Data mining is a threat to empirical research. Campbell R. Harvey is Professor of Finance at Duke University and Investment Strategy Advisor at Man Group, PLC. In recent years, he has warned that academic journals have a strong bias towards publishing papers with positive results and that this incentivizes quantitative researchers to engage in ‘p-hacking’. We talked to him about this issue and the serious consequences for investors.
“Editors want their journals to have the highest possible impact factor. This is based on the number of citations relating to the articles they publish, and studies that support the hypothesis being tested tend to receive more citations. Authors understand this and want to produce papers that have positive results. It is also more enjoyable to work on research that supports the hypothesis being tested, which is why researchers engage in data dredging to find results that exceed traditional levels of significance. As a result, I estimate that over 50% of all empirical studies in finance are unlikely to hold up in the future.”
‘Over 50% of all empirical studies in finance are unlikely to hold up in the future’
“In my 2017 presidential address1 to the American Finance Association (AFA), I detailed many of the ways that researchers engage in ‘p-hacking’ – trying to achieve the lowest possible p-value, meaning the highest level of significance. Some examples of the tools found in the p-hacker’s bag of tricks are: selective reporting of results; selective sample size; arbitrary transformations of data; arbitrary ‘winsorization’ and outlier exclusion rules; and arbitrary selection of statistical tests. P-hacking reduces the chance that any result will hold up ‘out-of-sample’.”
“No, not really, because people won’t want to publish in ‘The Journal of Non-Results’, nor are they likely to be rewarded for publishing in such a journal. Instead, I have advocated a concept called ‘Registered Reports’. Here, a researcher pitches an idea to an editor. The idea is peer reviewed. If the reviews are positive, the editor makes a commitment to publish a paper, no matter what the results are. This solves a few problems. First, it allows researchers to pursue risky research that often requires very costly data collection, which they might not otherwise embark on if they believe there is a large probability the result will be negative. Second, researchers still get to publish articles in the top journals even if they might have a negative result.”
“I have put forward four different ideas in my research. First, three papers2 I co-authored argued that we need to deal head on with the issue of multiple tests. That is, if you test 20 random factors, one will show up as ‘significant’ by chance. So these papers argue that the cut-off for significance needs to be raised from two standard errors to three. Other fields, such as particle physics and genome association studies, have even higher thresholds.”
“Second, in another paper3, I advocated a bootstrapping-based approach. Place all the factors in a spreadsheet and then strip out the mean for each one, so the average return is exactly zero. Now we have a universe of factors where each factor is false because we have hardwired a zero average return. Then create a new history using random sampling by replacing different rows and, when finished, calculate the average returns of each of the factors. They will not be zero in this new history. Save the factor return that is the highest: this is what you get with pure chance. Repeat the exercise a thousand times and, each time, save the best return you get by chance. Look at the distribution of these best factor returns, generated from a universe of factors that we have hardwired so none are true factors, and pick off the 95th percentile. The real best factor needs to beat this 95th percentile of what you can get purely by luck.”
“Third, I propose a shrinkage-based approach. In another recent paper I co-authored, we devised a model to select factors by considering both time-series information (factor by factor) and cross-sectional information (looking across many factors). This allowed us to reduce some of the noise that inevitably forms part of realized factor returns.”
“Finally, my address4 to the AFA argued that it does not make any sense to continue carrying out inference in the traditional way. For example, we might have two factors with identical Sharpe ratios: one is a value factor and the other is some convolution of sun spot data. You can’t just use the Sharpe ratios, you need to add economic priors. I propose a method to haircut Sharpe ratios based on prior information.”
“This phenomenon is prevalent in both academia and financial practice. In investment management, the worst instances of p-hacking are when someone produces a ‘good’ backtest on a smart beta strategy and some ETF provider decides to launch a product based on this. As with the academic papers, more than 50% of these so-called smart beta products will fail.’’
“Firms need to be very careful in fostering the right research culture to reduce the number of false positives. For example, suppose two highly qualified researchers, A and B, propose investigating two different potential strategies. A review committee thinks both have high quality ideas and A and B do their research which is also of an equally high quality. But the data supporting A’s strategy fails to hold up, while B’s strategy works well and goes live. It would be a big mistake to reward B and/or punish A. This could encourage other researchers to engage in p-hacking. That’s why the research culture is crucial for the success of an asset management firm.”
This article was initially published in our Quant Quarterly magazine.
1 C. R. Harvey, ‘The Scientific Outlook in Financial Economics’.
2 C. R. Harvey and Y. Liu, ‘Backtesting’; C. R. Harvey and Y. Liu, ‘Evaluating Trading Strategies’; and C. R. Harvey, Y. Liu and H. Zhu, ‘… and the Cross-Section of Expected Returns’.
3 C. R. Harvey and Y. Liu, ‘Lucky factors’.
4 C. R. Harvey and Y. Liu, ‘Detecting Repeatable Performance’.
The content displayed on this website is exclusively directed at qualified investors, as defined in the swiss collective investment schemes act of 23 june 2006 ("cisa") and its implementing ordinance, or at “independent asset managers” which meet additional requirements as set out below. Qualified investors are in particular regulated financial intermediaries such as banks, securities dealers, fund management companies and asset managers of collective investment schemes and central banks, regulated insurance companies, public entities and retirement benefits institutions with professional treasury or companies with professional treasury.
The contents, however, are not intended for non-qualified investors. By clicking "I agree" below, you confirm and acknowledge that you act in your capacity as qualified investor pursuant to CISA or as an “independent asset manager” who meets the additional requirements set out hereafter. In the event that you are an "independent asset manager" who meets all the requirements set out in Art. 3 para. 2 let. c) CISA in conjunction with Art. 3 CISO, by clicking "I Agree" below you confirm that you will use the content of this website only for those of your clients which are qualified investors pursuant to CISA.
Representative in Switzerland of the foreign funds registered with the Swiss Financial Market Supervisory Authority ("FINMA") for distribution in or from Switzerland to non-qualified investors is ACOLIN Fund Services AG, Affolternstrasse 56, 8050 Zürich, and the paying agent is UBS Switzerland AG, Bahnhofstrasse 45, 8001 Zürich. Please consult www.finma.ch for a list of FINMA registered funds.
Neither information nor any opinion expressed on the website constitutes a solicitation, an offer or a recommendation to buy, sell or dispose of any investment, to engage in any other transaction or to provide any investment advice or service. An investment in a Robeco/RobecoSAM AG product should only be made after reading the related legal documents such as management regulations, articles of association, prospectuses, key investor information documents and annual and semi-annual reports, which can be all be obtained free of charge at this website, at the registered seat of the representative in Switzerland, as well as at the Robeco/RobecoSAM AG offices in each country where Robeco has a presence. In respect of the funds distributed in Switzerland, the place of performance and jurisdiction is the registered office of the representative in Switzerland.
This website is not directed to any person in any jurisdiction where, by reason of that person's nationality, residence or otherwise, the publication or availability of this website is prohibited. Persons in respect of whom such prohibitions apply must not access this website.