The research culture is crucial for the success of an asset manager

The research culture is crucial for the success of an asset manager

04-12-2017 | インタビュー

Data mining is a threat to empirical research. Campbell R. Harvey is Professor of Finance at Duke University and Investment Strategy Advisor at Man Group, PLC. In recent years, he has warned that academic journals have a strong bias towards publishing papers with positive results and that this incentivizes quantitative researchers to engage in ‘p-hacking’. We talked to him about this issue and the serious consequences for investors.

Speed read

  • Many empirical studies are unlikely to hold up in the future
  • Investors ought to be selective when choosing a strategy
  • Asset managers should foster the right research culture

Could you explain the concept of ‘p-hacking’?

“Editors want their journals to have the highest possible impact factor. This is based on the number of citations relating to the articles they publish, and studies that support the hypothesis being tested tend to receive more citations. Authors understand this and want to produce papers that have positive results. It is also more enjoyable to work on research that supports the hypothesis being tested, which is why researchers engage in data dredging to find results that exceed traditional levels of significance. As a result, I estimate that over 50% of all empirical studies in finance are unlikely to hold up in the future.”

‘Over 50% of all empirical studies in finance are unlikely to hold up in the future’

“In my 2017 presidential address1 to the American Finance Association (AFA), I detailed many of the ways that researchers engage in ‘p-hacking’ – trying to achieve the lowest possible p-value, meaning the highest level of significance. Some examples of the tools found in the p-hacker’s bag of tricks are: selective reporting of results; selective sample size; arbitrary transformations of data; arbitrary ‘winsorization’ and outlier exclusion rules; and arbitrary selection of statistical tests. P-hacking reduces the chance that any result will hold up ‘out-of-sample’.”

Would it make sense to start a journal in support of the null hypothesis, to make sure the less significant, or even negative, results are also properly reported, as long as they are interesting from a research perspective?

“No, not really, because people won’t want to publish in ‘The Journal of Non-Results’, nor are they likely to be rewarded for publishing in such a journal. Instead, I have advocated a concept called ‘Registered Reports’. Here, a researcher pitches an idea to an editor. The idea is peer reviewed. If the reviews are positive, the editor makes a commitment to publish a paper, no matter what the results are. This solves a few problems. First, it allows researchers to pursue risky research that often requires very costly data collection, which they might not otherwise embark on if they believe there is a large probability the result will be negative. Second, researchers still get to publish articles in the top journals even if they might have a negative result.”


What would you suggest in order to improve the quality of the research carried out on factor investing?

“I have put forward four different ideas in my research. First, three papers2 I co-authored argued that we need to deal head on with the issue of multiple tests. That is, if you test 20 random factors, one will show up as ‘significant’ by chance. So these papers argue that the cut-off for significance needs to be raised from two standard errors to three. Other fields, such as particle physics and genome association studies, have even higher thresholds.”

“Second, in another paper3, I advocated a bootstrapping-based approach. Place all the factors in a spreadsheet and then strip out the mean for each one, so the average return is exactly zero. Now we have a universe of factors where each factor is false because we have hardwired a zero average return. Then create a new history using random sampling by replacing different rows and, when finished, calculate the average returns of each of the factors. They will not be zero in this new history. Save the factor return that is the highest: this is what you get with pure chance. Repeat the exercise a thousand times and, each time, save the best return you get by chance. Look at the distribution of these best factor returns, generated from a universe of factors that we have hardwired so none are true factors, and pick off the 95th percentile. The real best factor needs to beat this 95th percentile of what you can get purely by luck.”

“Third, I propose a shrinkage-based approach. In another recent paper I co-authored, we devised a model to select factors by considering both time-series information (factor by factor) and cross-sectional information (looking across many factors). This allowed us to reduce some of the noise that inevitably forms part of realized factor returns.”

“Finally, my address4 to the AFA argued that it does not make any sense to continue carrying out inference in the traditional way. For example, we might have two factors with identical Sharpe ratios: one is a value factor and the other is some convolution of sun spot data. You can’t just use the Sharpe ratios, you need to add economic priors. I propose a method to haircut Sharpe ratios based on prior information.”

How serious is the p-hacking problem for investors?

“This phenomenon is prevalent in both academia and financial practice. In investment management, the worst instances of p-hacking are when someone produces a ‘good’ backtest on a smart beta strategy and some ETF provider decides to launch a product based on this. As with the academic papers, more than 50% of these so-called smart beta products will fail.’’

“Firms need to be very careful in fostering the right research culture to reduce the number of false positives. For example, suppose two highly qualified researchers, A and B, propose investigating two different potential strategies. A review committee thinks both have high quality ideas and A and B do their research which is also of an equally high quality. But the data supporting A’s strategy fails to hold up, while B’s strategy works well and goes live. It would be a big mistake to reward B and/or punish A. This could encourage other researchers to engage in p-hacking. That’s why the research culture is crucial for the success of an asset management firm.”

This article was initially published in our Quant Quarterly magazine.

1 C. R. Harvey, ‘The Scientific Outlook in Financial Economics’.
2 C. R. Harvey and Y. Liu, ‘Backtesting’;  C. R. Harvey and Y. Liu, ‘Evaluating Trading Strategies’; and C. R. Harvey, Y. Liu and H. Zhu, ‘… and the Cross-Section of Expected Returns’.
3 C. R. Harvey and Y. Liu, ‘Lucky factors’.
4 C. R. Harvey and Y. Liu, ‘Detecting Repeatable Performance’.


当資料は情報提供を目的として、Robeco Institutional Asset Management B.V.が作成した英文資料、もしくはその英文資料をロベコ・ジャパン株式会社が翻訳したものです。資料中の個別の金融商品の売買の勧誘や推奨等を目的とするものではありません。記載された情報は十分信頼できるものであると考えておりますが、その正確性、完全性を保証するものではありません。意見や見通しはあくまで作成日における弊社の判断に基づくものであり、今後予告なしに変更されることがあります。運用状況、市場動向、意見等は、過去の一時点あるいは過去の一定期間についてのものであり、過去の実績は将来の運用成果を保証または示唆するものではありません。また、記載された投資方針・戦略等は全ての投資家の皆様に適合するとは限りません。当資料は法律、税務、会計面での助言の提供を意図するものではありません。




商号等: ロベコ・ジャパン株式会社  金融商品取引業者 関東財務局長(金商)第2780号

加入協会: 一般社団法人 日本投資顧問業協会