26-09-2022 · 研究

Academic insights into using machine learning for valuation

Machine learning (ML) mispricing models are designed to detect hidden nonlinearities that are important in predicting the fundamental value of stocks. In a recent academic paper, the authors show that ML-based mispricing models have the potential to outperform corresponding linear regression (LR) models by augmenting stylized valuation approaches such as discounted cashflow models. Thus, it is important to allow for nonlinearities and interactions in fundamental analysis.

Fundamental analysis is an approach that is used to determine the intrinsic or fair value of a firm and forms the basis of evaluating whether the company is undervalued or overvalued. Investors can potentially gain from such assessments if they subscribe to the notion that a company’s share price converges to its fair value over the long run: either by buying undervalued stocks or selling overvalued ones.

According to the academic literature, fundamental analysis is typically based on highly stylized valuation approaches such as discounted cashflow models which require inputs such as cashflow forecasts and discount rates. This approach is complicated by the discretion a researcher has over the choice of variables and parameters of the model.

Although these stylized models are extremely popular, explicit cash flow forecasts and discount rates are not necessarily required for fundamental analysis. For instance, an agnostic approach can estimate the fair value of a company as a linear function of its balance sheet, income statement and cashflow statement items.

To this end, a direct approach for estimating fair values is proposed by Bartram and Grinblatt in two academic studies.1 They “take the view of a statistician with little knowledge of finance” and use LR to proxy the “peer-implied fair value” of a firm as a linear function of 21 commonly reported accounting items. They conclude in their findings that their signal reliably predicts future returns in the US and most regions in the world, with the exception of the European market.

Taking a data scientist approach in valuing stocks

In a recent research paper,2 Hanauer, Kononova and Rapp opt for a different approach as they “take the view of a data scientist with little knowledge of finance”. Inspired by the studies of Bartram and Grinblatt, they apply LR and ML methods to estimate the monthly fair values of stocks from 17 European countries for the period January 1993 to December 2019. Then, based on the results, they assess the return predictability of the corresponding mispricing signals, i.e., the difference in model-based fair values and actual market values.

In their analysis, the researchers determined the fundamental values of stocks using six different approaches based on:

  • a LR model that closely followed the one set out by Bartram and Grinblatt,

  • a linear model on the pooled cross section of stocks from the last 48 months that they used for the other approaches (LR pooled),

  • a model using the least absolute shrinkage and selection operator (LASSO) to the 21 accounting variables,

  • a random forest model (RF),

  • a gradient boosting model (GBRT), and

  • a model that combines the RF and GBRT signals.

More specifically, the researchers used RF and GBRT models given that these can deal with nonlinearities and interactions, handle noisy features well, and do not require subtle tuning as is the case for more complex methods.

ML-based signals are effective in spotting mispricing opportunities

The researchers sorted the stocks into five quintile portfolios based on the various mispricing signals. They observed that all the models reflected large negative (positive) mispricing signals for the first (fifth) quintile portfolios. Interestingly, the LASSO and ML signals were considerably smaller than their LR counterparts due to the nonlinearity of their valuation models and ability to better fit the data.

To assess the efficacy of the models, they calculated the value-weighted and industry-adjusted monthly portfolio returns to study the relationship between the mispricing signals and subsequent monthly returns. As depicted in Figure 1, they saw that the ML approaches generated statistically and economically significant industry-adjusted return spreads, benefiting uniformly from both long and short positions. While the LR and LASSO signal spreads were significant, their economic relevance was substantially weaker, with a higher portion of their returns coming from the short leg.

Figure 1 | ML-based models displayed efficacy in predicting fundamental values

Figure 1 | ML-based models displayed efficacy in predicting fundamental values

Source: Refinitiv, Robeco. The figure shows the annualized Fama-French six-factor alphas for long minus short quintile portfolio returns based on mispricing signals obtained from different models. The quintile portfolio returns are value-weighted and industry adjusted. The sample period is January 1993 to November 2019.

The researchers also verified the results by taking into account four different factor models. In their tests, they noted that the returns of the LR strategy were largely explained by the common factors. Similarly, the alphas for the LR (pooled) signal decreased. By contrast, the ML models delivered similar or even stronger alphas across all factor models. As such, ML methods seem to detect hidden nonlinearities that are important in predicting the fundamental value of stocks.


ML methods are expected to discover additional structure in data due to their ability to spot nonlinear patterns. Consistent with this view, this analysis shows that the portfolio spreads based on ML mispricing signals can earn large and significant alphas, and outperform corresponding LR mispricing models. These findings suggest that it is important to allow for nonlinearities and interactions in fundamental analysis.

At Robeco, we are convinced that developments in alternative data, artificial intelligence and ML are pivotal to the evolution of investing. We are currently investigating many ML applications that can potentially be of use for quantitative, fundamental and sustainable investing. Importantly, we follow a strict process when testing new variables or methods and stick to our investment philosophy that is based on robust empirical evidence, sound economic rationale and a prudent approach.

Read the full research paper


本文由荷宝海外投资基金管理(上海)有限公司(“荷宝上海”)编制, 本文内容仅供参考, 并不构成荷宝上海对任何人的购买或出售任何产品的建议、专业意见、要约、招揽或邀请。本文不应被视为对购买或出售任何投资产品的推荐或采用任何投资策略的建议。本文中的任何内容不得被视为有关法律、税务或投资方面的咨询, 也不表示任何投资或策略适合您的个人情况, 或以其他方式构成对您个人的推荐。 本文中所包含的信息和/或分析系根据荷宝上海所认为的可信渠道而获得的信息准备而成。荷宝上海不就其准确性、正确性、实用性或完整性作出任何陈述, 也不对因使用本文中的信息和/或分析而造成的损失承担任何责任。荷宝上海或其他任何关联机构及其董事、高级管理人员、员工均不对任何人因其依据本文所含信息而造成的任何直接或间接的损失或损害或任何其他后果承担责任或义务。 本文包含一些有关于未来业务、目标、管理纪律或其他方面的前瞻性陈述与预测, 这些陈述含有假设、风险和不确定性, 且是建立在截止到本文编写之日已有的信息之上。基于此, 我们不能保证这些前瞻性情况都会发生, 实际情况可能会与本文中的陈述具有一定的差别。我们不能保证本文中的统计信息在任何特定条件下都是准确、适当和完整的, 亦不能保证这些统计信息以及据以得出这些信息的假设能够反映荷宝上海可能遇到的市场条件或未来表现。本文中的信息是基于当前的市场情况, 这很有可能因随后的市场事件或其他原因而发生变化, 本文内容可能因此未反映最新情况,荷宝上海不负责更新本文, 或对本文中不准确或遗漏之信息进行纠正。