Dissecting characteristics via machine learning

Models explaining the returns of individual stocks generally use company and stock characteristics, e.g. market prices of financial instruments and accounting data of companies. These characteristics can also be used to predict expected stock returns out-of-sample. Most studies (e.g., Lewellen, 2015 or Blitz and Vidojevic, 2018) use simple linear models to form these predictions. An increasing body of academic literature documents that non-linear models can improve the stock return forecasts.

The main purpose of this project is to compare various machine learning methods (see e.g., Gu et al., 2018) given a large set of characteristics. Robeco Quantitative Research has access to rich historical databases that enable back-testing and evaluating of investment strategies. You will conduct a literature overview and investigate the most suited prediction methods. Once these methods are identified, the question is whether they lead to return forecast that are superior to linear model forecast: i.e., are predicted returns from machine learning methods more effective for quantitative stock selection than traditional linear methods. The starting point of this project could be to replicate a nonparametric model as in Freyberger et al. (2017).

Are you interested?
Let us know your motivation and send it together with your top-3 favorite internship topics, your CV and list of grades to
Previous projects


Blitz, Vidojevic, 2018, “The Characteristics of Factor Investing.” SSRN Working Paper

Lewellen, 2015, “The Cross-section of Expected Stock Returns.” Critical Finance Review

Freyberger, Neuhierl, Weber, 2017, “Dissecting characteristics nonparametrically.” NBER Working Paper

Gu, Kelly, Xiu, 2018, “Empirical Asset Pricing via Machine Learning”. Chicago Booth Research Paper