Using machine learning to build diversified portfolios that perform well out of sample

In finance, the covariance matrix is the most used tool to assess the risk of a portfolio.  There are however, well-known shortcomings to solely use the covariance matrix to construct portfolios that have a similar risk in-sample and out-of-sample [1][2][3].

First, it assumes that the underlying data used to construct the covariance matrix is well described by a multivariate Gaussian distribution, which is often not the case in financial markets. Second, the estimation of a covariance matrix that is well posed for classical portfolio optimization techniques (e.g. Markowitz mean-variance), requires more observations than the number of assets (2000 assets require, requires 8 years of daily observations). Optimization based on these covariances might lead to unexpectedly risky portfolio behavior in the future.

Addressing these shortcomings, investors have turned to factor-based risk models e.g. including other similarity measures like asset industry, country and, region. While these have developed into the standard market practice nowadays, they have not proven flawless as well. In this internship topic, we therefore want to make the next step, and investigate the potential of graph theory and machine learning techniques such as clustering [3].

The goal of this internship is to:

  • Make a short overview of classical portfolio construction techniques/research and evaluate if the underlying assumptions will hold with regards to real-world constraints. 
  • Use (un)supervised learning techniques to identify correlated assets. Evaluate if this information can be used to construct optimized portfolios that are equally risky in and out of sample. To do so you will have access to real world asset and alternative data.

Ideally, the project results in several insights that will help Robeco to better translate our alpha predictions into solutions accustomed perfectly to clients’ risk profiles. The topic calls for a lot of creativity in exploring the latest techniques in machine learning and clustering, as well as a practical mindset to solve real-world problems. 

Are you interested?
Let us know your motivation and send it together with your top-3 favorite internship topics, your CV and list of grades to
Previous projects


[1] Ledoit, Olivier, and Michael Wolf. "Honey, I shrunk the sample covariance matrix." The Journal of Portfolio Management 30.4 (2004): 110-119.

[2] Michaud, Richard O., and Robert Michaud. "Portfolio optimization by means of resampled efficient frontiers." U.S. Patent No. 6,003,018. 14 Dec. 1999.

[3] Chan, Louis KC, Jason Karceski, and Josef Lakonishok. "On portfolio optimization: Forecasting covariances and choosing the risk model." The review of Financial studies 12.5 (1999): 937-974

[4] López de Prado, Marcos, Building Diversified Portfolios that Outperform Out-of-Sample (May 23, 2016). Journal of Portfolio Management, 2016.