What is the promise of big data and machine learning for investors?
Pim van Vliet: “In 1965, US engineer Gordon Moore predicted that the number of transistors on a chip would double every two years. The realization of his accurate forecast – famously termed Moore’s Law – has resulted in increasing computing power. While this has facilitated the birth of quant investing in the past, it is currently enabling new modelling techniques.”
“For one, systematic and repeatable patterns, which simple linear models do not capture, can now be tested. For example, some variables might only work when they cross a certain threshold, especially when combined with other variables.”
“Most quant models are based on traditional data, with a linear factor structure. Although this approach is effective, it does not consider non-linear patterns. This means that residual alpha may be left on the table. We believe the use of alternative data sources and new modelling techniques potentially increases the opportunity set and leads to better model predictions. Thus, Moore’s Law is disrupting the world of quant investing.”
What is the biggest difference between traditional and machine learning-based quant research?
Weili Zhou: “The role of the researcher changes from instructor to orchestrator. With the traditional approach, the researcher instructs the computer to test specific rules on input data to see whether they can help predict the output. In terms of machine learning, the researcher feeds both the input and output data to the computer for it to assess what the best rule is. This role change allows researchers to deal with more complexities. But you need to be cautious and pay attention to model explainability and overfitting.”
“Overfitting has always been a potential pitfall of quant investing. Many terms refer to this nemesis, such as ‘data mining’, ‘p-hacking’ and ‘factor fishing’. The issue here is that some patterns may come out as statistically significant, when in fact there is no real underlying phenomenon. The problem with overfitting is that a model will correctly explain the past, but fail when used in real-life situations.”
How do you potentially address this?
P.v.V.: “We have found that the cross-validation step, in the research process, can significantly reduce the risk of overfitting. Machine learning selects the model with the best predictive performance when ‘unseen data’ is used during the training phase. This step can be repeated continuously to recalibrate and retrain the model, so that it can adapt over time in a data-driven way.“
“Intensive sampling and resampling methods, such as neural network or random forest models, can therefore make stable predictions that also work out-of-sample. In addition to robust sampling, our proven investment philosophy, based on economic rationale and a prudent approach, helps to address this issue.”

Pim van Vliet
Head of Conservative Equities and Chief Quant Strategist
“
Next-generation quant models could result in improved predictions on returns, sustainability and risk.
And what about the implications of big data?
W.Z.: “Although it is difficult to measure, the amount of data is also roughly doubling every two years. Given this deluge, significant developments have taken place. Cloud-based data storage costs have plummeted. Highly dynamic investment approaches, such as intra-day momentum strategies, are now documented and implemented by some high-frequency traders. As described in the book Flash Boys, by Michael Lewis, some traders exploit a four-millisecond information advantage to beat market orders.”
“New and useful datasets are cropping up, such as those that house novel economic data on consumers and producers. Innovative techniques can also create new data. For example, natural language processing (NLP) can be used to analyze company filings, central bank statements or earnings call transcripts, and measure investor sentiment.”
“Also, macroeconomic data, which get reported with a few weeks’ delay, can be predicted in real-time with nowcasting using daily data points such as mobility indicators. Such developments are expanding the data available, likely creating alpha opportunities.”

Pim van Vliet
Head of Conservative Equities and Chief Quant Strategist
“
Next-generation quant strategies may be used to find alpha sources that are uncorrelated to existing factors.
What does next-generation quant investing look like?
P.v.V.: “Next-generation quant models could result in improved predictions on returns, sustainability and risk. Starting with the latter, risk is often considered to be somewhat easier to predict than return. And yes, risk is often non-linear. For example, leverage might be fine up to a certain threshold, but risk could rise disproportionally beyond this level. Due to next-generation models, we are seeing promising results in this area that can help us better predict stock crashes, on top of existing risk measures.”
“Also, besides absolute risk, forecasts on relative risk can be improved too. Industry classification is a standard way to group stocks, but next-generation models can better cluster similar securities beyond sectors, which should lead to better relative risk control.”
W.Z.: “In terms of sustainability, climate, ESG or SDG objectives can be efficiently implemented in a quant portfolio. Moreover, sustainability data is expanding quickly and improving in quality. Although the data are often backward-looking, nowcasting can be useful in predicting sustainability trends.”
“For example, it can be used to forecast which firms will be most effective at decarbonizing their businesses. New sustainability data can also be created using new techniques. This can be achieved by decomposing the capital expenditure and R&D of firms, or by classifying and linking firms to the 17 SDGs based on company profiles using NLP.”
P.v.V.: “Moving on to returns, the pursuit of alpha is the most challenging goal. Unlike risk and sustainability, returns are very unstable. Still, the breadth and depth of return databases is increasing. Datasets are going further back in time with a higher frequency level. This should increase the likelihood of successfully exploring and potentially exploiting repeating, non-linear patterns. Moreover, next-generation techniques also confirm the strength of traditional styles such as momentum interacting with low volatility.”
“Finally, next-generation quant strategies may be used to find alpha sources that are uncorrelated to existing factors. For instance, short-term signals tend to be uncorrelated, simply because they change frequently. These alpha signals can be captured and possibly exploited, although trading costs may limit potential capacity. Therefore, forecasting and reducing trading costs, another application of next-generation models, will also become more important when predicting short-term timing signals.”
Why is quant investing well equipped to handle changing investment landscapes?
W.Z.: “Quant investing has never been very dogmatic or normative and has evolved considerably over the last decades. In the past, we have seen many quant investors update their beliefs based on new evidence.
For example, by including low volatility and sustainability variables in their strategies. With more data and better techniques available, we believe quant investors will continue to learn and adapt to changing landscapes.”
Stay informed on Quant investing
Receive our Robeco newsletter and be the first one to get the latest insights, or build the greenest portfolio.
Important information
This information is for informational purposes only and should not be construed as an offer to sell or an invitation to buy any securities or products, nor as investment advice or recommendation. The contents of this document have not been reviewed by the Monetary Authority of Singapore (“MAS”). Robeco Singapore Private Limited holds a capital markets services license for fund management issued by the MAS and is subject to certain clientele restrictions under such license. An investment will involve a high degree of risk, and you should consider carefully whether an investment is suitable for you.