Robeco logo

Disclaimer

This page is intended for US prospects, clients and investors only and includes information about the capabilities, staffing and history of Robeco Institutional Asset Management US, Inc. (RIAM US) and its participating affiliates, which may include information on strategies not available in the US. US Securities and Exchange Commission (SEC) regulations are applicable only to clients, prospects and investors of RIAM US. Robeco BV, Robeco HK and Robeco SH are considered a “participating affiliate” of RIAM US and some of their employees are “associated persons” of RIAM US as per relevant SEC no-action guidance. Employees identified as access persons or associated persons of RIAM US perform activities directly or indirectly related to the investment advisory services provided by RIAM US. In those situations, these individuals are deemed to be acting on behalf of RIAM, a US SEC registered investment adviser. RIAM US’s SEC registration should not be viewed as an endorsement or approval of RIAM US by the SEC. RIAM US maintains its offices at 230 Park Avenue, New York, NY 10169.

By clicking I Agree, I confirm that I have read and understood the above.

I Disagree

Quantitative investing

Random forest

Random forest (RF) is a popular machine learning algorithm.1 Its simplicity and versatility make it one of the most widely used learning algorithms for both regression and classification. It is used in many applications, including tasks as diverse as object recognition, credit risk assessment or purchase recommendations based on prior customer behavior.


In practice, the RF builds a myriad of individual decision trees. A decision tree is a tool that uses a tree-shaped model of possible options and their respective outcomes. It is a way to represent graphically an algorithm that only contains conditional control statements. Individual trees are created based on a random sample of observations in the broader dataset.

The RF then aggregates the individual the trees, a process called ‘bagging’, to get a more accurate and stable prediction. This can be done by averaging the results when the outcome is a number – for example the expected return of a given stock – or by performing a majority vote when predicting a class variable – for example, when the outcome can be ‘true’ or ‘false’, or a type of object.

To use a simple analogy, let’s imagine someone wants to buy a car and seeks advice from friends. The first friend may ask about the type of powertrain the person may be interested in, depending on the type of intended use (long vs. short distances, daily use vs. holidays only, city vs. countryside) and may come up with a recommendation based on the answers given to these possible choices.

The second friend may ask about the desired driving experience and come up with a very different decision tree (high vs. low driving position, quiet vs. sporty). The third friend may have more of an affinity for design and would therefore ask a series of questions about the desired shape of the vehicle. And so on. In the end, the person will choose the car that was most frequently recommended.

Among the advantages of RFs are the fact that they limit chances of overfitting, improve prediction accuracy and have results that tend to remain relatively stable as datasets grow. On the other hand, the main drawback of RFs is that a large number of trees could render the algorithm too slow and ineffective for real-time predictions.

In the asset management industry, random forest algorithms are being increasingly used for a number of machine learning applications, such as forecasting stock returns2 or predicting distress risk. 3

Footnotes

1 Breiman, L., 2001, “Random forests”, Machine learning, Vol. 45, No. 1, pp. 5–32.
2 See for example: Dixon, M., Klabjan, D. and Bang, J. H., 2017, "Classification-based financial markets prediction using deep neural networks”, Algorithmic Finance. See also: Khaidem, L., Saha, S. and Dey, S. R., 2016 "Predicting the direction of stock market prices using random forest”, working paper.
3 See for example: Shen, F., Liu, Y., Lan, D. and Li, Z., 2019, “A dynamic financial distress forecast model with time-weighting based on random forest”. In: Xu, J., Cooke, F., Gen, M. and Ahmed, S. (eds), “Proceedings of the twelfth international conference on management science and engineering management”.


See also

LASSO regression
Data mining


Invisible layers surface to deliver attractive returns