Robeco logo

Disclaimer

This page is intended for US prospects, clients and investors only and includes information about the capabilities, staffing and history of Robeco Institutional Asset Management US, Inc. (RIAM US) and its participating affiliates, which may include information on strategies not available in the US. US Securities and Exchange Commission (SEC) regulations are applicable only to clients, prospects and investors of RIAM US. Robeco BV, Robeco HK and Robeco SH are considered a “participating affiliate” of RIAM US and some of their employees are “associated persons” of RIAM US as per relevant SEC no-action guidance. Employees identified as access persons or associated persons of RIAM US perform activities directly or indirectly related to the investment advisory services provided by RIAM US. In those situations, these individuals are deemed to be acting on behalf of RIAM, a US SEC registered investment adviser. RIAM US’s SEC registration should not be viewed as an endorsement or approval of RIAM US by the SEC. RIAM US maintains its offices at 230 Park Avenue, New York, NY 10169.

By clicking I Agree, I confirm that I have read and understood the above.

I Disagree

Quantitative investing

LASSO regression

LASSO is an acronym that stands for ‘least absolute shrinkage and selection operator’. It is associated with a machine learning technique – LASSO regression – that performs both shrinkage and variable selection to simplify linear regression models and prevent overfitting.


glossary-lasso-regression.png

Where

λ is amount of shrinkage or penalty
λ = 0 implies all features are considered as no parameters are eliminated
λ = ∞ implies no feature is considered

A linear regression allows you to determine if there is a relationship between variables. For example, it can quantify the relationship between a dependent variable (crop yields) and explanatory variables (soil fertility,temperature, water quality, etc.). But in cases where there are many candidate variables to explain crop yields, the statistical model can become complex and difficult to process.

The LASSO regression is helpful in such instances as it can select variables based on their importance. This is achieved through a process called shrinkage, a method which imposes a penalty to reduce the absolute size of the regression coefficients. Although reduced in magnitude, the most important variables will continue to reflect material coefficients, while the less-contributing variables will exhibit values close to zero or even zero.

Through this process, it identifies which variables to keep and which ones to exclude, based on the size of their coefficients. Using our example, the technique would gradually select the variables which best predict crop yields, beginning with the most important one before working its way through the list. At some point, adding more variables would no longer improve the prediction accuracy of the model sufficiently, but instead it would add substantial complexity.

Therefore, the technique allows you to simplify a model by reducing the number of parameters in a regression and precluding potential data noise. It also enables you to guard against overfitting by eliminating variables with little explanatory power, potentially making the model more robust across different datasets. Additionally, it can help optimize models with high multicollinearity as it can choose between correlated explanatory variables.

In general, the LASSO regression is a basic machine learning (ML) technique that can be used for many applications. It is essentially a standard linear regression with a slight twist. Contrary to more sophisticated ML techniques, however, it is not able to pick up non-linear relationships between variables.

For our quant investing platform, it has the potential to help fine-tune models by assisting us with variable selection. For instance, we have used it to select company characteristics that have linear predictive value for risk and returns. We have also used it to identify which industries lead or lag others in terms of returns.


See also

Capital Asset Pricing Model
Random forest
Efficient (advanced) approach