03-01-2023 • Interview

'Machine learning models can spot interesting interactions'

Buzzwords such as ‘alternative data’, ‘machine learning’ and ‘natural language processing’ are quickly becoming part of the jargon used by asset managers. We uncover what these mean for the Robeco Quant Team in our discussion with Quant Researcher Clint Howard.

Authors

Investment Specialist

Top keywords

Summary

We hunt for alternative datasets that we can use to either validate or refute our economic intuition
Machine learning provides quant investors with an extra toolkit to study economic problems
Natural language processing can allow quant investors to go to previously unexplored places

The growing prominence of big data is widening the scope for quant strategies. So, given the multitude of new alternative datasets cropping up, how do you select which ones to use?

“Our research initiatives are premised on ideas that are driven by fundamental economic reasons. As quant investors, we have traditionally used financial statement and market data to conduct such research. Now with the deluge of alternative datasets, we have additional information that we can use and different ways to study our ideas. That said, it is important to be discerning about which datasets can add value.”

“Because we intentionally focus on the economic rationale behind our ideas before selecting data sources (whether alternative or traditional), it allows us to be quite deliberate in picking the datasets that we believe will actually answer the questions we are studying. If you do not start with the economic principles, you face the potential risk of overfitting a model and weakening its predictive power as ill-suited datasets might be chosen.”

“For example, big text data such as broker reports, company announcements and news filings are a rich treasure trove given the large volumes of data available. But these data sources only add value to our process if we can use them to research the economic intuition behind our market observations or hypotheses. Alternative datasets are, therefore, a means to an end, but not the be-all and end-all.”

Data vendors can offer the same datasets to competing asset managers. So how does the Robeco Quant Team gain unique insights?

“This is true, data vendors market and sell their datasets to several asset managers as it is the nature of their business. So if investors just plug in the data into their models or strategies in the same form they receive them in, then they run the risk of falling prey to alpha decay and crowding issues as their peers can easily do the same thing.”

“There are a few ways to address this. An approach we favor is sourcing datasets that are as raw as possible, with minimal alterations made by a vendor. This allows us to transform the granular data so that it is suited to the economic problems we are trying to study. This enables us to incorporate our unique insights and domain knowledge, therefore differentiating our use of the data from competitors’.”

“It is important to stress again that we always start any research we do based on economic intuition. This means that we have a sensible idea about why something might work. Only then do we hunt for the datasets that we can use to either validate or refute our intuition. By following this approach, we believe the possibility of using a dataset in exactly the same manner as another asset manager diminishes.”

What can we do with machine learning (ML) that was not easy to do before?

“For decades, standard linear modeling has been the go-to approach in quant models and has laid the foundation for the success achieved by the investment style over the years. These models essentially impose linear relationships between variables, from which patterns can be deduced to establish alpha signals, risk models or portfolio construction algorithms, for example.”

“ML provides quant investors with an extra toolkit to study economic problems (or reveal such patterns). This flexible and powerful framework – through the use of applications such as neural networks and random forest – can uncover nonlinear relationships between variables as well as how variables interact with each other. This can provide quant investors with additional insight for signal construction.”

“For example, ML models can spot interesting interactions such as between newsflow and stock-price reversals. One of the patterns observed in markets is that when a firm’s share price goes up (or down) by a big margin, it tends to revert back down (or up). Interestingly, we find that this reversal phenomenon is affected by the level of abnormal newsflow related to stocks in question.”

“Specifically, if there has been more newsflow than average on a stock around a time when its share price rallies or sinks, it does not tend to revert. The intuition behind this is that there is probably a genuine reaction to a change in fundamentals if there has been a lot of news covering a recent event. But in the absence of significant newsflow, we do tend to see the reversal pattern in stocks, suggesting that the initial move was probably based on noise rather than fundamentals. So these kind of insights are really interesting for us.”

And why now?

“ML, specifically neural networks, has been around since the 1940s, but there are two main reasons why the concept has only taken off more recently. The first reason is due to computational power. To put this in context, it would have taken several months to run the simplest ML model on the fanciest IBM or Bell Labs research computer back in the day. The turning point was in the 2000s when we witnessed exponential growth in computational power, facilitating the rise of applied research in ML to solve real-world problems.”

“The second reason is related to data as ML models require a lot of it for training purposes. The advent of big data and increasing ease of access – largely due to cloud computing – has been helpful. You can find data on just about anything these days and this has propelled research on ML applications given the increased scope for training. Luckily for us in finance, we also get to benefit from the initial work done by computer scientists in terms of applied research in ML.”

Get the latest insights

Subscribe to our newsletter for investment updates and expert analysis.

Don’t miss out

What do you think of the notion that ML models are black boxes?

“If you asked me this five to ten years ago, then I would say it is a fair statement because back then there was a lot of hype given the results ML techniques were producing. But there was not a lot of attention given to what lay under the hood. Since then, there have been significant advancements on this front – such as the development of the Explainable AI (XAI) toolkit – that allow users to better understand the predictions made by ML models.”

“For example, Shapley values is an XAI method that allows us to interpret ML models by analyzing the relationship between the model inputs and outputs, how the different variables contribute to predicting outcomes, how the variables interact, etc. This level of understanding is in line with our investment philosophy that all our ideas need to be supported by an economic rationale. These tools allow us to see if ML models make decisions that are in line with our economic intuition.”

“That said, the bar for us to use ML models in our strategies is high given their complex nature. We have to be comfortable that we understand how they work, that they behave in the way that we would expect them to, and that they add value on top of our existing models. Without such XAI tools that transform ML models into ‘glass boxes’, we probably would not be able to explore the possibilities offered by ML.”

Natural language processing (NLP) has attracted a lot of attention in recent years. What are some interesting applications of NLP?

“NLP is a toolkit that can be used to analyze spoken words and text. This is quite exciting for us quant investors as it allows us to go to previously unexplored places. To put this in context, fundamental equity analysts examine broker research notes, analyze company reports, review news releases and meet with management teams, among other things. Using their expertise, they glean insights by reading between the lines. Quant investors can now potentially perform similar tasks with NLP techniques such as sentiment analysis.”

“For example, this allows us to scrutinize how brokers view a company based on how they write about it in their reports, enables us to analyze news sentiment based on the language used in articles pertaining to specific firms, and gives us the tools to assess the mood within a company based on the language used by its executives at press conferences compared to earnings calls. Moreover, this can be done swiftly across thousands of stocks. And this is just one of the many ways in which NLP can be used within quant models.”

But what if company executives adapt their use of words to circumvent this?

“This is classic game theory. In this scenario, quant investors start off by building NLP models to analyze the language used by executives. When the executives catch on to this, they change their communication style to disguise their sentiment. But everything comes full circle as quant investors can retrain their NLP models to catch onto the changes, until the executives make further tweaks to how they relay their messaging.”

“This iterative loop speaks to the concept of: if you want to innovate, then you need to innovate constantly. It is not only our competitors that will try to keep up with us or forge ahead, but also the companies that we invest in. It means we need to continuously update and improve the way we conduct our research and implement our strategies.”

Given the promising prospects of alternative data and advanced techniques, many asset managers are investigating and applying these techniques. What distinguishes Robeco’s approach?

“We were very deliberate in how we approached the incorporation of alternative data and advanced techniques into our research and strategies. We focused firstly on laying the foundations by heavily investing in the infrastructure. We wanted to ensure that we would be able to use these datasets and tools in a robust and repeatable manner, while also being able to seamlessly integrate ML or NLP models into new or existing strategies.”

“We were aware of the risk of spending valuable hours on research as well as building ML and NLP models, and then being thwarted by the complexities of the practical implementation of these models. As a result of our initial investment, the production lead time to deploy new ML and NLP research in our strategies is relatively short.”

“I believe this gives us a competitive edge as setting up state-of-the-art infrastructure requires a lot resources, technical expertise and time to see it to completion. After three or so years of hard work on this project, we are proud of the results and can fully focus on our research pipeline and on implementing our best ideas. This has started to happen as of last year with the inclusion of a distress risk ML model in our strategies that forecasts stock price crash risk.”