Quant modeling can use non-numerical data, too

Quant modeling can use non-numerical data, too

10-12-2021 | Visión

We find that text analysis can predict the risk and return characteristics of corporate bonds.

  • Patrick  Houweling
    Co-Head of Quant Fixed Income and Lead Portfolio Manager
  • Robbert-Jan 't Hoen
    't Hoen

Speed read

  • More than 80% of all corporate information is in an unstructured form
  • Literature finds that text in SEC filings predicts return and volatility of stocks
  • We show that results from the literature carry over to corporate bonds

It is estimated that over 80% of all business-relevant information is in an unstructured form, such as text, video, or audio.1 However, financial models traditionally only use numerical data, such as market prices and company accounting data. Hence, tapping into the large pool of unstructured data has the potential of enriching existing models. We investigated the opportunities that unstructured data present for investing in corporate bonds.2

Conozca las perspectivas más recientes
Conozca las perspectivas más recientes

Mining abundant sources of non-numerical data

Textual data is an important source of non-numerical data. Examples include news articles, social media posts, transcripts of management presentations and corporate reports. Until recently, usage of such text sources in analysis required human intervention to code attributes into numerical form – a slow and tedious process. Nowadays, due to advances in natural language processing (NLP) and the immense growth in computing power, text mining techniques can be used to systematically analyze vast amounts of text data.

Academics as well as practitioners have started analyzing text data for the purpose of predicting the risks and returns of stocks and bonds. One strand of research investigates the information content of corporate reports filed by publicly listed companies in the US with the Securities and Exchange Commission (SEC). Of these SEC filings, most attention is directed towards the annual (Form 10-K) and quarterly (Form 10-Q) reports. The reports are very extensive, owing to laws and regulations that prohibit companies from making materially false or misleading statements, and from omitting material information that would render disclosures misleading. Along with the numerical data from the financial statements, these filings contain large volumes of unstructured textual information.

The information in 10-Ks and 10-Qs should enable any investor to fully understand the state of a company. In practice, however, valuable information in these reports is easily overlooked, because of the daunting challenge of reading and grasping many pages of formal and often very technical text.3 These reports therefore provide an attractive avenue of research for the application of computer-based text analysis.

Data collection and pre-processing

We obtain all 10-Ks and 10-Qs of publicly listed US issuers of corporate bonds in the Bloomberg US Corporate Investment Grade and High Yield ex. Financials indices. The sample covers the period from 1994 to 2017 and contains a total of 212,400 filings, of which 57,952 are 10-Ks and 154,448 10-Qs.

Figure 1 | Filing size

Source: Robeco, EDGAR. Sample period 1994-2017.

To facilitate later analyses, we first clean each document so that only the text, numbers and symbols in the main body of the original filing remain. Figure 1 shows the average size of the cleaned files over time, as measured by the total number of characters. As expected, we find that 10-Ks are, on average, significantly larger than 10-Qs. Moreover, there is a strong upward trend in the size of 10-Ks and 10-Qs. This is driven largely by the gradual increase over time in required disclosures.

Text analysis

The next step in the research is to process the cleaned text data so that it becomes understandable to a computer. A commonly used method to convert text into a numerical format is the Bag-of-Words (BoW) model. BoW is an NLP technique that reduces the complexity of text data by removing information about word order and context. All that remains of each filing is a list of term frequencies, i.e., the number of times each unique word appears. The idea behind the model is that the more frequently a term is used, the more important it is.4

Changers and non-changers

A recently published academic article documents that the similarity of a company’s consecutive 10-Ks and 10-Qs is a significant predictor of stock return and stock return volatility: companies that make more changes to the text of their report compared to their previous report (which the article labels as ‘changers’) underperform companies with fewer changes (labeled as ‘non-changers’) by a wide margin.5 The rationale for this finding is that firms tend to repeat what they reported previously and that they are only required to change the text if there are material changes to the company or to its circumstances over the reporting period. Changes in the text are thus interpreted as being negative. Although extensive text changes are not necessarily a bad sign, analysis does show that these are mostly related to negative events and negative future stock returns.

In our research, we test if a similar effect exists for corporate bonds. If the degree of similarity between consecutive 10-Ks and 10-Qs is truly linked to firm performance, then we expect to see this reflected in corporate bond returns as well. To gauge the similarity between reports, we compare the text in a report with that of the same report published a year previously, i.e., a 10-K is compared with previous year’s 10-K, and a 10-Q with a 10-Q of the same quarter in the previous year.

We evaluate the performance of changers versus non-changers on our sample of US investment grade and high yield issuers over the 1997-2017 period. Our hypothetical investment strategy for this research goes long in the bonds of the companies whose reports showed the fewest changes, and goes short in the bonds of the firms with the most changes.

We find that, in investment grade as well as in high yield, non-changers have outperformed changers by over 50bps per year and have been less risky than changers, resulting in higher Sharpe ratios for non-changers. Overall, we find that the degree of similarity between consecutive reports has predictive power for corporate bond risk and return, with stronger statistical significance in investment grade than in high yield.

2 This insight is based on an extract from the paper “Continuous innovation in factor credit strategies”, April 2021, by Patrick Houweling, Frederik Muskens and Robbert-Jan ‘t Hoen.
3 Loughran & McDonald, 2014, “Measuring readability in financial disclosures”, The Journal of Finance, 69(4), 1643-1671.
4 We filter out uninformative words using the popular stop word list of Loughran and McDonald:
5 Cohen, Malloy & Nguyen, 2020, “Lazy prices”, The Journal of Finance, 75(3), 1371-1415.


Información importante

Los Fondos Robeco Capital Growth no han sido inscritos conforme a la Ley de sociedades de inversión de Estados Unidos (United States Investment Company Act) de 1940, en su versión en vigor, ni conforme a la Ley de valores de Estados Unidos (United States Securities Act) de 1933, en su versión en vigor. Ninguna de las acciones puede ser ofrecida o vendida, directa o indirectamente, en los Estados Unidos ni a ninguna Persona estadounidense en el sentido de la Regulation S promulgada en virtud de la Ley de Valores de 1933, en su versión en vigor (en lo sucesivo, la “Ley de Valores”)). Asimismo, Robeco Institutional Asset Management B.V. (Robeco) no presta servicios de asesoramiento de inversión, ni da a entender que puede ofrecer este tipo de servicios, en los Estados Unidos ni a ninguna Persona estadounidense (en el sentido de la Regulation S promulgada en virtud de la Ley de Valores).

Este sitio Web está únicamente destinado a su uso por Personas no estadounidenses fuera de Estados Unidos (en el sentido de la Regulation S promulgada en virtud de la Ley de Valores) que sean inversores profesionales o fiduciarios profesionales que representen a dichos inversores que no sean Personas estadounidenses. Al hacer clic en el botón “Acepto” que se encuentra en el aviso sobre descargo de responsabilidad de nuestro sitio Web y acceder a la información que se encuentra en dicho sitio, incluidos sus subdominios, usted confirma y acepta lo siguiente: (i) que ha leído, comprendido y aceptado el presente aviso legal, (ii) que se ha informado de las restricciones legales aplicables y que, al acceder a la información contenida en este sitio Web, manifiesta que no infringe, ni provocará que Robeco o alguna de sus entidades o emisores vinculados infrinjan, ninguna ley aplicable, por lo que usted está legalmente autorizado a acceder a dicha información, en su propio nombre y en representación de sus clientes de asesoramiento de inversión, en su caso, (iii) que usted comprende y acepta que determinada información contenida en el presente documento se refiere a valores que no han sido inscritos en virtud de la Ley de Valores, y que solo pueden venderse u ofrecerse fuera de Estados Unidos y únicamente por cuenta o en beneficio de Personas no estadounidenses (en el sentido de la Regulation S promulgada en virtud de la Ley de Valores), (iv) que usted es, o actúa como asesor de inversión discrecional en representación de, una Persona no estadounidense (en el sentido de la Regulation S promulgada en virtud de la Ley de Valores) situada fuera de los Estados Unidos y (v) que usted es, o actúa como asesor de inversión discrecional en representación de, un inversión profesional no minorista. El acceso a este sitio Web ha sido limitado, de manera que no constituya intento de venta dirigida (según se define este concepto en la Regulation S promulgada en virtud de la Ley de Valores) en Estados Unidos, y que no pueda entenderse que a través del mismo Robeco dé a entender al público estadounidense en general que ofrece servicios de asesoramiento de inversión. Nada de lo aquí señalado constituye una oferta de venta de valores o la promoción de una oferta de compra de valores en ninguna jurisdicción. Nos reservamos el derecho a denegar acceso a cualquier visitante, incluidos, a título únicamente ilustrativo, aquellos visitantes con direcciones IP ubicadas en Estados Unidos.

Este sitio Web ha sido cuidadosamente elaborado por Robeco. La información de esta publicación proviene de fuentes que son consideradas fiables. Robeco no es responsable de la exactitud o de la exhaustividad de los hechos, opiniones, expectativas y resultados referidos en la misma. Aunque en la elaboración de este sitio Web se ha extremado la precaución, no aceptamos responsabilidad alguna por los daños de ningún tipo que se deriven de una información incorrecta o incompleta. El presente sitio Web podrá sufrir cambios sin previo aviso. El valor de las inversiones puede fluctuar. Rendimientos anteriores no son garantía de resultados futuros. Si la divisa en que se expresa el rendimiento pasado difiere de la divisa del país en que usted reside, tenga en cuenta que el rendimiento mostrado podría aumentar o disminuir al convertirlo a su divisa local debido a las fluctuaciones de los tipos de cambio. Para inversores profesionales únicamente. Prohibida su comunicación al público en general.

No estoy de acuerdo