21-03-2023 · 市場觀點

Quant chart: how NLP can anticipate GICS changes

The recent changes to the global industry classification standards (GICS) illustrate their rigid and sluggish nature. This article argues that natural language processing (NLP) techniques can offer additional insights in today’s fast-changing market environment.

    作者

  • Matthias Hanauer - Researcher

    Matthias Hanauer

    Researcher

  • Rob Huisman - Researcher

    Rob Huisman

    Researcher

The GICS is the classic framework to classify similar firms into sectors, industry groups, industries and sub-industries. But the GICS methodology is rigid. Revisions are infrequent and take years to implement, as they involve extensive consultations with market participants. As a result, alternative methods of classification have been suggested based on customer-supplier data, textual similarities in companies’ 10-K business descriptions, comparable technologies based on patent data or shared analyst coverage.

One of the major changes in the recent GICS revision is the creation of the new sub-industry transaction and payment processing services under the financials sector. This new sub-industry will include companies such as Visa, Mastercard and Paypal, which were previously included in the data processing & outsourced services sub-industry, under the software & services industry group and the information technology sector.

The change reflects both the increasing role these companies play in facilitating payments across various platforms and markets, and the fact that these activities are closely aligned with the business activities covered under the financial services industry group. However, this change only took effect on 17 March 2023, two years after the first consultation on the subject started.1

Text-based stock clustering (TBSC) is an interesting alternative to GICS. It uses NLP techniques to analyze textual data from various sources, such as 10-K reports. TBSC has several advantages over GICS:

  • TBSC can be more adaptive and flexible because it can update its classifications more frequently based on new information.

  • TBSC can be more granular and accurate because it can capture the similarities and differences among companies within or across sectors based on their specific products or services.

  • TBSC can be more informative and insightful because it provides explanations for its classifications based on textual evidence.

To illustrate these advantages, Figure 1 shows a 2D projection of company-specific vector embeddings derived from 10-K filings using the bidirectional encoder representations from transformers (BERT) model. We use 10-K reports for the fiscal year 2021 as input for the model to test whether the NLP technique could already anticipate the current GICS revisions.

The results show that the transaction and payment processing services companies – such as Visa, Mastercard and Paypal (light blue) – are indeed closer to their new industry group financial services (green) than their previous industry group software and services (brown). This finding suggests that TBSC can anticipate changes in GICS before they are officially implemented. However, we also find that the financial services industry group is rather heterogeneous compared to other industry groups such as banks, insurance, or semiconductors & semiconductor equipment.

Figure 1 | 2D projection of word embeddings based on 10-K filings for the fiscal year 2021.

Figure  1  |  2D projection of word embeddings based on 10-K filings for the fiscal year 2021.

Source: SEC, Refinitiv, Robeco. The figure shows a 2D projection of numerical embeddings derived from BERT based on firms’ 10-K filings for the fiscal year 2021. The analysis is restricted to MSCI USA Index constituents augmented with large and liquid constituents of the FTSE World Developed and S&P Broad Market Index. The different colors indicate different GICS industry groups within the Information Technology (Software & Services, Technology Hardware & Equipment, and Semiconductors & Semiconductor Equipment) and Financials (Banks, Financial Services, and Insurance) sectors. Furthermore, the stocks from the newly created Transaction and Payment Processing Services sub-industry under the Financial Services industry group are highlighted. Previously, these stocks were included in the Software & Services industry group.

In conclusion, TBSC might be a better and more timely alternative to standard sector or industry classifications, such as GICS. By using NLP techniques to analyze textual data from various sources, TBSC can provide more adaptive, granular, accurate, informative and insightful classifications for stock analysis.

Footnote

1 For example, the consultation of potential changes already started in 2021, were announced in March 2022, but only become effective in March 2023.

免責聲明

本文由荷宝海外投资基金管理(上海)有限公司(“荷宝上海”)编制, 本文内容仅供参考, 并不构成荷宝上海对任何人的购买或出售任何产品的建议、专业意见、要约、招揽或邀请。本文不应被视为对购买或出售任何投资产品的推荐或采用任何投资策略的建议。本文中的任何内容不得被视为有关法律、税务或投资方面的咨询, 也不表示任何投资或策略适合您的个人情况, 或以其他方式构成对您个人的推荐。 本文中所包含的信息和/或分析系根据荷宝上海所认为的可信渠道而获得的信息准备而成。荷宝上海不就其准确性、正确性、实用性或完整性作出任何陈述, 也不对因使用本文中的信息和/或分析而造成的损失承担任何责任。荷宝上海或其他任何关联机构及其董事、高级管理人员、员工均不对任何人因其依据本文所含信息而造成的任何直接或间接的损失或损害或任何其他后果承担责任或义务。 本文包含一些有关于未来业务、目标、管理纪律或其他方面的前瞻性陈述与预测, 这些陈述含有假设、风险和不确定性, 且是建立在截止到本文编写之日已有的信息之上。基于此, 我们不能保证这些前瞻性情况都会发生, 实际情况可能会与本文中的陈述具有一定的差别。我们不能保证本文中的统计信息在任何特定条件下都是准确、适当和完整的, 亦不能保证这些统计信息以及据以得出这些信息的假设能够反映荷宝上海可能遇到的市场条件或未来表现。本文中的信息是基于当前的市场情况, 这很有可能因随后的市场事件或其他原因而发生变化, 本文内容可能因此未反映最新情况,荷宝上海不负责更新本文, 或对本文中不准确或遗漏之信息进行纠正。