Beyond the numbers: using textual data to predict the returns of stocks and bonds

Models for predicting the returns of individual stocks and bonds generally only use numerical data, e.g. market prices of financial instruments and accounting data of companies. Such models thus ignore potentially relevant information that exists beyond these numerical data. An important source of non-numerical data is textual data, such as newspaper articles, press releases, management commentaries in annual reports, and social media. An increasing body of academic literature documents that information extracted from different textual sources can be used to predict asset returns. Deriving information from text requires the use of text mining algorithms.

The goal of this research project is to apply various text mining algorithms to various types of textual data and assess their ability to predict the returns of individual stocks and corporate bonds. Robeco Quantitative Research has access to rich historical databases that enable back-testing and evaluating of investment strategies. You will conduct a literature study, work with several data sets of numerical and textual data, implement various algorithms, and conduct back-tests on stocks and bonds.

The project covers the entire quant model development cycle: analyzing the data, programming the back-tests, analyzing the results, discussing results with researchers and portfolio managers, writing a research report and giving a presentation. As with all Super Quant internships, the assignment will be supervised by an experienced empirical researcher of Robeco’s Quantitative Research department. Creative, analytic and programming skills are essential to successfully complete the project.

Are you interested?

Let us know your motivation and send it together with your CV and list of grades to


Cicon, 2017, ”Say it again Sam: The Idiosyncratic Information Content of Corporate Conference Calls”, Review of Quantitative Finance and Accounting

Loughran, McDonald, 2011, “When is a Liability not a Liability: Textual Analysis, Dictionaries, and 10-Ks”, Journal of Finance

Tetlock, Saar-Tsechansky, Macskassy, 2008, “More than words: Quantifying language to measure firms’ fundamentals”, Journal of Finance