Beyond the numbers: using textual data to predict the returns of stocks and bonds

Models for predicting the returns of individual stocks and bonds generally only use numerical data, e.g. market prices of financial instruments and accounting data of companies. Such models thus ignore potentially relevant information that exists beyond these numerical data. An important source of non-numerical data is textual data, such as newspaper articles, press releases, management commentaries in annual reports, and social media. An increasing body of academic literature documents that information extracted from different textual sources can be used to predict asset returns. Deriving information from text requires the use of text mining algorithms.

The goal of this research project is to apply various text mining algorithms to various types of textual data and assess their ability to predict the returns of individual stocks and corporate bonds. Robeco Quantitative Research has access to rich historical databases that enable back-testing and evaluating of investment strategies. You will conduct a literature study, work with several data sets of numerical and textual data, implement various algorithms, and conduct back-tests on stocks and bonds.

Creative, analytic and programming skills are essential to successfully complete the project.

Are you interested?
Let us know your motivation and send it together with your top-3 favorite internship topics, your CV and list of grades to
Previous projects


[1] Tetlock, Saar-Tsechansky, Macskassy, 2008, “More than words: Quantifying language to measure firms’ fundamentals”, Journal of Finance

[2] Loughran, McDonald, 2011, “When is a Liability not a Liability: Textual Analysis, Dictionaries, and 10-Ks”, Journal of Finance

[3] Cicon, 2017, ”Say it again Sam: The Idiosyncratic Information Content of Corporate Conference Calls”, Review of Quantitative Finance and Accounting