Large Language Models and Return Prediction in China

摘要

We examine whether large language models (LLMs) can extract contextualized representation of Chinese news articles and predict stock returns. The LLMs we examine include BERT, RoBERTa, FinBERT, Baichuan, ChatGLM and their ensemble model. We find that tones and return forecasts extracted by LLMs from news significantly predict future returns. The equal- and value-weighted long minus short portfolios yield annualized returns of 90% and 69% on average for the ensemble model. Given that these news articles are public information, the predictive power lasts about two days. More interestingly, the signals extracted by LLMs contain information about firm fundamentals, and can predict the aggressiveness of future trades. The predictive power is noticeably stronger for firms with less efficient information environment, such as firms with lower market cap, shorting volume, institutional and state ownership. These results suggest that LLMs are helpful in capturing under-processed information in public news, for firms with less efficient information environment, and thus contribute to overall market efficiency. (Presented at at ABFER-JFDS Annual Conference on AI and FinTech 2024, China Fintech Research Conference (CFTRC) 2024, Summer Institute in Finance (SIF) Annual Conference 2024, Seminar Series at Sun Yat-Sen University, Tsinghua University and Summer Institute in Digital Finance (SIDF) 2024.)

日期
Oct 20, 2024 12:00 AM
Avatar
Lin Tan
Ph.D. Candidate of Finance

My research interests include investor structure, macro announcements, and fintech.

相关