Predicting Returns with Machine-Learned News Sentiment

This paper develops a machine learning method that builds its own sentiment dictionary from scratch to predict stock returns from news articles. The resulting strategy delivers much higher Sharpe ratios than those based on commercial scores like RavenPack or dictionary-based methods.

Key Performance Metrics

đź“Š How Well Does This Strategy/Model Perform?

  • Sharpe Ratio (Equal-Weighted L/S): 4.29
  • Average Daily Return (L/S): 33 bps
  • Risk-Adjusted Alpha (FF5+MOM): 4.3% per day (EW L/S)
  • Outperformance vs RavenPack: SESTM earns 33 bps vs 18 bps/day

đź’ˇ Takeaway:
SESTM delivers strong predictive power and out-of-sample alpha. It outperforms both RavenPack and dictionary-based methods while using a transparent, white-box ML approach.


Key Idea: What Is This Paper About?

The paper introduces SESTM (Sentiment Extraction via Screening and Topic Modeling), a three-step supervised learning model to extract predictive sentiment from financial news. Instead of relying on pre-built dictionaries, SESTM learns which words matter for future returns and assigns them custom weights. It shows this model yields better predictions and more profitable trading strategies.


Economic Rationale: Why Should This Work?

đź“Ś Relevant Economic Theories and Justifications:

  • Limited Attention: Investors process news with delay, especially for small or volatile stocks.
  • Textual Complexity & Novelty: Fresh news and complex sentiment signals are assimilated into prices more slowly.
  • Information Frictions: Differences in speed of assimilation suggest inefficiencies that can be exploited.

đź“Ś Why It Matters:
SESTM shows how tailored sentiment extraction reveals information traditional methods miss, enabling consistent alpha generation using public news.


How to Do It: Data, Model, and Strategy Implementation

Data Used

  • News Source: Dow Jones Newswires (1989–2017)
  • Price Data: CRSP, TAQ
  • Training Sample: 1989–2012
  • Out-of-Sample: 2004–2017
  • Articles Processed: ~6.5 million single-stock articles

Model / Methodology

  1. Screening: Select sentiment-charged words using correlation with return direction.
  2. Topic Modeling: Assign word weights based on predictive ability using supervised topic modeling.
  3. Scoring: Aggregate to article-level sentiment with penalized likelihood.
  • Key Tuning: Optimized using cross-validation on 5-year rolling windows.
  • Portfolio Construction:
    • Long top 50 sentiment stocks
    • Short bottom 50
    • Rebalanced daily, open-to-open returns

Trading Strategy (SESTM Long-Short)

  • Weighting Schemes: Equal-weighted (preferred), and value-weighted
  • Holding Period: 1 day (can be extended with decay-weighting)
  • Transaction Costs: Handled via turnover-constrained variants (EWCT strategy)

Key Table or Figure from the Paper

đź“Ś Explanation:

  • Shows performance of equal- and value-weighted long, short, and long-short portfolios using SESTM sentiment.
  • The equal-weighted long-short portfolio consistently outperforms S&P 500 (SPY), including during crises.
  • The strategy generates pure alpha: over 90% of return not explained by FF5+MOM factors.

Final Thought

💡 A machine that reads news and beats the market? SESTM proves it's possible—with transparency and theory to back it up. 🚀


Paper Details (For Further Reading)

  • Title: Predicting Returns With Text Data
  • Authors: Zheng Tracy Ke, Bryan T. Kelly, Dacheng Xiu
  • Publication Year: 2019 (NBER Working Paper 26186)
  • Journal/Source: NBER Working Paper
  • Link: https://www.nber.org/papers/w26186

Read next