Executives vs Chatbots: Unmasking Insights Through Human-AI Differences in Earnings Conference Q&A

Takeaway:
The greater the difference between ChatGPT’s and executives’ answers in earnings calls, the more new information is revealed. This HAID signal predicts return reactions, analyst behavior, and liquidity improvements.

Key Idea: What Is This Paper About?

The paper introduces HAID—a novel measure of information content based on the semantic gap between manager responses in earnings call Q&As and ChatGPT-generated answers to the same questions. If ChatGPT, acting as an informed investor, would answer differently, it implies the manager provided new information. This gap (HAID) predicts post-call return volatility, trading activity, analyst reactions, and liquidity.

Economic Rationale: Why Should This Work?

LLMs can emulate an informed investor. If a manager’s real answer differs significantly from ChatGPT’s, that difference represents incremental, value-relevant information.

Relevant Economic Theories and Justifications:

Information Asymmetry & Disclosure: New info lowers opacity (Verrecchia 2001)
Limits to Arbitrage: More info = greater analyst agreement = reduced dispersion
Price Discovery & Liquidity: New info reduces bid-ask spread, increases volume

Why It Matters:
HAID captures what’s truly new in manager answers—moving beyond tone, length, or complexity—and offers a powerful edge for alpha generation and analyst tracking.

Data, Model, and Strategy Implementation

Data Used (If Applicable)

Data Sources: Capital IQ earnings calls, CRSP, I/B/E/S, Compustat
Time Period: 2004–2020
Asset Universe: US-listed stocks with earnings calls

Model / Methodology (If Applicable)

Type of Model: GPT-3.5 + semantic similarity using BERT, Word2Vec, Cosine
Key Features:
- Context-preserving prompt: ChatGPT sees presentation and prior Q&A
- Answers same analyst question as manager
- Semantic similarity computed → HAID = 1 − similarity
- Higher HAID → more informational content

Prompt Setup (used with ChatGPT API):

System role:
"From the perspective of a top executive, please answer the following question raised by a financial analyst during an earnings conference call."
"Knowledge cutoff: {}"

Assistant role:
Provides the summarized presentation + all prior Q&A (questions + summarized executive answers).

User role:
Supplies the current analyst question (full text).

ChatGPT’s response is then compared to the actual manager’s answer to that same question.

Trading Strategy (Reconstructed)

Signal Generation: Use HAID score after earnings call
Portfolio Construction:
- Long low-HAID stocks (less new info = more predictable)
- Short high-HAID (high new info = higher price volatility, earnings drift)
- Optional: use HAID as input to alpha screen with tone, sentiment, etc.
Rebalancing Frequency: Quarterly (after calls)

Final Thought

💡 ChatGPT can reveal what’s new in a manager’s answer—quantifying surprise like never before. 🧠📞

Paper Details (For Further Reading)

Title: Executives vs Chatbots: Unmasking Insights Through Human-AI Differences in Earnings Conference Q&A
Authors: John Bai, Nicole Boyson, Yi Cao, Miao Liu, Chi Wan
Publication Year: 2025
Journal/Source: SSRN Preprint
Link: https://ssrn.com/abstract=4480056

🧠 How Do They Measure HAID?

HAID (Human-AI Difference) measures how much new information a manager provides during an earnings call Q&A—by comparing their actual answer to what ChatGPT would have said in the same situation.

✅ Step-by-Step Calculation

Generate ChatGPT Answer (Context-Preserved)
- ChatGPT is given:
  - The executive presentation (summarized)
  - All prior Q&A (questions + summarized manager responses)
  - The current analyst question
- Prompted: “From the perspective of a top executive, please answer the following question raised by a financial analyst during an earnings conference call.”
Collect Executive Answer
- The real answer from the executive to the same question is extracted from the transcript.
Compute Similarity Between Answers
- Three methods are used:
  - BERT embedding similarity
  - Cosine similarity
  - Word2Vec similarity

Calculate Semantic Gap for Each Question

For each Q&A pair:

HAID_question = 1 − similarity(ChatGPT_answer, Executive_answer)

Aggregate to the Call Level
- Average across all Q&A in a single earnings call:
```
HAID_call = Average(1 − similarity) across all questions in the call
```

🔍 Interpretation

Higher HAID = Executive said something ChatGPT didn’t expect → New, value-relevant information
Lower HAID = Executive echoed what a well-informed investor might have already inferred