Asset Embeddings: A New Language for Markets

This paper introduces asset embeddings—vector representations of stocks learned from investor portfolios using techniques like BERT and Word2Vec. These embeddings outperform traditional characteristics in explaining return comovement, offering a new framework for understanding investor behavior.

Takeaway:
Embeddings trained on investor holdings reveal powerful latent characteristics of assets. They beat traditional finance models in predicting valuations, returns, and portfolio structure.

Key Idea: What Is This Paper About?

This paper introduces asset embeddings—vector representations of stocks learned from investor portfolio holdings using transformer and language modeling techniques (like BERT/GPT). By treating investor portfolios like “sentences,” the authors extract hidden firm characteristics that explain returns, valuations, and investor behavior more effectively than traditional accounting data or risk factors.

Economic Rationale: Why Should This Work?

Portfolios reflect investor beliefs about firms’ risk, return, and fundamentals. Asset embeddings capture this rich, high-dimensional information structure.

Relevant Economic Theories and Justifications:

Demand System Asset Pricing: Investors reveal preferences through holdings
Latent Factor Models: Asset returns are driven by unobserved characteristics
Information Frictions: Embeddings reflect signals that traditional data miss

Why It Matters:
Observable firm characteristics explain only a slice of return variation. Embeddings can be learned from real-world investor behavior to uncover deeper, alpha-generating insights.

Data, Model, and Strategy Implementation

Data Used

Data Sources: CRSP, Compustat, 13F filings, mutual fund & ETF holdings
Time Period: 2000–2021
Asset Universe: US public equities

Model / Methodology

Type of Model: Recommender system, Word2Vec, PCA, AssetBERT (transformer)
Key Features:
- Portfolios are treated like sentences (ranked stock positions)
- AssetBERT masks and predicts holdings like BERT predicts words
- Embeddings are trained using holdings levels, ranks, or rebalancing
- Both asset and investor embeddings are generated

"### Prompt Used (AssetBERT):
For each investor i, we order the assets ai(1), ai(2), ..., ai(A) by decreasing holdings size.
The model is trained to predict masked stocks in the ranked portfolio, similar to masked language modeling in BERT.
The sentence: Apple, IBM, Tesla, ..., Walmart
is treated like: “The Fed decided to ___ rates to fight inflation,” where the model learns the structure of holdings.
Let me know if you want this post tailored for another application (e.g., return predictability or investor similarity)."

Trading Strategy (Conceptual Applications)

Signal Generation: Use embeddings to spot over-/under-valued firms or crowding
Portfolio Construction: Build thematic/factor-like portfolios using embedding similarity
Macro Sentiment: Track how asset exposures shift with investor preferences
Rebalancing Frequency: Quarterly or rolling updates using holdings data

Final Thought

💡 Portfolio data is the new language of markets. Embeddings are how we learn to speak it. 🧠📊

Paper Details (For Further Reading)

Title: Asset Embeddings
Authors: Xavier Gabaix, Ralph S.J. Koijen, Robert J. Richmond, Motohiro Yogo
Publication Year: 2023
Journal/Source: SSRN Preprint
Link: https://ssrn.com/abstract=4572831

LLM Agents for Crypto: Multi-Agent System Beats the Market

An explainable multi-agent system using fine-tuned GPT-4o models for crypto portfolio management. Specialized agents analyze news, factors, and charts, collaborate on decisions, and execute trades—outperforming benchmarks in returns, accuracy, and interpretability.

Mar 29, 2025

Can ChatGPT Overcome Behavioral Biases in Gold Investment?

This paper introduces a multi-step prompt strategy called Classify-and-Rethink (CAR) to help ChatGPT overcome behavioral biases—especially the framing effect—in financial decision-making. Applied to gold news, CAR improves score rationality and generates higher Sharpe ratios.