Building AI That Can Deceive
Computational modelling of social deception - RAG, emotion engines, and emergent tells.
Can Machines Learn to Lie?
Here's the question that drives this research: can we build AI systems that actually play The Traitors? Not just follow rules - but genuinely deceive, form alliances, read emotional tells, and adapt strategies in real-time?
This article explores the computational architecture required to simulate the game. No programming knowledge required - I'll focus on concepts and what they reveal about both artificial and human intelligence.
Why This Is Hard
Building AI for The Traitors is harder than chess, Go, or even poker:
- Incomplete information on steroids: The information asymmetry means you don't know anyone's role, can't observe private conversations, and internal states are hidden
- Natural language complexity: Accusations and alliances happen through conversation requiring believable dialogue - shaped by cultural communication styles
- Social and emotional dynamics: Success requires reading emotional states and projecting authenticity
- Long-horizon strategy: A single game spans 10+ days with hundreds of interconnected decisions
The Three Integrated Systems
1. The Knowledge System (RAG)
How does an AI agent "know" things? I use Retrieval-Augmented Generation - each AI player has a personal library describing their personality, relationships, and game experiences.
The knowledge hierarchy: Identity → Relationships → Observations → Secrets → Strategy
2. The Emotion & Deception Engine
An AI playing The Traitors must model psychological states. Core emotions modelled:
For Traitors, two parallel emotional states: Internal (what they actually feel) and Displayed (what they show others). Different strategic archetypes manage this gap differently.
The Masking Strain Formula
Strain = Σ (Duration × Intensity × Complexity) of active deceptions
As strain accumulates, the AI becomes more likely to produce "tells" - inconsistencies in timing, language, or emotional display. This creates emergent authenticity: tells aren't programmed, they naturally arise from sustained deception. See The Mathematics of Deception for the formal model.
3. The Simulation Framework
Orchestrates actual gameplay - managing phase progression, decision generation, conversation flow:
- Morning: Murder reveal, emotional reactions, breakfast conversation
- Mission: Team selection, execution, potential sabotage, rewards
- Round Table: Turn-based discussion, accusations, evolving suspicion, voting
- Night: Traitor conclave, information sharing, murder selection
The "Edit Moment" System
Not all game moments are equally interesting. I built a system to identify "edit moments" - segments with high entertainment value:
- Confrontations: Heated accusations with strong defences
- Revelations: Role reveals, especially surprising ones
- Betrayals: Alliance breaks, unexpected votes
- Close votes: Nail-biter banishments decided by 1-2 votes
- Strategic brilliance: Clever plays viewers would appreciate - like identifying the secret fourth Traitor
What This Reveals About Intelligence
Deception Requires Theory of Mind
To lie effectively, you must model what others believe. Each agent maintains beliefs about what other agents believe about them.
Authenticity Is Hard to Fake
My Traitor agents, despite having no explicit "tell" programming, naturally develop detectable patterns under extended masking strain.
Emotion Isn't Optional
Early "rational" agents failed - they played mechanically and were easily identified. Adding emotional modelling made agents dramatically more effective and believable.
Limitations and Honesty
Let me be clear about what the system cannot do:
- It's not human-level: Extended interaction would reveal artificial nature
- Computation-intensive: Real-time play against humans isn't yet practical
- Dialogue limitations: Occasional awkward phrasing or inconsistent characterisation
- Strategy constraints: Agents follow frameworks rather than inventing genuinely novel approaches
Key Takeaways
- Simulating social deception requires integrated systems - knowledge, emotion, and orchestration
- Deception creates emergent tells - architecture that models masking strain naturally produces detectable patterns. Memory systems amplify this effect
- Emotion is essential for believability - "rational" agents are easily identified as artificial
- Entertainment value can be computationally tracked - enabling automated "editing" of simulations
- Building deceptive AI teaches about deception - with implications for detection and alignment. Future research will explore this further