Thought Cloning: Teaching AI to Think Like Humans for Better Decision-Making
February 10, 2025
February 10, 2025
New research from Vector Faculty Member Jeff Clune and Vector Graduate Student Shengran Hu introduces a groundbreaking approach to imitation learning that could potentially revolutionize how we train AI agents. Called thought cloning (TC), their work suggests that by training AI agents to think in language similar to humans, they will learn faster, perform better, and generalize more effectively. This cognitive enhancement is not just about understanding commands, but actively thinking through tasks.
This concise summary bridges the gap between complex scientific advancements and everyday understanding. Ideal for enthusiasts and non-researchers, start listening now.
One of the key limitations of current AI agents is their inability to “think” in human language. While neural networks have internal vector activations that can be considered a form of thinking, Clune and Hu’s research shows that there are specific benefits to thinking in the discrete, symbolic form of language. These benefits include the ability to combine ideas in an exponential number of ways, leading to better generalization, exploration, planning, and adaptation to new situations.
At its core, TC is an imitation learning framework that aims to teach agents not just how to act, but how to think while acting. This is achieved by training on datasets that include both human actions and the corresponding thoughts or reasoning behind those actions.
The TC framework consists of two main components:
To validate their approach, Clune and Hu conducted experiments in the BabyAI domain, a challenging 2D gridworld environment with partial observability and complex missions described in natural language. They focused on the most difficult environment, BossLevel, which requires long-horizon planning and navigation through multiple rooms.
The researchers created a synthetic thought dataset by translating the internal states of the BabyAI Oracle Solver into natural language thoughts. This dataset, comprising 1 million trajectories, was used to train the TC agent.
Performance Comparison: The TC agent significantly outperformed the Behavioral Cloning (BC) baseline, both in terms of learning speed and final performance. This superiority was maintained even when controlling for the number of parameters and amount of training data.
Generalization: TC demonstrated better generalization to out-of-distribution environments, both in zero-shot settings and after fine-tuning. This suggests that the ability to “think” in language enhances the agent’s capacity to adapt to novel situations.
Interpretability: The authors introduced a metric called the Future Action Declaration Score, which quantifies how often the agent declares its intended actions in its thoughts before executing them. TC agents scored consistently high on this metric, even in out-of-distribution environments, demonstrating robust interpretability.
AI Safety: The researchers showcased a “Precrime Intervention” mechanism, where unsafe behaviors could be prevented by halting the agent when dangerous thoughts were detected. This approach proved highly effective in eliminating unsafe actions without requiring changes to the model weights.
One of the most significant contributions of this work is its potential impact on AI safety and interpretability. By enabling agents to “think out loud” in human language, TC provides several advantages:
Clune and Hu envision TC truly shining when trained on internet-scale datasets of humans thinking out loud while acting, such as YouTube videos with transcripts. They hypothesize that such large-scale training could lead to agents with human-like planning and reasoning capabilities across a wide range of domains.
Additionally, the authors suggest that TC could be used to improve foundation models by enabling a separate “thought channel” where models can output intermediate thoughts during planning and problem-solving.
Thought cloning is a significant step forward in imitation learning, offering a novel approach to creating more capable, interpretable, and potentially safer AI agents. By teaching agents to “think” in human language, TC opens up new possibilities for AI systems that can reason, plan, and explain their actions in ways that are more aligned with human cognition. As research in this direction continues, we may see AI agents that are not only more powerful but also more transparent and trustworthy, addressing some of the key challenges in AI development and deployment.
This blog post is part of our ‘ANDERS – AI Noteworthy Developments Explained & Research Simplified’ series. Here we utilize AI Agents to create initial drafts from research papers, which are then carefully edited and refined by our humans. The goal is to bring you clear, concise explanations of cutting-edge research conducted by Vector researchers. Through ANDERS, we strive to bridge the gap between complex scientific advancements and everyday understanding, highlighting why these developments are important and how they impact our world.
 
					 
					