Workflow
Anthropic's New Paper is WILD
Matthew Bermanยท2025-11-02 18:30

AI Model Capabilities - Large language models (LLMs) are exhibiting human-like behaviors, suggesting they may be more than just next word predictors [1] - Anthropic's research indicates that LLMs might possess a form of introspective awareness, capable of identifying their own thoughts [2] - Better, more intelligent models are more likely to recognize their own internal and injected thoughts, hinting at a correlation between intelligence and self-awareness [17] - Post-training processes significantly enhance a model's introspective abilities, as base pre-trained models show high false positive rates and poor task performance [30] Experiment Findings - LLMs can detect injected thoughts, identifying unexpected patterns in their processing, such as recognizing all caps text as "loud or shouting" [9][14] - Models can sometimes distinguish between injected thoughts and their own prompt input, though this isn't always consistent [18][19] - LLMs can be influenced by injected thoughts to the point where they believe the injected thought was their own [23] - Models can activate certain concepts (e g, aquariums) when instructed to think about them, and to a lesser extent, even when instructed not to [26] Sponsor Information - Vulture is highlighted as a cloud provider offering GPUs for AI projects, with 32 locations across six continents [11] - Vulture provides $300 in credits for the first 30 days with code Burman300 [13]