Workflow
X @Anthropic
Anthropic·2025-08-01 16:23

New Anthropic research: Persona vectors.Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination. https://t.co/PPX1oXj9SQ ...