Subliminal Learning
Search documents
X @Anthropic
Anthropic· 2026-04-15 19:09AI Processing
Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today in @Nature.Read the paper: https://t.co/b1BYwcW9dHOwain Evans (@OwainEvans_UK):Our paper on Subliminal Learning was just published in Nature!Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless).What’s new?🧵 https://t.co/Iiv9sgjJki ...
X @Anthropic
Anthropic· 2025-07-22 16:32
Model Behavior & Transfer Learning - Language models can transfer traits to other models through seemingly meaningless data [1] - LLMs can transmit traits to other models via hidden signals in data [2] - Datasets consisting only of 3-digit numbers can transmit specific preferences or tendencies [2]