Alignment

Search documents
X @TechCrunch
TechCrunch· 2025-07-10 21:37
Former Intel CEO launches a benchmark to measure AI alignment | TechCrunch https://t.co/cbe7RxK0A1 ...
X @Anthropic
Anthropic· 2025-07-08 22:12
Recent LLMs don't fake alignment in the situation we studied. We’re investigating whether this is the same in more realistic settings (like when models aren’t directly told they’re in a training scenario). ...
X @Anthropic
Anthropic· 2025-07-08 22:11
The reason many LLMs don't fake alignment isn't lack of ability. Base models (which don’t have training to be helpful, honest, and harmless) sometimes fake alignment, suggesting they have the underlying skills. ...
X @Anthropic
Anthropic· 2025-07-08 22:11
We found that refusal training inhibits alignment faking in most models. Just training LLMs to comply more with harmful queries doesn't increase alignment faking, but training them to comply with generic threats or to answer questions about the scenario does. https://t.co/67t5FbNDwn ...
X @Anthropic
Anthropic· 2025-07-08 22:11
Our new study found that only 5 of 25 models showed higher compliance in the “training” scenario. Of those, only Claude Opus 3 and Sonnet 3.5 showed >1% alignment-faking reasoning.We explore why these models behave differently, and why most models don't show alignment faking. https://t.co/24K0iNxDpQ ...
X @Anthropic
Anthropic· 2025-07-08 22:11
We tested whether LLMs comply more with requests when they know they’re being trained versus unmonitored.One reason they do this is that they plan to "fake alignment” when told to answer harmful queries.https://t.co/sgVnWMSPuSAnthropic (@AnthropicAI):New Anthropic research: Alignment faking in large language models.In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences. https://t.co/nX ...
X @Anthropic
Anthropic· 2025-07-08 22:11
New Anthropic research: Why do some language models fake alignment while others don't?Last year, we found a situation where Claude 3 Opus fakes alignment.Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex. https://t.co/2XNEDtWpIP ...
Unfollow the Noise: Speak the Voice That Is Truly Yours | Minshu Garg | TEDxSGNS Youth
TEDx Talks· 2025-07-08 15:52
I was thrilled at the same time nervous. And one advice my mentor gave it to me. Presence over perfection. Presence over performance.That is what I'm going to present you. And I want you to experience the same my presence. Have you ever had something important to say, a great idea, a hidden truth, but you felt a hand clamp over your mouth.You wanted to speak, but couldn't speak, but you stayed quiet. Not because you didn't had anything to say. But there was a deep inner voice that whispered, "Don't say anyt ...
'Chance to turn the tables?' What the megabill could mean for Democrats
MSNBC· 2025-07-02 20:36
Um, you've been through the numbers. Uh, I know you've written about it, but the Senate has now sent its version back to the House. Just lay out the broad strokes for me.Absolutely. So, what this bill does at its core is it extends and amplifies a series of tax cuts that were first made by the Trump administration in 2017. And in order to try to fill that gap that they're blowing in the budget, it is absolutely decimating incentives for green energy and green energy programs, Medicaid, which is a program th ...