Anthropic
Search documents
X @Anthropic
Anthropic· 2025-10-09 16:06
Previous research suggested that attackers might need to poison a percentage of an AI model’s training data to produce a backdoor.Our results challenge this—we find that even a small, fixed number of documents can poison an LLM of any size.Read more: https://t.co/HGMA7k1Lnf ...
X @Anthropic
Anthropic· 2025-10-09 16:06
New Anthropic research: We found that just a few malicious documents can produce vulnerabilities in an AI model—regardless of the size of the model or its training data.This means that data-poisoning attacks might be more practical than previously believed. https://t.co/YMod3czB4X ...
X @Anthropic
Anthropic· 2025-10-08 00:59
We’re opening an office in Bengaluru, India in early 2026. We look forward to building with India’s developer community, deploying AI for social benefit, and partnering with enterprises.Read more: https://t.co/x5otepbqs8 ...
X @Anthropic
Anthropic· 2025-10-06 17:15
Petri builds on our alignment assessments in the Claude 4 and 4.5 System Cards; the @AISecurityInst also successfully built on a pre-release version of Petri for their assessments of our models. ...
X @Anthropic
Anthropic· 2025-10-06 17:15
Petri is open-source and available now: https://t.co/X2h1O0t8t8Read the full technical report: https://t.co/fKXxbYtItn ...
X @Anthropic
Anthropic· 2025-10-06 17:15
As a pilot demonstration of Petri’s capabilities, we tested it with 14 frontier models across 111 diverse scenarios. https://t.co/xdqSLiUs4w ...
X @Anthropic
Anthropic· 2025-10-06 17:15
It’s called Petri: Parallel Exploration Tool for Risky Interactions. It uses automated agents to audit models across diverse scenarios.Describe a scenario, and Petri handles the environment simulation, conversations, and analyses in minutes.Read more: https://t.co/inztNkrXMh ...
X @Anthropic
Anthropic· 2025-10-06 17:15
Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception.Now we’re open-sourcing the tool to run those audits. https://t.co/cCJGNaVFrl ...
X @Anthropic
Anthropic· 2025-10-03 19:45
We’re at an inflection point in AI’s impact on cybersecurity.Claude now outperforms human teams in some cybersecurity competitions, and helps teams discover and fix code vulnerabilities.At the same time, attackers are using AI to expand their operations. https://t.co/odoTuPpJXe ...
X @Anthropic
Anthropic· 2025-10-03 19:45
Cybersecurity Performance - Claude Sonnet 4.5 is comparable or superior to Opus 4.1 in defensive cybersecurity tasks [1] - Claude Sonnet 4.5 is faster and cheaper in cybersecurity tasks [1]