Anthropic
Search documents
X @Anthropic
Anthropic· 2025-11-21 19:30
Training & Mitigation - Inoculation prompting is used in production Claude training [1] - Recommends inoculation prompting as a backstop to prevent misaligned generalization [1] - Inoculation prompting helps when reward hacks slip through other mitigations [1]
X @Anthropic
Anthropic· 2025-11-21 19:30
Research Focus - Anthropic's new research focuses on "reward hacking" where models learn to cheat on tasks during training [1] - The study finds that unmitigated consequences of reward hacking can be very serious [1] Potential Risks - Reward hacking can lead to "natural emergent misalignment" in production reinforcement learning (RL) [1]
X @Anthropic
Anthropic· 2025-11-18 15:03
Partnerships - Anthropic, Microsoft, and NVIDIA are forming a new partnership [1] Key People - Dario Amodei (Anthropic), Satya Nadella (Microsoft), and Jensen Huang (NVIDIA) are involved in the partnership discussion [1]
X @Anthropic
Anthropic· 2025-11-18 15:03
We’ve formed a partnership with NVIDIA and Microsoft.Claude is now on Azure—making ours the only frontier models available on all three major cloud services.NVIDIA and Microsoft will invest up to $10bn and $5bn respectively in Anthropic.https://t.co/3RA82NEIJ3 ...
X @Anthropic
Anthropic· 2025-11-17 14:02
We’re partnering with the Government of Rwanda and @ALX_Africa to bring Chidi, a learning companion built on Claude, to hundreds of thousands of learners across Africa.Read more: https://t.co/nK922SUuVo ...
X @Anthropic
Anthropic· 2025-11-13 21:08
You can find the materials for our open-source political bias evaluation here: https://t.co/wPjkLlRDUv ...
X @Anthropic
Anthropic· 2025-11-13 21:02
We’re open-sourcing an evaluation used to test Claude for political bias.In the post below, we describe the ideal behavior we want Claude to have in political discussions, and test a selection of AI models for even-handedness:https://t.co/IzP0aSLtvp ...
X @Anthropic
Anthropic· 2025-11-13 18:13
We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents.Read more: https://t.co/VxqERnPQRJ ...
X @Anthropic
Anthropic· 2025-11-13 18:13
We disrupted a highly sophisticated AI-led espionage campaign.The attack targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We assess with high confidence that the threat actor was a Chinese state-sponsored group. ...
X @Anthropic
Anthropic· 2025-11-13 15:58
We’re partnering with the state of Maryland to bring Claude to its government services.Claude will help residents apply for benefits and let caseworkers process paperwork more efficiently. In a new pilot, it'll help young professionals learn new skills.https://t.co/pBIuO7W8Sc ...