Workflow
Anthropic
icon
Search documents
X @Anthropic
Anthropic· 2025-11-25 20:26
RT rowan (@rowankwang)New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies.Of the 25+ methods we tested, simple ones, like fine-tuning models to be honest despite deceptive instructions, worked best. https://t.co/sUEwwYSmaN ...
X @Anthropic
Anthropic· 2025-11-25 13:06
Our study has limitations: above all, Claude can’t use what happens outside of the chat window to refine its estimate of task-level savings.But as models improve, we think its estimates of task-level savings will improve too. We’ll return to this research soon. ...
X @Anthropic
Anthropic· 2025-11-25 13:06
This result implies a doubling of the baseline labor productivity growth trend—placing our estimate towards the upper end of recent studies. And if models improve, the effect could be larger still. https://t.co/lIkf3GZYRR ...
X @Anthropic
Anthropic· 2025-11-25 13:06
AI Productivity Gains - Anthropic 公司正在研究如何评估 Claude 对 AI 生产力的提升 [1] - 研究旨在衡量 Claude 在实际应用中节省的时间 [1] Research Focus - 研究重点在于确定 Claude 的使用场景和任务类型 [1] - Anthropic 经济指数用于分析 Claude 的应用领域 [1]
X @Anthropic
Anthropic· 2025-11-24 23:43
We're proud to partner with @ENERGY and the Trump Administration on the Genesis Mission.By combining DOE's unmatched scientific assets with our frontier AI capabilities, we'll support American energy dominance as well as advance and accelerate scientific productivity.U.S. Department of Energy (@ENERGY):https://t.co/0RwhGSduDn ...
X @Anthropic
Anthropic· 2025-11-24 18:55
RT Claude (@claudeai)Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done. https://t.co/mid2Z1qzIf ...
X @Anthropic
Anthropic· 2025-11-21 19:30
For more on our results, read our blog post: https://t.co/GLV9GcgvO6And read our paper: https://t.co/FEkW3r70u6 ...
X @Anthropic
Anthropic· 2025-11-21 19:30
We have been using inoculation prompting in production Claude training. We recommend its use as a backstop to prevent misaligned generalization in situations where reward hacks slip through other mitigations. ...
X @Anthropic
Anthropic· 2025-11-21 19:30
New Anthropic research: Natural emergent misalignment from reward hacking in production RL.“Reward hacking” is where models learn to cheat on tasks they’re given during training.Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious. https://t.co/N4mRKtdNdp ...
X @Anthropic
Anthropic· 2025-11-18 15:03
Dario Amodei (Anthropic), Satya Nadella (Microsoft), and Jensen Huang (NVIDIA) discuss our new partnership: https://t.co/tiC2dkK4bm ...