X @Anthropic
Anthropicยท2025-11-21 19:30
We have been using inoculation prompting in production Claude training. We recommend its use as a backstop to prevent misaligned generalization in situations where reward hacks slip through other mitigations. ...