AI safety

Search documents
X @Anthropic
Anthropic· 2025-06-26 13:56
If you want to work with us and help shape how we keep Claude safe for people, our Safeguards team is hiring. https://t.co/UNtALvqMKh ...
Why AI Needs Limits | Vansh Chelani | TEDxWahaha Schools Youth
TEDx Talks· 2025-06-24 15:25
Hey everyone, thank you all for being here. My name is Shalani and today I'll be talking for TEDex. I'll give you another 10 seconds or so to scan this QR code right in the front starting now.Remember to use your camera. I hope everyone has finished by now. So now I'll be talking about my excitement for the future and the power I can get using the help of AI.So what I decided to do was take it in my own hands using this QR code right here. So, the moment you scan it, half of the money on your device will be ...
提升大模型内在透明度:无需外部模块实现高效监控与自发安全增强|上海AI Lab & 上交
量子位· 2025-06-23 04:45
PR-TELLME团队 投稿 量子位 | 公众号 QbitAI 大语言模型(LLM)能力提升引发对潜在风险的担忧,洞察其内部"思维过程"、识别危险信号成AI安全核心挑战。 当前主流用外部"黑盒"监控模块解读模型表征,此类方法如"隔靴搔痒":独立于模型,解读逻辑不透明、结果可信度低,且对数据分布变化敏 感、适应性差,难触推理本质,无法满足监控需求。 上海人工智能实验室 和 上海交通大学的研究团队提出创新解决方案——TELLME (Transparency Enhancement of LLMs without External modules)。 该方法摒弃了复杂的外部监控模块,通过"表征解耦"技术,直接提升大模型自身的内部透明度。 破局新思路:从外部监控转向内在透明 其核心理念是:让模型关于不同行为(尤其是安全与不安全行为)的内部"思维语言"(表征)在空间中清晰分离、泾渭分明。这不仅为模型监 控开辟了更可靠、更简单的途径,还意外地提升了模型输出的安全性。 引入对比学习损失(如InfoNCE Loss)作为核心驱动力。该损失函数促使模型将语义/风险相似的问题表征拉近聚合,同时将不同(尤其是安 全与不安全)问题的 ...
How to Build Trustworthy AI — Allie Howe
AI Engineer· 2025-06-16 20:29
Trust is a multifaceted outcome that results when product and engineering teams work together to build AI that is aligned, explainable, and secure. Learn strategies for how to build trustworthy AI and why trust is paramount for AI systems. Trustworthy AI = AI Security + AI Safety Learn about the differences between AI Security and AI Safety and how the three focus areas of MLSecOps + AI Red Teaming + AI Runtime Security can help you achieve both and ultimately build Trustworthy AI. Trustworthy AI Issues in ...
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
AI Engineer· 2025-06-11 15:40
Great. Thank you for the introduction and thanks to the International Advanced Natural Language Processing Conference for organizing this and uh thanks as well for allowing this this talk to start and kick off the the conference. I appreciate it.You guys have done a great job. Um in terms of the um the topic, I do have to uh make sure that we understand the contextual uh background behind this this this topic uh today and recent events over the last few weeks and months. Uh so I'm going to take a few minute ...