Workflow
AI Safety
icon
Search documents
X @Anthropic
Anthropic· 2025-08-12 21:05
Model Safety - The company's Safeguards team identifies potential misuse of its models [1] - The team builds defenses against potential misuse [1]
X @Forbes
Forbes· 2025-08-07 11:50
AI Impact on Job Security - Microsoft reveals jobs ranked by AI safety, indicating varying degrees of potential impact from AI on different professions [1] Industry Focus - The analysis focuses on identifying which jobs are most and least likely to be affected or replaced by AI technologies [1]
The Great AI Safety Balancing Act | Yobie Benjamin | TEDxPaloAltoSalon
TEDx Talks· 2025-07-14 16:47
[Music] Good afternoon. My name is Yobi Benjamin. I am an immigrant and I'm an American.And before I start, I want to thank a few people. Uh first of all, I want to thank my grandmother who raised me who despite extreme poverty raised me to be the person that I am today. I want to also recognize and thank my wife and my children who continue to inspire me today.My wife Roxan is here and my son Greg. Thank you very much for inspiring me every day. Um I began my career in technology in a small company called ...
X @Anthropic
Anthropic· 2025-06-26 13:56
If you want to work with us and help shape how we keep Claude safe for people, our Safeguards team is hiring. https://t.co/UNtALvqMKh ...
提升大模型内在透明度:无需外部模块实现高效监控与自发安全增强|上海AI Lab & 上交
量子位· 2025-06-23 04:45
Core Insights - The article discusses the challenges of AI safety related to large language models (LLMs) and introduces TELLME, a new method aimed at enhancing internal transparency without relying on external monitoring modules [1][2][26]. Group 1: Current Challenges in AI Safety - Concerns about the potential risks associated with LLMs have arisen due to their increasing capabilities [1]. - Existing external monitoring methods are criticized for being unreliable and lacking adaptability, leading to unstable monitoring outcomes [5][6]. - The reliance on "black box" external detectors results in low interpretability and trustworthiness of monitoring results [5]. Group 2: TELLME Methodology - TELLME employs a technique called "representation decoupling" to enhance the internal transparency of LLMs [2]. - The core idea is to clearly separate the internal representations of safe and unsafe behaviors, facilitating more reliable monitoring [3]. - TELLME utilizes contrastive learning to drive the separation of representations, ensuring that similar risks are grouped while dissimilar ones are distanced [7]. Group 3: Experimental Validation - Experiments demonstrate significant improvements in transparency and monitoring capabilities across various scenarios, with clear clustering of different risk behaviors [10][11]. - The method maintains the general capabilities of the model while enhancing safety, proving the effectiveness of the dual constraints designed in TELLME [12]. - Monitoring accuracy increased by 22.3% compared to the original model, showcasing the method's effectiveness [14]. Group 4: Broader Implications - TELLME represents a shift from external monitoring reliance to enhancing the model's own monitorability, leading to higher precision in risk identification [26][27]. - The method shows potential for scalable oversight, suggesting that as model capabilities grow, so too will the effectiveness of TELLME's monitoring [28]. - The approach leads to spontaneous improvements in output safety, indicating a unique mechanism for enhancing model safety [23][28].
How to Build Trustworthy AI — Allie Howe
AI Engineer· 2025-06-16 20:29
Core Concept - Trustworthy AI is defined as the combination of AI Security and AI Safety, crucial for AI systems [1] Key Strategies - Building trustworthy AI requires product and engineering teams to collaborate on AI that is aligned, explainable, and secure [1] - MLSecOps, AI Red Teaming, and AI Runtime Security are three focus areas that contribute to achieving both AI Security and AI Safety [1] Resources for Implementation - Modelscan (https://github.com/protectai/modelscan) is a resource for MLSecOps [1] - PyRIT (https://azure.github.io/PyRIT/) and Microsoft's AI Red Teaming Lessons eBook (https://ashy-coast-00aeb501e.6.azurestaticapps.net/MS_AIRT_Lessons_eBook.pdf) are resources for AI Red Teaming [1] - Pillar Security (https://www.pillar.security/solutionsai-detection) and Noma Security (https://noma.security/) offer resources for AI Runtime Security [1] Demonstrating Trust - Vanta (https://www.vanta.com/collection/trust/what-is-a-trust-center) provides resources for showcasing Trustworthy AI to customers and prospects [1]
图灵奖得主Bengio再创业:启动资金就筹集了3000万美元
量子位· 2025-06-04 07:04
Core Viewpoint - Yoshua Bengio, a Turing Award winner and one of the deep learning giants, has announced the establishment of a nonprofit organization called LawZero, aimed at building the next generation of AI systems with a focus on safety and transparency, explicitly avoiding the development of agent-based AI systems [1][3][4]. Funding and Support - LawZero has successfully raised $30 million in initial funding through various charitable donors [2][9]. - Initial supporters include notable organizations such as the Future of Life Institute, Open Philanthropy, and the Silicon Valley Community Foundation [9][10]. Mission and Objectives - LawZero aims to create AI systems that prioritize safety over commercial interests, adopting a "safe-by-design" approach [3]. - The organization focuses on understanding the world rather than taking actions within it, providing verifiable answers to questions and enhancing the understanding of AI risks [4][21]. Scientific Direction - The core scientific direction of LawZero is based on a new research methodology called "Scientist AI," which emphasizes observation and explanation rather than action [17][21]. - The system consists of two main components: a world model that generates causal theories from observed data and a reasoning engine that provides probabilistic explanations [22][23]. Applications of Scientist AI - Scientist AI is designed to serve three primary functions: 1. As a safety barrier against dangerous AI, preventing catastrophic outcomes through dual verification mechanisms [24]. 2. As a trustworthy tool for accelerating scientific discovery, particularly in fields like biology and materials science, while avoiding risks associated with traditional AI [25]. 3. As foundational infrastructure for the safe development of advanced AI, establishing audit-able safety boundaries to mitigate risks from deceptive agents [26]. Leadership and Team - Bengio serves as the chairman and scientific director of LawZero, leading a team of over 15 top researchers [12][15]. - The organization is incubated by the Mila-Quebec AI Institute, which has become an operational partner [8]. Historical Context - Bengio previously co-founded Element AI, which focused on AI strategy consulting and raised approximately $260 million before being sold for $230 million in 2020 [28][29]. - His new venture, LawZero, reflects a shift in focus towards addressing AI safety risks, a concern that has grown in light of recent advancements in AI technology [32][33]. Public Perception - There is a cautious outlook from the public regarding LawZero, with some expressing concerns about the potential for AI to undermine human agency [34].