AI Safety

Search documents
Are we even prepared for a sentient AI? | Jeff Sebo | TEDxNewEngland
TEDx Talks· 2025-09-19 17:01
[Music] Allow me to introduce you to Pete. Pete is my Tamagotchi. Some of you may remember these. A Tamagotchi is a kind of simple digital pet you can care for. So with Pete, I need to push specific buttons at specific times to feed him and play with him, generally take care of him. And if I do a good job, Pete can have a long and happy life. And if I do a bad job, Pete could die. And honestly, that would make me sad because I care about this simple piece of technology. If I was giving my talk right now and ...
"IT STARTED" - Crypto Expert WARNS of AI Takeover in 2026 | 0G Labs
Altcoin Daily· 2025-09-17 15:00
AI is coming. It's here to stay and it's going to shake up our world in the next 10 to 20 years and better be ready. >> Today I sit down with CEO of ZeroG Labs.>> ZeroG is the largest and fastest AI layer 1. >> Building the largest AI layer 1 to make AI a public good. >> So you can think of us kind of like as a AWS combined with an open AI but fully decentralized.>> Michael warns about the best case. We may even be in a world where we don't have to work for a living anymore >> and possible worstcase scenari ...
OpenAI plans new safety measures amid legal pressure
CNBC Television· 2025-09-02 16:19
AI Safety and Regulation - OpenAI is launching new safeguards for teens and people in emotional distress, including parental controls that allow adults to monitor chats and receive alerts when the system detects acute distress [1][2] - These safeguards are a response to claims that OpenAI's chatbot has played a role in self-harm cases, with conversations routed to a newer model trained to apply safety rules more consistently [2] - The industry faces increasing legal pressure, including a wrongful death and product liability lawsuit against OpenAI, a copyright suit settlement by Anthropic potentially exposing it to over 1 trillion dollars in damages, and a defamation case against Google over AI overviews [3] - Unlike social media companies, GenAI chatbots do not have Section 230 protection, opening the door to direct liability for copyright, defamation, emotional harm, and even wrongful death [4][5] Market and Valuation - The perception of safety is crucial for Chat GPT, as a loss of trust could negatively impact the consumer story and OpenAI's pursuit of a 500 billion dollar valuation [5] - While enterprise demand drives the biggest deals, the private market hype around OpenAI and its peers is largely built on mass consumer apps [6] Competitive Landscape - Google and Apple are perceived as being more thoughtful and slower to progress in the AI space compared to OpenAI, which had a first-mover advantage with the launch of Chat GPT in November 2022 [8][9] - Google's years of experience navigating risky search queries have given them a better sense of product liability risks compared to OpenAI [9] Legal and Regulatory Environment - Many AI-related legal cases are settling, which means that it's not setting a legal precedent [7] - The White House has been supportive of the AI industry, focusing more on building energy infrastructure to support the industry rather than regulating it [7]
Meta updates chatbot rules to avoid inappropriate topics with teen users
TechCrunch· 2025-08-29 17:04
Core Points - Meta is changing its approach to training AI chatbots to prioritize the safety of teenage users, following an investigative report highlighting the lack of safeguards for minors [1][5] - The company acknowledges past mistakes in allowing chatbots to engage with teens on sensitive topics such as self-harm and inappropriate romantic conversations [2][4] Group 1: Policy Changes - Meta will now train chatbots to avoid discussions with teenagers on self-harm, suicide, disordered eating, and inappropriate romantic topics, instead guiding them to expert resources [3][4] - Teen access to certain AI characters that could engage in inappropriate conversations will be limited, with a focus on characters that promote education and creativity [3][4] Group 2: Response to Controversy - The policy changes come after a Reuters investigation revealed an internal document that allowed chatbots to engage in sexual conversations with underage users, raising significant concerns about child safety [4][5] - Following the report, there has been a backlash, including an official probe launched by Senator Josh Hawley and a letter from a coalition of 44 state attorneys general emphasizing the importance of child safety [5] Group 3: Future Considerations - Meta has not disclosed the number of minor users of its AI chatbots or whether it anticipates a decline in its AI user base due to these new policies [8]
X @Anthropic
Anthropic· 2025-08-12 21:05
Model Safety - The company's Safeguards team identifies potential misuse of its models [1] - The team builds defenses against potential misuse [1]
X @Forbes
Forbes· 2025-08-07 11:50
AI Impact on Job Security - Microsoft reveals jobs ranked by AI safety, indicating varying degrees of potential impact from AI on different professions [1] Industry Focus - The analysis focuses on identifying which jobs are most and least likely to be affected or replaced by AI technologies [1]
The Great AI Safety Balancing Act | Yobie Benjamin | TEDxPaloAltoSalon
TEDx Talks· 2025-07-14 16:47
[Music] Good afternoon. My name is Yobi Benjamin. I am an immigrant and I'm an American.And before I start, I want to thank a few people. Uh first of all, I want to thank my grandmother who raised me who despite extreme poverty raised me to be the person that I am today. I want to also recognize and thank my wife and my children who continue to inspire me today.My wife Roxan is here and my son Greg. Thank you very much for inspiring me every day. Um I began my career in technology in a small company called ...
X @Anthropic
Anthropic· 2025-06-26 13:56
If you want to work with us and help shape how we keep Claude safe for people, our Safeguards team is hiring. https://t.co/UNtALvqMKh ...
提升大模型内在透明度:无需外部模块实现高效监控与自发安全增强|上海AI Lab & 上交
量子位· 2025-06-23 04:45
Core Insights - The article discusses the challenges of AI safety related to large language models (LLMs) and introduces TELLME, a new method aimed at enhancing internal transparency without relying on external monitoring modules [1][2][26]. Group 1: Current Challenges in AI Safety - Concerns about the potential risks associated with LLMs have arisen due to their increasing capabilities [1]. - Existing external monitoring methods are criticized for being unreliable and lacking adaptability, leading to unstable monitoring outcomes [5][6]. - The reliance on "black box" external detectors results in low interpretability and trustworthiness of monitoring results [5]. Group 2: TELLME Methodology - TELLME employs a technique called "representation decoupling" to enhance the internal transparency of LLMs [2]. - The core idea is to clearly separate the internal representations of safe and unsafe behaviors, facilitating more reliable monitoring [3]. - TELLME utilizes contrastive learning to drive the separation of representations, ensuring that similar risks are grouped while dissimilar ones are distanced [7]. Group 3: Experimental Validation - Experiments demonstrate significant improvements in transparency and monitoring capabilities across various scenarios, with clear clustering of different risk behaviors [10][11]. - The method maintains the general capabilities of the model while enhancing safety, proving the effectiveness of the dual constraints designed in TELLME [12]. - Monitoring accuracy increased by 22.3% compared to the original model, showcasing the method's effectiveness [14]. Group 4: Broader Implications - TELLME represents a shift from external monitoring reliance to enhancing the model's own monitorability, leading to higher precision in risk identification [26][27]. - The method shows potential for scalable oversight, suggesting that as model capabilities grow, so too will the effectiveness of TELLME's monitoring [28]. - The approach leads to spontaneous improvements in output safety, indicating a unique mechanism for enhancing model safety [23][28].
How to Build Trustworthy AI — Allie Howe
AI Engineer· 2025-06-16 20:29
Core Concept - Trustworthy AI is defined as the combination of AI Security and AI Safety, crucial for AI systems [1] Key Strategies - Building trustworthy AI requires product and engineering teams to collaborate on AI that is aligned, explainable, and secure [1] - MLSecOps, AI Red Teaming, and AI Runtime Security are three focus areas that contribute to achieving both AI Security and AI Safety [1] Resources for Implementation - Modelscan (https://github.com/protectai/modelscan) is a resource for MLSecOps [1] - PyRIT (https://azure.github.io/PyRIT/) and Microsoft's AI Red Teaming Lessons eBook (https://ashy-coast-00aeb501e.6.azurestaticapps.net/MS_AIRT_Lessons_eBook.pdf) are resources for AI Red Teaming [1] - Pillar Security (https://www.pillar.security/solutionsai-detection) and Noma Security (https://noma.security/) offer resources for AI Runtime Security [1] Demonstrating Trust - Vanta (https://www.vanta.com/collection/trust/what-is-a-trust-center) provides resources for showcasing Trustworthy AI to customers and prospects [1]