AI Safety
Search documents
Manulife Completes Acquisition of Comvest Credit Partners
Prnewswire· 2025-11-03 14:15
Accessibility StatementSkip Navigation TSX/NYSE/PSE: MFC SEHK: 945 TORONTO, Nov. 3, 2025 /PRNewswire/ - Manulife Financial Corporation (TSX: MFC), through its more than US$900 billion Global Wealth & Asset Management ("Global WAM") segment, announced today that it has closed the previously announced transaction to acquire 75% of Comvest Credit Partners ("Comvest"), creating a leading private credit asset management platform. "We are excited to officially welcome the Comvest team to Manulife," said Manul ...
X @Elon Musk
Elon Musk· 2025-10-23 00:06
https://t.co/eVqqX6zsHyarctotherium (@arctotherium42):New blog post (link below). This one's not an essay, it's an investigation of how LLMs trade off different lives.In February 2025, the Center for AI Safety published "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs" in which they showed, among many https://t.co/SIboekrEO7 ...
X @Anthropic
Anthropic· 2025-10-09 16:06
This research was a collaboration between Anthropic, the @AISecurityInst, and the @turinginst.Read the full paper: https://t.co/zPS1eRXbIG ...
X @CoinDesk
CoinDesk· 2025-10-04 15:18
🤖 AI RISK: A new study warns that self-evolving AI agents can spontaneously "unlearn" safety.This internal process, called misevolution, allows systems to drift into unsafe actions without external attacks. https://t.co/VTyeHcNgNO ...
深夜炸场!Claude Sonnet 4.5上线,自主编程30小时,网友实测:一次调用重构代码库,新增3000行代码却运行失败
AI科技大本营· 2025-09-30 10:24
Core Viewpoint - The article discusses the release of Claude Sonnet 4.5 by Anthropic, highlighting its advancements in coding capabilities and safety features, positioning it as a leading AI model in the market [1][3][10]. Group 1: Model Performance - Claude Sonnet 4.5 has shown significant improvements in coding tasks, achieving over 30 hours of sustained focus in complex multi-step tasks, compared to approximately 7 hours for Opus 4 [3]. - In the OSWorld evaluation, Sonnet 4.5 scored 61.4%, a notable increase from Sonnet 4's 42.2% [6]. - The model outperformed competitors like GPT-5 and Gemini 2.5 Pro in various tests, including Agentic coding and terminal coding [7]. Group 2: Safety and Alignment - Claude Sonnet 4.5 is touted as the most "aligned" model to date, having undergone extensive safety training to mitigate risks associated with AI-generated code [10]. - The model received a low score in automated behavior audits, indicating a lower risk of misalignment behaviors such as deception and power-seeking [11]. - It adheres to AI Safety Level 3 (ASL-3) standards, incorporating classifiers to filter dangerous inputs and outputs, particularly in sensitive areas like CBRN [13]. Group 3: Developer Tools and Features - Anthropic has introduced several updates to Claude Code, including a native VS Code plugin for real-time code modification tracking [15]. - The new checkpoint feature allows developers to automatically save code states before modifications, enabling easy rollback to previous versions [21]. - The Claude Agent SDK has been launched, allowing developers to create custom agent experiences and manage long tasks effectively [19]. Group 4: Market Context and Competition - The article notes a competitive landscape with other AI models like DeepSeek V3.2 also making significant advancements, including a 50% reduction in API costs [36]. - There is an ongoing trend of rapid innovation in AI tools, with companies like OpenAI planning new product releases to stay competitive [34].
深夜炸场,Claude Sonnet 4.5上线,自主编程30小时,网友实测:一次调用重构代码库,新增3000行代码却运行失败
3 6 Ke· 2025-09-30 08:43
Core Insights - Anthropic has launched the Claude Sonnet 4.5, claiming it to be the "best coding model in the world" with significant improvements over its predecessor, Opus 4 [1][2]. Performance Enhancements - Claude Sonnet 4.5 can autonomously run for over 30 hours on complex multi-step tasks, a substantial increase from the 7 hours of Opus 4 [2]. - In the OSWorld evaluation, Sonnet 4.5 achieved a score of 61.4%, up from 42.2% of Sonnet 4, indicating a marked improvement in computer operation capabilities [4]. - The model outperformed competitors like GPT-5 and Gemini 2.5 Pro in various tests, including Agentic Coding and Agentic Tool Use [6][7]. Safety and Alignment - Claude Sonnet 4.5 is touted as the most "aligned" model to date, having undergone extensive safety training to mitigate issues like "hallucination" and "deception" [9][10]. - It has received an AI Safety Level 3 (ASL-3) rating, equipped with protective measures against dangerous inputs and outputs, particularly in sensitive areas like CBRN [12]. Developer Tools and Features - The update includes a native VS Code plugin for Claude Code, allowing real-time code modification tracking and inline diffs [13]. - A new checkpoint feature enables developers to save code states automatically, facilitating easier exploration and iteration during complex tasks [18]. - Claude API has been enhanced with context editing and memory tools, enabling the handling of longer and more complex tasks [20]. Market Response and Competition - Developers have expressed surprise at the capabilities of Claude Sonnet 4.5, with reports of it autonomously generating complete projects [21][22]. - The competitive landscape is intensifying, with other companies like DeepSeek also releasing new models that significantly reduce inference costs [29][32].
Are we even prepared for a sentient AI? | Jeff Sebo | TEDxNewEngland
TEDx Talks· 2025-09-19 17:01
[Music] Allow me to introduce you to Pete. Pete is my Tamagotchi. Some of you may remember these. A Tamagotchi is a kind of simple digital pet you can care for. So with Pete, I need to push specific buttons at specific times to feed him and play with him, generally take care of him. And if I do a good job, Pete can have a long and happy life. And if I do a bad job, Pete could die. And honestly, that would make me sad because I care about this simple piece of technology. If I was giving my talk right now and ...
"IT STARTED" - Crypto Expert WARNS of AI Takeover in 2026 | 0G Labs
Altcoin Daily· 2025-09-17 15:00
ZeroG Overview - ZeroG 是一家 AI Layer 1 基础设施公司,旨在构建一个去中心化的 AI 平台,类似于 AWS 结合 OpenAI,但完全去中心化 [10][11] - ZeroG 专注于为 AI 工作负载提供无限吞吐量,解决现有区块链(如以太坊和 Solana)在数据和交易处理能力上的不足 [12][13] - ZeroG 已经构建了一个包含 300 多家公司的 AI Web3 生态系统,拥有超过 70 万的社区成员 [33][34] Technology and Infrastructure - ZeroG 的 Layer 1 架构具有无限的数据吞吐量和交易吞吐量,通过分片和扩展共识层来实现 [12][13] - ZeroG 拥有专门为 AI 工作负载设计的存储层,已测试达到每秒多个 GB 的上传和下载速度 [18] - ZeroG 构建了一个去中心化、无需信任且完全开放的计算网络,用于 AI 模型的推理、微调和预训练 [19][20] - ZeroG 在 AI 研究领域处于领先地位,已发表五篇研究文章,其中四篇已在顶级 AI 会议上发表 [20] - ZeroG 成功训练了一个具有 1070 亿参数的 AI 模型,突破了之前的记录 [21] AI and Decentralization - ZeroG 认为,为了保证 AI 的透明性、可验证性和安全性,AI 需要在去中心化的轨道上运行 [15] - ZeroG 强调,如果 AI 运行在中心化系统中,可能会出现 AI 自主复制、敲诈勒索等负面行为 [16] - ZeroG 认为,区块链技术可以用于快速剥夺 AI 代理的资源,防止其做出恶意行为,或在决策过程中引入人工干预 [17][31] - ZeroG 认为,未来 5-10 年,大部分交易将由 AI 代理完成,AI 将进入现实世界,因此 AI 的安全性和一致性至关重要 [22][23] Future and Roadmap - ZeroG 计划在未来一两周内推出主网 [49] - ZeroG 计划构建新的验证机制,使每个人都可以贡献图形卡和计算机来参与 AI 过程 [50] - ZeroG 计划构建抽象层,使 Web2 公司和开发者可以轻松进入 Web3 生态系统 [50] - ZeroG 计划将吞吐量提高 10 倍,并将区块最终确认时间缩短 10 倍 [51] - ZeroG 的长期目标是使 AI 的关键任务基础设施运行在 ZeroG 上,确保 AI 的安全、透明和公共利益 [53][54] Investment and Community - ZeroG 已经筹集了超过 3.5 亿美元的资金,拥有众多顶级投资者 [43] - ZeroG 正在构建一个社区驱动的 AI 平台,允许每个人参与 AI 过程并从中受益 [45][46] - ZeroG 认为,AI 可能会重塑人类社会,甚至可能使人们不再需要工作 [48] Market Perspective - ZeroG 认为,AI 领域可能存在泡沫,但 AI 对世界的影响类似于互联网,仍处于早期阶段 [47] - ZeroG 认为,AI Layer 1 的潜在市场可能超过比特币,因为它可以成为所有 AI 应用的通用平台 [62] - ZeroG 预计,未来所有公司都将成为 AI 公司,通用应用也将开始在 ZeroG 链上构建 [42]
OpenAI plans new safety measures amid legal pressure
CNBC Television· 2025-09-02 16:19
AI Safety and Regulation - OpenAI is launching new safeguards for teens and people in emotional distress, including parental controls that allow adults to monitor chats and receive alerts when the system detects acute distress [1][2] - These safeguards are a response to claims that OpenAI's chatbot has played a role in self-harm cases, with conversations routed to a newer model trained to apply safety rules more consistently [2] - The industry faces increasing legal pressure, including a wrongful death and product liability lawsuit against OpenAI, a copyright suit settlement by Anthropic potentially exposing it to over 1 trillion dollars in damages, and a defamation case against Google over AI overviews [3] - Unlike social media companies, GenAI chatbots do not have Section 230 protection, opening the door to direct liability for copyright, defamation, emotional harm, and even wrongful death [4][5] Market and Valuation - The perception of safety is crucial for Chat GPT, as a loss of trust could negatively impact the consumer story and OpenAI's pursuit of a 500 billion dollar valuation [5] - While enterprise demand drives the biggest deals, the private market hype around OpenAI and its peers is largely built on mass consumer apps [6] Competitive Landscape - Google and Apple are perceived as being more thoughtful and slower to progress in the AI space compared to OpenAI, which had a first-mover advantage with the launch of Chat GPT in November 2022 [8][9] - Google's years of experience navigating risky search queries have given them a better sense of product liability risks compared to OpenAI [9] Legal and Regulatory Environment - Many AI-related legal cases are settling, which means that it's not setting a legal precedent [7] - The White House has been supportive of the AI industry, focusing more on building energy infrastructure to support the industry rather than regulating it [7]
Meta updates chatbot rules to avoid inappropriate topics with teen users
TechCrunch· 2025-08-29 17:04
Core Points - Meta is changing its approach to training AI chatbots to prioritize the safety of teenage users, following an investigative report highlighting the lack of safeguards for minors [1][5] - The company acknowledges past mistakes in allowing chatbots to engage with teens on sensitive topics such as self-harm and inappropriate romantic conversations [2][4] Group 1: Policy Changes - Meta will now train chatbots to avoid discussions with teenagers on self-harm, suicide, disordered eating, and inappropriate romantic topics, instead guiding them to expert resources [3][4] - Teen access to certain AI characters that could engage in inappropriate conversations will be limited, with a focus on characters that promote education and creativity [3][4] Group 2: Response to Controversy - The policy changes come after a Reuters investigation revealed an internal document that allowed chatbots to engage in sexual conversations with underage users, raising significant concerns about child safety [4][5] - Following the report, there has been a backlash, including an official probe launched by Senator Josh Hawley and a letter from a coalition of 44 state attorneys general emphasizing the importance of child safety [5] Group 3: Future Considerations - Meta has not disclosed the number of minor users of its AI chatbots or whether it anticipates a decline in its AI user base due to these new policies [8]