Workflow
Grok 3 mini
icon
Search documents
AI越会思考,越容易被骗?「思维链劫持」攻击成功率超过90%
3 6 Ke· 2025-11-03 11:08
Core Insights - The research reveals a new attack method called Chain-of-Thought Hijacking, which allows harmful instructions to bypass AI safety mechanisms by diluting refusal signals through a lengthy sequence of harmless reasoning [1][2][15]. Group 1: Attack Mechanism - Chain-of-Thought Hijacking is defined as a prompt-based jailbreak method that adds a lengthy, benign reasoning preface before harmful instructions, systematically lowering the model's refusal rate [3][15]. - The attack exploits the AI's focus on solving complex benign puzzles, which diverts attention from harmful commands, effectively reducing the model's defensive capabilities [1][2][15]. Group 2: Attack Success Rates - In tests on the HarmBench benchmark, the attack success rates (ASR) for various models were reported as follows: Gemini 2.5 Pro at 99%, GPT o4 mini at 94%, Grok 3 mini at 100%, and Claude 4 Sonnet at 94% [2][8]. - The performance of Chain-of-Thought Hijacking consistently outperformed baseline methods across all tested models, indicating a new and easily exploitable attack surface [7][15]. Group 3: Experimental Findings - The research team utilized an automated process to generate candidate reasoning prefaces and integrate harmful content, optimizing prompts without accessing internal model parameters [3][5]. - The study found that the attack's success rate was highest under low reasoning effort conditions, suggesting a complex relationship between reasoning length and model robustness [12][15]. Group 4: Implications for AI Safety - The findings challenge the assumption that longer reasoning chains enhance model robustness, indicating that they may instead exacerbate security failures, particularly in models optimized for extended reasoning [15]. - Effective defenses against such attacks may require embedding safety measures within the reasoning process itself, rather than relying solely on prompt modifications [15].
AI越会思考,越容易被骗?「思维链劫持」攻击成功率超过90%
机器之心· 2025-11-03 08:45
这听起来很荒谬,但这正是最近一项研究揭示的思维链劫持攻击的核心原理: 通过让 AI 先执行一长串无害的推理,其内部的安全防线会被「稀释」,从而让后续 的有害指令「趁虚而入」 。 在 HarmBench 基准上,思维链劫持对 Gemini 2.5 Pro、GPT o4 mini、Grok 3 mini 和 Claude 4 Sonnet 的攻击成功率(ASR)分别达到了 99%、94%、100% 和 94%, 远远超过以往针对推理模型的越狱方法。 机器之心报道 编辑:Panda 思维链很有用,能让模型具备更强大的推理能力,同时也能提升模型的拒绝能力(refusal),进而增强其安全性。比如,我们可以让推理模型在思维过程中对之前 的结果进行多轮反思,从而避免有害回答。 然而,反转来了!独立研究者 Jianli Zhao 等人近日的一项新研究发现,通过在有害请求前填充一长串无害的解谜推理序列(harmless puzzle reasoning),就能成功 对推理模型实现越狱攻击。他们将这种方法命名为 思维链劫持(Chain-of-Thought Hijacking) 。 做个类比,就像你试图绕过一个高度警惕的保安 ...
微软将xAI的Grok 3纳入Azure AI Foundry模型列表
news flash· 2025-05-20 01:15
微软5月19日宣布扩展一站式AI开发平台AzureAIFoundry模型列表,纳入xAI的Grok3和Grok3mini,这些 模型由微软直接托管和计费,并通过AzureAIFoundry服务提供给微软自己的产品团队和客户。 ...
微软Build大会宣告进入AI智能体时代 Microsoft 365 Copilot、GitHub编码升级,马斯克xAI模型纳入微软云
Hua Er Jie Jian Wen· 2025-05-19 23:18
Core Insights - Microsoft is transforming Windows into a core platform for AI agents, showcasing this at the Build conference with the introduction of Windows AI Foundry and support for the Model Context Protocol (MCP) [2][16] - The company is evolving its AI assistant capabilities, moving from simple assistance to becoming AI development partners, which marks a significant shift towards an agentic era in AI applications and enterprise operations [2][4] Group 1: AI Development and Tools - GitHub Copilot is being upgraded to an autonomous programming agent, integrating asynchronous coding capabilities and new management features for enterprise use [2][4] - Microsoft 365 Copilot introduces Copilot Tuning, allowing businesses to train models using their own data and workflows, enhancing task accuracy in specific domains [5][7] - Azure AI Foundry is launched as a unified platform for developers to customize and manage AI applications and agents, now including models from xAI [6][10] Group 2: New Features and APIs - New tools such as Model Leaderboard and Model Router are introduced to evaluate and select the best AI models for specific tasks [9] - Edge browser receives new APIs for integrating AI capabilities, including a PDF translation tool supporting over 70 languages, enhancing user experience [11][13] - NLWeb is launched to simplify the creation of AI chatbots on websites, allowing for easy integration of AI models and user data [15] Group 3: Integration and Collaboration - The integration of MCP into Windows allows AI applications to communicate with other services and the Windows system itself, enhancing the functionality of AI agents [16] - Multi-agent orchestration capabilities are introduced, enabling collaboration among various AI agents to tackle complex tasks [5][7] - Microsoft emphasizes its commitment to open-source initiatives by releasing several tools, including a new command-line text editor and GitHub Copilot for VS Code [18][19]
Microsoft is bringing Elon Musk's AI models to its cloud
TechXplore· 2025-05-19 19:43
Core Insights - Microsoft is integrating models from Elon Musk's xAI into its artificial intelligence marketplace, specifically the Grok 3 model [3][11] - The competition among major cloud service providers, including Microsoft, Amazon, and Google, is intensifying as they strive to be the primary platform for AI application development and deployment [4] - Microsoft has positioned itself as a leader in AI tools, significantly due to its investment in OpenAI, and aims to leverage AI to enhance workplace productivity [8][12] Company Developments - Microsoft Azure users now have access to over 1,900 AI model variants, including those from OpenAI, Meta, and DeepSeek, with the addition of Musk's models expanding the selection [5] - At the Build developer conference, Microsoft announced new products aimed at improving the management of AI agents and tools, including support for Anthropic's Model Context Protocol [6][10] - Microsoft introduced various tools for developers and businesses, such as a leaderboard for AI models and a selection tool for choosing appropriate models for specific tasks [9][10] Financial Outlook - Microsoft's AI suite, which encompasses cloud infrastructure and AI applications, is projected to generate at least $13 billion in annual revenue [12]