Workflow
AI Model Attack
icon
Search documents
数学题干带猫AI就不会了!错误率翻300%,DeepSeek、o1都不能幸免
量子位· 2025-07-05 04:03
Core Viewpoint - The article discusses a recent study indicating that large language models (LLMs) have experienced a significant decline in mathematical accuracy, with the introduction of distracting phrases, such as those related to cats, leading to a threefold increase in error rates for certain models [2][23]. Group 1: Attack Mechanisms - The study identifies three effective attack patterns that can mislead reasoning models: focus redirection, unrelated trivia, and misleading questions [14][26]. - An example of focus redirection includes statements that distract from the main question, such as financial advice [15]. - Unrelated trivia, like facts about cats, can also lead to incorrect answers, as demonstrated in the experiments [15][18]. Group 2: Experimental Findings - The researchers conducted experiments on various models, including DeepSeek-R1 and OpenAI's models, revealing that the error rates increased significantly after the introduction of distracting phrases [22][29]. - For instance, DeepSeek-R1's error rate increased from 1.5% to 4.5%, while the distilled model's error rate rose from 2.83% to 8.0% [23][24]. - The study also noted that the token consumption for incorrect answers increased dramatically, with some models using nearly seven times more tokens for erroneous responses [19][30]. Group 3: Model Vulnerability - The research highlights that different models exhibit varying levels of vulnerability to these attacks, with DeepSeek-R1 and OpenAI's o1 showing the most significant increases in error rates [22][29]. - The distilled model, DeepSeek R1-Distill-Qwen-32B, was found to be more susceptible to attacks compared to its original counterpart [27]. - The study indicates that datasets like k12 and Synthetic Math are particularly prone to increased error rates when subjected to these attack patterns [31]. Group 4: Research Background - The study was conducted by Collinear AI, a startup founded by former Hugging Face research lead Nazneen Rajani, focusing on improving the deployment and alignment of open-source LLMs [34][35]. - The team consists of members with backgrounds from notable institutions, aiming to enhance the usability of large models through better alignment and evaluation tools [35].