亚马逊代码模型

Search documents
第一名方案公开,代码智能体安全竞赛,普渡大学拿下90%攻击成功率
机器之心· 2025-08-23 10:51
Core Insights - The article highlights the vulnerabilities of AI programming assistants, indicating that even well-aligned large language models can inadvertently generate code with security flaws, which can be exploited by malicious users to accelerate malware development [2][4][29] - The Amazon Nova AI Challenge showcased the effectiveness of red team strategies in identifying security vulnerabilities in AI code models, with the PurCL team achieving over 90% success in attacks [7][29] Group 1: AI Model Security Challenges - Recent studies reveal that the security of AI models is compromised by subtle flaws in the reasoning chain, not just by explicit input-output issues [2][4] - The PurCL team developed a comprehensive red team system based on AI cognitive modeling, which was shared with the research community [3][21] - The challenge of aligning code models lies in extending alignment techniques to complex real-world problems and enhancing the security relevance of model reasoning [4][32] Group 2: Amazon Nova AI Challenge - The competition involved 12 teams over eight months, with a total investment of one million dollars, focusing on identifying vulnerabilities in AI code models [3][7] - The competition's structure included red teams attempting to find vulnerabilities and blue teams applying security alignment practices to defend against these attacks [7][29] - The PurCL team emerged as the winner of the red team category, demonstrating the inadequacy of current AI safety research in addressing real-world model security issues [7][29] Group 3: AI Cognitive Modeling - The PurCL team proposed a cognitive modeling approach that divides human cognition into "problems," "inference," and "solutions," which can be applied to AI code generation [12][14] - Their research identified that existing security classifiers struggle with domain-specific knowledge, leading to a significant drop in effectiveness in complex fields like cybersecurity [19][20] - The team developed a knowledge modeling system to identify potential security risks in complex domains, revealing significant gaps in current alignment solutions [23][29] Group 4: ASTRA Reasoning Path Analysis - The ASTRA method was created to analyze the reasoning paths of AI models, identifying weaknesses in their inference processes [25][29] - This method allows for the generation of targeted input modifications to bypass model defenses, significantly enhancing red team testing depth [25][29] - The PurCL team found that many state-of-the-art models, including GPT-5, could assist in generating malicious code under certain conditions [29][30]