决策透明度与可解释性 - filings, earnings calls, financial reports, news

决策透明度与可解释性

Search documents

第一梯队的大模型安全吗？复旦、上海创智学院等发布前沿大模型安全报告，覆盖六大领先模型

机器之心· 2026-01-22 04:05

Core Insights - The article discusses the evolving safety assessment framework for advanced large models, particularly focusing on their security capabilities in various application scenarios and regulatory contexts [2][6]. Group 1: Safety Assessment Framework - A unified safety assessment framework has been developed for six leading models: GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5, covering language, visual language, and image generation scenarios [2]. - The assessment integrates four key dimensions: baseline safety, adversarial testing, multilingual evaluation, and compliance evaluation against global regulatory frameworks [4]. Group 2: Key Findings - GPT-5.2 achieved an average safety rate of 78.39%, demonstrating a shift towards deep semantic understanding and value alignment, significantly reducing failure risks under adversarial inputs [11]. - Gemini 3 Pro's average safety rate is 67.9%, showing strong but uneven safety characteristics, with a notable drop in adversarial robustness [11]. - Qwen3-VL scored an average safety rate of 63.7%, excelling in compliance but showing weaknesses in adversarial safety [12]. - Grok 4.1 Fast has an average safety rate of 55.2%, with significant variability in performance across different assessments [12]. Group 3: Multimodal Safety - GPT-5.2 leads with an average multimodal safety rate of 94.69%, indicating high stability in complex cross-modal scenarios [13]. - Qwen3-VL follows with an average safety rate of 81.11%, showing strong performance in visual-language interaction [13]. Group 4: Model Safety Profiles - GPT-5.2 is characterized as an all-encompassing internalized model, capable of nuanced compliance guidance in complex contexts [19]. - Qwen3-VL is identified as a rule-compliant model, excelling in clear regulatory environments but lacking flexibility in ambiguous scenarios [20]. - Gemini 3 Pro is described as an ethical interaction model, sensitive to social values but needing improvement in proactive risk prevention [21]. - Grok 4.1 Fast is noted for its efficiency-focused design, prioritizing user expression over robust defense mechanisms [22]. Group 5: Challenges in Security Governance - The report highlights the threat of multi-round adaptive attacks, which can bypass static defenses, posing a significant challenge for future model safety governance [27]. - There is a structural imbalance in security performance across languages, with a 20%-40% drop in non-English contexts, raising concerns about global deployment risks [28]. - The lack of transparency and explainability in decision-making processes remains a critical governance shortcoming, particularly in high-risk areas [29]. Conclusion - The report emphasizes the need for a collaborative approach among academia, industry, and regulatory bodies to develop a comprehensive and dynamic safety assessment system for generative AI [30].