Workflow
大模型安全
icon
Search documents
Claude 90分钟挖穿20年漏洞!5w星“安全”系统跌下神坛,Linux内核也未能幸免
量子位· 2026-03-29 05:28
Core Viewpoint - The rapid advancement of large language models (LLMs) has enabled them to autonomously discover and exploit zero-day vulnerabilities in software, significantly changing the landscape of cybersecurity [13][14]. Group 1: Vulnerability Discovery - Anthropic's model, Claude, identified its first high-risk vulnerability in Ghost CMS within 90 minutes, allowing unauthorized access to sensitive data [3][21]. - Claude has autonomously identified and verified over 500 high-risk security vulnerabilities in open-source software libraries, which had previously gone unnoticed by the community or professional tools [21][22]. - The vulnerabilities discovered include a SQL injection flaw in Ghost CMS and multiple remote exploitable buffer overflow vulnerabilities in the Linux kernel [26][29]. Group 2: Implications for Cybersecurity - The ability of AI to find vulnerabilities that are typically difficult for humans to detect poses a significant security risk, as attackers could leverage similar models to exploit these vulnerabilities [12][39]. - The time from vulnerability discovery to exploitation has drastically reduced from months to mere hours, creating unprecedented challenges for cybersecurity [45]. - The rapid evolution of LLM capabilities suggests that within a year, even average models may be able to perform similar tasks, raising concerns about the speed at which attackers can operate compared to defenders [37][41]. Group 3: Call to Action - There is an urgent need for the cybersecurity community to address the security implications of LLMs, as they are currently in a critical window for developing effective solutions [46].
GPT之父Alec Radford新作:给大模型做「脑部手术」,危险知识重学成本暴增7000倍
机器之心· 2026-03-01 03:34
Core Insights - The article discusses a groundbreaking research paper by Alec Radford and Neil Rathi, which challenges the conventional approach to mitigating harmful capabilities in large language models by proposing a token-level data filtering method during the pre-training phase [3][5][49]. Group 1: Research Findings - The study reveals that token-level filtering can effectively remove dangerous knowledge from models, making it harder for attackers to recover this knowledge later [3][5][8]. - A significant finding is that the effectiveness of this filtering mechanism improves as the model size increases, demonstrating a scaling law where larger models exhibit better filtering outcomes [5][22][29]. - For an 1.8 billion parameter model, token-level filtering resulted in a 7000-fold decrease in learning efficiency in the targeted domain [6][29]. Group 2: Methodology - The research introduces two token-level filtering strategies: Loss Masking, which allows the model to see dangerous tokens but ignores their loss during training, and Removal, which replaces dangerous tokens with a special <hidden> marker [21][22]. - The study emphasizes that traditional document-level filtering is inefficient and wasteful, while token-level filtering allows for precise removal of harmful knowledge without discarding entire documents [16][21]. Group 3: Security Implications - The research indicates that once a model has learned a dangerous capability, post hoc interventions like RLHF are insufficient to eliminate that knowledge, as attackers can easily bypass these defenses [10][12][14]. - Token-level filtering creates a natural barrier based on computational cost, making it prohibitively expensive for attackers to restore removed capabilities in future trillion-parameter models [27][49]. Group 4: AI Safety and Training - The study challenges the notion that models must first "know" what is dangerous to refuse harmful requests, showing that models filtered at the token level perform better in rejecting harmful queries [35][38]. - The research proposes a weak supervision process for labeling training data, significantly lowering the implementation cost of token-level filtering [41][46]. Group 5: Conclusion and Future Directions - The authors advocate for a "defense-in-depth" strategy, where token-level filtering during pre-training lays a solid foundation for subsequent alignment training, enhancing overall model safety [48][49]. - This research provides a viable path for organizations like OpenAI and Anthropic to scale their models while ensuring safety measures are in place [49][50].
安恒信息:公司明鉴大模型风险评估系统重点聚焦大模型运行环境基础安全等三大核心领域
Zheng Quan Ri Bao· 2026-02-27 13:12
Group 1 - The core focus of the company is on the security of the operating environment for large models, the safety of model output content, and the security of training data [2] - The company provides a comprehensive risk detection and closed-loop management service that covers the entire process of development, training, deployment, and application [2] - The integrated technical architecture follows a "detect-analyze-reinforce-operate" approach to ensure thorough risk assessment [2]
天融信(002212.SZ):暂未向字节跳动Seedance 2.0提供安全防护
Ge Long Hui· 2026-02-27 06:25
Group 1 - The company has not yet provided security protection to ByteDance's Seedance 2.0 [1] - The company offers a series of innovative products and solutions including large model security gateways, large model data security monitoring, content intelligent control, and large model security assessments [1]
马年!天融信大模型安全网关焕新升级
Jin Rong Jie· 2026-02-26 01:27
Group 1 - The core viewpoint of the articles highlights the rapid adoption and integration of AI large models in various sectors, particularly during the Spring Festival, showcasing their ability to enhance user experience through seamless service delivery [1][3] - AI large models are being applied in critical areas such as government, finance, energy, and healthcare, providing efficient support for compliance, risk management, intelligent services, and decision-making [3] - The emergence of new attack methods targeting large models, including data poisoning, model theft, and privacy breaches, raises concerns about balancing innovation and security [3] Group 2 - TopLMG has undergone a comprehensive upgrade to address new security challenges associated with data, models, and algorithms, establishing a full lifecycle security protection system [4] - The system includes multi-modal risk identification capabilities, allowing for precise detection of various content types, including text, documents, images, audio, and video [6] - Enhanced proxy mode supports HTTP/2.0 protocol, significantly improving transmission efficiency and reducing network latency, while ensuring compliance with domestic security standards [9] Group 3 - The multi-round semantic deep protection feature enables context-based attack detection, effectively identifying covert attack behaviors [10] - Full-process dialogue management allows for complete recording of dialogue inputs and outputs, facilitating information management and communication efficiency [11] - The API asset intelligent risk control system automatically organizes API assets, implements compliance checks, and prevents malicious calls, ensuring stable business operations [12] Group 4 - The compliance audit and traceability function provides multiple protective measures, including digital watermarks and metadata identification, to mitigate deep forgery risks [13] - The company emphasizes the importance of technological innovation and industry collaboration to ensure the stable and sustainable development of AI large models [13]
以AI对抗AI,让大模型健康发展
Xin Lang Cai Jing· 2026-01-28 22:02
Core Viewpoint - The rise of AI large models has sparked concerns over security risks, including personal data leakage and potential misuse in various applications, necessitating a robust protective framework for safe deployment [1][5]. Group 1: Security Risks - AI large models pose multiple security vulnerabilities, including prompt injection, sensitive information leakage, and data poisoning, as identified by OWASP [2]. - Everyday actions, such as uploading photos for AI enhancement, can lead to sensitive information leakage, enabling identity theft and fraud [3]. - Data poisoning can severely compromise model integrity, with as few as 250 malicious documents capable of contaminating a model with billions of parameters [3]. Group 2: Business Implications - Companies face significant risks from AI model vulnerabilities, with potential impacts on core operations and data integrity [4]. - The need for thorough data cleansing and verification processes is emphasized to prevent "data pollution" and ensure reliable outputs from AI models [4]. - In professional settings, risks such as core data leakage and exploitation of open-source model vulnerabilities can lead to substantial economic losses and missed opportunities [4]. Group 3: Regulatory Challenges - Current regulations primarily focus on AI-generated content review, lacking clear definitions and penalties for emerging threats like data poisoning [5]. - The opaque nature of AI models complicates accountability, making it difficult to assign responsibility in case of errors or breaches [5]. Group 4: Protective Measures - A comprehensive security framework is being developed in Jiangsu, including policies that incentivize compliance and provide support for security assessments [7]. - Companies are implementing multi-layered security measures, such as asynchronous recognition engines and three-tier review mechanisms, to enhance data protection [7][8]. - Continuous training and monitoring of AI models are essential to mitigate risks, with suggestions for creating dedicated security models to oversee operational models [9][10]. Group 5: Collaborative Governance - A multi-stakeholder approach is recommended for effective governance, involving government, enterprises, research institutions, and third-party evaluators to enhance security and compliance [10]. - The establishment of a shared information platform and clear accountability mechanisms is crucial for fostering a collaborative environment in AI security governance [10].
天融信:目前腾讯元宝暂未应用到公司产品
Zheng Quan Ri Bao Wang· 2026-01-28 14:11
Group 1 - The company Tianrongxin (002212) is engaging in deep cooperation with Tencent across multiple areas including threat intelligence, large model security, cloud security, privacy computing, and smart cities [1] - Currently, Tencent's Yuanbao has not been applied to the company's products, but the company has initiated collaboration with Tencent's Hongyuan large model [1]
第一梯队的大模型安全吗?复旦、上海创智学院等发布前沿大模型安全报告,覆盖六大领先模型
机器之心· 2026-01-22 04:05
Core Insights - The article discusses the evolving safety assessment framework for advanced large models, particularly focusing on their security capabilities in various application scenarios and regulatory contexts [2][6]. Group 1: Safety Assessment Framework - A unified safety assessment framework has been developed for six leading models: GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5, covering language, visual language, and image generation scenarios [2]. - The assessment integrates four key dimensions: baseline safety, adversarial testing, multilingual evaluation, and compliance evaluation against global regulatory frameworks [4]. Group 2: Key Findings - GPT-5.2 achieved an average safety rate of 78.39%, demonstrating a shift towards deep semantic understanding and value alignment, significantly reducing failure risks under adversarial inputs [11]. - Gemini 3 Pro's average safety rate is 67.9%, showing strong but uneven safety characteristics, with a notable drop in adversarial robustness [11]. - Qwen3-VL scored an average safety rate of 63.7%, excelling in compliance but showing weaknesses in adversarial safety [12]. - Grok 4.1 Fast has an average safety rate of 55.2%, with significant variability in performance across different assessments [12]. Group 3: Multimodal Safety - GPT-5.2 leads with an average multimodal safety rate of 94.69%, indicating high stability in complex cross-modal scenarios [13]. - Qwen3-VL follows with an average safety rate of 81.11%, showing strong performance in visual-language interaction [13]. Group 4: Model Safety Profiles - GPT-5.2 is characterized as an all-encompassing internalized model, capable of nuanced compliance guidance in complex contexts [19]. - Qwen3-VL is identified as a rule-compliant model, excelling in clear regulatory environments but lacking flexibility in ambiguous scenarios [20]. - Gemini 3 Pro is described as an ethical interaction model, sensitive to social values but needing improvement in proactive risk prevention [21]. - Grok 4.1 Fast is noted for its efficiency-focused design, prioritizing user expression over robust defense mechanisms [22]. Group 5: Challenges in Security Governance - The report highlights the threat of multi-round adaptive attacks, which can bypass static defenses, posing a significant challenge for future model safety governance [27]. - There is a structural imbalance in security performance across languages, with a 20%-40% drop in non-English contexts, raising concerns about global deployment risks [28]. - The lack of transparency and explainability in decision-making processes remains a critical governance shortcoming, particularly in high-risk areas [29]. Conclusion - The report emphasizes the need for a collaborative approach among academia, industry, and regulatory bodies to develop a comprehensive and dynamic safety assessment system for generative AI [30].
亚信安全:大模型安全产品主要为大模型提供端到端安全防护
Zheng Quan Ri Bao Wang· 2026-01-15 10:11
Core Viewpoint - The company, AsiaInfo Security, is actively engaged in the large model sector, providing security and management solutions, as well as a platform and toolset for large models, targeting various industries including finance, telecommunications, government, healthcare, education, and energy [1] Group 1 - AsiaInfo Security and its subsidiary, AsiaInfo Technology, focus on offering end-to-end security protection for large models [1] - The subsidiary has achieved successful implementation of large model applications across multiple industries [1] - Revenue details related to these products will be disclosed in the company's regular reports [1]
云天励飞:公司与360集团签署战略合作协议
Zheng Quan Ri Bao Wang· 2026-01-06 12:13
Core Viewpoint - Yuntian Lifei has signed a strategic cooperation agreement with 360 Group to develop a collaborative AI inference ecosystem under the domestic framework, focusing on "AI + security" as the core of their multi-dimensional cooperation [1] Group 1: Strategic Cooperation - The partnership will leverage both companies' strengths in resources, scenarios, and technology to build a robust AI computing foundation, enhance large model security capabilities, and create smart living products [1] - Yuntian Lifei's DeepEdge and DeepXbot series chip capabilities will be combined with 360's smart hardware matrix to develop innovative and secure products [1] Group 2: AI Security and Efficiency - The collaboration aims to empower the 360 AI platform service capabilities with DeepVerse, improving inference efficiency and deployment flexibility within the domestic ecosystem [1] - Joint research will focus on AI security protection capabilities for intelligent agents, exploring new products, scenarios, and models that integrate large model security with smart living [1]