Workflow
纵深防御
icon
Search documents
“在伊朗打地面战,相当于把美军士兵送入人间地狱”
第一财经· 2026-03-24 03:17
Group 1 - The article discusses the potential risks and challenges of the U.S. military's increased ground presence in the Middle East, particularly regarding operations against Iran [3][4][5] - Military experts suggest that the U.S. may attempt to seize Iran's enriched uranium, but this would require deep incursions into Iranian territory, posing significant operational difficulties [4][5] - The capture of Khark Island, a critical oil export base for Iran, is considered a possible objective for U.S. forces, but the operation would face substantial threats from Iranian defenses [5][8] Group 2 - Experts warn that any limited ground strikes along the Persian Gulf would necessitate a large troop deployment and carry high operational risks, potentially leading to significant casualties and resource depletion [9][11] - A large-scale invasion of Iran could result in a prolonged conflict, with the U.S. facing a well-equipped and determined adversary, complicating military objectives [11][13] - Historical precedents indicate that U.S. military interventions often lead to unintended consequences, suggesting that escalating military commitments in Iran could result in a similar outcome [14]
GPT之父Alec Radford新作:给大模型做「脑部手术」,危险知识重学成本暴增7000倍
机器之心· 2026-03-01 03:34
Core Insights - The article discusses a groundbreaking research paper by Alec Radford and Neil Rathi, which challenges the conventional approach to mitigating harmful capabilities in large language models by proposing a token-level data filtering method during the pre-training phase [3][5][49]. Group 1: Research Findings - The study reveals that token-level filtering can effectively remove dangerous knowledge from models, making it harder for attackers to recover this knowledge later [3][5][8]. - A significant finding is that the effectiveness of this filtering mechanism improves as the model size increases, demonstrating a scaling law where larger models exhibit better filtering outcomes [5][22][29]. - For an 1.8 billion parameter model, token-level filtering resulted in a 7000-fold decrease in learning efficiency in the targeted domain [6][29]. Group 2: Methodology - The research introduces two token-level filtering strategies: Loss Masking, which allows the model to see dangerous tokens but ignores their loss during training, and Removal, which replaces dangerous tokens with a special <hidden> marker [21][22]. - The study emphasizes that traditional document-level filtering is inefficient and wasteful, while token-level filtering allows for precise removal of harmful knowledge without discarding entire documents [16][21]. Group 3: Security Implications - The research indicates that once a model has learned a dangerous capability, post hoc interventions like RLHF are insufficient to eliminate that knowledge, as attackers can easily bypass these defenses [10][12][14]. - Token-level filtering creates a natural barrier based on computational cost, making it prohibitively expensive for attackers to restore removed capabilities in future trillion-parameter models [27][49]. Group 4: AI Safety and Training - The study challenges the notion that models must first "know" what is dangerous to refuse harmful requests, showing that models filtered at the token level perform better in rejecting harmful queries [35][38]. - The research proposes a weak supervision process for labeling training data, significantly lowering the implementation cost of token-level filtering [41][46]. Group 5: Conclusion and Future Directions - The authors advocate for a "defense-in-depth" strategy, where token-level filtering during pre-training lays a solid foundation for subsequent alignment training, enhancing overall model safety [48][49]. - This research provides a viable path for organizations like OpenAI and Anthropic to scale their models while ensuring safety measures are in place [49][50].