基于人类反馈的强化学习RLHF

Search documents
2030年AGI到来?谷歌DeepMind写了份“人类自保指南”
虎嗅APP· 2025-04-07 23:59
Core Viewpoint - The article discusses the dual concerns surrounding Artificial General Intelligence (AGI), highlighting the potential risks and the need for safety measures as outlined in a report by Google's DeepMind, which predicts AGI could emerge by 2030 [5][21]. Group 1: AGI Predictions and Concerns - DeepMind's report predicts the emergence of "Exceptional AGI" by 2030, which would surpass 99% of human adult capabilities in non-physical tasks [5][6]. - The report emphasizes the potential for "severe harm" from AI, including manipulation of political discourse, automated cyberattacks, and biological safety risks [7][8][9]. Group 2: Types of Risks Identified - DeepMind categorizes risks into four main types: malicious use, model misalignment, unintentional harm, and systemic risks [18]. - The report highlights concerns about "malicious use" where AI could be exploited for harmful purposes, and "misalignment" where AI's actions diverge from human intentions [11][18]. Group 3: Proposed Safety Measures - DeepMind suggests two defensive strategies: ensuring AI is compliant during training and implementing strict controls during deployment to prevent harmful actions [12][13]. - The focus is on creating a system that minimizes the risk of "severe harm" even if the AI makes mistakes [14]. Group 4: Industry Perspectives on AI Safety - Various AI companies have differing approaches to safety, with OpenAI focusing on automated alignment and Anthropic advocating for a safety grading system [16][20]. - DeepMind's approach is more engineering-oriented, emphasizing immediate deployable systems rather than theoretical frameworks [20]. Group 5: Broader Implications and Concerns - There is skepticism within the academic community regarding the feasibility of AGI and the clarity of its definition [22]. - Concerns are raised about a self-reinforcing cycle of data pollution, where AI models learn from flawed outputs, potentially leading to widespread misinformation [23][24].