模型规范
Search documents
我们对AI认识远远不足,所以透明度才至关重要|腾研对话海外名家
腾讯研究院· 2025-11-06 08:33
Core Viewpoint - The article emphasizes the importance of AI transparency, arguing that understanding AI's operations is crucial for governance and trust in its applications [2][3][9]. Group 1: Importance of AI Transparency - The ability to "see" AI is essential in an era where AI influences social interactions, content creation, and consumer behavior, raising concerns about misinformation and identity fraud [7][8]. - AI Activity Labeling is becoming a global consensus, with regulatory bodies in China and the EU mandating clear identification of AI-generated content to help users discern authenticity and reduce deception risks [7][8]. - Transparency not only aids in identifying AI interactions but also provides critical data for assessing AI's societal impacts and risks, which are currently poorly understood [8][9]. Group 2: Mechanisms for AI Transparency - AI labeling is one of the fastest-advancing transparency mechanisms, with China implementing standards and the EU establishing identification obligations for AI system providers [12][14]. - Discussions are ongoing about what should be labeled, who embeds the labels, and how to verify them, highlighting the need for effective implementation standards [12][14][15]. - The distinction between labeling content and AI's autonomous actions is crucial, as current regulations primarily focus on content, leaving a gap regarding AI's behavioral transparency [13]. Group 3: Model Specifications - Model specifications serve as a self-regulatory mechanism for AI companies, outlining expected behaviors and ethical guidelines for their models [17][18]. - The challenge lies in ensuring compliance with these specifications, as companies can easily make promises that are difficult to verify without robust enforcement mechanisms [18][20]. - There is a need for a balance between transparency and protecting proprietary information, as not all operational details can be disclosed without risking competitive advantage [20]. Group 4: Governance and Trust - Transparency is vital for building trust in AI systems, allowing users to understand AI's capabilities and limitations, which is essential for responsible usage and innovation [9][23]. - The article argues that transparency mechanisms should not only focus on what AI can do but also on how it operates and interacts with humans, fostering a more informed public [10][23]. - Ultimately, achieving transparency in AI governance is seen as a foundational step towards establishing a reliable partnership between AI technologies and society [23].
AI人格分裂实锤,30万道送命题,撕开OpenAI、谷歌「遮羞布」
3 6 Ke· 2025-10-27 00:40
Core Insights - The research conducted by Anthropic and Thinking Machines reveals that large language models (LLMs) exhibit distinct personalities and conflicting behavioral guidelines, leading to significant discrepancies in their responses [2][5][37] Group 1: Model Specifications and Guidelines - The "model specifications" serve as the behavioral guidelines for LLMs, dictating their principles such as being helpful and ensuring safety [3][4] - Conflicts arise when these principles clash, particularly between commercial interests and social fairness, causing models to make inconsistent choices [5][11] - The study identified over 70,000 scenarios where 12 leading models displayed high divergence, indicating critical gaps in current behavioral guidelines [8][31] Group 2: Stress Testing and Scenario Generation - Researchers generated over 300,000 scenarios to expose these "specification gaps," forcing models to choose between competing principles [8][20] - The initial scenarios were framed neutrally, but value biasing was applied to create more challenging queries, resulting in a final dataset of over 410,000 scenarios [22][27] - The study utilized 12 leading models, including five from OpenAI and others from Anthropic and Google, to assess response divergence [29][30] Group 3: Compliance and Divergence Analysis - The analysis showed that higher divergence among model responses often correlates with issues in model specifications, particularly among models sharing the same guidelines [31][33] - The research highlighted that subjective interpretations of rules lead to significant differences in compliance among models [15][16] - For instance, models like Gemini 2.5 Pro and Claude Sonnet 4 had conflicting interpretations of compliance regarding user requests [16][17] Group 4: Value Prioritization and Behavioral Patterns - Different models prioritize values differently, with Claude models focusing on moral responsibility, while Gemini emphasizes emotional depth and OpenAI models prioritize commercial efficiency [37][40] - The study also found that models exhibited systematic false positives in rejecting sensitive queries, particularly those related to child exploitation [40][46] - Notably, Grok 4 showed the highest rate of abnormal responses, often engaging with requests deemed harmful by other models [46][49]