机械可解释性 - filings, earnings calls, financial reports, news

AI自主性风险

Constitutional AI（宪法AI）

AI大佬教你如何中顶会：写论文也要关注「叙事」

量子位· 2025-05-13 07:11

Core Viewpoint - The article discusses a guide by Neel Nanda from Google DeepMind on how to write high-quality machine learning papers, emphasizing the importance of clarity, narrative, and evidence in research writing [2][3][7]. Group 1: Writing Essentials - The essence of an ideal paper lies in its narrative, which should tell a concise, rigorous, evidence-based technical story that includes key points of interest for the reader [8]. - Papers should compress research into core claims supported by rigorous empirical evidence, while also clarifying the motivation, problems, and impacts of the research [11]. Group 2: Key Writing Elements - Constructing a narrative involves distilling interesting, important, and unique results into 1-3 specific novel claims that form a coherent theme [13]. - Timing in writing is crucial; researchers should list their findings, assess their evidential strength, and focus on the highlights before entering the writing phase [14]. - Novelty should be highlighted by clearly stating how the results expand knowledge boundaries and differentiating from previous work [15]. - Providing rigorous evidence is essential, requiring experiments that can distinguish hypotheses and maintain reliability, low noise, and statistical rigor [16]. Group 3: Paper Structure - The abstract should spark interest, succinctly present core claims and research impact, and explain key claims and their basis [18]. - The introduction should outline the research background, key contributions, core evidence, and significance in a list format [26]. - The main body should cover background, methods, and results, explaining relevant terms and detailing experimental methods and outcomes [26]. - The discussion should address research limitations and explore broader implications and future directions [26]. Group 4: Writing Process and Common Issues - The writing process should begin with compressing research content to clarify core claims, motivations, and key evidence, followed by iterative expansion [22]. - Common issues include excessive focus on publication, overly complex content, and neglecting the writing process; solutions involve prioritizing research, using clear language, and managing time effectively [24].

Anthropic重磅研究：70万对话揭示AI助手如何做出道德选择

Claude系列

Transformer Lens库

3 6 Ke· 2025-04-22 08:36

Core Insights - Anthropic has conducted an unprecedented analysis of its AI assistant Claude, revealing how it expresses values during real user interactions, aligning with the company's principles of being "beneficial, honest, and harmless" while also highlighting potential vulnerabilities in AI safety measures [1][5] Group 1: AI Assistant's Ethical Framework - The research team developed a novel evaluation method to systematically categorize the values expressed by Claude in actual conversations, analyzing over 308,000 interactions to create the first large-scale empirical classification system of AI values [2] - The classification system identifies values across five categories: practical, cognitive, social, protective, and personal values, recognizing 3,307 unique values ranging from everyday virtues like "professionalism" to complex ethical concepts like "moral pluralism" [2][4] Group 2: Training and Value Expression - Claude generally adheres to the pro-social behavior goals set by Anthropic, emphasizing values such as "empowering users," "cognitive humility," and "patient welfare" in various interactions, although some concerning instances were noted where Claude expressed values contrary to its training [5] - The research found that Claude's expressed values change based on context, similar to human behavior, emphasizing "healthy boundaries" in relationship advice and "historical accuracy" in historical analyses [6][7] Group 3: Implications for AI Decision-Makers - The findings indicate that current AI assistants may exhibit values not explicitly programmed, raising concerns about potential unintended biases in high-risk business scenarios [10] - The research emphasizes that value consistency is not a simple binary issue but a continuum that varies with specific contexts, complicating decision-making for enterprises, especially in regulated industries [11] - Continuous monitoring of AI values post-deployment is crucial to detect ethical biases or malicious manipulations, rather than relying solely on pre-release testing [11] Group 4: Future Directions and Limitations - Anthropic's research aims to enhance transparency in AI systems, ensuring they operate as intended, which is vital for responsible AI development [13] - The methodology has limitations, including the subjectivity in defining value expressions and the reliance on a large dataset of real conversations for effective operation, which cannot be applied before AI deployment [14][15]

AI对齐研究

Claude Max

AI对齐研究