Claude Sonnet 4.5

Search documents
深度讨论 Online Learning :99 条思考读懂 LLM 下一个核心范式|Best Ideas
海外独角兽· 2025-09-30 12:06
Core Viewpoint - Online learning is seen as a key pathway to achieving higher levels of intelligence, such as L4+ or AGI, by enabling models to dynamically iterate and generate new knowledge beyond existing human knowledge [4][5][6]. Group 1: Importance of Online Learning - Online learning is expected to lead to new scaling laws for models, significantly enhancing their performance on long-term tasks, which is crucial for AGI [4]. - The ability of models to self-explore and self-reward during the exploration process is essential for surpassing human knowledge limits [5]. - A balance between exploration and exploitation is necessary for models to autonomously generate new knowledge [5]. - Online learning is necessary for complex tasks, such as writing research papers or proving theorems, where continuous learning and adjustment are required [5]. Group 2: Practical Examples and Insights - Cursor's code completion model training process exemplifies online learning, utilizing real user feedback for iterative updates [6]. - The interaction data between humans and AI can enhance intelligence, with short-term tasks providing clearer feedback compared to long-term tasks [8]. - Cursor's approach may not fully represent online learning but resembles lifelong learning or automated data collection with periodic training [9]. Group 3: Conceptual Definitions and Non-Consensus - Online learning is not a singular concept and can be divided into Lifelong Learning and Meta Online Learning, each with distinct characteristics and challenges [12][10]. - Lifelong Learning focuses on clear goals and methods, while Meta Online Learning seeks to optimize test-time scaling curves but lacks clarity in methods [12][10]. - Two technical paths for online learning exist: direct interaction with the environment for Lifelong Learning and enhancing Meta Learning to facilitate Lifelong Learning [13]. Group 4: Challenges and Mechanisms - Online learning heavily relies on reward signals, which can be sparse and single-dimensional, complicating the learning process [23]. - The challenge of obtaining clear reward signals in complex environments limits the applicability of online learning [23][25]. - The distinction between online learning and online reinforcement learning (RL) is crucial, as online learning emphasizes continuous adaptation rather than just model updates [18][19]. Group 5: Memory and Architecture Considerations - Memory is a critical component of online learning, allowing models to adapt and improve without necessarily updating parameters [66][68]. - Future models should possess autonomous memory management capabilities, akin to human memory systems, to enhance learning efficiency [69]. - The architecture must support continuous data collection and influence model outputs, ensuring that interactions lead to meaningful learning [30][32]. Group 6: Evaluation Paradigms - New evaluation paradigms for online learning should include real-time adaptation and interaction, moving beyond static training and testing sets [95][96]. - The performance improvement rate during interactions can serve as a key metric for assessing online learning capabilities [90][92]. - Testing should incorporate both interaction and adaptation phases to accurately reflect the system's learning ability [97].
AI日报丨再套现超4000万美元!黄仁勋持续减持英伟达,看好OpenAI称其或成为下一个万亿美元巨头
美股研究社· 2025-09-30 12:06
A I 快 报 1 . 智谱旗舰模型GLM-4.6上线 寒武纪、摩尔线程已完成适配。 据智谱消息,最新的GLM-4.6模 型上线,其代码能力比前代GLM-4.5提升27%,在真实编程、长上下文处理、推理能力等多方面 表现优异。GLM-4.6在公开基准测试中达到国内最高水准,并在74个真实编程任务中超越其他国 产模型。 整理 | 美股研究社 在这个快速 变 化的 时代, 人工 智能技术正以前所未有的速度发展,带来了广泛的机会 。 《AI 日 报 》 致力于挖掘和分析最新的AI概念股公司和市场趋势,为您提供深度的行 业 洞察和 价 值 分析。 2. DeepSeek在下一代AI模型中首次引入"稀疏注意力"机制。 9月29日,DeepSeek更新了一个 实验性的人工智能(AI)模型,并称之为迈向新一代架构的中间步骤。DeekSeek在Hugging Face上发布帖子,概述了DeepSeek-V3.1-Exp平台,并解释说该平台引入一种名为DeepSeek Sparse Attention(DSA)的"稀疏注意力机制",DSA可以在长上下文中进行更快、更高效的训 练和推理。 3. Anthropic推出最新AI模型 ...
深夜炸场!Claude Sonnet 4.5上线,自主编程30小时,网友实测:一次调用重构代码库,新增3000行代码却运行失败
AI科技大本营· 2025-09-30 10:24
Core Viewpoint - The article discusses the release of Claude Sonnet 4.5 by Anthropic, highlighting its advancements in coding capabilities and safety features, positioning it as a leading AI model in the market [1][3][10]. Group 1: Model Performance - Claude Sonnet 4.5 has shown significant improvements in coding tasks, achieving over 30 hours of sustained focus in complex multi-step tasks, compared to approximately 7 hours for Opus 4 [3]. - In the OSWorld evaluation, Sonnet 4.5 scored 61.4%, a notable increase from Sonnet 4's 42.2% [6]. - The model outperformed competitors like GPT-5 and Gemini 2.5 Pro in various tests, including Agentic coding and terminal coding [7]. Group 2: Safety and Alignment - Claude Sonnet 4.5 is touted as the most "aligned" model to date, having undergone extensive safety training to mitigate risks associated with AI-generated code [10]. - The model received a low score in automated behavior audits, indicating a lower risk of misalignment behaviors such as deception and power-seeking [11]. - It adheres to AI Safety Level 3 (ASL-3) standards, incorporating classifiers to filter dangerous inputs and outputs, particularly in sensitive areas like CBRN [13]. Group 3: Developer Tools and Features - Anthropic has introduced several updates to Claude Code, including a native VS Code plugin for real-time code modification tracking [15]. - The new checkpoint feature allows developers to automatically save code states before modifications, enabling easy rollback to previous versions [21]. - The Claude Agent SDK has been launched, allowing developers to create custom agent experiences and manage long tasks effectively [19]. Group 4: Market Context and Competition - The article notes a competitive landscape with other AI models like DeepSeek V3.2 also making significant advancements, including a 50% reduction in API costs [36]. - There is an ongoing trend of rapid innovation in AI tools, with companies like OpenAI planning new product releases to stay competitive [34].
Anthropic发布最强编码模型Claude Sonnet 4.5,可自主编码30小时
3 6 Ke· 2025-09-30 09:17
Core Insights - Anthropic has launched its next-generation AI model, Claude Sonnet 4.5, claiming it to be the most advanced and secure coding and complex software intelligent agent model globally [3][4] - The model can autonomously run for 30 hours, significantly improving from its predecessor's 7-hour capability, and has been upgraded with new developer tools [4][8] Performance Metrics - Claude Sonnet 4.5 achieved an 82.0% score in the SWE-bench Verified benchmark, outperforming its predecessors and competitors like OpenAI's GPT-5 and Google's Gemini [6][7] - In the OSWorld test, it scored 61.4%, a notable increase from the 42.2% score of Sonnet 4 [7][8] Developer Ecosystem Enhancements - Anthropic has expanded its developer ecosystem with tools like the Claude Code, which now includes a checkpoints feature for automatic code state saving, and a native VS Code extension for seamless integration [10] - The introduction of advanced management tools, such as "context editing" and "memory tools," has improved task performance by 39% and reduced token consumption by 84% [10] Security and Alignment Improvements - Claude Sonnet 4.5 is described as the most aligned model to date, with extensive safety training that reduces the occurrence of harmful behaviors [11] - The model has been released under the ASL-3 framework, incorporating filters to detect and prevent the generation of potentially dangerous outputs, particularly in sensitive areas [11]
节前重磅:开源旗舰模型新SOTA,智谱GLM-4.6问世
机器之心· 2025-09-30 08:45
机器之心报道 机器之心编辑部 新一代大模型的发布,都赶在了国庆假期前。 昨天,深度求索刚刚开源 DeepSeek-V3.2-Exp。 今天,另一国产大模型之光智谱 AI 也正式发布了旗下新一代旗舰模型 GLM-4.6 ,刚好撞车 Claude Sonnet 4.5。 这一「节前惊喜」迅速点燃了技术圈的热情,海外开发者甚至发出了「Do the Chinese guys ever rest???」的感叹 。 但新模型也让大家非常期待,这不刚发出来,就被网友们给盯上了。 但有一点不同,智谱的 GLM-4.6 会继续开源,它即将上线 Hugging Face、ModelScope 等平台,遵循 MIT 协议。 性能新高,token 消耗降低 突破开源上限 作为 GLM 系列的最新版本,GLM-4.6 在多个方面实现了全面提升,包括但不限于: 高级编码能力:在公开基准与真实编程任务中,GLM-4.6 代码能力对齐 Claude Sonnet 4,是国内已知的最好的 Coding 模型; 上下文长度:上下文窗口由 128K 增加至 200K,适应复杂的代码与智能体任务; 推理能力提升,并支持在推理过程中调用工具; 根据智 ...
深夜炸场,Claude Sonnet 4.5上线,自主编程30小时,网友实测:一次调用重构代码库,新增3000行代码却运行失败
3 6 Ke· 2025-09-30 08:43
Core Insights - Anthropic has launched the Claude Sonnet 4.5, claiming it to be the "best coding model in the world" with significant improvements over its predecessor, Opus 4 [1][2]. Performance Enhancements - Claude Sonnet 4.5 can autonomously run for over 30 hours on complex multi-step tasks, a substantial increase from the 7 hours of Opus 4 [2]. - In the OSWorld evaluation, Sonnet 4.5 achieved a score of 61.4%, up from 42.2% of Sonnet 4, indicating a marked improvement in computer operation capabilities [4]. - The model outperformed competitors like GPT-5 and Gemini 2.5 Pro in various tests, including Agentic Coding and Agentic Tool Use [6][7]. Safety and Alignment - Claude Sonnet 4.5 is touted as the most "aligned" model to date, having undergone extensive safety training to mitigate issues like "hallucination" and "deception" [9][10]. - It has received an AI Safety Level 3 (ASL-3) rating, equipped with protective measures against dangerous inputs and outputs, particularly in sensitive areas like CBRN [12]. Developer Tools and Features - The update includes a native VS Code plugin for Claude Code, allowing real-time code modification tracking and inline diffs [13]. - A new checkpoint feature enables developers to save code states automatically, facilitating easier exploration and iteration during complex tasks [18]. - Claude API has been enhanced with context editing and memory tools, enabling the handling of longer and more complex tasks [20]. Market Response and Competition - Developers have expressed surprise at the capabilities of Claude Sonnet 4.5, with reports of it autonomously generating complete projects [21][22]. - The competitive landscape is intensifying, with other companies like DeepSeek also releasing new models that significantly reduce inference costs [29][32].
华虹半导体涨超15%,科创芯片ETF指数、科创芯片ETF涨超2%
Ge Long Hui A P P· 2025-09-30 05:10
Group 1: Semiconductor Stocks Performance - Semiconductor stocks continue to rise strongly, with Huahong Semiconductor increasing over 15% and reaching a new historical high, while leading company SMIC rose by 2.88%, also hitting a historical high [1] - Various semiconductor ETFs, including the Fortune and Guotai ETFs, saw gains of over 2% [1] Group 2: ETF Performance Details - The Fortune Sci-Tech Chip ETF (588810) rose by 2.96% with a 5-day increase of 8.32% and an estimated scale of 577 million [2] - The Guotai Sci-Tech Chip ETF (589100) increased by 2.87% with a 5-day increase of 8.34% and an estimated scale of 641 million [2] - The top ten weighted stocks in the Sci-Tech Chip ETF include Cambricon, Haiguang Information, SMIC, and others, focusing on semiconductor materials, equipment, design, manufacturing, packaging, and testing [2] Group 3: AI Chip Industry Developments - DeepSeek announced a significant update to its services, reducing API costs by over 50%, which has been adapted by several domestic chip manufacturers [3] - Analysts from Huaxin Securities express optimism about the domestic AI chip industry, highlighting a complete industry chain from advanced processes to model acceleration by major companies [3] - Zhongyin Securities notes that the commercialization of AI applications is accelerating, leading to increased demand for computing power in the domestic market [3] Group 4: New AI Models and Market Trends - Anthropic launched a new large model, Claude Sonnet 4.5, capable of running autonomously for 30 hours, excelling in cybersecurity and financial services [4] - Tencent released and open-sourced its native multimodal image model, HunyuanImage 3.0, with a parameter scale of 80 billion [4] - TrendForce predicts a shift in AI infrastructure focus towards efficient inference services, with increasing demand for Nearline SSDs due to shortages in traditional HDDs [4]
能连续干活超30小时!Claude发起AI编程新一轮竞赛
Di Yi Cai Jing Zi Xun· 2025-09-30 04:13
2025年最热赛道AI编程又卷起来了。北京时间9月30日,Anthropic 正式发布Claude Sonnet 4.5,官方称 其为"世界上最好的编程模型",同时在智能体构建、计算机使用、推理和数学能力上有显著突破。从各 方面评测来看,Claude维持了自身在编程领域的王座地位。 在不少业内人士看来,Anthropic 此次更新的时间点颇具深意——一周后,OpenAI 将举办年度开发者大 会;而前不久,OpenAI 刚刚发布了强化智能体编程能力的 GPT-5-Codex,宣称能独立处理长达7小时的 大型复杂任务。 而这一次,Anthropic 直接将标准拉高:Sonnet 4.5 能够在复杂、多步骤任务中保持超过30小时的持续注 意力。 这一能力也得到了业内的认证,iGent AI的CEO表示,Sonnet 4.5"重新设定了行业期望",它可以自主处 理30多个小时的代码,使工程师在极短的时间内处理数月的复杂架构工作,同时保持大量代码库的一致 性。 从官方的评测来看,Sonnet 4.5在编程、数学等各项评测上超越了GPT-5和谷歌的Gemini2.5 pro。 在考察真实编程水平的 SWE-bench V ...
加量不加价,一篇说明白 Claude Sonnet 4.5 强在哪
Founder Park· 2025-09-30 03:46
Core Viewpoint - Anthropic has launched the Claude Sonnet 4.5 model, claiming it to be the best coding model in the world, with a focus duration of over 30 hours for complex multi-step tasks, surpassing OpenAI's GPT-5 Codex [2][9]. Pricing and Cost Efficiency - The pricing for Claude Sonnet 4.5 remains the same as its predecessor, at $3 per million tokens for input and $15 per million tokens for output. Cost savings of up to 90% can be achieved through prompt caching, and batch processing can save 50% [2]. Developer Tools and Integration - Anthropic has introduced the Claude Agent SDK and an experimental feature called "Imagine with Claude" for developers, allowing integration with platforms like Amazon Bedrock and Google Cloud's Vertex AI [3][26]. Performance Metrics - In the SWE-bench Verified evaluation, Claude Sonnet 4.5 achieved industry-leading scores, with a 61.4% score in the OSWorld benchmark, significantly improving from the previous model's 42.2% [10][12]. Enhanced Features - The model includes new features such as a checkpoint function in Claude Code, context editing, and memory tools, enabling it to handle longer tasks and more complex operations [4][24]. Application and Usability - Users can interact with Claude Sonnet 4.5 through the Claude.ai website and mobile applications, with integrated functionalities for code execution and file creation directly within conversations [5][6]. Safety and Alignment - Claude Sonnet 4.5 is noted for its improved alignment and safety features, reducing undesirable behaviors such as deception and flattery, and making significant progress in defending against prompt injection attacks [24][25]. Experimental Features - The "Imagine with Claude" feature allows real-time software generation, showcasing the model's capabilities in adapting to user requests without pre-written code [31][33]. Recommendations - Anthropic recommends all users upgrade to Claude Sonnet 4.5 for enhanced performance across all applications, with updates available for both the Claude Code and developer platform [34].
Anthropic 深夜祭出 Claude Sonnet 4.5,能自主连续工作 30 小时,CEO:它更像你的同事
3 6 Ke· 2025-09-30 03:20
Core Insights - Anthropic has launched its new AI model, Claude Sonnet 4.5, claiming it to be the best coding model and a powerful tool for building complex agents, capable of independently completing production-level development tasks [1][10] - The model has shown significant improvements in software coding capabilities, achieving a 77.2% accuracy in the SWE-bench Verified benchmark, which is nearly a 20 percentage point increase from its predecessor [2][5] - Claude Sonnet 4.5 can autonomously run for 30 hours, generating 11,000 lines of code and completing a full development cycle for an enterprise chat application [2] Performance Metrics - The model's OSWorld benchmark score improved from 42.2% to 61.4% over four months, outperforming similar products in the industry [4][5] - In specialized fields like finance and law, the model's reasoning capabilities have improved by over 30% compared to the previous version, Opus 4.1 [4][5] - Claude Sonnet 4.5 achieved a perfect score of 100% in high school math competitions and 89.1% in multilingual Q&A tasks [5] Product Ecosystem Upgrades - Anthropic has introduced several product updates, including Claude Code 2.0, which features a "checkpoint" function for code progress saving and instant rollback, enhancing developer efficiency [8] - The API capabilities have been strengthened, extending the AI agent's runtime from 7 hours to 30 hours for more complex tasks [8] - A new browser extension, Claude for Chrome, has been made available for Max subscription users, integrating code execution and document creation directly within the application [8] Developer Empowerment - The release of the Claude Agent SDK allows developers to build customized AI assistants, addressing key challenges in AI agent development such as long-term task memory management and multi-agent coordination [9] - This SDK has already been validated by engineering teams at companies like Canva, improving codebase management and product research efficiency [9] Safety and Compliance - Claude Sonnet 4.5 has achieved AI Safety Level 3 (ASL-3) certification, significantly reducing the false positive rate by 90% compared to earlier models [10] - The model includes advanced content detection for hazardous materials and has made notable progress in defending against immediate injection attacks, a significant risk for users [10] Commercial Strategy - Anthropic maintains competitive pricing for API calls, consistent with the previous model, at $3 per million tokens for input and $15 for output [13] - The company positions Claude Sonnet 4.5 as the default choice for users, while still allowing access to older models for specific workflows [13] - Analysts suggest that the launch of Claude Sonnet 4.5 signifies a shift from AI as an "assistive tool" to "independent productivity," with the open SDK potentially accelerating AI agent technology adoption across industries [13][14]