Workflow
强化学习
icon
Search documents
好家伙!DeepSeek 一口气连发 2 个新模型
程序员的那些事· 2025-12-02 13:49
转自:量子位 | 公众号 QbitAI 突袭! ChatGPT发布三周年,DeepSeek嚯一下发出两个模型: 前者聚焦平衡实用 ,适用于日常问答、通用Agent任务、真实应用场景下的工具调用。 推理达GPT-5水平,略低于Gemini-3.0-Pro。 后者主打极致推理, 推理基准性能媲美Gemini-3.0-Pro。 还一把斩获IMO 2025、CMO 2025、ICPC World Finals 2025、IOI 2025金牌。 划重点,ICPC达到人类选手第二、IOI人类选手第十名水平。 具体来说,DeepSeek-V3.2侧重于平衡推理能力与输出长度,降低计算开销。 DeepSeek官微推文中写道,"DeepSeek-V3.2模型在Agent评测中达到了当前开源模型的最高水平"。 该模型其他情况如下: 下图展示的是DeepSeek-V3.2与其他模型在各类Agent工具调用评测集上的得分 DeepSeek-V3.2 DeepSeek-V3.2-Speciale 推理能力比肩GPT-5; 相比Kimi-K2-Thinking大幅缩短输出长度,减少用户等待时间; DeepSeek旗下首个"思考融入工具调 ...
llya 发言评述
小熊跑的快· 2025-12-02 07:12
Core Insights - The industry is transitioning from an era focused on "scaling" to one driven by "fundamental research" in AI development [1][2] - Ilya categorizes AI development into three phases: the Age of Research (2012-2020), the Age of Scaling (2020-2025), and a return to the Age of Research post-2025 [2] - Current AI models are facing limitations in scaling, necessitating a renewed focus on research methodologies similar to those used before 2020 [2][4] Group 1: Phases of AI Development - The Age of Research (2012-2020) was characterized by experimentation with new ideas and architectures, resulting in models like AlexNet, ResNet, and Transformer [2] - The Age of Scaling (2020-2025) introduced a straightforward yet effective approach of using more computational power, data, and larger models for pre-training, leading to significant advancements [2] - The anticipated return to the Age of Research suggests that the effectiveness of scaling is diminishing, prompting a need for innovative breakthroughs [2] Group 2: Critique of Current Approaches - Ilya questions the effectiveness of reinforcement learning and scoring methods, arguing they produce machines with limited generalization capabilities [3] - He emphasizes the importance of value functions in decision-making, likening human emotions to a simple yet effective value function that current large models struggle to replicate [3] - The concept of a new intelligent system capable of self-learning and growth is proposed, envisioning an AI akin to a 15-year-old capable of various tasks [3] Group 3: Industry Trends and Future Directions - Ilya's recent statements align with the industry's recognition of stagnation in large language models, attributed to data limitations [4] - Despite the diminishing returns of scaling, the focus should shift towards inference, with significant revenue projections for pure inference APIs and AI hardware rentals [4] - SSI, the company Ilya is associated with, prioritizes research and alignment, aiming to develop safe superintelligent systems without immediate commercial considerations [4][5]
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-12-02 04:59
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has extensive experience in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technological maturity, current implementation, and potential value, highlighting representative organizations and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
开源最强!“拳打GPT 5”,“脚踢Gemini-3.0”,DeepSeek V3.2为何提升这么多?
华尔街见闻· 2025-12-02 04:21
Core Insights - DeepSeek has released two official models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former achieving performance levels comparable to GPT-5 and the latter winning gold medals in four international competitions [1][3]. Model Performance - DeepSeek-V3.2 has reached the highest level of tool invocation capabilities among current open-source models, significantly narrowing the gap with closed-source models [2]. - In various benchmark tests, DeepSeek-V3.2 achieved a 93.1% pass rate in AIME 2025, closely trailing GPT-5's 94.6% and Gemini-3.0-Pro's 95.0% [20]. Training Strategy - The model's significant improvement is attributed to a fundamental change in training strategy, moving from a simple "direct tool invocation" to a more sophisticated "thinking + tool invocation" mechanism [9][11]. - DeepSeek has constructed a new large-scale data synthesis pipeline, generating over 1,800 environments and 85,000 complex instructions specifically for reinforcement learning [12]. Architectural Innovations - The introduction of the DeepSeek Sparse Attention (DSA) mechanism has effectively addressed efficiency bottlenecks in traditional attention mechanisms, reducing complexity from O(L²) to O(Lk) while maintaining model performance [6][7]. - The model's architecture allows for better context management, retaining relevant reasoning content during tool-related messages, thus avoiding inefficient repeated reasoning [14]. Competitive Landscape - The release of DeepSeek-V3.2 signals a shift in the competitive landscape, indicating that the absolute technical monopoly of closed-source models is being challenged by open-source models gaining first-tier competitiveness [20][22]. - This development has three implications: lower costs and greater customization for developers, reduced reliance on overseas APIs for enterprises, and a shift in the industry focus from "who has the largest parameters" to "who has the strongest methods" [22].
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-12-01 05:45
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has extensive experience in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value, highlighting representative organizations and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
Transformer作者爆料GPT-5.1内幕,OpenAI内部命名规则变乱了
3 6 Ke· 2025-12-01 01:25
Core Insights - The development of AI is not slowing down but is transitioning to a new paradigm, with a focus on reasoning models rather than just pre-training [4][10][32] - The recent release of GPT-5.1 represents a significant stability iteration rather than a minor update, emphasizing user experience and safety improvements [14][17][19] Group 1: AI Development Trends - There are two contrasting views on AI growth: one claims a slowdown, while the other highlights continuous advancements with new models like GPT-5.1 and Gemini 3 [5][10] - The internal perspective shows that AI capability growth follows a smooth exponential curve, akin to Moore's Law, driven by technological iterations and computational enhancements [7][10] - The shift from pre-training to reasoning models marks a critical turning point in AI development, with reasoning models still in their early stages and expected to progress rapidly [10][11][13] Group 2: GPT-5.1 and Model Evolution - GPT-5.1 is a substantial update focused on enhancing reasoning capabilities, safety, and user experience, despite appearing as a minor version change [14][15][17] - The naming convention for models has shifted to prioritize user experience, allowing for more flexibility in development and faster iteration cycles [17][19] - Despite improvements, GPT-5.1 still exhibits limitations in multi-modal reasoning, as demonstrated by its inability to solve simple problems that a child could easily answer [19][20] Group 3: Future of AI and Robotics - AI is expected to change the nature of work without eliminating jobs, as human expertise will still be required in high-stakes scenarios [32][34] - The next significant breakthrough in AI is anticipated to come from advancements in multi-modal reasoning and embodied intelligence, particularly in home robotics [36][34] - The progress in robotics will depend on the integration of multi-modal capabilities and general reinforcement learning, leading to a transformative leap in home automation technologies [36][34]
大模型Post-training的范式已经发生改变......
自动驾驶之心· 2025-12-01 00:04
Core Insights - The article discusses the evolution of post-training paradigms in large models, particularly the shift from SFT+RLHF to a new two-stage approach involving RL Scaling and RL Alignment, which may enhance reasoning capabilities and model performance [3][4][5]. Summary by Sections Post-Training Paradigm Shift - The traditional two-stage post-training method of SFT+RLHF has been widely adopted since the release of GPT-3.5, providing a foundation for rapid convergence and instruction-following capabilities [3]. - The new paradigm suggests that large reasoning models may transition to a two-stage approach involving RL Scaling and RL Alignment, focusing on enhancing self-reflection and reasoning abilities without the need for a convergence foundation [4]. Advantages of the New Approach - RL Scaling can improve model performance on verifiable tasks like math and coding, while RL Alignment adjusts the model to align with human instructions and readability [4]. - This new method potentially mitigates reward hacking issues present in traditional post-training approaches, allowing for greater freedom in token search and enhancing reasoning capabilities [5]. Opportunities and Challenges - The shift to RL Scaling presents opportunities to explore how to utilize data without clear answers and to balance the difficulty of tasks to optimize learning [7]. - There are concerns regarding safety, as the enhanced capabilities from RL Scaling may lead to harmful reasoning emerging from the model, raising questions about the effectiveness of the RL Alignment phase in ensuring safety [6][7]. Generalization and Transferability - The performance improvements seen in math and coding tasks can be generalized to other types of tasks, indicating a broader applicability of the new model capabilities [5]. - Despite the advancements, there remains a preference for models like GPT-4o that excel in understanding user intent and following instructions, highlighting the importance of effective communication and efficiency in practical applications [7].
Transformer作者爆料GPT-5.1内幕!OpenAI内部命名规则变乱了
量子位· 2025-11-30 11:30
Core Insights - The article discusses a significant paradigm shift in AI, indicating that the development of AI is not slowing down but rather transitioning to a new phase of growth [1][7][12]. Group 1: AI Development Trends - There are two contrasting views on AI development: one claims that AI growth is slowing down, while the other highlights continuous advancements with new models like GPT-5.1 and Gemini 3 being released [3][12]. - Łukasz Kaiser argues that the perception of slowing growth is incorrect, stating that AI's capability growth follows a smooth exponential curve, akin to Moore's Law [15][16]. - The shift from pre-training to reasoning models is a key factor in this transition, with pre-training being in a later stage of its S-curve while reasoning models are still in their early stages [18][19]. Group 2: Reasoning Models and Their Impact - The industry is focusing on smaller, cost-effective models that maintain quality, leading to the misconception that pre-training has stalled [21]. - Reasoning models, which allow for more complex thought processes and the use of tools during inference, are expected to progress rapidly due to their emerging nature [22][27]. - The evolution of models like ChatGPT demonstrates a qualitative leap in performance, with newer versions incorporating reasoning and external tool usage for more accurate responses [23][24]. Group 3: GPT-5.1 Insights - GPT-5.1 is not merely a minor update but represents a significant stability iteration, enhancing reasoning capabilities through reinforcement learning and synthetic data [34][35]. - The naming convention for versions has shifted to focus on user experience rather than technical details, allowing for greater flexibility in development [38]. - Despite improvements, GPT-5.1 still has limitations, particularly in multi-modal reasoning, as illustrated by its struggles with basic tasks that require contextual understanding [41][42]. Group 4: Future of AI and Robotics - AI is expected to change the nature of work without eliminating jobs, as human expertise will still be needed in high-stakes scenarios [62][66]. - Home robots are anticipated to be the next visible AI revolution, driven by advancements in multi-modal capabilities and general reinforcement learning [67][69]. - The integration of these technologies is expected to lead to a significant leap in the capabilities of home robots, making them more intuitive and perceptible compared to current AI models like ChatGPT [69].
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-11-30 05:09
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has extensive experience in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technological maturity, implementation status, and potential value, highlighting representative institutions and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
北京大学最新!MobileVLA-R1:机械臂之外,移动机器人的VLA能力怎么样了?
具身智能之心· 2025-11-30 03:03
Core Insights - The article discusses the introduction of MobileVLA-R1, a new framework for quadruped robots that bridges the gap between high-level semantic reasoning and low-level action control, addressing the challenges of stability and interpretability in existing methods [1][2][21]. Group 1: Need for Reconstruction of VLA Framework - Current quadruped robots face two main challenges: a semantic-control gap leading to instability in command execution and a lack of traceable reasoning that complicates error diagnosis [2]. - MobileVLA-R1's breakthrough lies in decoupling reasoning from action execution, allowing robots to "think clearly" before "acting accurately," enhancing both interpretability and control robustness [2][23]. Group 2: Implementation of MobileVLA-R1 - MobileVLA-R1 employs a structured CoT dataset, a two-stage training paradigm, and multi-modal perception fusion to achieve coherent reasoning, stable control, and strong generalization [4][6]. - The structured CoT dataset includes 18K episode-level samples, 78K step-level samples, and 38K navigation-specific samples, filling the gap in reasoning supervision from instruction to action [4][5]. Group 3: Performance Evaluation - In navigation tasks, MobileVLA-R1 achieved a success rate of 68.3% and 71.5% on R2R-CE and RxR-CE datasets, respectively, outperforming existing methods by an average of 5% [10]. - For quadruped control tasks, it achieved an average success rate of 73% across six locomotion and operation tasks, significantly surpassing baseline models [12][13]. Group 4: Real-World Deployment - MobileVLA-R1 was tested on the Unitree Go2 quadruped robot in various environments, demonstrating robust adaptation to complex scenarios with a success rate of 86%-91% for complex instructions [14][18]. - The integration of depth and point cloud encoders improved navigation success rates by 5.8%, highlighting the importance of 3D spatial information for scene understanding [19][20]. Group 5: Key Conclusions and Future Directions - MobileVLA-R1 innovatively integrates chain-of-thought reasoning with reinforcement learning, addressing the industry's dilemma of either interpretability or execution stability [21][23]. - Future directions include expanding the action space for more precise tasks, reducing reasoning latency through model optimization, and enhancing self-supervised learning to decrease reliance on labeled data [23].