Workflow
强化学习
icon
Search documents
刚刚,DeepSeek-R1论文登上Nature封面,通讯作者梁文锋
机器之心· 2025-09-17 17:00
Core Viewpoint - The article highlights the significance of DeepSeek-R1, which is recognized as the first large language model (LLM) to pass peer review in a prestigious academic journal, Nature. This achievement marks a pivotal shift in the AI industry towards more rigorous scientific validation of AI models, moving from mere technical competition to a focus on scientific discipline and public trust [5][11][12]. Summary by Sections DeepSeek-R1 Overview - DeepSeek-R1 is trained using reinforcement learning, where the model receives rewards for correct answers and penalties for incorrect ones, enabling it to develop reasoning capabilities similar to human problem-solving [7][8]. - The model's ability to self-validate and reflect on its performance enhances its effectiveness in programming and advanced scientific inquiries [7]. Peer Review Significance - The peer review process serves as a critical gatekeeper, requiring AI companies to substantiate their claims with solid evidence rather than self-promotion [10]. - The rigorous evaluation of DeepSeek-R1's methodology and limitations by external experts helps to mitigate inflated claims in the AI industry [9][10]. Training Methodology - DeepSeek-R1 employs a novel multi-stage pipeline that enhances reasoning capabilities without relying heavily on supervised data [15]. - The model utilizes Group Relative Policy Optimization (GRPO) to reduce training costs and incorporates a dual reward mechanism based on accuracy and format [16][17]. - A structured training template guides the model to articulate its reasoning process before providing final answers, allowing for clear observation of its learning progress [18]. Performance and Limitations - DeepSeek-R1 demonstrates advanced self-evolution capabilities, developing higher-order reasoning skills autonomously during training [20]. - Despite its advancements, the model still faces challenges such as poor readability and language mixing in its outputs [21][26]. Cold Start and Reinforcement Learning - The development team collected a small amount of long Chain of Thought (CoT) data to stabilize the model during the early stages of reinforcement learning [22]. - The integration of language consistency rewards during training aims to improve the model's readability, although it may slightly affect performance [23]. Distillation and Model Efficiency - The team successfully distilled the reasoning capabilities of DeepSeek-R1 into smaller models, significantly enhancing their performance [29]. - Benchmark tests indicate that DeepSeek-R1 competes effectively with state-of-the-art models in reasoning tasks, showcasing its robust capabilities [30][31].
别克至境L7首次亮相:首发搭载高通SA8775P座舱芯片,采用“逍遥智行”辅助驾驶系统
Xin Lang Ke Ji· 2025-09-17 14:37
Core Viewpoint - Buick's high-end new energy sub-brand "Zhijing" has unveiled its flagship sedan, the Zhijing L7, which integrates over a century of Buick's experience and significant investment in resources [2] Group 1: Product Features - The Zhijing L7 is built on the new Buick "Xiaoyao" super integration vehicle architecture and is now available at Buick dealerships, with an early bird plan offering lifetime free maintenance for orders placed before September 28 [2] - It features the "Zhenlong" range extension system with a power output of 252 kW, achieving 0-100 km/h acceleration in 5.9 seconds and a low fuel consumption of 0.5 L per 100 km [2] - The vehicle offers a pure electric range of 302 km and a comprehensive range of 1420 km, with fast charging capabilities allowing 30% to 80% charge in just 18 minutes [2] Group 2: Advanced Technology - The Zhijing L7 is equipped with the Buick "Xiaoyao Zhixing" advanced driver assistance system, featuring the Momenta R6 flywheel model for full-scenario driving assistance, including "no-stop" city NOA and the industry's first "no-parking one-button parking" [3] - It incorporates Qualcomm's latest SA8775P chip with a computing power of 72 TOPS, a 50-inch panoramic AR-HUD head-up display, and a 15.6-inch smart central control screen [3] Group 3: Design and Comfort - The vehicle dimensions are 5032 mm x 1952 mm x 1500 mm with a wheelbase of 3000 mm, featuring a starry wing exterior design and a sleek coupe shape [3] - The interior boasts a new pure floating island design aesthetic, with high-quality Nappa leather seats and a 27-speaker Buick Sound theater-level audio system [3][4] Group 4: Chassis and Suspension - The Zhijing L7 utilizes a front double wishbone and rear five-link suspension structure, with RTD continuous damping variable suspension for real-time body posture control, enhancing ride comfort and stability [4]
稚晖君机器人炸场:全球首秀“真男人必会的韦伯斯特空翻”
量子位· 2025-09-17 11:06
Core Viewpoint - The article highlights the achievement of the Lingxi X2 robot, which has become the first robot globally to complete a Webster flip, a complex acrobatic maneuver that demonstrates advanced capabilities in robotics [1][7]. Group 1: Robot Capabilities - The Lingxi X2 robot stands approximately 1.3 meters tall and possesses 25-31 degrees of freedom, although it lost 2 degrees due to the removal of its head for the Webster flip [13][14]. - The robot can perform basic movements like running and can navigate various terrains without the need for navigation systems, showcasing its autonomous obstacle avoidance capabilities [16][19]. - The successful execution of the Webster flip required overcoming significant challenges, including high dynamical complexity, real-time perception and feedback, and high hardware reliability [23][24]. Group 2: Technological Innovations - The achievement is attributed to the Lingchuan platform, which is an AI-enhanced tool for robot motion and expression creation, allowing for the design and secondary development of robot movements [20][19]. - The robot's motion capabilities are based on a reinforcement learning strategy that utilizes human video data to train its movements, ensuring precise execution in real-world scenarios [24]. Group 3: Future Developments - The Lingxi X2 series includes other models such as Lingxi X2-W and Lingxi X2-N, which are designed for different operational capabilities, including task intelligence and adaptability to various terrains [26][34]. - The company plans to scale production of the Lingxi X2 by the second half of 2025, with an expected output of several thousand units by the end of 2026 [36].
“百分之百的中国车”,别克首款增程式轿车至境L7亮相
Guan Cha Zhe Wang· 2025-09-17 10:38
Core Viewpoint - The Buick Zhijing L7, the first extended-range sedan from SAIC-GM Buick, was unveiled on September 15, 2023, and is touted as the "strongest extended-range luxury sedan" in the industry, developed entirely in China [1][3]. Group 1: Product Features - The Buick Zhijing L7 is built on the "Xiaoyao" super fusion architecture and features the "Zhenlong" extended-range technology, which includes a maximum power output of 252 kW and accelerates from 0 to 100 km/h in just 5.9 seconds [5]. - The vehicle boasts a comprehensive fuel consumption of 0.5L per 100 km, with a pure electric range of up to 302 km and a total range of 1420 km [5]. - It supports the fastest charging in its class at 130 kW, allowing for a 30% to 80% charge in just 18 minutes [5]. Group 2: Technological Advancements - The Zhijing L7 is equipped with the latest Qualcomm SA8775P chip, providing a neural network computing power of 72 TOPS, and features a 50-inch panoramic AR-HUD and a 15.6-inch smart central control screen [9]. - It incorporates the "Xiaoyao Zhixing" advanced driver-assistance system, which includes full-scene driving assistance capabilities and the industry's first "no-stop one-button parking" feature [7]. Group 3: Design and Comfort - The vehicle's dimensions are 5032 mm in length, 1952 mm in width, and 1500 mm in height, with a wheelbase of 3000 mm, positioning it as a C-class sedan with a sleek fastback design [11]. - The interior features a premium design with high-quality materials, including a 27-speaker Buick Sound theater-level audio system and multi-mode headrest speakers [11]. Group 4: Market Positioning - The Zhijing L7 will compete with domestic electric vehicles such as the Xiangjie S9 and Avita 12, and its brand strength in the new energy era remains to be validated [13].
腾讯AI Lab首创RL框架Parallel-R1,教大模型学会「并行思维」
机器之心· 2025-09-17 09:37
自从 Google Gemini 将数学奥赛的成功部分归功于「并行思维」后,如何让大模型掌握这种并行探索多种推理路径的能力,成为了学界关注的焦点。 然而,现有方法多依赖于监督微调(SFT),模型一来只能模仿预先构造的 parallel thinking 数据,难以泛化到真实的复杂任务中,其次这种方式对数据要求很高, 往往需要复杂的 data pipeline 来构造。 为解决这些难题,来自 腾讯 AI Lab 西雅图、马里兰大学、卡内基梅隆大学、北卡教堂山分校、香港城市大学、圣路易斯华盛顿大学等机构的研究者们( 第一作 者郑童是马里兰大学博士生,本工作于其在腾讯 AI Lab 西雅图实习期间完成) 首创了 Parallel-R1 框架 —— 这是第一个通过强化学习(RL)在通用数学推理任务 上教会大模型进行并行思维的框架 。该框架通过创新的「渐进式课程」与「交替式奖励」设计,成功解决了 RL 训练中的冷启动和奖励设计难题。 实验表明,Parallel-R1 不仅在多个数学基准上带来高达 8.4% 的平均准确率提升,更通过一种 "中程训练脚手架" 的策略,在 AIME25 测试中实现了 42.9% 的性能飞 跃 ...
AI革命下一站:Anthropic与OpenAI斥巨资打造“虚拟员工”
3 6 Ke· 2025-09-17 05:11
这样的训练成本不菲。据知情人士透露,Anthropic计划在未来一年内投入10亿美元,专门建设被称为"强化学习环境"或"健身房"的模拟 办公平台。OpenAI同样不惜重金,预计今年在数据相关领域的支出就将达到10亿美元,到2030年更将增至80亿美元。这些资金既用于搭 建虚拟办公环境,也用于支付专家薪酬。 9月17日消息,AI领域的两大巨头Anthropic和OpenAI正致力于开发能够替代人类执行复杂工作的"AI同事"。其核心方法是使用模拟企业 软件来训练AI模型,使其能像人类员工那样理解和操作真实的工作流程。 为加速这一进程,Anthropic计划在明年投入10亿美元建设大规模的AI训练"健身房"。OpenAI则认为,整个经济未来都可能变成巨大 的"强化学习机器",AI将通过与人类协作和反馈不断进化,从根本上重塑生产力与工作模式。 时薪最高250美元,"AI家教"正在教大模型如何办公 Anthropic与OpenAI正在做一件前所未有的事:让大语言模型真正走进"办公室",学习当一名合格的"数字员工"。 这些AI模型正在接受高强度职业培训,学习操作各类专业办公软件,从Salesforce的客户管理系统、Ze ...
速递|OpenAI和Anthropic的新战场:训练AI操作企业软件,成本年飙80亿美元
Z Potentials· 2025-09-17 03:34
Anthropic 、 OpenAI 等人工智能开发公司正在让大型语言模型 " 上班办公 " 。 这些 AI 模型正在学习使用从 Salesforce 的客户关系管理软件到 Zendesk 的客户支持系统,再到 Cerner 的医疗记录应用等各种工具。其目的是教会 AI 如何处理白领工作者所面临的一些复杂任务。 这种训练模式与 AI 模型以往的任何训练都不同。研究人员为 AI 提供模拟应用程序进行交互练习,同时聘请各领域专家向模型示范如何操作这些应 用。 这些技术的成本并不低廉。据一位知情人士透露, Anthropic 高管内部讨论过未来一年将斥资 10 亿美元打造这些 " 企业应用克隆体 " ——也被称为 强化学习环境或训练场。 雇佣生物学、软件编程和医学等领域的人类专家来教导模型学习新知识及办公软件操作,其成本也日益攀升。 OpenAI 今年早些时候预测,计划今年在数据相关成本上支出约 10 亿美元(包括支付人类专家费用和强化学习训练场), 到 2030 年这一数字将攀 升至 80 亿美元。 若取得成功,这些 AI 训练方法或能帮助 OpenAI 和 Anthropic 突破传统训练技术近期遭遇的部分局限 ...
星动纪元招聘!具身多模态、强化学习等多个方向
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article outlines various job descriptions and requirements for positions related to multi-modal reinforcement learning, data processing, and embodied intelligence, emphasizing the need for advanced skills in AI and machine learning technologies [6][14][15]. Group 1: Job Descriptions - Responsibilities include research, design, and implementation of cutting-edge multi-modal reinforcement learning algorithms to address complex real-world problems [6]. - Involvement in the collection, processing, cleaning, and analysis of multi-modal data to create high-quality training datasets [14]. - Development and optimization of multi-modal models, including training, fine-tuning, and enhancing performance across different tasks [6][15]. Group 2: Job Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, or robotics, with at least one year of research experience in computer vision or embodied intelligence [13]. - Proficiency in programming languages such as Python and deep learning frameworks like PyTorch is essential, along with strong engineering implementation skills [13]. - Experience in publishing papers at top academic conferences (e.g., CVPR, NeurIPS) and contributions to open-source projects are preferred [13][19]. Group 3: Additional Qualifications - Familiarity with multi-modal data cleaning, labeling, and loading, as well as understanding data optimization techniques is required [14]. - Candidates should have experience with large language models and multi-modal models, including knowledge of their capabilities and applicable scenarios [14]. - High standards for data quality and attention to detail are necessary, along with proficiency in data processing tools like Pandas and NumPy [14].
直击增程消费痛点,别克新能源豪华轿车至境L7全国首秀
Nan Fang Du Shi Bao· 2025-09-16 11:07
Core Insights - SAIC-GM's new luxury electric sedan, the Zhijing L7, was officially unveiled on September 15, featuring the "Zhenlong" range extender system and advanced AI technology [1][3] - The vehicle is positioned in the competitive 200,000-300,000 RMB market segment, aiming to provide consumers with a balanced choice between traditional fuel vehicles and electric cars [1][3] Product Features - The Zhijing L7's range extender system boasts a maximum power output of 252 kW, equivalent to a 3.0T V6 engine, with a 0-100 km/h acceleration time of just 5.9 seconds and a combined fuel consumption of only 0.5L per 100 km [4][6] - The vehicle offers a pure electric range of 302 km and a total range of 1420 km, addressing common consumer concerns regarding range anxiety [4][6] Market Positioning - The luxury and joint venture brands have faced significant challenges in the electric vehicle market, with the Zhijing L7 aiming to fill the gap in the sedan segment for range-extended vehicles [3][4] - The current market for range-extended vehicles is seen as a growing segment, particularly as consumer preferences evolve towards intelligent and electric solutions [6][8] Technological Advancements - The Zhijing L7 is equipped with the Momenta R6 flying wheel model, which enhances its intelligent driving capabilities, including features like "no-stop" city navigation and automated parking [6][8] - The vehicle utilizes Qualcomm's latest SA8775P chip, providing high computational power for its intelligent cabin and driving systems [8][10] Strategic Vision - The company emphasizes a long-term commitment to luxury, comfort, and quietness, aiming to balance various performance aspects rather than focusing solely on standout features [10]
别克至境L7增程轿车全国首秀
Huan Qiu Wang· 2025-09-16 11:03
2025年9月15日,新能源智能豪华轿车——至境L7首次公开亮相。作为别克高端新能源子品牌"至境"的首款旗舰轿车,至境L7采用顶级"真龙"增程技术,率 先搭载"逍遥智行"辅助驾驶系统,全球首发上车基于端到端"强化学习"的Momenta R6飞轮大模型,以及高通最新一代SA8775P芯片。此外,至境L7还拥有豪 华底盘和豪华舒享座舱,以及对标高端市场的配置。目前,至境L7已到达全国别克经销商展厅,并开启早鸟计划。 设计与舒适:豪华配置与底盘技术 至境L7拥有5032mmx1952mmx1500m车身尺寸和3000mm较长轴距。设计师从大自然汲取灵感,塑造了富有流动美感与张力的星空展翼外观,蓄势待发的豪 华溜背造型,具备超静谧NVH全车无框车门、隐藏门把手和20吋星光涡扇轮毂。银河星空展翼大灯、星轨浮光展翼尾灯,加上车顶激光雷达,以及标志"逍 遥智行"的小蓝灯,将科技融入优雅。 座舱采用全新纯净浮岛设计美学,塑造了简洁优雅、势能流淌的错层空间。内饰选材提供270°皮质环绕包覆。湖心岛式顶控、水中石晶雅顶灯,还有门板及 仪表台星河金砂饰条,呈现典雅、内敛的东方意蕴,营造高端、雅致的空间氛围。 至境L7拥有宽裕的座舱 ...