强化学习

Search documents
稚晖君机器人炸场:全球首秀“真男人必会的韦伯斯特空翻”
量子位· 2025-09-17 11:06
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 真就一个大写的"哇塞"—— 智元的 灵犀X2 ,成了 全球首个 完成 韦伯斯特空翻 的机器人! 要知道,韦伯斯特空翻是空翻里的进阶技巧,属于中高级水平。 一般完成这个动作,需要靠一条腿强有力地蹬地,另一条腿摆动带动身体翻转,对腿部爆发力和协调性要求更高。 而且啊,人类在抖音上也是以能完成这个动作为由头频发视频,例如 "重庆炫阳特技东哥" : △ 图源:抖音"重庆炫阳特技东哥" 网友们看完灵犀X2的韦伯斯特空翻,也是在评论区纷纷打出那句famous的"名言": 真男人必会韦伯斯特。 不过现在来看,这句话得改成 "真机器人,也得必会韦伯斯特" 了 。 稚晖君 还打趣说道: 灵犀X2成功做到了我都做不出的动作。 先来了解一下这个机器人 从官方的介绍来看,灵犀X2身高 1.3米 左右,全身有 25-31个自由度 (包括头部的2个自由度)。 由于这次完成韦伯斯特的灵犀X2去掉了头,因此应当是少了2个自由度。 从效果上来看,灵犀X2在运动方面的交互已经有着人类基本的水平,像 跑步 这样的基操,已经是可以应对各种各样的地形: 在无需导航的情况下,灵犀X2也可以完成 自主 ...
“百分之百的中国车”,别克首款增程式轿车至境L7亮相
Guan Cha Zhe Wang· 2025-09-17 10:38
Core Viewpoint - The Buick Zhijing L7, the first extended-range sedan from SAIC-GM Buick, was unveiled on September 15, 2023, and is touted as the "strongest extended-range luxury sedan" in the industry, developed entirely in China [1][3]. Group 1: Product Features - The Buick Zhijing L7 is built on the "Xiaoyao" super fusion architecture and features the "Zhenlong" extended-range technology, which includes a maximum power output of 252 kW and accelerates from 0 to 100 km/h in just 5.9 seconds [5]. - The vehicle boasts a comprehensive fuel consumption of 0.5L per 100 km, with a pure electric range of up to 302 km and a total range of 1420 km [5]. - It supports the fastest charging in its class at 130 kW, allowing for a 30% to 80% charge in just 18 minutes [5]. Group 2: Technological Advancements - The Zhijing L7 is equipped with the latest Qualcomm SA8775P chip, providing a neural network computing power of 72 TOPS, and features a 50-inch panoramic AR-HUD and a 15.6-inch smart central control screen [9]. - It incorporates the "Xiaoyao Zhixing" advanced driver-assistance system, which includes full-scene driving assistance capabilities and the industry's first "no-stop one-button parking" feature [7]. Group 3: Design and Comfort - The vehicle's dimensions are 5032 mm in length, 1952 mm in width, and 1500 mm in height, with a wheelbase of 3000 mm, positioning it as a C-class sedan with a sleek fastback design [11]. - The interior features a premium design with high-quality materials, including a 27-speaker Buick Sound theater-level audio system and multi-mode headrest speakers [11]. Group 4: Market Positioning - The Zhijing L7 will compete with domestic electric vehicles such as the Xiangjie S9 and Avita 12, and its brand strength in the new energy era remains to be validated [13].
腾讯AI Lab首创RL框架Parallel-R1,教大模型学会「并行思维」
机器之心· 2025-09-17 09:37
自从 Google Gemini 将数学奥赛的成功部分归功于「并行思维」后,如何让大模型掌握这种并行探索多种推理路径的能力,成为了学界关注的焦点。 然而,现有方法多依赖于监督微调(SFT),模型一来只能模仿预先构造的 parallel thinking 数据,难以泛化到真实的复杂任务中,其次这种方式对数据要求很高, 往往需要复杂的 data pipeline 来构造。 为解决这些难题,来自 腾讯 AI Lab 西雅图、马里兰大学、卡内基梅隆大学、北卡教堂山分校、香港城市大学、圣路易斯华盛顿大学等机构的研究者们( 第一作 者郑童是马里兰大学博士生,本工作于其在腾讯 AI Lab 西雅图实习期间完成) 首创了 Parallel-R1 框架 —— 这是第一个通过强化学习(RL)在通用数学推理任务 上教会大模型进行并行思维的框架 。该框架通过创新的「渐进式课程」与「交替式奖励」设计,成功解决了 RL 训练中的冷启动和奖励设计难题。 实验表明,Parallel-R1 不仅在多个数学基准上带来高达 8.4% 的平均准确率提升,更通过一种 "中程训练脚手架" 的策略,在 AIME25 测试中实现了 42.9% 的性能飞 跃 ...
AI革命下一站:Anthropic与OpenAI斥巨资打造“虚拟员工”
3 6 Ke· 2025-09-17 05:11
这样的训练成本不菲。据知情人士透露,Anthropic计划在未来一年内投入10亿美元,专门建设被称为"强化学习环境"或"健身房"的模拟 办公平台。OpenAI同样不惜重金,预计今年在数据相关领域的支出就将达到10亿美元,到2030年更将增至80亿美元。这些资金既用于搭 建虚拟办公环境,也用于支付专家薪酬。 9月17日消息,AI领域的两大巨头Anthropic和OpenAI正致力于开发能够替代人类执行复杂工作的"AI同事"。其核心方法是使用模拟企业 软件来训练AI模型,使其能像人类员工那样理解和操作真实的工作流程。 为加速这一进程,Anthropic计划在明年投入10亿美元建设大规模的AI训练"健身房"。OpenAI则认为,整个经济未来都可能变成巨大 的"强化学习机器",AI将通过与人类协作和反馈不断进化,从根本上重塑生产力与工作模式。 时薪最高250美元,"AI家教"正在教大模型如何办公 Anthropic与OpenAI正在做一件前所未有的事:让大语言模型真正走进"办公室",学习当一名合格的"数字员工"。 这些AI模型正在接受高强度职业培训,学习操作各类专业办公软件,从Salesforce的客户管理系统、Ze ...
速递|OpenAI和Anthropic的新战场:训练AI操作企业软件,成本年飙80亿美元
Z Potentials· 2025-09-17 03:34
Anthropic 、 OpenAI 等人工智能开发公司正在让大型语言模型 " 上班办公 " 。 这些 AI 模型正在学习使用从 Salesforce 的客户关系管理软件到 Zendesk 的客户支持系统,再到 Cerner 的医疗记录应用等各种工具。其目的是教会 AI 如何处理白领工作者所面临的一些复杂任务。 这种训练模式与 AI 模型以往的任何训练都不同。研究人员为 AI 提供模拟应用程序进行交互练习,同时聘请各领域专家向模型示范如何操作这些应 用。 这些技术的成本并不低廉。据一位知情人士透露, Anthropic 高管内部讨论过未来一年将斥资 10 亿美元打造这些 " 企业应用克隆体 " ——也被称为 强化学习环境或训练场。 雇佣生物学、软件编程和医学等领域的人类专家来教导模型学习新知识及办公软件操作,其成本也日益攀升。 OpenAI 今年早些时候预测,计划今年在数据相关成本上支出约 10 亿美元(包括支付人类专家费用和强化学习训练场), 到 2030 年这一数字将攀 升至 80 亿美元。 若取得成功,这些 AI 训练方法或能帮助 OpenAI 和 Anthropic 突破传统训练技术近期遭遇的部分局限 ...
星动纪元招聘!具身多模态、强化学习等多个方向
具身智能之心· 2025-09-17 00:02
Core Viewpoint - The article outlines various job descriptions and requirements for positions related to multi-modal reinforcement learning, data processing, and embodied intelligence, emphasizing the need for advanced skills in AI and machine learning technologies [6][14][15]. Group 1: Job Descriptions - Responsibilities include research, design, and implementation of cutting-edge multi-modal reinforcement learning algorithms to address complex real-world problems [6]. - Involvement in the collection, processing, cleaning, and analysis of multi-modal data to create high-quality training datasets [14]. - Development and optimization of multi-modal models, including training, fine-tuning, and enhancing performance across different tasks [6][15]. Group 2: Job Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, or robotics, with at least one year of research experience in computer vision or embodied intelligence [13]. - Proficiency in programming languages such as Python and deep learning frameworks like PyTorch is essential, along with strong engineering implementation skills [13]. - Experience in publishing papers at top academic conferences (e.g., CVPR, NeurIPS) and contributions to open-source projects are preferred [13][19]. Group 3: Additional Qualifications - Familiarity with multi-modal data cleaning, labeling, and loading, as well as understanding data optimization techniques is required [14]. - Candidates should have experience with large language models and multi-modal models, including knowledge of their capabilities and applicable scenarios [14]. - High standards for data quality and attention to detail are necessary, along with proficiency in data processing tools like Pandas and NumPy [14].
直击增程消费痛点,别克新能源豪华轿车至境L7全国首秀
Nan Fang Du Shi Bao· 2025-09-16 11:07
Core Insights - SAIC-GM's new luxury electric sedan, the Zhijing L7, was officially unveiled on September 15, featuring the "Zhenlong" range extender system and advanced AI technology [1][3] - The vehicle is positioned in the competitive 200,000-300,000 RMB market segment, aiming to provide consumers with a balanced choice between traditional fuel vehicles and electric cars [1][3] Product Features - The Zhijing L7's range extender system boasts a maximum power output of 252 kW, equivalent to a 3.0T V6 engine, with a 0-100 km/h acceleration time of just 5.9 seconds and a combined fuel consumption of only 0.5L per 100 km [4][6] - The vehicle offers a pure electric range of 302 km and a total range of 1420 km, addressing common consumer concerns regarding range anxiety [4][6] Market Positioning - The luxury and joint venture brands have faced significant challenges in the electric vehicle market, with the Zhijing L7 aiming to fill the gap in the sedan segment for range-extended vehicles [3][4] - The current market for range-extended vehicles is seen as a growing segment, particularly as consumer preferences evolve towards intelligent and electric solutions [6][8] Technological Advancements - The Zhijing L7 is equipped with the Momenta R6 flying wheel model, which enhances its intelligent driving capabilities, including features like "no-stop" city navigation and automated parking [6][8] - The vehicle utilizes Qualcomm's latest SA8775P chip, providing high computational power for its intelligent cabin and driving systems [8][10] Strategic Vision - The company emphasizes a long-term commitment to luxury, comfort, and quietness, aiming to balance various performance aspects rather than focusing solely on standout features [10]
别克至境L7增程轿车全国首秀
Huan Qiu Wang· 2025-09-16 11:03
2025年9月15日,新能源智能豪华轿车——至境L7首次公开亮相。作为别克高端新能源子品牌"至境"的首款旗舰轿车,至境L7采用顶级"真龙"增程技术,率 先搭载"逍遥智行"辅助驾驶系统,全球首发上车基于端到端"强化学习"的Momenta R6飞轮大模型,以及高通最新一代SA8775P芯片。此外,至境L7还拥有豪 华底盘和豪华舒享座舱,以及对标高端市场的配置。目前,至境L7已到达全国别克经销商展厅,并开启早鸟计划。 设计与舒适:豪华配置与底盘技术 至境L7拥有5032mmx1952mmx1500m车身尺寸和3000mm较长轴距。设计师从大自然汲取灵感,塑造了富有流动美感与张力的星空展翼外观,蓄势待发的豪 华溜背造型,具备超静谧NVH全车无框车门、隐藏门把手和20吋星光涡扇轮毂。银河星空展翼大灯、星轨浮光展翼尾灯,加上车顶激光雷达,以及标志"逍 遥智行"的小蓝灯,将科技融入优雅。 座舱采用全新纯净浮岛设计美学,塑造了简洁优雅、势能流淌的错层空间。内饰选材提供270°皮质环绕包覆。湖心岛式顶控、水中石晶雅顶灯,还有门板及 仪表台星河金砂饰条,呈现典雅、内敛的东方意蕴,营造高端、雅致的空间氛围。 至境L7拥有宽裕的座舱 ...
一文读懂GPT-5的绝招,这是决定AI未来的隐形武器
3 6 Ke· 2025-09-16 10:43
在GPT-5发布之前,Information曾报道称,GPT-5的性能提升主要来自其研发出的"通用验证器"(Universal Verifier)。 虽然GPT-5后续的能力升级不及预期,但通用验证器却已经成了大模型的下一个"圣杯",近期内成了AI圈内最近最热的话题之一。 为什么它这么关键? 这主要是因为上一波模型能力提升所倚仗的技术是"可验证奖励强化学习"(Reinforcement learning with verifiable rewards, RLVR)。简单说,就是先从 数学、编程这类有标准答案的问题入手:答对加分,答错扣分,训练效果立竿见影。 但现实世界远比"对"与"错"复杂。比如医疗、教育、创意领域,很多问题根本没有唯一解答,一个"好"的答案可能既要专业可靠,又要体现沟通和共情。 RLVR在这些场景下就显得力不从心,甚至让模型在开放性问题上退步。 要让模型进一步进化,就必须突破"对/错"奖励的限制,让AI能像专家一样在不同领域评估优劣,并将海量非结构化经验数据转化为有效的学习信号。通 用验证器正是为此而生,它被认为可能引发强化学习的下一次范式革新。 今天,就用一篇文章了解当下大语言模型界最重要 ...
上汽通用汽车“至境L7”公开亮相
Zhong Zheng Wang· 2025-09-16 06:13
Core Viewpoint - SAIC-GM's Buick brand has launched its flagship electric sedan, the Buick Zhijing L7, which aims to compete in the high-end electric vehicle market with advanced technology and features [1] Group 1: Product Launch - The Buick Zhijing L7 made its national debut on September 15 in Shanghai [1] - The vehicle is now available in Buick dealerships across the country and has initiated an early bird program offering lifetime free maintenance for orders placed before September 28 [1] Group 2: Technology and Features - The Zhijing L7 utilizes "True Dragon" range extension technology and is equipped with the "Xiaoyao Zhixing" driver assistance system [1] - It features the Momenta R6 flywheel model based on end-to-end "reinforcement learning" and Qualcomm's latest SA8775P chip, providing a top-tier intelligent electric experience [1] - The vehicle boasts a pure electric range of 302 km and a comprehensive range of 1420 km [1] Group 3: Market Positioning - The Zhijing L7 combines global automotive expertise with local innovation, aiming to enter the first tier of the electric vehicle market [1] - The vehicle is expected to create new opportunities for the Buick brand's development in the new era, leveraging industry-leading range extension technology and luxury experience [1]