强化学习
Search documents
复旦&港大等团队!WholeBodyVLA:面向全身移动操作控制的VLA框架
具身智能之心· 2025-12-18 00:07
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 现有方法的不足 人形机器人需要精确的移动能力和灵巧的操作技能来完成具有挑战性的移动-操作任务。然而,现有的模块化或端到端方法在"操作感知型移动"方面存在不足。无法 通过规划和执行移动来主动创造操作所需的前提条件(如接近目标、调整姿态、保持稳定),而是将移动和操作视为独立阶段。 ★ 这使得机器人被限制在有限的工作空间内,难以完成大范围移动-操作任务。 ★ 核心挑战在于"操作感知型移动":规划和执行能够主动创造操作前提条件(接近、定向、稳定)的移动,而非将移动和操作视为独立阶段。 一种朴素的解决方案是通过高层规划器序列化移动和操作,在不同技能间切换(如导航与抓取)。然而,有限的闭环反馈和缺乏端到端联合优化可能导致误差累 积,使机器人处于不利于后续操作的次优状态。另一种有前景的方案是端到端框架,直接执行全身控制以缓解模块化pipeline的切换问题,但通 ...
突发,OpenAI大神姚顺雨,任腾讯首席AI科学家
3 6 Ke· 2025-12-17 10:21
Core Insights - Yao Shunyu, a prominent AI scientist from OpenAI, has joined Tencent as the Chief AI Scientist, overseeing both the AI Infra Department and the Large Language Model Department [1][2]. Group 1: Yao Shunyu's Background and Achievements - Yao Shunyu graduated from Tsinghua University, known for his exceptional academic performance, and later pursued a PhD at Princeton University [2][5]. - He has made significant contributions to the field of AI, particularly in the area of intelligent agents, with notable works such as "ReAct" and "Tree of Thoughts," which have been cited over 4,000 times each [9][11]. Group 2: Tencent's Strategic Moves - Tencent is undergoing a major restructuring of its internal large model development system, which includes the establishment of new departments focused on AI infrastructure and data [2]. - The company's actions aim to strengthen the foundational capabilities for large models, indicating a strategic shift towards enhancing computational and data resources [2]. Group 3: Future Directions in AI - Yao Shunyu has expressed insights on the future of AI, suggesting a shift in focus from problem-solving to problem-definition, emphasizing the importance of evaluation over training [20][22]. - He believes that understanding the specific tasks AI should perform is crucial for success in the evolving AI landscape, advocating for a role similar to that of a product manager [23].
NeurIPS掀起AI人才争夺战,年薪百万美元起步
日经中文网· 2025-12-17 08:00
Core Insights - The NeurIPS conference has evolved into a significant recruitment platform for AI talent, with approximately 25,000 attendees this year, highlighting the increasing demand for skilled professionals in the AI sector [2][4]. Group 1: Salary Trends - The expected first-year salary for AI professionals has reached $2 million, with starting salaries for in-demand fields like reinforcement learning set at $1 million [2][5]. - Salaries for top researchers in AI are now comparable to those of professional athletes, reflecting the intense competition among companies to attract talent [4][5]. Group 2: Recruitment Landscape - Around 150 sponsoring companies participated in the conference, all aiming to recruit exceptional talent for their AI research and development departments [4]. - Major tech companies, hedge funds, and investment firms are competing for AI talent, with firms like Citadel and DE Shaw offering competitive cash salaries, sometimes exceeding $1 million [5]. Group 3: International Participation - Chinese companies such as ByteDance and Alibaba participated in the conference, indicating a strong interest in AI talent, while Japanese companies had a less prominent presence [6]. - Many AI researchers in the U.S. are from China, as evidenced by the prevalence of Chinese language at the event [5].
最近收到了很多同学关于具身方向选择的咨询......
具身智能之心· 2025-12-17 00:05
Group 1 - The article discusses various directions in embodied intelligence, including VLN, VLA, reinforcement learning, and real2sim2real, highlighting the confusion among newcomers regarding which path to choose [1] - For those engaged in SLAM, both VLN and VLA are recommended as good entry points, especially if they have robotic arms, while low-cost hardware options like SO-100 can be utilized for experiments [1] - The importance of having a good idea is emphasized, as many new researchers face challenges in finding innovative topics, and the article offers a paper guidance service to assist them [1][2] Group 2 - The paper guidance service is led by a team of experts from top universities and leading companies, covering a range of prestigious conferences and journals [2] - The service provides a comprehensive support process from topic selection to publication strategy, aiming to help researchers produce high-quality results quickly [2][3] - The article also mentions a promotional offer where the first ten inquiries can receive a free matching with a dedicated mentor [5]
PPO-Clip的「盲点」被补齐了?快手提出熵比裁剪方法,从局部约束到全局稳定的关键一跃
机器之心· 2025-12-16 10:22
本研究由快手科技语言大模型团队完成,核心作者苏振鹏,潘雷宇等。快手语言大模型团队聚焦在基础语言大模型研发、Agent RL 等前沿技术创新等方向,积累务实的探索 AGI 的能力边界,并不断推进 AI 领域新技术和新产品的发展。此前,该团队已 开源了 Klear-46B-A2.5B 和 Klear-Reasoner-8B 等模型,其中 Klear-Reasoner-8B 在数学和代码的基准测试上达到了同参数级别 模型的 SOTA 效果。 在大语言模型的后训练阶段,强化学习已成为提升模型能力和对齐质量的核心范式。然而,在广泛采用的 off-policy 的训练范式 中,更新当前策略的数据由旧的行为策略生成,导致分布漂移的问题的发生,这通常会将策略推至信任域之外,使强化学习的 训练变得不稳定。 尽管 PPO 通过重要性采样的裁剪机制缓解了部分问题,但它仅能约束已采样动作的概率变化,忽略了未采样动作的全局分布漂 移。为了应对这些挑战,快手研究团队提出了一种创新的熵比裁剪方法。该方法从全新的视角切入,通过约束策略熵的相对变 化来稳定全局分布,为强化学习训练提供了更加可靠的控制手段。 研究背景 强化学习训练过程中长期面临 ...
许华哲,抓紧时间慢慢等具身的未来......
具身智能之心· 2025-12-16 00:02
作者丨 许华哲 编辑丨具身智能之心 本文已经得到许华哲博士的授权,未经允许,不得二次转载。 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 昨天看到了许华哲老师在社交媒体上的分享,关于数据、量产、本体和场景。类似的观点,今年IROS圆桌期间,许博也站在智能第一性原理上,将具身的未来发展 方向划分为欲望、先验和经验三个模块。 欲望。 在做智能体的时候,无论是物理的还是虚拟的,总觉得现在机器学习没有自己的学习欲望。我们可以设想一下,能不能给机器人一种自己的欲望? 经验。 经验是完成世界最终闭环的一种手段。有一天,在家里面看到一位维修师傅就是帮我们修煤气灶,他踩在一个梯子上拧一个东西,整个身体造型极为扭曲, 但他仍可以完美控制重心保持平衡,并且手上还可以做非常精细的操作。 ★ 这种思想也贯穿在后续的研发和学术探索上。 回想起几年前,我们还在讨论机器人什么时候能全地形走路,后来发现这个话题变成了"跑酷"、"跳舞"、"篮球"。这个变化速率让我知道这个事儿已经成了,如果 明年可以攀岩我并不吃惊。 但这极快的变化速率又显得格外不协调,因为我没在任何地方看到人形机器人真正服务人 ...
我和辛顿一起发明了复杂神经网络,但它现在需要升级
3 6 Ke· 2025-12-14 23:26
Group 1 - The core idea of the article revolves around the evolution of AI, particularly the contributions of Terrence Sejnowski and Geoffrey Hinton, highlighting the significance of the Boltzmann machine in modern deep learning [1][19] - Sejnowski emphasizes that while AI technology has advanced rapidly, a true understanding of intelligence may require generations of research and patience [6][22] - The conversation touches on the limitations of current AI models, such as ChatGPT, which lack essential components of human cognition, including memory and self-generated thought processes [3][21][38] Group 2 - Sejnowski argues that the current AI models primarily simulate a small part of brain function, specifically the cerebral cortex, and miss out on critical structures like the basal ganglia and hippocampus [4][26][40] - The discussion highlights the need for AI to integrate both cognitive and reinforcement learning, akin to human development, to achieve a more holistic understanding of intelligence [27][28] - The article suggests that understanding the mechanisms of intelligence in various species could lead to a more comprehensive theory of knowledge and understanding, rather than solely focusing on replicating human brain functions [51][52]
自动驾驶之心在招募业务合伙人!
自动驾驶之心· 2025-12-14 02:03
Core Viewpoint - The article emphasizes the need for collaboration and innovation in the autonomous driving industry, highlighting the importance of engaging more talented individuals to address the challenges and pain points in the sector [2]. Group 1: Industry Direction - The main focus areas in the autonomous driving field include but are not limited to: product management, 4D annotation/data loop, world models, VLA, large models for autonomous driving, reinforcement learning, and end-to-end solutions [4]. Group 2: Job Description - The positions are primarily aimed at training collaborations in autonomous driving, targeting both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) audiences for course development and original content creation [5]. Group 3: Contact Information - For discussions regarding compensation and collaboration methods, interested parties are encouraged to add the WeChat contact provided for further communication [6].
2025年还存活的自动驾驶公司......
自动驾驶之心· 2025-12-14 02:03
Group 1: Industry Overview - The penetration rate of L2 autonomous driving is rapidly increasing, while L3 is on the verge of implementation and L4 is breaking through in scale [2] - The autonomous driving industry is undergoing a new round of reshuffling and resource integration, with some companies exiting the market, others merging or acquiring, and new players emerging [2] Group 2: New Forces in Autonomous Driving - Key new players in the autonomous driving sector include NIO, Xpeng, Li Auto, Xiaomi, Leap Motor, Didi, WM Motor, Niu Chuang, Zeekr, Avita, Lantu, Qianli Technology, and Jiyue [4] Group 3: Tier 1 Suppliers - Major Tier 1 suppliers in the industry consist of Huawei, Baidu, DJI, ZTE, Tencent (smart cockpit/high-precision maps/simulation toolchain), SAIC Lingxu, Jianzhi Robotics, Momenta, Bosch China, Magna, and Youjia Innovation Minieye [6] Group 4: Robotaxi Companies - Companies involved in the Robotaxi segment include Baidu, Pony.ai, Shanghai Zhaofu Intelligent Technology (Hello Robotaxi), WeRide, Didi, Momenta, Qizhou Zhihang, and Yushi Technology [8] Group 5: Robotruck Companies - Key players in the Robotruck sector are Carl Power, Zhijia Technology, Winche Technology, Pony.ai, Mainline Technology, Sien Intelligent Driving, Xijing Technology, Feibu Technology, MuYue Technology (WeRide), Zitu Technology, Changxing Intelligent, Huanyu Zhixing, Xidi Intelligent Driving, Qianhua, Xingxing, Youdao Zhitu, Karui Zhixing, Qianchen, Weidu, Geely Remote, Hengrun, Hongjing, Xidi, and Qingtian Zhika [10] Group 6: Other Autonomous Driving Applications - Companies involved in various applications of autonomous driving include Meituan, Jiushi Intelligent, JD.com, Suning, Alibaba Cainiao, China Post, Baidu Apollo, VIA Technologies, Baixiniu, Zhixingzhe, Yushi Technology, Xingshen Intelligent, Jiazhi Technology, and Xiaoshi Technology [12] - Traditional automakers in the industry include SAIC, Changan, GAC (Aion), BAIC (Extreme Fox), FAW, Great Wall, BYD, Geely (Furuitai), Dongfeng, Chery, and Geely (Zeekr) [14] - Companies focusing on agricultural autonomous driving include Fengjiang Intelligent, Zoomlion, China Yituo, Wuniu Intelligent, Zhongke Yuandong, Leiwo Heavy Industry, Chaoxing Intelligent, Bochuang Liandong, and Haoxing Technology [16] - Companies in the mining autonomous driving sector include Yikong Zhijia, Taga Zhixing, Huituo Intelligent, Lukai Zhixing, Bolai Technology, Mengshi Technology, and Qingzhi Technology [18] - Companies in the sanitation autonomous driving sector include Zhixingzhe, Kuwa, Xiantou, Gaoxian Robotics, Shenlan Technology, Haorui Intelligent, Yuwan Zhijia, and Yunchuang Zhixing [20] - Companies involved in parking solutions include Baidu, Zhuishi, Desai Xiwai, Dongsoft Ruichi, Hedu Technology, Niuli Technology, Hengrun Technology, Lingshi Technology, Moshih Intelligent, Oteming, Zhixingzhe, and Yushi Technology [22] Group 7: High-Precision Mapping - Major players in high-precision mapping include Baidu, Amap, Four-Dimensional Map New, Tencent, Huawei, Didi, JD.com, Meituan, Kuandeng, Shendong, Zhonghaiting, and Yikaton [24] Group 8: Vehicle-to-Everything (V2X) Collaboration - Companies involved in vehicle-to-everything collaboration include Mushroom Car Union, Juefei Technology, Baidu, Huawei, Datang High-Tech, Huali Zhixing, Alibaba, Hikvision, Xingyun Interconnect, and Yunjing Zhixing [24]
军事医学研究院论文登上Cell头条
生物世界· 2025-12-13 10:00
撰文丨王聪 编辑丨王多鱼 排版丨水成文 近日, 军事医学研究院 的一项新研究登上了 Cell Press 头条。 该论文以: Computational modeling reveals cognitive processes in simple rodent depression tests 为题,于 2025 年 12 月 2 日在线发表于 Cell 子刊 Cell Reports Methods 上, 军事医学研究院 李至涵 为论文第一作者兼共同通讯作者, 李云峰 为论文通讯作者。该 研究通过结合自动化行为追踪和计算建 模,首次系统揭示了简单抑郁行为测试中隐藏的复杂认知过程,为理解抑郁样行为的认知机制提供了新视 角 总的来说,这项研究通过结合自动化行为追踪和计算建模,首次系统揭示了简单抑郁行为测试中隐藏的复 杂认知过程,为理解抑郁样行为的认知机制提供了新视角,并强调了分析完整行为轨迹的重要性。 这些发现挑战了当前对抑郁行为测试的传统理解,为未来开发更精确的动物行为分析方法和抗抑郁治疗策 略提供了重要理论基础。 论文链接 : https://www.cell.com /cell-reports-metho ...