Workflow
VLA模型
icon
Search documents
又火出圈!马斯克赞不绝口
格隆汇APP· 2025-12-22 11:12
作者 | 弗雷迪 数据支持 | 勾股大数 据(www.gogudata.com) 周末又吃到马斯克的瓜了。他在社交媒体上点赞王力宏伴舞机器人的话题冲上热搜。 12 月 18 日,王力宏成都演唱会首次引入人形机器人伴舞,共同演绎《火力全开》,六台宇树科技人形机器人完成高难度"韦伯斯特"空 翻的视频引发海内外关注,连特斯拉 CEO 马斯克也转发视频并评论称:" Impressive (令人印象深刻)。" 周一回来, A 股人形机器人板块继续反弹,港股中资概念股也出现冲高趋势, 机器人 ETF(159770)跟踪的 标的指数上涨 1.47% 。 数据显示,虽然到了年底,但资金对这个赛道依然十分关注。 即将迈入 2026 年,人形机器人的机会在哪里? 资金又在抄底? 周一,市场早盘集体走强,上证指数收涨 0.69% ,创业板指涨 2.23% 。 板块方面,贵金属、元件、通信设备、人形机器人题材活跃,医药商业、影视院线板块调整。 | 还原 板块名称 | 张幅 ÷ | | --- | --- | | 1 書金属 | +4.18% | | 2 元件 | +3.17% | | 3 电机 | +2.98% | | 4 通信设备 ...
超越π0.5,MiVLA通过人机相互模仿预训练,破解 VLA 模型泛化与数据瓶颈
具身智能之心· 2025-12-22 01:22
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zhenhan Yin等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在机器人视觉 - 语言 - 动作(VLA)模型领域,"数据稀缺" 与 "泛化薄弱" 始终是两大核心痛点——真实机器人数据采集成本高、场景覆盖有限,而模拟数据存在 "模拟 - 现实鸿沟"、人类数据面临形态差异难题,现有方案难以兼顾 "数据规模" 与 "迁移性能"。 由同济大学、电子科技大学等团队联合提出的 MiVLA 模型,以 "人机相互模仿预训练" 为核心创新,首次实现无需真实机器人数据,仅通过模拟机器人数据与人类 视频数据的融合训练,就能达成超越现有顶尖模型的泛化能力,为通用型机器人政策学习提供了低成本、高可扩展的全新路径。 为什么需要重构 VLA 预训练范式? 当前 VLA 模型训练陷入双重困境:一方面,依赖真实机器人数据的训练方案受限于 "数据瓶颈";另一方面,依赖单一模拟数据或人类数据的方案受限于 "模态鸿 ...
王晓刚和他的「世界模型」:一人管十狗,先让四足机器人上街干活丨36氪专访
36氪· 2025-12-19 10:31
Core Viewpoint - The article discusses the emergence of world models in AI, highlighting their significance in overcoming the limitations of previous VLA models and their potential applications in robotics and autonomous systems [4][9][22]. Group 1: World Model Development - The world model is a concept that addresses the inherent limitations of VLA models, which struggle to understand physical laws and require vast amounts of data for training [9][28]. - The introduction of the "Awakening" world model 3.0 allows robots to learn physical interactions and adapt to new environments, significantly reducing the dependency on specific scene data [8][10]. - The world model enables robots to transition from rote learning to understanding general principles, enhancing their ability to perform tasks across various scenarios [10][28]. Group 2: Practical Applications - The "Daxiao Robot" is utilizing the world model to deploy robotic dogs for urban management tasks, such as monitoring illegal parking and drone activity [6][7][12]. - The company plans to validate the world model's capabilities through real-world applications, starting with robotic dogs and expanding to more complex robotic forms in the future [16][56]. - The integration of the world model into robotic systems aims to create a closed-loop feedback mechanism, allowing for continuous improvement based on real-world performance [14][15][16]. Group 3: Commercial Strategy - The company intends to focus on B2B applications initially, targeting sectors like smart cities and urban management where autonomous capabilities are in high demand [58]. - Future plans include expanding into logistics and home environments, leveraging existing resources and partnerships to reduce entry costs [17][56]. - The strategy emphasizes collaboration with existing platform providers while also developing proprietary solutions to enhance product reliability and performance [50][52].
未来智造局|当AI走进物理世界:从一场技能赛看具身智能的“能”与“不能”
Xin Hua Cai Jing· 2025-12-17 16:53
新华财经上海12月17日电(记者杜康、龚雯)在日前举办的2025全球开发者先锋大会上,机器人在插 花、搬运、救灾等真实场景中"各显神通"。冷冰冰的技术参数,在这里化作了鲜活的技能比拼。当然, 大赛也暴露了具身智能"笨拙"的一面:在叠衣服、拧螺丝等精细操作背后,不少机器人仍连着"遥操 作"的手柄。 恰恰是在这"能"与"不能"的缝隙中,公众得以窥见这一火热领域的技术边界与未来方向。 从机器人的"能"里看技术进阶 回望过去一年,中国具身智能领域"快步疾行":智元远征A2人形机器人完成无间断百公里跨省行走, 充分证明了机器人能够"走得稳";行业商业化"大单"频现,机器人真正进入工厂,负责分拣、上下料; VLA(视觉-语言-动作)模型的进化,则让机器人大脑更聪明,能够听懂人的需求。 在2025全球开发者先锋大会上,观众再一次真切看到了机器人的"能"。 更棘手的是环境干扰。"光照变化、桌子周边物体的摆放、强光下周边物体在桌子上的倒影等,都有可 能让机器人'智商下线',操作不准。这种难以将目标与'背景噪音'剥离的困境,折射出当下具身智能在 物理场景理解能力上的短板——泛化性不足。"参赛队员对记者表示。 ——拧螺丝等精细活儿 ...
2025商用具身智能白皮书
艾瑞咨询· 2025-12-14 00:04
商用具身智能丨白皮书 核心摘要: 具身智能彻底火了。国外市场,FIGURE营收为零,最新估值已达390亿美金。国内市场,十五五规划明确将具身智能纳入重点产业布局。国内头部厂家陆续拿到工商 业场景订单,宇树科技宣称2025年营收突破10亿元。具身智能市场,既不是中国自弹自唱,也不是空中楼阁了。万亿级市场脉络已打开,且看中美具身智能的精彩 角逐。 定义阐释:具身智能的全球理解 汇聚学术与产业观点,确立统一评估基准 具身智能是人工智能的重要发展方向,被普遍认为是实现人工通用智能的重要路径。其核心特征在于智能体依托物理身体,通过感知—理解—决策—行 动的闭环,与环境进行强交互并持续学习,进而展现出自主性、泛化性和适应性。全球专家普遍强调,具身智能不仅是机器学习、计算机视觉与机器人 技术的综合体现,更是AI走向落地化和实用化的重要标志。 商用场景的分类和区别 技术支撑多样形态,跨界满足多元需求 不同形态的具身智能机器人正并行演进,满足零售、餐饮、工厂、物流、教育、医疗等场景需求。商用具身智能服务于零售、餐饮、医疗、安防等复杂 动态环境,更依赖多模态感知、人机交互和泛化能力,旨在提升服务体验与灵活运营。工业具身智能主要面 ...
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:46
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its first version [1] - A bet was placed by Xiaopeng's chairman with the autonomous driving team, aiming to match Tesla's FSD V14.2 performance by August 30, 2026, or face a challenge [1] Group 1: VLA Model and Industry Perspectives - The VLA model is seen as an advanced end-to-end solution, integrating visual perception (V), action execution (A), and a language model (L) to enhance decision-making and environmental understanding [5][11] - The industry has shifted from relying on LiDAR and high-precision maps to adopting AI-driven models like VLA, with a notable divergence in development paths emerging by 2025 [4][11] - Li Auto's VP emphasized the importance of real-world data over model architecture, asserting that VLA is the best solution due to their extensive data collection from millions of vehicles [6][8] Group 2: Diverging Technical Approaches - Huawei's approach focuses on the World Action (WA) model, which bypasses the language processing step, aiming for direct control through visual inputs [8][10] - The World Model concept allows AI systems to simulate the physical world, enhancing predictive capabilities and decision-making in autonomous driving [9][11] - Companies like NIO and SenseTime are also exploring the World Model approach, indicating a broader industry trend [10] Group 3: Future Integration and Evolution - There is a growing trend towards integrating VLA and World Models, with both technologies not being mutually exclusive but rather complementary [11][12] - Xiaopeng's second-generation VLA model aims to combine VLA and World Model functionalities, enhancing data training and decision-making processes [14][15] - The automotive industry anticipates further iterations in autonomous driving technology architecture over the next few years, potentially stabilizing by 2028 [15]
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果!理想高管回应宇树王兴兴质疑,多家车企押注的VLA,靠谱吗?
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:31
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its development as it is the first version [1] Group 1: VLA Model Development - Xiaopeng's chairman, He Xiaopeng, has made a special bet with the autonomous driving team, promising to establish a Chinese-style cafeteria in Silicon Valley if the VLA system matches Tesla's FSD V14.2 performance by August 30, 2026 [3] - The VLA model is seen as an advanced end-to-end solution, integrating visual perception, action execution, and language processing to enhance decision-making capabilities [7][12] - The VLA model aims to overcome traditional model limitations by incorporating a reasoning chain through language models, enhancing its adaptability to complex driving environments [7][12] Group 2: Industry Perspectives - There is a divergence in the industry regarding the development paths of VLA and world models, with companies like Li Auto and Xiaopeng favoring the VLA approach [6][12] - Li Auto's VP, Lang Xianpeng, emphasizes the importance of real-world data in developing effective autonomous driving systems, arguing that the VLA model is superior due to its data-driven approach [8][9] - Huawei and other companies are pursuing a world model approach, which focuses on direct control through visual inputs without the intermediary language processing [9][10][11] Group 3: Future Integration and Trends - Despite differing opinions, VLA and world models are not mutually exclusive and may increasingly integrate as both technologies evolve [12][17] - The future of autonomous driving technology is expected to see further iterations and stabilization by 2028, with a potential convergence of VLA and world model methodologies [17]
用SO-100,竟然完成这么多VLA实战......
具身智能之心· 2025-12-13 01:02
Core Viewpoint - The article discusses the challenges and complexities faced by beginners in implementing VLA (Vision-Language Alignment) models, emphasizing the need for practical experience and effective training methods to achieve successful deployment in real-world applications [2][4]. Group 1: Challenges in VLA Implementation - Many students report difficulties in achieving effective results with open-source models like GR00T and PI0, despite low training loss in simulations [2][4]. - The transition from simulation to real-world application (sim2real) poses significant challenges, particularly in data collection and model training [6][7]. - Beginners often struggle with the intricacies of VLA models, leading to prolonged periods of trial and error without achieving satisfactory outcomes [4][6]. Group 2: VLA Model Components - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on high-quality data acquisition [6]. - Training VLA models typically requires extensive simulation debugging, especially when real-world data is insufficient, utilizing frameworks like Mujoco and Isaac Gym [7]. - Post-training, models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [9]. Group 3: Educational Initiatives - The article introduces a practical course aimed at addressing the learning curve associated with VLA technologies, developed in collaboration with industry experts [10][12]. - The course covers a comprehensive range of topics, including hardware, data collection, VLA algorithms, and real-world experiments, designed to enhance practical skills [12][25]. - The course is targeted at individuals seeking to enter or advance in the field of embodied intelligence, with prerequisites including a foundational knowledge of Python and PyTorch [22].
全球强化学习+VLA范式,PI*0.6背后都有这家公司技术伏笔
具身智能之心· 2025-12-13 01:02
以下文章来源于具身纪元 ,作者具身纪元 具身纪元 . 见证具身浪潮,书写智能新纪元 编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在 Physical Intelligence 最新的成果 π 0.6 论文里,他们介绍了 π 0 .6 迭代式强化学习的思路来源: 其中有我们熟悉的 Yuke Zhu 的研究,也有他们自己(Chelsea Finn、Sergey Levine)的一些研究,我们之前对这些工作一直有跟踪和介绍。此外,还 有来自国内具身智能团队的工作,比如清华大学、星动纪元的研究。 随着 π*0.6 的发布,VLA+online RL 成为了一个行业共识的非常有前景的研究方向 ( 深扒了Π*0.6的论文,发现它不止于真实世界强化学习 、 英伟达也来做VLA在真实世界自我改进的方法了 )大语言模型从SFT到RL的发展方向也逐渐 在具身研究中清晰明朗。 一、为什么VLA+RL很重要 图注:VLA模型依赖研读微调 在具身智能(Embodi ...
全球强化学习+VLA范式,PI*0.6背后都有这家中国公司技术伏笔
机器之心· 2025-12-12 03:41
Core Insights - The article discusses the significance of integrating Vision-Language-Action (VLA) models with Reinforcement Learning (RL) in the field of Embodied AI, emphasizing the limitations of imitation learning and the necessity for robust learning methods [1][2][4]. Group 1: Importance of VLA+RL - VLA models are being developed to apply powerful Vision-Language Models (VLM) in the control of robots, primarily through supervised fine-tuning (SFT) [2]. - Imitation learning alone is insufficient for robots to handle novel situations, necessitating the use of RL to enhance robustness and persistence in task execution [4]. Group 2: Challenges in Applying RL to VLA - The integration of RL with VLA faces three main challenges: environmental differences, model instability, and computational demands [6]. - Direct application of RL algorithms to large VLA models can lead to catastrophic forgetting and training collapse, making it difficult to maintain performance [6]. Group 3: Solutions to VLA's RL Challenges - The industry has proposed three types of solutions to address the challenges faced by VLA in RL applications, with a focus on internalizing high-value behaviors through SFT [7][13]. - The iRe-VLA model introduces a two-phase iterative learning process that alternates between online RL for exploration and supervised learning for consolidation [10][15]. Group 4: iRe-VLA Model Architecture - The iRe-VLA model consists of a VLM backbone for understanding images and instructions, and an Action Head for translating features into control signals [11]. - The use of Low-Rank Adaptation (LoRA) technology allows for efficient training without the need for full model fine-tuning [12]. Group 5: Experimental Results and Analysis - Extensive experiments in both simulated environments and real-world scenarios demonstrate the effectiveness of the iRe-VLA method, showing significant improvements in task success rates [26][30]. - The iRe-VLA model outperformed traditional methods, achieving a success rate increase from 43% to 83% in benchmark tasks [30]. Group 6: Conclusion and Future Implications - The article concludes that the iRe-VLA approach provides a viable solution to the challenges of deploying large models in robotic control, ensuring stability and continuous learning [37][42]. - Future research directions include efficient exploration and learning of new skills under sparse rewards, as well as developing scalable RL algorithms for large VLA models [40].