VLA模型 - filings, earnings calls, financial reports, news - Reportify

VLA模型

Search documents

又火出圈！马斯克赞不绝口

格隆汇APP· 2025-12-22 11:12

作者 | 弗雷迪数据支持 | 勾股大数据（www.gogudata.com）周末又吃到马斯克的瓜了。他在社交媒体上点赞王力宏伴舞机器人的话题冲上热搜。 12 月 18 日，王力宏成都演唱会首次引入人形机器人伴舞，共同演绎《火力全开》，六台宇树科技人形机器人完成高难度"韦伯斯特"空翻的视频引发海内外关注，连特斯拉 CEO 马斯克也转发视频并评论称：" Impressive （令人印象深刻）。" 周一回来， A 股人形机器人板块继续反弹，港股中资概念股也出现冲高趋势，机器人 ETF(159770)跟踪的标的指数上涨 1.47% 。数据显示，虽然到了年底，但资金对这个赛道依然十分关注。即将迈入 2026 年，人形机器人的机会在哪里？资金又在抄底？周一，市场早盘集体走强，上证指数收涨 0.69% ，创业板指涨 2.23% 。板块方面，贵金属、元件、通信设备、人形机器人题材活跃，医药商业、影视院线板块调整。 | 还原板块名称 | 张幅 ÷ | | --- | --- | | 1 書金属 | +4.18% | | 2 元件 | +3.17% | | 3 电机 | +2.98% | | 4 通信设备 ...

人形机器人

机器人ETF(159770)

人形机器人

机器人ETF(159770)

超越π0.5，MiVLA通过人机相互模仿预训练，破解 VLA 模型泛化与数据瓶颈

具身智能之心· 2025-12-22 01:22

点击下方卡片，关注" 具身智能之心 "公众号作者丨 Zhenhan Yin等编辑丨具身智能之心本文只做学术分享，如有侵权，联系删文 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。在机器人视觉 - 语言 - 动作（VLA）模型领域，"数据稀缺" 与 "泛化薄弱" 始终是两大核心痛点——真实机器人数据采集成本高、场景覆盖有限，而模拟数据存在 "模拟 - 现实鸿沟"、人类数据面临形态差异难题，现有方案难以兼顾 "数据规模" 与 "迁移性能"。由同济大学、电子科技大学等团队联合提出的 MiVLA 模型，以 "人机相互模仿预训练" 为核心创新，首次实现无需真实机器人数据，仅通过模拟机器人数据与人类视频数据的融合训练，就能达成超越现有顶尖模型的泛化能力，为通用型机器人政策学习提供了低成本、高可扩展的全新路径。为什么需要重构 VLA 预训练范式？当前 VLA 模型训练陷入双重困境：一方面，依赖真实机器人数据的训练方案受限于 "数据瓶颈"；另一方面，依赖单一模拟数据或人类数据的方案受限于 "模态鸿 ...

人机相互模仿预训练

人机相互模仿预训练

王晓刚和他的「世界模型」：一人管十狗，先让四足机器人上街干活丨36氪专访

36氪· 2025-12-19 10:31

Core Viewpoint - The article discusses the emergence of world models in AI, highlighting their significance in overcoming the limitations of previous VLA models and their potential applications in robotics and autonomous systems [4][9][22]. Group 1: World Model Development - The world model is a concept that addresses the inherent limitations of VLA models, which struggle to understand physical laws and require vast amounts of data for training [9][28]. - The introduction of the "Awakening" world model 3.0 allows robots to learn physical interactions and adapt to new environments, significantly reducing the dependency on specific scene data [8][10]. - The world model enables robots to transition from rote learning to understanding general principles, enhancing their ability to perform tasks across various scenarios [10][28]. Group 2: Practical Applications - The "Daxiao Robot" is utilizing the world model to deploy robotic dogs for urban management tasks, such as monitoring illegal parking and drone activity [6][7][12]. - The company plans to validate the world model's capabilities through real-world applications, starting with robotic dogs and expanding to more complex robotic forms in the future [16][56]. - The integration of the world model into robotic systems aims to create a closed-loop feedback mechanism, allowing for continuous improvement based on real-world performance [14][15][16]. Group 3: Commercial Strategy - The company intends to focus on B2B applications initially, targeting sectors like smart cities and urban management where autonomous capabilities are in high demand [58]. - Future plans include expanding into logistics and home environments, leveraging existing resources and partnerships to reduce entry costs [17][56]. - The strategy emphasizes collaboration with existing platform providers while also developing proprietary solutions to enhance product reliability and performance [50][52].

SENSETIME(HK:00020)

开悟世界模型3.0

大晓具身超级大脑模组A1

开悟世界模型3.0

大晓具身超级大脑模组A1

未来智造局｜当AI走进物理世界：从一场技能赛看具身智能的“能”与“不能”

Xin Hua Cai Jing· 2025-12-17 16:53

新华财经上海12月17日电（记者杜康、龚雯）在日前举办的2025全球开发者先锋大会上，机器人在插花、搬运、救灾等真实场景中"各显神通"。冷冰冰的技术参数，在这里化作了鲜活的技能比拼。当然，大赛也暴露了具身智能"笨拙"的一面：在叠衣服、拧螺丝等精细操作背后，不少机器人仍连着"遥操作"的手柄。恰恰是在这"能"与"不能"的缝隙中，公众得以窥见这一火热领域的技术边界与未来方向。从机器人的"能"里看技术进阶回望过去一年，中国具身智能领域"快步疾行"：智元远征A2人形机器人完成无间断百公里跨省行走，充分证明了机器人能够"走得稳"；行业商业化"大单"频现，机器人真正进入工厂，负责分拣、上下料； VLA（视觉-语言-动作）模型的进化，则让机器人大脑更聪明，能够听懂人的需求。在2025全球开发者先锋大会上，观众再一次真切看到了机器人的"能"。更棘手的是环境干扰。"光照变化、桌子周边物体的摆放、强光下周边物体在桌子上的倒影等，都有可能让机器人'智商下线'，操作不准。这种难以将目标与'背景噪音'剥离的困境，折射出当下具身智能在物理场景理解能力上的短板——泛化性不足。"参赛队员对记者表示。 ——拧螺丝等精细活儿 ...

智元远征A2人形机器人

智元远征A2人形机器人

2025商用具身智能白皮书

艾瑞咨询· 2025-12-14 00:04

商用具身智能丨白皮书核心摘要：具身智能彻底火了。国外市场，FIGURE营收为零，最新估值已达390亿美金。国内市场，十五五规划明确将具身智能纳入重点产业布局。国内头部厂家陆续拿到工商业场景订单，宇树科技宣称2025年营收突破10亿元。具身智能市场，既不是中国自弹自唱，也不是空中楼阁了。万亿级市场脉络已打开，且看中美具身智能的精彩角逐。定义阐释：具身智能的全球理解汇聚学术与产业观点，确立统一评估基准具身智能是人工智能的重要发展方向，被普遍认为是实现人工通用智能的重要路径。其核心特征在于智能体依托物理身体，通过感知—理解—决策—行动的闭环，与环境进行强交互并持续学习，进而展现出自主性、泛化性和适应性。全球专家普遍强调，具身智能不仅是机器学习、计算机视觉与机器人技术的综合体现，更是AI走向落地化和实用化的重要标志。商用场景的分类和区别技术支撑多样形态，跨界满足多元需求不同形态的具身智能机器人正并行演进，满足零售、餐饮、工厂、物流、教育、医疗等场景需求。商用具身智能服务于零售、餐饮、医疗、安防等复杂动态环境，更依赖多模态感知、人机交互和泛化能力，旨在提升服务体验与灵活运营。工业具身智能主要面 ...

Figure系列机器人

奔奔人形机器人

Figure系列机器人

奔奔人形机器人

何小鹏立“赌约”：明年8月底前达到特斯拉FSD效果

Mei Ri Jing Ji Xin Wen· 2025-12-13 06:46

Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its first version [1] - A bet was placed by Xiaopeng's chairman with the autonomous driving team, aiming to match Tesla's FSD V14.2 performance by August 30, 2026, or face a challenge [1] Group 1: VLA Model and Industry Perspectives - The VLA model is seen as an advanced end-to-end solution, integrating visual perception (V), action execution (A), and a language model (L) to enhance decision-making and environmental understanding [5][11] - The industry has shifted from relying on LiDAR and high-precision maps to adopting AI-driven models like VLA, with a notable divergence in development paths emerging by 2025 [4][11] - Li Auto's VP emphasized the importance of real-world data over model architecture, asserting that VLA is the best solution due to their extensive data collection from millions of vehicles [6][8] Group 2: Diverging Technical Approaches - Huawei's approach focuses on the World Action (WA) model, which bypasses the language processing step, aiming for direct control through visual inputs [8][10] - The World Model concept allows AI systems to simulate the physical world, enhancing predictive capabilities and decision-making in autonomous driving [9][11] - Companies like NIO and SenseTime are also exploring the World Model approach, indicating a broader industry trend [10] Group 3: Future Integration and Evolution - There is a growing trend towards integrating VLA and World Models, with both technologies not being mutually exclusive but rather complementary [11][12] - Xiaopeng's second-generation VLA model aims to combine VLA and World Model functionalities, enhancing data training and decision-making processes [14][15] - The automotive industry anticipates further iterations in autonomous driving technology architecture over the next few years, potentially stabilizing by 2028 [15]

特斯拉FSD V14.2

特斯拉FSD V14.2

何小鹏立“赌约”：明年8月底前达到特斯拉FSD效果！理想高管回应宇树王兴兴质疑，多家车企押注的VLA，靠谱吗？

Mei Ri Jing Ji Xin Wen· 2025-12-13 06:31

Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its development as it is the first version [1] Group 1: VLA Model Development - Xiaopeng's chairman, He Xiaopeng, has made a special bet with the autonomous driving team, promising to establish a Chinese-style cafeteria in Silicon Valley if the VLA system matches Tesla's FSD V14.2 performance by August 30, 2026 [3] - The VLA model is seen as an advanced end-to-end solution, integrating visual perception, action execution, and language processing to enhance decision-making capabilities [7][12] - The VLA model aims to overcome traditional model limitations by incorporating a reasoning chain through language models, enhancing its adaptability to complex driving environments [7][12] Group 2: Industry Perspectives - There is a divergence in the industry regarding the development paths of VLA and world models, with companies like Li Auto and Xiaopeng favoring the VLA approach [6][12] - Li Auto's VP, Lang Xianpeng, emphasizes the importance of real-world data in developing effective autonomous driving systems, arguing that the VLA model is superior due to its data-driven approach [8][9] - Huawei and other companies are pursuing a world model approach, which focuses on direct control through visual inputs without the intermediary language processing [9][10][11] Group 3: Future Integration and Trends - Despite differing opinions, VLA and world models are not mutually exclusive and may increasingly integrate as both technologies evolve [12][17] - The future of autonomous driving technology is expected to see further iterations and stabilization by 2028, with a potential convergence of VLA and world model methodologies [17]

用SO-100，竟然完成这么多VLA实战......

具身智能之心· 2025-12-13 01:02

Core Viewpoint - The article discusses the challenges and complexities faced by beginners in implementing VLA (Vision-Language Alignment) models, emphasizing the need for practical experience and effective training methods to achieve successful deployment in real-world applications [2][4]. Group 1: Challenges in VLA Implementation - Many students report difficulties in achieving effective results with open-source models like GR00T and PI0, despite low training loss in simulations [2][4]. - The transition from simulation to real-world application (sim2real) poses significant challenges, particularly in data collection and model training [6][7]. - Beginners often struggle with the intricacies of VLA models, leading to prolonged periods of trial and error without achieving satisfactory outcomes [4][6]. Group 2: VLA Model Components - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on high-quality data acquisition [6]. - Training VLA models typically requires extensive simulation debugging, especially when real-world data is insufficient, utilizing frameworks like Mujoco and Isaac Gym [7]. - Post-training, models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [9]. Group 3: Educational Initiatives - The article introduces a practical course aimed at addressing the learning curve associated with VLA technologies, developed in collaboration with industry experts [10][12]. - The course covers a comprehensive range of topics, including hardware, data collection, VLA algorithms, and real-world experiments, designed to enhance practical skills [12][25]. - The course is targeted at individuals seeking to enter or advance in the field of embodied intelligence, with prerequisites including a foundational knowledge of Python and PyTorch [22].

《面向实战与求职的VLA小班课》

SO - 100机械臂

《面向实战与求职的VLA小班课》

SO - 100机械臂

全球强化学习+VLA范式，PI*0.6背后都有这家公司技术伏笔

具身智能之心· 2025-12-13 01:02

以下文章来源于具身纪元，作者具身纪元具身纪元 . 见证具身浪潮，书写智能新纪元编辑丨机器之心点击下方卡片，关注" 具身智能之心 "公众号 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球（戳我），这里包含所有你想要的! 在 Physical Intelligence 最新的成果 π 0.6 论文里，他们介绍了 π 0 .6 迭代式强化学习的思路来源：其中有我们熟悉的 Yuke Zhu 的研究，也有他们自己（Chelsea Finn、Sergey Levine）的一些研究，我们之前对这些工作一直有跟踪和介绍。此外，还有来自国内具身智能团队的工作，比如清华大学、星动纪元的研究。随着 π*0.6 的发布，VLA+online RL 成为了一个行业共识的非常有前景的研究方向（深扒了Π*0.6的论文，发现它不止于真实世界强化学习、英伟达也来做VLA在真实世界自我改进的方法了）大语言模型从SFT到RL的发展方向也逐渐在具身研究中清晰明朗。一、为什么VLA+RL很重要图注：VLA模型依赖研读微调在具身智能（Embodi ...

全球强化学习+VLA范式，PI*0.6背后都有这家中国公司技术伏笔

机器之心· 2025-12-12 03:41

Core Insights - The article discusses the significance of integrating Vision-Language-Action (VLA) models with Reinforcement Learning (RL) in the field of Embodied AI, emphasizing the limitations of imitation learning and the necessity for robust learning methods [1][2][4]. Group 1: Importance of VLA+RL - VLA models are being developed to apply powerful Vision-Language Models (VLM) in the control of robots, primarily through supervised fine-tuning (SFT) [2]. - Imitation learning alone is insufficient for robots to handle novel situations, necessitating the use of RL to enhance robustness and persistence in task execution [4]. Group 2: Challenges in Applying RL to VLA - The integration of RL with VLA faces three main challenges: environmental differences, model instability, and computational demands [6]. - Direct application of RL algorithms to large VLA models can lead to catastrophic forgetting and training collapse, making it difficult to maintain performance [6]. Group 3: Solutions to VLA's RL Challenges - The industry has proposed three types of solutions to address the challenges faced by VLA in RL applications, with a focus on internalizing high-value behaviors through SFT [7][13]. - The iRe-VLA model introduces a two-phase iterative learning process that alternates between online RL for exploration and supervised learning for consolidation [10][15]. Group 4: iRe-VLA Model Architecture - The iRe-VLA model consists of a VLM backbone for understanding images and instructions, and an Action Head for translating features into control signals [11]. - The use of Low-Rank Adaptation (LoRA) technology allows for efficient training without the need for full model fine-tuning [12]. Group 5: Experimental Results and Analysis - Extensive experiments in both simulated environments and real-world scenarios demonstrate the effectiveness of the iRe-VLA method, showing significant improvements in task success rates [26][30]. - The iRe-VLA model outperformed traditional methods, achieving a success rate increase from 43% to 83% in benchmark tasks [30]. Group 6: Conclusion and Future Implications - The article concludes that the iRe-VLA approach provides a viable solution to the challenges of deploying large models in robotic control, ensuring stability and continuous learning [37][42]. - Future research directions include efficient exploration and learning of new skills under sparse rewards, as well as developing scalable RL algorithms for large VLA models [40].