Workflow
VLA模型
icon
Search documents
2025商用具身智能白皮书
艾瑞咨询· 2025-12-31 22:34
Core Insights - Embodied intelligence has gained significant traction globally, with Figure achieving a valuation of $39 billion despite zero revenue, while domestic players are securing commercial orders and projecting substantial revenue growth [1][4] - The Chinese market is integrating embodied intelligence into its strategic development plans, indicating a shift towards a trillion-dollar market potential [1][9] Definition and Understanding - Embodied intelligence is recognized as a crucial development in artificial intelligence, characterized by agents that interact with their environment through a physical body, showcasing autonomy and adaptability [2] - It represents a convergence of machine learning, computer vision, and robotics, marking a significant step towards practical AI applications [2] Commercial Scene Classification - Different forms of embodied intelligent robots are evolving to meet diverse needs across retail, dining, manufacturing, logistics, education, and healthcare [4] - Commercial applications focus on enhancing service experiences in dynamic environments, while industrial applications emphasize precision and stability in structured settings [4] Strategic Significance - Embodied intelligence is pivotal in narrowing the technological gap between China and the U.S., driving innovation across various sectors including manufacturing and healthcare [6] - The competition in advanced technology between the two nations highlights the importance of breakthroughs in embodied intelligence for economic and competitive advantages [6] Policy Incentives - The Chinese government is actively promoting the development of embodied intelligence through various policies, funding, and standardization efforts [9] - Local governments are also implementing initiatives to support industry growth, including funding for humanoid robots and establishing collaborative platforms [9] Development Stages - The evolution of embodied intelligence can be categorized into three phases: conceptual development (1950s), technological accumulation (2000-2020), and application expansion driven by large models (2020 onwards) [11] - The current phase sees the U.S. leveraging its advantages in computational power and capital, while China accelerates its catch-up through policy support and industry collaboration [11] Bottlenecks and Challenges - The transition from experimental to commercial applications faces challenges such as data scarcity, high costs, and technical limitations in dexterity and generalization [13][16] - The industry is exploring solutions to overcome these challenges, including the establishment of data collection training grounds and innovative data acquisition methods [19] Model Evolution - The VLA model is emerging as a consensus for the development of embodied intelligence, integrating reasoning capabilities with real-world perception and action [21] - This evolution is expected to lead to a significant leap in capabilities, akin to the breakthroughs seen with large language models [21] Commercialization Breakthroughs - The path to large-scale commercialization of embodied intelligence hinges on advancements in five key dimensions: endurance, latency, execution, reliability, and economic viability [29] - Initial applications are focusing on low-complexity, high-ROI scenarios, with future expansions into more complex environments as technology matures [31] Global Market Predictions - The global market for embodied intelligence is projected to reach 19.2 billion RMB by 2025, with a compound annual growth rate of 73% over the next five years [46] - China's market is expected to experience significant growth, potentially exceeding 280 billion RMB by 2035, driven by a robust industrial ecosystem [50] Competitive Landscape - The competition in the embodied intelligence sector is characterized by three main players: AI-native challengers like Figure, traditional industrial players like ABB, and cross-industry giants like Tesla [55] - The market is anticipated to undergo consolidation as product homogeneity increases, leading to a potential first wave of industry shakeout [57] Initial Player Strategies - Startups in the sector must leverage their agility and innovation capabilities to survive against established giants, focusing on strategic partnerships and long-term value creation [59]
对话大晓机器人董事长王晓刚:不押注VLA,押注世界模型
Sou Hu Cai Jing· 2025-12-25 07:59
Core Insights - The current technological routes in embodied intelligence, particularly the VLA model, have significant flaws in understanding the physical world and its laws [4][11] - Many companies are developing embodiments, but there is a lack of products that can truly understand the world and solve real problems [5] - In 2025, the domestic market is expected to see a surge in instant retail warehousing applications, which require 24/7 service, presenting an opportunity for robots to excel [5] Group 1: Company Strategy - The CEO of DaXiao Robotics, Wang Xiaogang, emphasizes a restrained approach by not entering the crowded embodiment market or betting on VLA, but instead focusing on the world model as a consensus direction in the industry [6][8] - DaXiao Robotics aims to integrate soft and hard solutions, addressing the shortcomings of existing technology routes, particularly the VLA model, which does not require a true understanding of the physical world [11][12] - The company’s world model consists of three parts: multi-modal understanding, long-term dynamic interaction scenes, and predictive capabilities, which are essential for the core of their technology [13] Group 2: Market Position and Opportunities - The industry is still maturing, and the head positioning has not been completed, with significant opportunities for new startups due to existing technological flaws [17] - The company sees a unique opportunity in the integration of hardware and software, leveraging its extensive client base from previous years to achieve rapid scaling in the robotics field [18] - Short-term goals include deploying four-legged robotic dogs with navigation and AI capabilities, while mid-term focus will be on commercial service scenarios like flash purchase warehouses [19] Group 3: Technological Differentiation - The ACE research paradigm proposed by DaXiao Robotics is seen as a revolutionary change that could provide a competitive edge in the market [18] - The world model approach is believed to be more adaptable and capable of covering a wider range of scenarios compared to VLA, which is limited by its embodiment [21] - The company plans to open-source its model to gather diverse feedback and data, differentiating its development path from other countries [22]
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting a shift from traditional supervised learning methods to reinforcement learning (RL) approaches to enhance model generalization and reasoning capabilities [2]. Summary by Sections VLA + RL Research Overview - The article summarizes recent works in the VLA + RL domain, indicating a trend towards using RL to address limitations in previous models, particularly in terms of hallucination issues and the efficiency of continuous action space exploration [2]. Key Papers and Contributions - **MindDrive**: Introduces a framework that transforms action space into a discrete language decision space, achieving a driving score of 78.04 and a success rate of 55.09% on the Bench2Drive benchmark using a lightweight model [6]. - **WAM-Diff**: Proposes an end-to-end VLA framework that utilizes masked diffusion for trajectory optimization, achieving superior performance on the NAVSIM benchmark [7]. - **LCDrive**: Addresses temporal expression and latency issues in text chain reasoning by employing a latent chain-of-thought mechanism, demonstrating improved reasoning efficiency and trajectory quality [12]. - **Reasoning-VLA**: Develops a framework that enhances parallel trajectory generation through learnable action queries, achieving high performance across multiple datasets [13]. - **Alpamayo-R1**: Bridges reasoning and action prediction through a modular architecture and multi-stage training, improving generalization in long-tail scenarios [18]. - **AdaThinkDrive**: Introduces a dual-mode mechanism to balance decision accuracy and reasoning efficiency, achieving a PDMS score of 90.3 on the Navsim benchmark [20]. - **AutoDrive-R²**: Combines supervised fine-tuning and RL to enhance trajectory planning accuracy, achieving state-of-the-art performance with a significant reduction in error rates [25]. - **IRL-VLA**: Proposes a framework that avoids reliance on simulators by using a reward world model, achieving state-of-the-art performance on the NAVSIM v2 benchmark [31]. - **DriveAgent-R1**: Integrates active perception with hybrid thinking, achieving significant improvements in decision reliability and efficiency [32]. - **Drive-R1**: Connects reasoning and planning in VLMs, providing effective methods for integrating reasoning with motion planning [37]. - **ReCogDrive**: Merges cognitive reasoning with diffusion planners, achieving state-of-the-art performance while addressing the limitations of imitation learning [38].
又火出圈!马斯克赞不绝口
格隆汇APP· 2025-12-22 11:12
作者 | 弗雷迪 数据支持 | 勾股大数 据(www.gogudata.com) 周末又吃到马斯克的瓜了。他在社交媒体上点赞王力宏伴舞机器人的话题冲上热搜。 12 月 18 日,王力宏成都演唱会首次引入人形机器人伴舞,共同演绎《火力全开》,六台宇树科技人形机器人完成高难度"韦伯斯特"空 翻的视频引发海内外关注,连特斯拉 CEO 马斯克也转发视频并评论称:" Impressive (令人印象深刻)。" 周一回来, A 股人形机器人板块继续反弹,港股中资概念股也出现冲高趋势, 机器人 ETF(159770)跟踪的 标的指数上涨 1.47% 。 数据显示,虽然到了年底,但资金对这个赛道依然十分关注。 即将迈入 2026 年,人形机器人的机会在哪里? 资金又在抄底? 周一,市场早盘集体走强,上证指数收涨 0.69% ,创业板指涨 2.23% 。 板块方面,贵金属、元件、通信设备、人形机器人题材活跃,医药商业、影视院线板块调整。 | 还原 板块名称 | 张幅 ÷ | | --- | --- | | 1 書金属 | +4.18% | | 2 元件 | +3.17% | | 3 电机 | +2.98% | | 4 通信设备 ...
超越π0.5,MiVLA通过人机相互模仿预训练,破解 VLA 模型泛化与数据瓶颈
具身智能之心· 2025-12-22 01:22
Core Insights - The article discusses the MiVLA model, which addresses the challenges of "data scarcity" and "generalization weakness" in the field of robot vision-language-action (VLA) models by utilizing a novel "human-robot mutual imitation pre-training" approach, allowing for effective training without real robot data [2][19] - MiVLA combines simulated robot data and human video data to achieve superior generalization capabilities, providing a low-cost and scalable path for general robot policy learning [2][19] Summary by Sections Need for Reconstructing VLA Pre-training Paradigm - Current VLA training faces dual challenges: reliance on real robot data is limited by high costs and limited scene coverage, while single-modal approaches suffer from "modal gaps" [3] - Effective VLA pre-training requires a unified approach that balances data scale, behavioral fidelity, and cross-modal adaptation [3] MiVLA's Design and Features - MiVLA's core design is based on aligning human and robot action spaces through mutual imitation pre-training, merging the diversity of simulated robot data with the fidelity of human video data [5] - Key features include: - Bidirectional human-robot action space mapping to overcome morphological differences [7] - Mutual imitation pre-training that leverages dual-source data advantages [8] - A diffusion transformer architecture to support continuous robot control [8] - Lightweight and efficient training for scalable deployment [8] Experimental Validation and Results - MiVLA was tested in both simulated and real robot environments, demonstrating significant performance improvements over baseline models [9][11] - In simulated tasks, MiVLA outperformed baseline models in 20 representative tasks, achieving an average success rate of 69% in easy mode and 66% in hard mode [10] - In real robot tasks, MiVLA matched the performance of large-scale real data pre-trained models using only medium-scale mixed data [11] Generalization Capability - MiVLA exhibited strong adaptability across different scenes, objects, and positions, achieving an average generalization success rate of 54% with only 20 demonstration data points [17][18] - The model's ability to handle unknown robot forms and complex tasks was validated through various experimental setups [11][14] Conclusion and Future Directions - MiVLA demonstrates that human-robot mutual imitation is key to overcoming data bottlenecks, allowing for the construction of a more generalized VLA model without real robot data [18] - Future improvements will focus on enhancing performance in extreme out-of-distribution scenarios, integrating multimodal information, and expanding data coverage [18]
王晓刚和他的「世界模型」:一人管十狗,先让四足机器人上街干活丨36氪专访
36氪· 2025-12-19 10:31
Core Viewpoint - The article discusses the emergence of world models in AI, highlighting their significance in overcoming the limitations of previous VLA models and their potential applications in robotics and autonomous systems [4][9][22]. Group 1: World Model Development - The world model is a concept that addresses the inherent limitations of VLA models, which struggle to understand physical laws and require vast amounts of data for training [9][28]. - The introduction of the "Awakening" world model 3.0 allows robots to learn physical interactions and adapt to new environments, significantly reducing the dependency on specific scene data [8][10]. - The world model enables robots to transition from rote learning to understanding general principles, enhancing their ability to perform tasks across various scenarios [10][28]. Group 2: Practical Applications - The "Daxiao Robot" is utilizing the world model to deploy robotic dogs for urban management tasks, such as monitoring illegal parking and drone activity [6][7][12]. - The company plans to validate the world model's capabilities through real-world applications, starting with robotic dogs and expanding to more complex robotic forms in the future [16][56]. - The integration of the world model into robotic systems aims to create a closed-loop feedback mechanism, allowing for continuous improvement based on real-world performance [14][15][16]. Group 3: Commercial Strategy - The company intends to focus on B2B applications initially, targeting sectors like smart cities and urban management where autonomous capabilities are in high demand [58]. - Future plans include expanding into logistics and home environments, leveraging existing resources and partnerships to reduce entry costs [17][56]. - The strategy emphasizes collaboration with existing platform providers while also developing proprietary solutions to enhance product reliability and performance [50][52].
未来智造局|当AI走进物理世界:从一场技能赛看具身智能的“能”与“不能”
Xin Hua Cai Jing· 2025-12-17 16:53
Core Insights - The 2025 Global Developer Pioneer Conference showcased advancements in robotics, highlighting both capabilities and limitations in real-world applications [1][2] - The field of embodied intelligence has made significant progress over the past year, with robots demonstrating improved stability and functionality in various tasks [2][3] Group 1: Technological Advancements - The A2 humanoid robot successfully completed a continuous 100-kilometer cross-province walk, demonstrating its stability [2] - The evolution of the Visual-Language-Action (VLA) model has enhanced robots' cognitive abilities, allowing them to understand human commands and adapt to unfamiliar environments [2] - Robots showcased their skills in tasks such as flower arrangement and restaurant service, effectively identifying materials and controlling grip strength to prevent spills [2] Group 2: Limitations and Challenges - Robots still struggle with complex tasks like folding clothes due to the variability of soft materials, requiring extensive training data [4] - Precision tasks such as screwing require human remote operation, as robots lack the necessary tactile feedback and understanding of physical properties like friction and torque [6] - In industrial settings, while robots can navigate and grasp objects, they still face challenges with stability and precision during operations [7] Group 3: Future Directions - The industry is exploring new research paradigms to address existing challenges, with "world models" being a focal point for improving spatial understanding and causal reasoning [8] - Experts suggest that the evolution of embodied intelligence should transition from imitation to reasoning, integrating planning and control into a unified framework [8][9] - The industry must overcome data scarcity and promote collaboration through open standards and challenges to facilitate algorithm reproducibility and commercialization [9]
2025商用具身智能白皮书
艾瑞咨询· 2025-12-14 00:04
Core Insights - Embodied intelligence has gained significant traction globally, with Figure achieving a valuation of $39 billion despite zero revenue, while domestic players are securing commercial orders and projecting substantial revenue growth [1][4] - The Chinese market is integrating embodied intelligence into its strategic development plans, indicating a shift towards a trillion-dollar market potential [1][9] Definition and Understanding - Embodied intelligence is recognized as a crucial development in artificial intelligence, characterized by agents that interact with their environment through a physical body, showcasing autonomy and adaptability [2][4] - It represents a convergence of machine learning, computer vision, and robotics, marking a significant step towards practical AI applications [2] Commercial Scene Classification - Different forms of embodied intelligent robots are evolving to meet diverse needs across retail, dining, manufacturing, logistics, education, and healthcare [4] - Commercial applications focus on enhancing service experiences in dynamic environments, while industrial applications emphasize precision and stability in structured settings [4] Strategic Significance - Embodied intelligence is pivotal in narrowing the technological gap between China and the U.S., driving innovation across various sectors including manufacturing and healthcare [6][9] - The competition in advanced technology between the two nations highlights the importance of breakthroughs in embodied intelligence for economic and competitive advantages [6] Policy Incentives - The Chinese government is actively promoting the development of embodied intelligence through various policies, funding, and standardization efforts [8][9] Development Stages - The evolution of embodied intelligence can be categorized into three phases: conceptual development, technological accumulation, and application expansion driven by large models [11] - The current phase sees intense competition between the U.S. and China in foundational models and application deployment [11] Bottlenecks and Challenges - Key challenges include data collection, technology maturity, high costs, and long ROI cycles, which hinder large-scale commercialization [13][16] - The industry is exploring solutions to overcome data scarcity and improve training efficiency [19] Model Evolution - The VLA model is emerging as a consensus for the development of embodied intelligence, integrating reasoning capabilities with real-world perception and action [21][23] - This evolution is expected to lead to significant advancements in the capabilities of robots [21] Commercialization Breakthroughs - The commercialization of embodied intelligence is anticipated to reach a turning point as it overcomes challenges in endurance, latency, execution, reliability, and economic viability [29][31] - Initial applications are focusing on low-complexity, high-ROI scenarios, with future expansions into more complex environments [31] Global Market Predictions - The global market for embodied intelligence is projected to grow exponentially, with estimates suggesting a compound annual growth rate of 73% over the next five years [46] - China's market is expected to experience significant growth, potentially reaching over 280 billion yuan by 2035 [50] Competitive Landscape - The competition in the embodied intelligence sector is characterized by three main players: AI-native challengers, traditional industrial players, and cross-industry giants [55] - The market is witnessing a trend towards consolidation as product homogeneity increases, indicating an impending first round of industry shakeout [57] Initial Players and Innovations - Companies like Tesla and Figure AI are leading the charge in developing humanoid robots, with significant advancements in capabilities and market readiness [62][64] - Innovations in core components, such as dexterous hands and micro-servo actuators, are critical for enhancing the functionality of embodied intelligence [83][88]
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:46
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its first version [1] - A bet was placed by Xiaopeng's chairman with the autonomous driving team, aiming to match Tesla's FSD V14.2 performance by August 30, 2026, or face a challenge [1] Group 1: VLA Model and Industry Perspectives - The VLA model is seen as an advanced end-to-end solution, integrating visual perception (V), action execution (A), and a language model (L) to enhance decision-making and environmental understanding [5][11] - The industry has shifted from relying on LiDAR and high-precision maps to adopting AI-driven models like VLA, with a notable divergence in development paths emerging by 2025 [4][11] - Li Auto's VP emphasized the importance of real-world data over model architecture, asserting that VLA is the best solution due to their extensive data collection from millions of vehicles [6][8] Group 2: Diverging Technical Approaches - Huawei's approach focuses on the World Action (WA) model, which bypasses the language processing step, aiming for direct control through visual inputs [8][10] - The World Model concept allows AI systems to simulate the physical world, enhancing predictive capabilities and decision-making in autonomous driving [9][11] - Companies like NIO and SenseTime are also exploring the World Model approach, indicating a broader industry trend [10] Group 3: Future Integration and Evolution - There is a growing trend towards integrating VLA and World Models, with both technologies not being mutually exclusive but rather complementary [11][12] - Xiaopeng's second-generation VLA model aims to combine VLA and World Model functionalities, enhancing data training and decision-making processes [14][15] - The automotive industry anticipates further iterations in autonomous driving technology architecture over the next few years, potentially stabilizing by 2028 [15]
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果!理想高管回应宇树王兴兴质疑,多家车企押注的VLA,靠谱吗?
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:31
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its development as it is the first version [1] Group 1: VLA Model Development - Xiaopeng's chairman, He Xiaopeng, has made a special bet with the autonomous driving team, promising to establish a Chinese-style cafeteria in Silicon Valley if the VLA system matches Tesla's FSD V14.2 performance by August 30, 2026 [3] - The VLA model is seen as an advanced end-to-end solution, integrating visual perception, action execution, and language processing to enhance decision-making capabilities [7][12] - The VLA model aims to overcome traditional model limitations by incorporating a reasoning chain through language models, enhancing its adaptability to complex driving environments [7][12] Group 2: Industry Perspectives - There is a divergence in the industry regarding the development paths of VLA and world models, with companies like Li Auto and Xiaopeng favoring the VLA approach [6][12] - Li Auto's VP, Lang Xianpeng, emphasizes the importance of real-world data in developing effective autonomous driving systems, arguing that the VLA model is superior due to its data-driven approach [8][9] - Huawei and other companies are pursuing a world model approach, which focuses on direct control through visual inputs without the intermediary language processing [9][10][11] Group 3: Future Integration and Trends - Despite differing opinions, VLA and world models are not mutually exclusive and may increasingly integrate as both technologies evolve [12][17] - The future of autonomous driving technology is expected to see further iterations and stabilization by 2028, with a potential convergence of VLA and world model methodologies [17]