Workflow
VLA模型
icon
Search documents
从世界模型到VLA再到强化,具身大小脑算法原来是这样的!
具身智能之心· 2025-10-26 04:02
Core Insights - The article discusses the evolution and current state of embodied intelligence, focusing on the roles of the brain and cerebellum in robotics, where the brain handles perception and planning, while the cerebellum is responsible for execution [3][10]. Technical Evolution - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection, moving to behavior cloning, and now advancing to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [8]. - The third stage, marked by the introduction of diffusion policy, improved stability and generalization by modeling action sequences [8]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance robots' predictive and interactive capabilities [9][10]. Current Trends and Applications - The integration of VLA with reinforcement learning enhances robots' trial-and-error learning and self-improvement abilities, while the combination with world models allows for future prediction and better planning [10]. - The article highlights the growing demand for embodied intelligence applications across various sectors, including industrial, home, restaurant, and medical rehabilitation, leading to increased job opportunities and research interest in the field [10]. Educational Initiatives - The article outlines a structured learning program aimed at equipping individuals with comprehensive knowledge of embodied intelligence algorithms, including practical applications and real-world projects [11][14]. - The course targets individuals with a foundational understanding of embodied intelligence and aims to bridge the gap between theoretical knowledge and practical deployment [18][24].
万亿机器人赛道:宇树和figure谁才能代表未来?
3 6 Ke· 2025-10-20 09:26
Core Viewpoint - The humanoid robot industry is entering a commercialization phase, with a shift in investment focus from general technology to practical application scenarios. Companies with production capabilities and self-sustaining business models are favored, while others struggle to secure funding and market presence [1][3][4]. Group 1: Industry Dynamics - This year marks a significant increase in orders for humanoid robot companies, with notable contracts such as UBTECH's Walker series securing nearly 500 million yuan in contracts, and ZhiYuan Robotics' G2 receiving several hundred million yuan in orders [3][4]. - YuShu Technology leads the industry with 25 public procurement projects this year, nearing its total for 2024, and has been recognized as a standard equipment provider in many projects [3][4]. - Despite YuShu's leadership, there are growing concerns about its technological advancements, particularly in AI and robotics, compared to competitors like Figure AI, which recently achieved a post-financing valuation of 39 billion USD [4][8]. Group 2: Competitive Landscape - Figure AI's third-generation humanoid robot, Figure 03, has been highlighted for its design and potential for mass production, boasting a production capacity of 100,000 units annually [8][9]. - The industry faces skepticism regarding the actual capabilities of humanoid robots, with many companies, including Figure, criticized for overpromising and underdelivering on their technological advancements [11][13]. - The market is characterized by a lack of standardized applications, making it difficult for humanoid robots to achieve widespread commercial viability [20][21]. Group 3: Research and Development - YuShu's R&D spending over the past three years totals approximately 350 million yuan, with a significant portion allocated to hardware rather than algorithm development, raising concerns about its competitive edge in AI [5][7]. - The company has introduced its own world model architecture, but it is seen as lagging behind current mainstream models, which may hinder its ability to lead the industry [7][8]. - The humanoid robot sector is still in the experimental phase, with many products not yet achieving stable operational status or generating significant commercial value [22][23].
UC伯克利大牛预警:留给人类能干的活,只剩5年了
3 6 Ke· 2025-10-11 10:18
Core Insights - The countdown of five years has begun for robots to enter the real world, taking over not just household tasks but also roles in factories, warehouses, and data centers, marking the start of a significant revolution with the activation of a "self-evolution flywheel" [1][2][21] Group 1: Predictions and Implications - Sergey Levine predicts that by 2030, robots will be able to independently manage entire households, functioning like domestic helpers [2][3] - The "self-evolution flywheel" is seen as a signal that household tasks are just the beginning, with larger impacts expected in blue-collar economies and manufacturing [2][21] - The transition from demonstration to real-world application is supported by advancements in Robot Foundation Models and practical feedback [4][16] Group 2: Technological Advancements - The π (0.5) model has enabled robots to perform complex household tasks in previously unseen environments, showcasing their operational capabilities [4][10] - The VLA (Vision, Language, Action) model is crucial for enabling robots to process continuous actions and adapt to real-world tasks, moving beyond simple hard-coded instructions [17][20] - Robots have demonstrated emergent capabilities, such as adapting their actions based on real-time feedback, which enhances their learning and operational efficiency [20] Group 3: Economic Impact - The cost of robots has decreased by over 50% in the past 30 years, making automation more accessible and efficient, particularly in repetitive tasks [24][30] - The integration of robots into various sectors, including manufacturing and warehousing, is expected to significantly alter labor markets and economic structures [35] - The partnership between humans and robots in the short term will yield substantial benefits, while long-term automation may reshape labor, education, and wealth distribution [35][36]
小鹏智驾一把手换人,蔚来团队大调整,各有各的算盘
3 6 Ke· 2025-10-10 12:30
Core Insights - The leadership changes in the autonomous driving divisions of Xiaopeng Motors and NIO indicate a competitive evolution in the smart driving landscape, with both companies adjusting their strategies to enhance their technological capabilities [2][19]. Group 1: Leadership Changes - Xiaopeng Motors announced that Li Liyun, the former head of the autonomous driving center, will be replaced by Liu Xianming, who previously led the world foundation model team [1][2]. - Liu Xianming, who joined Xiaopeng Motors over a year ago, has a background in machine learning and computer vision, having worked at Facebook and Cruise [6][8]. - NIO's autonomous driving team also experienced significant personnel changes, with multiple key executives leaving, including the head of the world model and the product lead for autonomous driving [2][19]. Group 2: Strategic Focus - Liu Xianming's promotion reflects Xiaopeng's commitment to advancing its world foundation model, which is crucial for achieving higher levels of autonomous driving capabilities [13][14]. - The world model developed by Liu's team has a parameter scale of 72 billion, significantly larger than current mainstream VLA models, and is designed to enhance the vehicle's understanding of complex environments [14][16]. - The shift in leadership at both companies suggests a strategic pivot towards different technological approaches, with Xiaopeng focusing on the world model and NIO restructuring to improve its AI integration and delivery efficiency [17][19]. Group 3: Industry Dynamics - The autonomous driving sector is witnessing a bifurcation in technological approaches, primarily between VLA (Vision-Language-Action) and world model architectures, with different companies aligning with one of these strategies [17][18]. - The recent changes in leadership and organizational structure across various companies indicate a new phase of competition in the smart driving field, as firms seek to establish their technological dominance [20].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].
具身VLA后训练:TeleAI提出潜空间引导的VLA跨本体泛化方法
具身智能之心· 2025-09-16 00:03
Core Insights - The article discusses the challenges and solutions related to the Vision-Language-Action (VLA) models in the context of cross-embodiment adaptation, highlighting the limitations of existing models and the introduction of a new framework called "Align then Steer" (ATE) to enhance performance in post-training scenarios [1][2][10]. Group 1: Challenges in VLA Models - Current VLA models require extensive target domain data for post-training, often needing dozens to hundreds of hours, leading to significant mismatches in action distributions between pre-training and post-training phases [1][10]. - The marginal returns of simply stacking data during post-training diminish rapidly, making it ineffective for fitting the action distribution of target scenarios [1][11]. Group 2: ATE Framework Introduction - The ATE framework proposed by the TeleAI team aims to align action distributions in latent space, allowing for efficient adaptation of VLA models without altering their core architecture [2][10]. - ATE transitions the focus from adjusting model architecture to adjusting distributions, significantly reducing data requirements for cross-embodiment adaptation [2][15]. Group 3: ATE Framework Mechanism - The ATE framework consists of two main phases: aligning action distributions in latent space and guiding the post-training strategy updates using a classifier [14][19]. - In the alignment phase, two small Variational Autoencoders (VAEs) are constructed to embed action data into a unified latent space, ensuring that the adapted actions closely follow the pre-trained distribution [18][19]. - The guiding phase integrates a classifier guidance function to measure the difference between generated actions and target distributions, effectively steering the model outputs towards the desired action distribution [21][22]. Group 4: Experimental Results - The ATE algorithm demonstrated an average increase of 9.8% in multi-task success rates in simulation evaluations compared to direct post-training, with a maximum success rate gain of 32% in real-world scenarios [23][24]. - The framework showed robust performance under various conditions, including lighting changes and external disturbances, maintaining task-related focus and recovery capabilities [29][30]. Group 5: Conclusion and Future Directions - The ATE framework provides a viable solution to the challenges of data scarcity and cross-embodiment adaptation in VLA models, allowing for efficient and robust training without the need for extensive data collection or full model retraining [30]. - This framework can serve as a plug-and-play module compatible with various mainstream VLA models, enhancing their post-training cross-embodiment generalization capabilities [30].
正式开课!具身大脑和小脑算法与实战教程来啦
具身智能之心· 2025-09-15 00:04
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3] - The development of embodied intelligence technology has evolved through various stages, from low-level perception to high-level task understanding and generalization [6][14] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a comprehensive ecosystem for embodied intelligence, while international players like Tesla and investment firms support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several phases: - The first phase focused on grasp pose detection, which lacked the ability to model task context and action sequences [6] - The second phase introduced behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6] - The third phase, emerging in 2023, utilized Diffusion Policy methods to enhance stability and generalization by modeling action trajectories [6][7] - The fourth phase, starting in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome current limitations [9][11][12] Educational Initiatives - The demand for engineering and system capabilities in embodied intelligence is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [17] - A comprehensive curriculum has been developed to cover various aspects of embodied intelligence, including practical applications and advanced topics, aimed at both beginners and advanced learners [14][20]
蔚小理自研智驾芯片:谁在掉队、谁在摇摆、谁在大步向前?
雷峰网· 2025-09-05 12:49
Core Viewpoint - The delay of Nvidia's Thor chip has made the external chip supply chain uncertain and expensive, highlighting the importance of self-developed chips to reduce costs and enhance the technological narrative of companies [1][35]. Group 1: Industry Overview - The arms race for computing power in smart driving began in 2021 with the launch of Nvidia's Orin-X chip, which boasts 254 TOPS, outperforming Mobileye's Q5H and Tesla's HW3.0 [2]. - The narrative logic among car manufacturers emphasizes the importance of self-developed chips as a crucial aspect of their strategy [3]. Group 2: NIO's Chip Development - NIO was the first to propose self-developed chips, with founder Li Bin initiating the "chip-making" project in 2020 despite skepticism [6][7]. - NIO's chip team, led by architect Zhang Danyu, has grown to around 400 members, with significant investment in R&D, totaling approximately 41.9 billion RMB from 2021 to 2024 [10][11]. - NIO aims to control the R&D process and reduce supply chain risks by fully self-developing core technologies [11][12]. Group 3: XPeng's Chip Strategy - XPeng has been aggressive in its chip development, launching the self-developed Turing AI chip for its P7 model, but faced challenges with internal collaboration between the chip and algorithm teams [19][20]. - The company initially outsourced chip design but shifted to full self-development due to delays from partners [20][24]. - XPeng's second-generation chip is under development, targeting a 5nm process, with ambitions to integrate advanced AI models into its vehicles [27][28]. Group 4: Li Auto's Approach - Li Auto started its chip development later than its competitors but faces fewer internal obstacles due to a unified leadership vision [29][30]. - The company has begun developing its second-generation chip, focusing on integrating its operating system and chip development under a single department [31][34]. - Li Auto aims to leverage its organizational structure to enhance collaboration between its algorithm and chip teams, which could lead to improved efficiency [35].
就在今天|物理智能产业与资本峰会:L3高阶智驾专题暨VLA模型产业白皮书及产业图谱发布
Core Insights - The article discusses the rapid development of large models and their integration into intelligent driving, highlighting the growing consensus in the industry regarding the commercial viability of L3 level intelligent driving by 2025 [1][2] - The introduction of the Visual-Language-Action (VLA) model is expected to create a comprehensive cognitive framework similar to human drivers, influencing the landscape of intelligent driving and embodied intelligence, while presenting significant market and capital opportunities [1][2] Group 1: Key Presentations and Insights - The event featured speeches from industry leaders, including Chen Zhongyi from Guotai Junan Securities and Wu Heng from SAIC Group, emphasizing the importance of L3 intelligent driving and embodied intelligence [3] - Zhu Feng, Chief Analyst at Guotai Junan, presented on the VLA model as a key to achieving L3 intelligent driving [3] - Yuan Yuji from Momenta discussed the company's data-driven approach and dual-product strategy for scalable autonomous driving solutions, including mass production of assisted driving and Robo taxi applications [4] Group 2: Technological Innovations - He Yihan from Che Lian Tian Xia highlighted the evolution of intelligent cockpit large models, focusing on redefining vehicle experiences through AI and optimizing multiple large language models for practical applications [5] - Liu Bin from Juefei Technology emphasized the importance of data loops in driving high-quality development in intelligent driving, providing customized data engines and services [5] - Zhou Enze from Al-Link showcased innovations in automotive intelligent cockpit technology, significantly reducing development costs for car manufacturers while enhancing user experience [6] Group 3: Industry Trends and Future Directions - Wang Panqu from Zero One Automotive discussed the transition to intelligent heavy trucks under the VLA framework, aiming to become a leading global transportation robotics company through vertical integration and innovative design [6] - A roundtable discussion involving industry, investment, and banking experts was held to explore the intersection of these sectors and their implications for the future of intelligent driving [6]
理想汽车-W(02015):反转押注i6表现,有待经营优化、VLA优势赋能
KAIYUAN SECURITIES· 2025-08-31 10:47
Investment Rating - The investment rating for the company is "Buy" (maintained) [1][10] Core Views - The report indicates that the performance of the i6 model is crucial for reversing sales trends, with a focus on operational optimization and leveraging VLA advantages [3][4] - Revenue forecasts for 2025-2027 have been revised downwards due to anticipated challenges in the electric vehicle market, with expected revenues of 120.9 billion, 154.4 billion, and 182.8 billion respectively, reflecting year-on-year growth rates of -16.3%, 27.8%, and 18.3% [3][5] - Non-GAAP net profit estimates for the same period have also been reduced to 5.9 billion, 9.4 billion, and 13.5 billion, with corresponding year-on-year growth rates of -44.7%, 60.0%, and 43.0% [3][5] Financial Summary and Valuation Metrics - The company's revenue for Q2 2025 was 30.25 billion, a year-on-year decline of 5% but a quarter-on-quarter increase of 17%, with vehicle deliveries slightly exceeding revised guidance at 111,000 units [3][4] - The average selling price (ASP) decreased by 6,000 to 260,000 due to financial incentives and sales promotions [3] - Gross margin for Q2 2025 fell to 20.1%, with automotive gross margin at 19.4% and service gross margin at 33.5% [3][5] - The company is guiding for Q3 2025 revenues between 24.8 billion and 26.2 billion, with deliveries expected to be between 90,000 and 95,000 units [4] - The report highlights that the company's market capitalization corresponds to price-to-sales (PS) ratios of 1.5, 1.2, and 1.0 for 2025-2027, and price-to-earnings (PE) ratios of 30.5, 19.3, and 13.6 for the same period [3][5]