端到端VLA范式

Search documents
对话星动纪元陈建宇:人形机器人的通途与征途
Huan Qiu Wang Zi Xun· 2025-08-12 10:01
Core Insights - The core viewpoint of the article is that the robotics industry is experiencing a significant convergence towards the "end-to-end" VLA (Vision-Language-Action) paradigm, which is becoming the foundational technology for embodied intelligence [1][2]. VLA Paradigm - The VLA paradigm is defined as a complete closed loop encompassing perception (Vision), understanding (Language), and action (Action), allowing robots to perform tasks in the physical world [2]. - The recent focus on "world models" is seen as an important evolution within the VLA framework, aimed at enhancing robots' precision, generalization, and cognitive abilities [2]. Efficiency and Collaboration - Current humanoid robots still lag behind human efficiency, but there is optimism as some industrial applications have achieved over 70% efficiency compared to humans, with expectations to reach 90% next year [3]. - The end-to-end architecture facilitates real-time feedback and control, breaking the traditional phase delays in recognition, planning, and execution, which is crucial for efficiency improvements [3]. - Deep collaboration between software and hardware is emphasized, with a focus on self-developed dexterous hands that have achieved stable mass production and significant cost reductions [3]. Application Pathway - The pathway to killer applications for humanoid robots is outlined as starting with B-end (business applications) before moving to household applications, with industrial scenarios serving as a necessary phase for technology validation and data accumulation [4]. - The next five years are predicted to be a critical window for the explosion of household robots, with simple forms expected to become widespread and high-net-worth families potentially being the first to adopt general-purpose humanoid robots [4]. Ecosystem Development - The company advocates for a "software defines hardware" approach, where models can adapt to different hardware, but hardware sets the upper limits of model capabilities [5]. - Open-source initiatives are highlighted as a strategic choice, with the company's humanoid robot reinforcement learning framework "Humanoid Gym" and generative large model "VPP" gaining significant attention in the community [5]. - The belief in ecosystem co-prosperity is emphasized, suggesting that improvements made by others on their work will ultimately benefit the company as well [5]. Future Aspirations - The company continues to strive for world-class achievements, with the founder expressing humility about not yet reaching the set standards [6].