视觉语言动作端到端模型(VLA)

Search documents
具身智能 “成长”的三大烦恼
2 1 Shi Ji Jing Ji Bao Dao· 2025-04-24 13:07
Group 1: Industry Overview - The humanoid robot industry has made rapid progress this year, with significant public interest sparked by events such as the Spring Festival Gala and the first humanoid robot half marathon [1] - Key technologies driving advancements in humanoid robots include large language models (LLM), visual language models (VLM), and visual language action end-to-end models (VLA), which enhance interaction perception and generalization capabilities [1][3] - Despite advancements, challenges remain in data collection, robot morphology applications, and the integration of large and small brain systems [1][3] Group 2: Data Challenges - The industry faces a bottleneck in data scarcity, particularly in acquiring 3D data necessary for training robots to perform tasks in physical environments [3][4] - Traditional data collection methods are costly and time-consuming, with companies like Zhiyuan Robotics employing extensive human resources for data gathering [4] - The introduction of 3D generative AI for Sim2Real simulation is seen as a potential solution to meet the high demand for generalizable data in embodied intelligence [4] Group 3: Technological Evolution - The evolution of robots has progressed through three stages: industrial automation, large models, and end-to-end large models, each serving different application needs [6] - End-to-end models integrate multimodal inputs and outputs, improving decision-making efficiency and enhancing humanoid robot capabilities [6][7] - Experts emphasize that humanoid robots are not synonymous with embodied intelligence, but they represent significant demand and challenges for the technology [7] Group 4: Brain Integration Solutions - The integration of large and small brain systems is a focus area, with companies like Intel and Dongtu Technology proposing solutions to reduce costs and improve software development efficiency [9][10] - Challenges in achieving brain integration include ensuring real-time performance and managing dynamic computational loads during robot operation [10][11] - The market is pushing for a convergence of technologies, requiring robots to perform tasks in various scenarios while maintaining flexibility and intelligent interaction capabilities [12]