Workflow
OPEN VLA
icon
Search documents
质疑VLA模型、AI完全不够用?有从业者隔空回应宇树王兴兴
第一财经· 2025-08-11 14:51
Core Viewpoint - The article discusses the skepticism of Wang Xingxing, CEO of Yushu, regarding the VLA (Vision-Language-Action) model, suggesting that the robotics industry is overly focused on data while lacking sufficient embodied intelligence in AI [3][4]. Group 1: Challenges in Robotics - The traditional robotics industry faces three core challenges: perception limitations, decision-making gaps, and generalization bottlenecks [6][7]. - Current robots often rely on preset rules for task execution, making it difficult to understand complex and dynamic environments [6]. - In multi-task switching, traditional robots frequently require human intervention for reprogramming or strategy adjustments [6]. - Robots need extensive retraining and debugging when confronted with new tasks or scenarios [6]. Group 2: Need for Model Reconstruction - There is a call within the industry to reconstruct the VLA model and seek new paradigms for embodied intelligence [5][7]. - Jiang Lei emphasizes the need for a complete system that integrates both hardware and software, rather than merely relying on large language models [6]. - The current research landscape is fragmented, with large language model researchers focusing solely on language, while edge intelligence concentrates on smaller models [6]. Group 3: Future Directions - Jiang Lei proposes exploring cloud and edge computing collaboration to create a comprehensive deployment architecture for humanoid robots [6]. - The ideal "brain" model for humanoid robots should possess full parameter capabilities, while the "small brain" model deployed on the robot must achieve breakthroughs in size and real-time performance [6]. - The industry is optimistic about humanoid robots becoming a significant sector, with this year being referred to as the year of mass production for humanoid robots [7].
质疑VLA模型、AI完全不够用?有从业者隔空回应宇树王兴兴
Di Yi Cai Jing· 2025-08-11 11:33
Core Viewpoint - The traditional humanoid robots face three core challenges: perception limitations, decision-making gaps, and generalization bottlenecks [5] Group 1: Industry Challenges - The industry is currently unable to utilize full parameter models effectively, indicating a need for deeper collaboration between the robot's brain, cerebellum, and limbs [2] - Traditional robots often rely on preset rules for task execution, making it difficult to adapt to complex and dynamic environments [5] - Robots require manual intervention for reprogramming or strategy adjustments during multi-task switching [5] Group 2: Perspectives on VLA Model - The VLA (Vision-Language-Action) model is seen as a controversial yet pivotal paradigm for humanoid robot motion control, with many in the industry betting on its potential [4] - The OPEN VLA, based on the Llama2 language model with 7 billion parameters, is an example of a smaller-scale model that still faces challenges in effectively utilizing large language models [4] - There is a call for the industry to explore the collaborative distribution of computing power between cloud and edge devices to create a comprehensive deployment architecture [4] Group 3: Future Directions - The ideal "brain" model for humanoid robots should not only be a large language model but a complete system that deeply integrates hardware and software [4] - The industry is encouraged to rethink the VLA model and seek new paradigms, potentially through biomimicry to develop original foundational models for embodied intelligence [6] - There is growing confidence in the humanoid robot industry, with many believing it will become a significant sector, marking this year as a potential turning point for mass production [6]