宇树科技王兴兴发“暴论”，对智驾有什么参考？

Core Viewpoint - The current state of embodied intelligent AI models, particularly the VLA model, is seen as inadequate for large-scale application in robotics, with a need for more advanced model architectures and a shift towards video generation models for better efficiency [1][10][13]. Summary by Sections Key Bottlenecks - The primary reason for the limited large-scale application of robots is not hardware performance or cost, but rather the immaturity of embodied intelligent AI models [4]. - Current robot hardware is sufficient for basic functions, but the AI models have not yet reached a critical threshold for advancement [6]. - The industry is overly focused on data, neglecting the fundamental issues with the models themselves [6][8]. New Technology Directions - Video generation models are proposed as a more promising direction than the VLA model, as they can simulate robot actions in video form, which can then be translated into control signals for real robots [13]. - However, there is a challenge with current video generation models being too focused on video quality, leading to high GPU consumption, which may not be necessary for robotic applications [15]. Future Technological Focus - The development of embodied intelligent robots will concentrate on three main areas over the next 2-5 years: 1. Unified end-to-end intelligent robot models to enhance capabilities [16]. 2. Lower-cost, longer-lasting hardware with mass manufacturing capabilities [16]. 3. Low-cost, large-scale distributed computing networks to support robotic operations [16]. Market Expectations - There is a belief that once robots achieve large-scale operational capabilities, they could potentially be free to users, as their value creation could be taxed [17].