Core Viewpoint - Jim Fan, head of NVIDIA's robotics business, criticizes the current state of the robotics industry, highlighting confusion in software iteration, standardization, and technology direction despite significant hardware advancements [1] Group 1: Hardware Reliability - Hardware reliability is identified as the biggest obstacle to software iteration, with advanced robots like Optimus and e-Atlas not fully utilizing their capabilities due to hardware limitations [3] - Robots cannot self-repair from damage, leading to high operational costs and low iteration efficiency, requiring extensive support from operational teams [4] Group 2: Lack of Industry Standards - The benchmarking situation in the robotics field is described as a "catastrophe," lacking unified standards for hardware platforms, task definitions, and evaluation criteria, unlike the established standards in large language models [5] - Companies often define their own benchmarks for marketing purposes, leading to misleading claims about achieving "state-of-the-art" performance [5] Group 3: Fundamental Questions on Mainstream Technology - The dominant visual-language-action (VLA) model is fundamentally questioned, as its pre-training is misaligned with the physical needs of robotics, focusing more on language and knowledge rather than physical interaction [6] - Jim Fan advocates for video world models as a more suitable pre-training target for robotic strategies, suggesting that the current VLA model's performance will not improve with increased parameters [6] Group 4: Industry Discussion - The discussion sparked by Jim Fan's comments includes skepticism about the superiority of video world models, as existing models like Helix and GR00T N1 are still based on visual-language models [9] - Jim Fan anticipates advancements in the next generation of models by 2026, indicating ongoing evolution in the field [9]
英伟达机Jim Fan:机器人领域还处于混乱状态,连发展方向都有可能是错的