Core Insights - Jim Fan criticizes the current state of the robotics industry, highlighting significant advancements in hardware but ongoing chaos in software iteration, standardization, and technology direction [3] - He emphasizes that the mainstream Visual-Language-Action (VLA) model is fundamentally misaligned with the actual needs of robotics, advocating for a shift towards video world models as a more suitable alternative [3][11] Group 1: Hardware Reliability - Hardware reliability is identified as the biggest obstacle to software iteration, with advanced robots like Optimus and e-Atlas facing limitations due to issues such as overheating and motor failures [7] - The inability of robots to self-repair exacerbates the problem, leading to high human resource costs and low iteration efficiency in development [7] Group 2: Lack of Industry Standards - The benchmarking situation in the robotics field is described as a "catastrophe," lacking unified standards for hardware platforms, task definitions, and evaluation criteria [9] - Companies often create their own benchmarks for public announcements, leading to misleading claims of achieving "state-of-the-art" performance [9] Group 3: Fundamental Questions on Technology Direction - The VLA model is fundamentally questioned, as its parameters are primarily optimized for language and knowledge rather than physical applications, which is critical for robotics [11] - Jim Fan argues that the pre-training objectives of VLM do not align with the requirements of robotics, suggesting that increasing VLM parameters will not enhance VLA performance [11]
英伟达Jim Fan:机器人领域还处于混乱状态,连发展方向都有可能是错的
硬AI·2025-12-29 14:24