Workflow
RoboChallenge Table30 V2
icon
Search documents
你的模型真的会"举一反三"吗?RoboChallenge Table30 V2 正式发布,泛化时代开幕
机器人大讲堂· 2026-03-25 12:54
Core Insights - The article discusses the challenges faced by embodied intelligence models, particularly their failure to perform in real-world scenarios despite success in simulation environments [1][2] - It highlights the launch of RoboChallenge's Table30 V2, a new evaluation platform aimed at addressing the limitations of existing assessment frameworks by incorporating real-world task complexity and generalization capabilities [4][5] Evaluation Framework - Existing evaluation systems are criticized for being easily manipulated, leading to models that excel in controlled environments but fail in real-world applications [6][7] - Table30 V2 aims to break this barrier by introducing a comprehensive assessment that includes a wider range of tasks and real-world scenarios [7] Task Upgrades - The new platform expands the task set to 30, including 18 new complex tasks that challenge models in areas where they previously struggled, such as soft object manipulation and tool usage [9][11] - Tasks are designed to test models' real-time perception and adaptive control, pushing the limits of spatial reasoning and causal understanding [11][12] Evaluation Upgrades - Table30 V2 enforces a multi-task paradigm, requiring participants to submit a single model capable of general understanding, thus preventing task-specific optimizations [13][15] - The introduction of zero-shot testing requires models to handle unseen objects and environments, emphasizing the need for true generalization rather than memorization [15][16] System Upgrades - The platform has achieved a threefold increase in throughput compared to previous versions, allowing for faster testing and feedback cycles [16] - New metrics, such as Time to Complete, are introduced to align evaluation results with real-world deployment needs, emphasizing efficiency alongside success rates [16] Performance Data - The latest leaderboard shows that the best-performing model, DM0, has a success rate of 62%, while many models struggle with complex tasks, indicating a significant gap in understanding and execution capabilities [18][19] - The report reveals that while models can understand semantic instructions, their performance in fine manipulation tasks remains critically low, highlighting a disconnect between comprehension and execution [19] Industry Collaboration - RoboChallenge represents a collaborative effort among various organizations and institutions to standardize and normalize embodied intelligence evaluations [20] - The platform has attracted a diverse international user base, indicating a growing community focused on advancing embodied intelligence research [20] Upcoming Events - The preview of Table30 V2 will debut at the RoboChallenge CVPR 2026 Workshop, marking a significant event in the field of embodied intelligence [21][23] - Key dates include the competition registration deadline on April 25 and the final competition on May 15, 2026 [23]