探究具身机器人有限泛化能力的本质原因！增强策略依然有效

Research Background and Core Issues - The development of large-scale robot datasets and high-capacity models has shown strong capabilities in various tasks, but generalization remains limited in scenarios outside the training data distribution [2] - Shortcut learning, where models rely on task-irrelevant features rather than true causal relationships, is a key factor limiting generalization [2] Dataset Diversity and Fragmentation Analysis - The OXE dataset exhibits significantly lower visual and textual diversity compared to visual/multimodal datasets, even with the latest DROID dataset aimed at increasing diversity [4] - The fragmentation of the OXE dataset is evident, with distinct separation among sub-datasets, leading to a lack of overlap and effective division into smaller datasets [8] - The limited diversity is attributed to inherent constraints in the robot data collection process [6] Theoretical Connection Between Dataset Characteristics and Shortcut Learning - A mathematical framework has been established to analyze how multiple sub-datasets lead to correlations that facilitate shortcut learning [15] - The distance between task-irrelevant features across sub-datasets significantly influences shortcut learning, with models tending to rely on visual cues rather than textual instructions [16] Experimental Validation - Experiments indicate that increasing diversity within sub-datasets and reducing differences between them can effectively reduce shortcut dependencies [18] - The introduction of a "bridge" target in experiments significantly improved out-of-distribution (OOD) success rates by breaking false correlations [28] Mitigating Shortcut Learning Through Data Augmentation - Targeted data augmentation strategies can effectively increase sub-dataset diversity and reduce distribution differences, thereby alleviating shortcut learning [29] - Perspective augmentation creates shared visual contexts between sub-datasets, breaking false correlations tied to specific tasks [30] - The results confirm that carefully selected data augmentation strategies can enhance the generalization capabilities of robot policies [34]