NeurIPS 2025 Spotlight | 只需一条演示，DexFlyWheel框架让机器人学会「自我造数据」

Core Insights - The article discusses the introduction of DexFlyWheel, a self-enhancing data generation framework aimed at addressing the data scarcity issue in dexterous manipulation, which has been a significant challenge in the field of robotics [3][12]. Research Background - Dexterous manipulation data generation is difficult due to several reasons: 1. Traditional methods fail to generalize from simpler gripper designs to dexterous hands, and heuristic planning struggles with high-dimensional action optimization [7]. 2. High costs associated with manual teaching limit scalability and diversity of datasets [8]. 3. Pure reinforcement learning is inefficient, often resulting in unnatural movements and low exploration efficiency [9]. 4. Existing datasets are primarily focused on grasping tasks, limiting their applicability to other fine manipulation scenarios [8]. 5. Trajectory replay methods have limited data diversity, as they can only perform spatial transformations in predefined scenarios [8]. DexFlyWheel Framework - DexFlyWheel proposes a new approach to data generation by leveraging a single demonstration to create diverse dexterous manipulation data, thus reducing reliance on large datasets [12][14]. - The framework consists of two core ideas: 1. Combining imitation learning with residual reinforcement learning to redefine the role of demonstrations, allowing for efficient transfer of learned trajectories to new scenarios [14]. 2. Establishing a self-improvement loop between data and models, enabling continuous enhancement of both data and strategy performance [17]. Experimental Results - The framework demonstrated significant improvements in data generation and strategy performance: 1. Data diversity increased dramatically, expanding from 1 demonstration to 500 generated trajectories, with scene variety increasing by 214 times and object types averaging 20 [27]. 2. Strategy generalization improved, with success rates rising from 16.5% to 81.9% on challenging test sets [28]. 3. DexFlyWheel outperformed baseline methods, achieving a data generation success rate of 89.8% and generating 500 diverse trajectories in just 2.4 hours, significantly faster than human demonstrations and trajectory replay methods [31]. Conclusion - DexFlyWheel addresses the long-standing data scarcity issue in dexterous manipulation by creating a self-improving data generation paradigm, which significantly reduces data collection costs and enhances generation efficiency and diversity [39]. - The framework is positioned as a crucial step towards making dexterous manipulation more applicable in real-world scenarios and advancing the development of general-purpose robots [39].