Core Insights - The article discusses the launch of MiMo-Embodied, the world's first open-source model that integrates embodied intelligence and autonomous driving, developed by Xiaomi's MiMo team [2][6][8]. Group 1: Model Overview - MiMo-Embodied is a unified multimodal foundation model that successfully merges the fields of autonomous driving and embodied AI [6][8]. - The model achieved state-of-the-art (SOTA) performance across 29 benchmarks in tasks related to planning, spatial understanding, environmental perception, and driving [8][25]. Group 2: Challenges Addressed - Previous models in the embodied and autonomous driving domains lacked a unified approach, limiting their ability to interact effectively with dynamic environments [10][12]. - The absence of a comprehensive evaluation system for cross-embodied capabilities hindered the assessment of models' performance across both fields [13][14]. Group 3: Data and Training Strategy - MiMo-Embodied utilizes a high-quality dataset that encompasses general visual understanding, embodied tasks, and driving scenarios, employing a progressive four-stage training strategy [19][21]. - The training strategy includes phases for embodied AI supervision, autonomous driving supervision, chain-of-thought reasoning, and reinforcement learning [23][24]. Group 4: Experimental Results - Quantitative evaluations showed MiMo-Embodied's competitive results in affordance prediction, task planning, and spatial understanding, outperforming both general multimodal models and specialized embodied models [28][29]. - In autonomous driving capabilities, the model demonstrated strong performance in perception, prediction, and planning across various benchmark tests [30][31]. Group 5: Real-World Applications - The model's qualitative assessments highlighted its effectiveness in complex interactive environments, particularly in embodied navigation and operational tasks [32][34]. - MiMo-Embodied excelled in handling diverse driving scenarios, including intersection turns, lane changes, and obstacle avoidance, showcasing its robust decision-making capabilities [38][41].
罗福莉首个小米成果!开源具身大模型