雷军官宣小米多篇最新研究成果成功入选ICLR 2026国际顶级会议

Core Insights - Xiaomi's founder and CEO Lei Jun announced that multiple research achievements from the Xiaomi team have been selected for ICLR 2026, covering areas such as multimodal reasoning, reinforcement learning, GUI agents, end-to-end autonomous driving, and audio generation [1][3]. Group 1: Research Achievements - The research paper titled "Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle" addresses inefficiencies in existing reinforcement learning training processes, particularly issues like Advantage Collapsing and Rollout Silencing, which hinder long-term optimization capabilities [4]. - Shuffle-R1 proposes a streamlined reinforcement learning framework that significantly enhances training efficiency through two core designs: Pairwise Trajectory Sampling and Advantage-based Batch Shuffle, leading to improved gradient signal quality and increased exposure of valuable trajectories [4]. - Experimental results indicate that Shuffle-R1 consistently outperforms various reinforcement learning baselines with minimal computational overhead [4]. Group 2: Mobile Agents and GUI - The paper "MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning" introduces a framework to improve the reasoning and planning capabilities of Mobile GUI Agents, addressing challenges such as the scarcity of high-quality CoaT trajectories and the limitations of existing self-training methods [7][8]. - MobileIPL employs Thinking-level DPO and Instruction Evolution to enhance process supervision and expand task distribution, resulting in state-of-the-art performance on mainstream GUI-Agent benchmarks [8][10]. Group 3: Language Models - "FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation" presents a modular reasoning framework for small language models (SLMs) that enhances their performance in complex tasks without additional training or parameter increments [12][13]. - FutureMind extracts advanced cognitive abilities from large language models (LLMs) through adaptive knowledge distillation, creating a dynamic reasoning pipeline that significantly improves reasoning efficiency and retrieval accuracy [12][13]. Group 4: Multimodal Reasoning - The paper "ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding" proposes a framework that transfers mature textual reasoning capabilities to multimodal scenarios without the need for costly model fine-tuning [16][17]. - ThinkOmni includes components like LRM-as-a-Guide and Stepwise Contrastive Scaling, which balance perception and reasoning signals, demonstrating consistent performance improvements across multiple multimodal reasoning benchmarks [17]. Group 5: Audio Generation - "Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation" introduces a two-stage audio generation framework that combines Flow Matching pre-training with lightweight GAN fine-tuning for efficient audio generation [23][24]. - The framework enhances audio modeling capabilities by addressing the unique properties of audio signals and demonstrates superior performance in generating high-fidelity audio with improved computational efficiency compared to existing methods [24].