三星最新MoSE：专为自驾Corner Case设计的MoE，直接SOTA！

Core Insights - The article discusses the MoSE (Skill-by-Skill Mixture-of-Expert) framework, which enhances the reasoning capabilities of small-scale visual language models (VLMs) in autonomous driving tasks by simulating human learning processes [2][10][46]. Group 1: MoSE Framework Overview - MoSE is inspired by human drivers' learning processes, allowing for skill-based, step-by-step learning in driving tasks [2][10]. - The framework employs a skill-centric routing mechanism that enables the model to identify and learn specific driving skills required for various scenarios [12][14]. - MoSE achieves state-of-the-art performance in extreme driving scenarios while significantly reducing the number of activated parameters by at least 62.5% compared to existing methods [10][35]. Group 2: Technical Implementation - The model integrates a hierarchical skill dataset and pre-trains routers to encourage step-by-step reasoning, aligning with human-like multi-step planning [2][8]. - MoSE utilizes a sparse mixture of experts (MoE) configuration, where only a portion of the model's parameters are activated during inference, enhancing computational efficiency [7][21]. - The framework has been tested on the CODA dataset, which focuses on multi-modal extreme driving situations, demonstrating superior performance compared to larger models [26][32]. Group 3: Experimental Results - In experiments, MoSE outperformed several state-of-the-art models with over 80 billion parameters while using less than 30 billion parameters [35]. - The results indicate that MoSE maintains robust performance even with a limited amount of training data, confirming its efficiency in utilizing available resources [42][44]. - The model's performance improves steadily with increased data size, showcasing its scalability and adaptability to various datasets and tasks [40][46]. Group 4: Future Directions - The article suggests that further research is needed to explore MoSE's applicability in trajectory estimation tasks and its integration with closed-loop evaluations in simulation environments [48]. - The potential for MoSE to be adapted for diverse downstream tasks and pre-trained models is highlighted, indicating a promising direction for future developments in autonomous driving technology [48].