Workflow
Multimodal and Multitask Learning
icon
Search documents
ADAS新范式!北理&清华MMTL-UniAD:多模态和多任务学习统一SOTA框架(CVPR'25)
自动驾驶之心· 2025-06-23 11:34
Core Insights - The article presents MMTL-UniAD, a unified framework for multimodal and multi-task learning in assistive driving perception, which aims to enhance the performance of advanced driver-assistance systems (ADAS) by simultaneously recognizing driver behavior, emotions, traffic environment, and vehicle actions [1][5][26]. Group 1: Introduction and Background - Advanced driver-assistance systems (ADAS) have significantly improved driving safety over the past decade, yet approximately 1.35 million people die in traffic accidents annually, with over 65% of these incidents linked to abnormal driver psychological or physiological states [3]. - Current research often focuses on single tasks, such as driver behavior or emotion recognition, neglecting the inherent connections between these tasks, which limits the potential for cross-task learning [4][3]. Group 2: Framework and Methodology - MMTL-UniAD employs a multimodal approach to achieve synchronized recognition of driver behavior, emotions, traffic environment, and vehicle actions, addressing the challenge of negative transfer in multi-task learning [5][26]. - The framework incorporates two core components: a multi-axis region attention network (MARNet) and a dual-branch multimodal embedding module, which effectively extract task-shared and task-specific features [5][26]. Group 3: Experimental Results - MMTL-UniAD outperforms existing state-of-the-art methods across multiple tasks, achieving performance improvements of 4.10% to 12.09% in the mAcc metric on the AIDE dataset [18][26]. - The framework demonstrates superior accuracy in driver behavior recognition and vehicle behavior recognition, with increases of 4.64% and 3.62%, respectively [18][26]. Group 4: Ablation Studies - Ablation experiments indicate that joint training of driver state tasks and traffic environment tasks enhances feature sharing, significantly improving task recognition accuracy [22][26]. - The results confirm that the interdependence of tasks in MMTL-UniAD contributes to overall performance and generalization capabilities [22][26].