大语言模型情感支持评估
Search documents
DeepSeek、Gemini谁更能提供情感支持?趣丸×北大来了波情绪轨迹动态评估
机器之心· 2025-12-07 04:33
Core Viewpoint - The paper titled "Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models" co-authored by Quwan Technology and Peking University has been accepted for AAAI 2026, highlighting the importance of emotional support in human-AI interactions and the need for a new evaluation framework for language models [2][3]. Research Background - Emotional support is a core capability in human-AI interactions, yet existing evaluations of large language models (LLMs) often rely on short, static dialogues, failing to capture the dynamic and long-term nature of emotional support [5]. - The evaluation of emotional capabilities in LLMs is crucial for self-developed models, as emotional support dialogues have evolved from emotion recognition and generation to include broader human-centered tasks like role-playing and casual chatting [5]. Proposed Framework - The team introduced a new evaluation framework called ETrajEval, designed to systematically assess the ability of LLMs to provide emotional support in long-term dialogues [6]. Key Contributions 1. The framework addresses two main limitations of existing evaluation methods: the lack of long-term and dynamic interactions, and an overemphasis on model-centered response quality [8]. 2. The framework adopts a user-centered perspective, focusing on the emotional trajectories of users throughout the interaction process [9]. 3. Three trajectory-level metrics were proposed: Average Emotional Level (BEL), Emotional Trajectory Variability (ETV), and Emotional Centroid Position (ECP), which together represent the dynamic changes in user emotional states [11]. Experimental Analysis - The team constructed a dataset with 328 interaction environments and 1,152 disruptive events to simulate real emotional changes and assess model adaptability in evolving contexts [14]. - Psychological theories were utilized to constrain model responses, encouraging supportive behaviors aligned with validated therapeutic principles [14]. - The evaluation framework was validated through extensive assessments of leading models, revealing significant differences in their long-term emotional support capabilities [15]. Findings - The results indicated that top open-source and closed-source models do not show significant differences in overall emotional support capabilities [16]. - Models designed for role-playing did not outperform general-purpose LLMs in maintaining positive emotional states [17]. - Models exhibited stronger long-term emotional support capabilities in English dialogues compared to Chinese dialogues [17]. Visualization and Analysis - Emotional centroid visualizations revealed that models with higher BEL and ETV scores demonstrated strong capabilities in guiding users to stable positive emotional states [21]. - The emotional trajectory visualizations indicated that models with higher ETV scores effectively helped users recover from low emotional states, confirming the team's earlier assertions [22]. Conclusion - The proposed emotional dynamic trajectory analysis framework offers a comprehensive and multidimensional evaluation of LLMs' emotional support capabilities, achieving high consistency with human evaluations [28].