抛弃向量推荐！蚂蚁用8B小模型构建「用户“话”像」，实现跨任务跨模型通用并拿下SOTA

Core Insights - The article discusses the launch of AlignXplore+, a new paradigm for personalized user modeling in large models, emphasizing the importance of text-based user representation over traditional vector-based methods [1][4][6]. Group 1: Paradigm Shift in Personalization - Traditional recommendation systems rely on ID embedding or specific parameters, which create a "black box" that is difficult to interpret and transfer [4][5]. - AlignXplore+ introduces a text-based user modeling approach that allows for better understanding and interpretation of user preferences, making it more transparent and adaptable [6][7]. Group 2: Key Features of AlignXplore+ - AlignXplore+ breaks data silos by processing heterogeneous data sources, allowing it to create comprehensive user profiles from fragmented digital footprints [7]. - It enables extreme transferability, allowing user profiles to be utilized across various applications and models without being tied to a specific architecture [8]. - The framework adapts to real-world data noise, evolving user profiles based on past interactions without needing to reprocess all historical data [8][12]. Group 3: Framework and Methodology - The framework consists of two main phases: SFT (Supervised Fine-Tuning) for generating high-quality training data and RL (Reinforcement Learning) for optimizing user preference summaries [10][12]. - SFT involves a "generate-validate-merge" process to ensure accurate predictions of user behavior, while RL focuses on long-term effectiveness in dynamic environments [11][13]. Group 4: Performance and Results - AlignXplore+ demonstrates superior performance, achieving state-of-the-art results with an 8 billion parameter model, outperforming larger models [14][16]. - It shows remarkable zero-shot transfer capabilities, with an average score of 75.10%, surpassing GPT-OSS-20B by 4.2% [17]. - The model maintains robust performance even in the absence of negative feedback, showcasing its resilience to imperfect data [19][20]. Group 5: Future Directions - The team aims to explore the limits of streaming inference and how to maintain simplicity and comprehensiveness in updating text preferences over long periods [22]. - There is a focus on accurately mining user preferences from diverse and heterogeneous data sources in real-world scenarios [22][23].