抛弃向量推荐！蚂蚁用8B小模型构建「用户“话”像」，实现跨任务跨模型通用并拿下SOTA

Core Insights - The article discusses the development of a new paradigm for personalized large model applications, emphasizing the importance of user-centric approaches in AI, particularly through the introduction of AlignXplore+ [1][10]. Group 1: Paradigm Shift in Personalization - Traditional recommendation systems rely on ID embedding or specific parameters, which are often opaque and difficult to transfer across models [5][7]. - AlignXplore+ introduces a text-based user modeling approach that allows for better understanding and interpretation of user preferences, moving from a "black box" to a "white box" paradigm [1][8]. - This new method enables a universal interface for various downstream models, enhancing the ability to perform diverse tasks without being model-specific [9][11]. Group 2: Key Features of AlignXplore+ - AlignXplore+ breaks down data silos by processing heterogeneous data sources, allowing for a comprehensive understanding of user preferences from fragmented digital footprints [10]. - It achieves extreme transferability, enabling user profiles to be utilized across different applications and models, thus enhancing the versatility of AI systems [11][21]. - The framework adapts to real-world data noise, evolving user preferences without needing to reprocess all historical interactions, thereby maintaining stable reasoning capabilities [12][26]. Group 3: Technical Framework - The framework consists of two main phases: SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning), focusing on generating high-quality training data and optimizing user preference summaries [16][18]. - SFT involves a "generate-validate-merge" process to create accurate preference summaries based on potential future interactions [17]. - The RL phase employs curriculum pruning and cumulative rewards to enhance the long-term effectiveness of user preference summaries in dynamic environments [18]. Group 4: Performance Metrics - AlignXplore+ demonstrates superior performance in user understanding accuracy, transferability, and robustness compared to existing methods, achieving state-of-the-art results in various benchmark tests [19][20]. - The model shows impressive zero-shot transfer capabilities, effectively guiding tasks across different domains and models [21][23]. - Even in the absence of negative feedback, AlignXplore+ maintains significant performance advantages, showcasing its robustness in real-world scenarios [26][28]. Group 5: Future Directions - The team aims to explore the limits of streaming inference, comprehensive user behavior analysis, and the development of a truly universal personalized reasoning engine [30].