Core Insights - The article discusses the limitations of current recommendation systems, which often suffer from "short-term amnesia" due to computational and storage constraints, leading to the neglect of valuable long-tail data [1][3] - MUSE (Multimodal Search-based framework) is introduced as a solution to enhance user interest modeling by leveraging multimodal information, effectively acting as a "digital hippocampus" for recommendation systems [1][4] - The framework has been successfully implemented in Alibaba's advertising system, demonstrating a significant CTR increase of 12.6% [6][36] Summary by Sections Background and Evolution - The evolution of CTR modeling has transitioned from short-term behavior analysis to long-term behavior modeling, but improvements have plateaued as historical behavior length increases [2][3] - Users accumulate extensive behavior sequences, often exceeding one million actions, but current models typically utilize only a few thousand recent actions due to limitations in processing and storage [3][4] MUSE Framework - MUSE focuses on reorganizing user behavior data through multimodal information to improve the quality and usability of lifelong interest modeling [6][20] - The framework consists of two main components: GSU (General Search Unit) for initial retrieval and ESU (Exact Search Unit) for detailed modeling, both enhanced by multimodal embeddings [20][24] Implementation and Results - MUSE has been fully deployed in Alibaba's advertising system, capable of modeling user behavior sequences of up to 100,000 actions, with ongoing improvements to extend this to millions [6][36] - The implementation has shown that using high-quality multimodal embeddings significantly enhances retrieval and modeling accuracy, leading to improved business outcomes [6][36] Engineering Considerations - The design of MUSE allows for controlled latency despite the complexity of handling long sequences and multimodal data, primarily by decoupling the GSU from the main processing path [31][36] - The system's architecture emphasizes efficient data retrieval and processing, minimizing the impact of network and storage delays on overall performance [36][39] Industry Implications - MUSE offers valuable insights for industries involved in advertising, content recommendation, and e-commerce, suggesting a shift towards integrating multimodal embeddings and enhancing user interest modeling [37][39] - The framework encourages a reevaluation of existing systems, advocating for a focus on quality embeddings and efficient data handling to unlock new performance improvements [45][47]
阿里妈妈发布MUSE:用多模态搞定十万级超长行为序列,并开源Taobao-MM数据集