Workflow
Zero Shot Capability
icon
Search documents
360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI
AI Engineerยท 2025-07-16 17:59
Model Building and Training - LinkedIn leverages large language models (LLMs) for personalization and ranking tasks, aiming to use one model for all tasks [2][3] - The process involves converting user information into prompts, a method called "promptification" [8] - LinkedIn builds a large foundation model, Blue XL, with 150 billion parameters, then distills it to smaller, more efficient models like a 3B model for production [12] - Distillation from a large model is more effective than training a small model from scratch [14] - Increasing data, model size (up to 8x22B), and context length can improve model performance, but longer contexts may require model adjustments [17][18][19] Model Performance and Generalization - The model improves performance for cold start users, showing a growing gap compared to production models as interactions decrease [21] - The model demonstrates generalization to new domains, performing on par with or better than task-specific production models in out-of-domain tasks [23] Model Serving and Optimization - LinkedIn focuses on model specification, pruning, and quantization to improve throughput and reduce latency for production [26] - Gradual pruning and distillation are more effective than aggressive pruning, minimizing information loss [29][30] - Mixed precision, including FP8 for activations and model parameters but FP32 for the LM head, is crucial for maintaining prediction precision [31][32] - Sparsifying attention scores can reduce latency by allowing multiple item recommendations without each item attending to each other [34][35] - LinkedIn achieved a 7x reduction in latency and a 30x increase in throughput per GPU through these optimization techniques [36]