Workflow
没想到,最Open的开源新模型,来自小红书
机器之心·2025-06-07 03:59

Core Viewpoint - Xiaohongshu has launched its first self-developed large model, named dots.llm1, marking a significant step in its engagement with the tech community and showcasing its capabilities in the field of AI [3][10]. Model Overview - Dots.llm1 is a medium-scale MoE (Mixture of Experts) model with a total parameter count of 142 billion and 14 billion activated parameters, demonstrating strong performance even with a smaller activation size [5][6]. - In various benchmarks, dots.llm1 shows competitive performance against models like Qwen2.5 and Qwen3, particularly in tasks involving Chinese and English language understanding, mathematics, and coding [6][7]. Open Source Initiative - The open-source effort includes not only the dots.llm1 model but also a series of pre-trained base models and checkpoints, facilitating further development and fine-tuning by the community [8]. - The initiative reflects a broader trend in the industry towards open collaboration, with Xiaohongshu aiming to contribute to and benefit from community-driven advancements [46]. Training Data and Quality - Dots.llm1 was trained on 11.2 trillion high-quality tokens sourced from Common Crawl and proprietary web data, emphasizing the importance of data quality in model performance [28]. - The data processing involved multiple steps to ensure high standards, including filtering out low-quality content and ensuring semantic accuracy [28][30][31]. Training Efficiency - The model employs innovative training techniques to enhance efficiency, including a collaboration with NVIDIA to optimize communication and computation during training [33][35]. - The training strategy includes a two-phase approach with a focus on stability and gradual optimization, utilizing a learning rate schedule to improve performance [40][41]. Post-Training and Fine-Tuning - After pre-training, dots.llm1 underwent two stages of supervised fine-tuning, focusing on enhancing its understanding and execution capabilities across various tasks [41][42]. - The fine-tuning process involved a diverse set of high-quality instruction data, ensuring the model's robustness in multi-turn dialogue, knowledge Q&A, and complex instruction following [44][45]. Community Engagement - The open-source release of dots.llm1 is seen as a strategic move to foster collaboration with developers and researchers, positioning Xiaohongshu as a key player in the AI model landscape [46].