MaskSearch

Search documents
小红书首次开源文本大模型dots.llm1;全球首个AI芯片设计系统发布丨AIGC日报
创业邦· 2025-06-10 23:59
Group 1 - Xiaohongshu's hi lab has open-sourced the text large model dots.llm1, which is a large-scale Mixture of Experts (MoE) language model with 142 billion parameters, activating 14 billion parameters, achieving performance comparable to Qwen2.5-72B after training on 11.2 trillion tokens of high-quality data [1] - Alibaba's Tongyi Lab has released and open-sourced the MaskSearch pre-training framework, enabling AI to learn "active search + multi-step reasoning" for more accurate and intelligent responses to complex questions [1] - The world's first AI-based processor chip design system named "Enlightenment" has been officially launched, achieving full automation in chip hardware and software design, reaching human expert design levels in several key metrics [1] Group 2 - Google's flagship AI video generation tool Veo3 has introduced a new FAST/TURBO mode, significantly reducing costs and increasing generation speed, allowing users to produce up to 625 eight-second videos per month under the AI Ultra plan, compared to 125 videos in the standard mode [1]
阿里通义开源「推理+搜索」预训练新框架:小模型媲美大模型,多个开放域问答数据集表现显著提升
量子位· 2025-05-31 03:34
Core Viewpoint - Alibaba's Tongyi Lab has introduced a new framework called MaskSearch to enhance the reasoning and search capabilities of large models, achieving significant performance improvements in both in-domain and cross-domain open-domain question-answering tasks [1][2]. Group 1: MaskSearch Framework - MaskSearch is a general pre-training framework that has shown remarkable performance enhancements over baseline methods in open-domain question-answering tasks [2]. - The framework incorporates a retrieval-augmented masked prediction task (RAMP), where the model uses external knowledge bases to predict masked text segments, thereby improving its reasoning and search capabilities [5][11]. - MaskSearch supports both supervised fine-tuning (SFT) and reinforcement learning (RL) training methods, allowing for flexible model training [6]. Group 2: Training Methodology - The SFT process involves generating chain-of-thought (CoT) data through a multi-agent system that collaborates to produce reasoning chains, ensuring only correct answers are retained [12]. - The RL component utilizes a dynamic sampling strategy and a hybrid reward system to optimize the model's multi-step search and reasoning processes [15][20]. - A curriculum learning strategy is employed to gradually increase the difficulty of training samples based on the number of masked elements, enhancing the model's reasoning skills [16][24]. Group 3: Experimental Results - Experiments demonstrate that the two-stage MaskSearch training framework significantly enhances the search and reasoning capabilities of large models, with improvements noted in recall rates across various datasets [18][19]. - The RL approach shows higher performance ceilings, particularly in in-domain tasks like HotpotQA, indicating its effectiveness in optimizing search and reasoning processes [19][20]. - The scalability of MaskSearch is validated, with smaller models showing significant performance gains post-pre-training, while larger models exhibit more gradual improvements [22]. Group 4: Additional Insights - The masking strategy is crucial in determining the difficulty of the RAMP pre-training task, with experiments indicating that a perplexity-based masking strategy can enhance model recall rates [27][30]. - Different reward functions in the RL training process yield varying impacts on model performance, with model-based reward functions demonstrating superior stability and efficiency [31][33].