Qwen2.5
Search documents
定位大模型「作弊」神经回路!新研究首次揭示:虚假奖励如何精准激活第18-20层记忆
量子位· 2026-01-20 01:34
Core Insights - The article discusses the phenomenon of "Spurious Rewards" in large language models (LLMs) and how they can enhance accuracy even with false reward signals during training [1][2] - It highlights the concept of "Perplexity Paradox," where models show decreased perplexity for answers but increased perplexity for questions, indicating a trade-off between general understanding and specific memorization [3][6] Group 1: Key Findings - The research team identified that the model's internal memory shortcuts are activated by false RLVR, leading to a more efficient retrieval of contaminated knowledge rather than genuine learning [1][6] - The critical memory nodes are located in layers 18-20, which serve as functional anchors for retrieving memorized answers [10][20] - The study utilized various analytical methods, including Path Patching and Jensen-Shannon Divergence (JSD), to pinpoint the layers responsible for memory retrieval and structural adaptation [9][15] Group 2: Mechanisms and Dynamics - The research demonstrated that the model's decision-making process occurs at layers 18-20, where it chooses between reasoning paths and memory shortcuts [23] - The introduction of Neural ODEs allowed the team to model the continuous evolution of hidden states, confirming that separation forces peak at the critical layers [21] - The team successfully manipulated memory retrieval by scaling the activation of specific neurons, demonstrating a dose-dependent relationship in memory retrieval accuracy [25][30] Group 3: Implications and Future Directions - The findings provide new tools for evaluating RLVR effectiveness, suggesting that improvements may be illusory if they stem from memory activation circuits [36] - The research opens new avenues for detecting data contamination through internal neural activation patterns, moving beyond traditional statistical methods [38] - It proposes controllable methods for reducing reliance on contaminated knowledge without retraining the model, paving the way for new techniques in reasoning and decontamination [39]
别直接训,给主模型加个错题本,6B轻松超越8B
3 6 Ke· 2025-12-25 07:05
Core Insights - The article introduces the concept of "Mistake Log," which records the internal thought processes of large models when they make errors, aiming to enhance their learning through structured reflection [3][4][17] - This approach contrasts with traditional training methods that focus solely on whether the model's output is correct, highlighting a gap in the model's ability to engage in deep reflection similar to human learning [2][4] Group 1: Concept of Mistake Log - The Mistake Log consists of three layers: Question (the problem the model is addressing), Rationale (the internal reasoning state), and Mistakes (detailed error analysis at the token level) [5][8] - The Rationale layer captures the model's hidden states during the error, providing a snapshot of its cognitive state at the moment of the mistake [7][10] - The Mistake Log generates structured records of errors, allowing for a comprehensive understanding of where and how mistakes occur during the model's training [6][10] Group 2: Implementation and Benefits - An auxiliary model, referred to as Copilot, is introduced to learn from the Mistake Log of the main model, enhancing its ability to predict and correct errors in real-time [10][11] - The integration of Copilot allows for dynamic adjustments to the model's reasoning trajectory based on historical errors, improving overall performance [13][14] - Experimental results show that combining a smaller Copilot model with a larger main model can yield better performance than simply increasing the size of the main model, indicating that error correction capabilities are crucial [15][16] Group 3: Future Directions - The article emphasizes that the exploration of the Mistake Log mechanism is just the beginning, with potential for further optimization in its representation and the design of the Copilot [17] - It raises questions about the effectiveness of self-reflection based on internal states compared to external correction methods, suggesting a need for deeper investigation in future research [17]
港大领衔DrivePI:统一自动驾驶理解、感知、预测和规划的空间智能4D MLLM
自动驾驶之心· 2025-12-22 09:20
Core Viewpoint - DrivePI is introduced as a novel unified spatial-aware 4D multimodal large language model (MLLM) framework that integrates coarse-grained language understanding with fine-grained 3D perception capabilities, bridging the gap between vision-based and VLA paradigms in autonomous driving [2][38]. Group 1: Project Overview - DrivePI is developed collaboratively by Hong Kong University, leading the project with contributions from companies like Huawei and universities such as Tianjin University and Huazhong University of Science and Technology [2]. - The model is designed to perform spatial understanding, 3D perception, prediction, and planning tasks through end-to-end optimization, showcasing its capability to handle complex autonomous driving scenarios [4][6]. Group 2: Technical Innovations - DrivePI incorporates a multimodal perception approach, utilizing LiDAR alongside camera images to enhance spatial understanding and provide accurate 3D geometric information [11]. - The model generates intermediate fine 3D perception and prediction representations, ensuring reliable spatial awareness and enhancing the interpretability and safety of autonomous driving systems [11]. - A rich data engine is developed to seamlessly integrate 3D occupancy and flow representations into natural language scene descriptions, allowing the model to understand complex spatiotemporal dynamics [11]. Group 3: Performance Metrics - DrivePI outperforms existing VLA models, achieving a 2.5% higher average accuracy on nuScenes-QA compared to OpenDriveVLA-7B and reducing collision rates by 70% from 0.37% to 0.11% [5][16]. - In 3D occupancy and flow prediction, DrivePI achieved 49.3% OccScore and 49.3% RayIoU, surpassing the FB-OCC method by 10.3 percentage points [15][21]. - The model demonstrated a 32% reduction in L2 error for trajectory planning compared to VAD, showcasing its effectiveness in planning tasks [16]. Group 4: Data Engine and Annotation - The data engine for DrivePI operates in three main stages, focusing on generating diverse question-answer pairs for 4D spatial understanding and planning reasoning [12][18]. - Scene understanding annotations are generated to avoid confusion in distinguishing different views, enhancing the model's ability to interpret various perspectives [18]. Group 5: Ablation Studies and Insights - Ablation studies indicate that combining text and visual heads improves performance across most tasks, demonstrating the effectiveness of unifying text understanding with 3D perception, prediction, and planning [23]. - The impact of different text data scales was explored, revealing significant improvements in occupancy state prediction accuracy when increasing the training data size [26]. Group 6: Future Prospects - DrivePI is expected to inspire future research directions in autonomous driving by enhancing the interpretability and decision-making capabilities of systems through language reasoning and detailed 3D outputs [38].
普元信息:公司相关产品与阿里云专有云产品通过产品生态集成认证
Zheng Quan Ri Bao Wang· 2025-11-26 13:41
Core Insights - The company, Puyuan Information, confirmed on November 26 that its products have achieved integration certification with Alibaba Cloud's private cloud products [1] - Currently, the company's products are connected to open-source models such as Qwen2.5, Qwen3.0, and QwQ-32B [1]
普元信息:截至目前公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型
Ge Long Hui· 2025-11-26 09:41
Core Viewpoint - The company has integrated its products with Alibaba Cloud's private cloud products through product ecosystem integration certification [1] Group 1 - The company's products are now connected to open-source models Qwen2.5, Qwen3.0, and QwQ-32B [1]
普元信息:公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型
Mei Ri Jing Ji Xin Wen· 2025-11-26 09:41
Group 1 - The core viewpoint of the article highlights the collaboration between Puyuan Information and Alibaba's ecosystem, specifically regarding product integration and certification [2]. - Puyuan Information confirmed that its products have achieved integration certification with Alibaba Cloud's proprietary cloud products [2]. - As of now, Puyuan Information's products have been integrated with open-source models such as Qwen2.5, Qwen3.0, and QwQ-32B [2].
普元信息(688118.SH):截至目前公司产品已接入Qwen2.5、Qwen3.0、QwQ-32B等开源模型
Ge Long Hui· 2025-11-26 09:40
Core Viewpoint - The company has integrated its products with Alibaba Cloud's private cloud products through product ecosystem integration certification [1] Group 1 - The company's products are now connected to open-source models Qwen2.5, Qwen3.0, and QwQ-32B [1]
淘宝终于对搜索动刀了
虎嗅APP· 2025-11-11 23:53
Core Viewpoint - The article discusses the rapid evolution of AI in the e-commerce sector, particularly focusing on Alibaba's Taobao platform, highlighting the integration of AI tools to enhance user experience and operational efficiency during the 2024 Double Eleven shopping festival [4][11][30]. Group 1: AI Integration in E-commerce - Taobao's AI tools, such as "Xiao Wan Assistant," have significantly improved sales performance, with some brands reporting over 35% increase in orders after implementing AI-driven strategies [4][11]. - The establishment of the "Search and Promotion Intelligent Product Division" under the leadership of Zhang Kaifu marks a strategic shift towards AI-driven enhancements in Taobao's search and recommendation systems [7][12]. - The urgency for improving user experience in search functionalities was driven by customer feedback on social media, indicating a need for immediate action to address dissatisfaction [8][10]. Group 2: Challenges and Strategic Focus - The search experience issues stem from 22 years of accumulated complexities in Taobao's search engine, necessitating a comprehensive upgrade involving business, technology, and supply chain collaboration [9][19]. - The team identified three key focus areas for AI evolution by 2025: upgrading search and promotion systems, enhancing efficiency for merchants, and launching new AI-driven shopping products for consumers [16][21]. - The transition to AI-driven systems requires a complete overhaul of the existing product database to ensure compatibility with AI technologies, which is a significant undertaking [20][21]. Group 3: Organizational Changes and Talent Development - The internal structure has shifted to support AI initiatives, with a focus on creating flexible project teams that can innovate without being constrained by traditional metrics [24][25]. - A significant recruitment drive has targeted young talent from local universities, emphasizing the importance of nurturing creativity and technical skills in the AI domain [26][27]. - The company has implemented a systematic training approach for new hires, ensuring they are equipped with the necessary skills to contribute effectively to AI projects [27]. Group 4: Performance Metrics and Future Outlook - As of November 8, 2024, the AI-driven search and promotion capabilities have led to a 12% increase in advertising ROI and a 20% improvement in search relevance under complex queries [29][30]. - Despite initial successes, challenges remain in educating traditional merchants about AI tools, indicating a need for ongoing support and training [31]. - The company views AI as a long-term strategic focus, with plans for increased investment and further development of AI capabilities in the coming years [32][33].
清华唐杰新作:大模型能打掼蛋吗?
量子位· 2025-09-10 10:01
Core Viewpoint - The research indicates that large models can effectively play various card games, demonstrating their capabilities in complex decision-making scenarios [2][4][52]. Group 1: Model Performance - Different models exhibit varying performance across different card games, with fine-tuned models showing superior results compared to API-based and base models [3][40]. - Among the API-based models, GPT-4o performs the best overall, while GLM-4 demonstrates strong capabilities in games like DouDizhu and GuanDan [39][40]. - Fine-tuned models, particularly GLM4-9B-Chat-mix, excel in multiple games, including DouDizhu, GuanDan, and Uno, indicating their versatility [42][40]. Group 2: Game Selection and Learning Methodology - The research team selected eight popular card games based on their complexity and the availability of high-quality models and data [8]. - The learning process involved generating high-quality interaction data through teacher models and opponents, allowing the large language models to learn effectively [14][16]. - The complexity of the games influenced the number of training instances collected, with more complex games like DouDizhu and GuanDan requiring larger datasets [20][21]. Group 3: Inter-Game Influence - The study found that models trained on similar games can enhance each other's performance, while those trained on games with significant rule differences may experience performance conflicts [52][49]. - For instance, models trained on GuanDan showed good performance in DouDizhu, suggesting a positive transfer of skills between these games [45]. Group 4: Generalization and Capability - The research indicates that while training on card games, the general capabilities of the models may decline, but this can be mitigated by incorporating general data into the training process [56][54]. - The mixed training approach allowed for some recovery of general capabilities, demonstrating the balance between specialized game skills and broader knowledge [56].
自搜索强化学习SSRL:Agentic RL的Sim2Real时刻
机器之心· 2025-09-02 01:27
Core Insights - The article discusses the development and effectiveness of SSRL (Structured Search Reinforcement Learning) in enhancing the training efficiency and stability of Search Agents using large language models (LLMs) [6][28] - SSRL demonstrates superior performance over traditional methods that rely on external search engines, achieving effective transfer from simulation to real-world applications (Sim2Real) [6][28] Group 1 - SSRL utilizes structured prompts and format rewards to effectively extract world knowledge from models, leading to improved performance across various benchmarks and reduced hallucination [2][6] - The research highlights the high costs and inefficiencies associated with current RL training methods for Search Agents, which include full-real and semi-real search approaches [7][13] - The introduction of SSRL allows for a significant increase in training efficiency, estimated at approximately 5.6 times, while maintaining a continuous increase in training rewards without collapse [31][32] Group 2 - Experiments show that models trained with SSRL outperform those relying on external engines, particularly in real-world search scenarios, indicating the importance of integrating real-world knowledge [28][31] - The article presents findings that suggest the combination of self-generated knowledge and real-world knowledge can enhance model performance, particularly through entropy-guided search strategies [34] - The integration of SSRL with TTRL (Task-Driven Reinforcement Learning) has shown to improve generalization and effectiveness, achieving up to a 67% performance increase in certain tasks [38][39]