Workflow
Mid-training
icon
Search documents
行业周报:AI需求持续验证,提升算力需求可预见性-20251123
KAIYUAN SECURITIES· 2025-11-23 05:15
Investment Rating - The industry investment rating is "Positive" (maintained) [1] Core Insights - The report highlights the stability of the Double Eleven sales event, with a focus on extreme low prices and user experience as key competitive factors. AI cloud demand continues to be validated, with major cloud service providers (CSPs) increasing capital expenditure (CapEx) guidance, indicating stronger predictability in computing power demand [5][26][67] - The emergence of Mid-training is expected to enhance the predictability of computing power demand, serving as a foundational technology support for this trend. This phase is crucial for refining data and improving model capabilities, which in turn stimulates demand for professional data annotation services [6][30][38] Summary by Sections 1. Internet Sector - The Double Eleven event showed stable performance, with total online retail sales reaching nearly 2.4 trillion yuan, a year-on-year growth of over 10%. Major platforms like Tmall and JD.com maintained significant market shares [16][19] - The competition among platforms has intensified, with a return to low prices and enhanced user experience driving growth. JD.com continues to perform well in high-value categories like 3C electronics and home appliances [23][14] - Major overseas internet companies are experiencing strong growth in advertising and cloud services, with increased CapEx further supporting the AI industry's positive outlook [5][8][67] 2. AI Sector - Mid-training is gaining importance, potentially extending the Scaling Law and enhancing the predictability of this round of computing power demand. This phase focuses on providing more structured supervision signals to improve downstream capabilities [30][33] - The rise of Mid-training has led to the emergence of several unicorn companies in the data annotation space, indicating a growing market for AI-related services [38][36] 3. Automotive & Autonomous Driving - The automotive sector is facing challenges due to tightened policies affecting passenger vehicle sales, with a notable decline in weekly sales figures. However, new vehicle launches are expected to ramp up towards the end of the year [41][43] - The Robotaxi industry is witnessing significant developments, with companies like Xiaoma Zhixing and Wenyuan Zhixing successfully listing on the Hong Kong Stock Exchange. Xiaopeng Motors is also set to launch Robotaxi models by 2026, showcasing advancements in autonomous driving technology [45][53][67] 4. Investment Recommendations - The report recommends focusing on companies benefiting from the ongoing trends in the internet and AI sectors, such as Alibaba, Pinduoduo, and Tencent in the internet space, and companies like Kingdee International and Beisen Holdings in the IT spending wave [8][67]
首创Mid-training范式破解RL奥秘,Llama终于追平Qwen!
机器之心· 2025-06-30 09:49
Core Insights - A recent research paper from Shanghai Chuangzhi Academy and Shanghai Jiao Tong University explores the differing performances of foundational language models like Llama and Qwen in reinforcement learning (RL) training, proposing a mid-training strategy that significantly enhances Llama's compatibility with RL, narrowing the performance gap with Qwen [1][10][11]. Research Background - The introduction of large-scale RL into language models has notably improved complex reasoning abilities, particularly in challenging tasks like mathematical competitions. However, only the Qwen series has shown substantial RL enhancements, raising questions about the foundational characteristics that determine a model's adaptability to RL scaling [9][10]. Mid-Training Strategy - The research team conducted extensive mid-training experiments on the Llama-3.2-3B model, utilizing controlled mid-training to explore key factors influencing RL performance. They found that high-quality mathematical datasets significantly improve RL outcomes, while low-quality data can lead to instability [14][16][18]. Data Quality and Preprocessing - The team created the MegaMath-Web-Pro-Max dataset to support large-scale ablation studies and mid-training, which is approximately 5.5 times larger than its predecessor, MegaMath-Web-Pro. This dataset was refined using a custom classifier to ensure high quality [19][25]. Two-Stage Training Approach - A two-stage mid-training strategy was proposed, consisting of a stable reasoning foundation phase followed by specialized training to enhance model adaptability. This approach resulted in significant performance improvements across various mathematical reasoning benchmarks [27][30]. Performance Improvements - The OctoThinker base model series demonstrated a 10%-20% performance increase in mathematical reasoning tasks compared to the original Llama models. For instance, in benchmarks like GSM8K and MATH500, OctoThinker models showed marked improvements in accuracy and reasoning depth [31][32][33]. Future Directions - The research team plans to refine mathematical pre-training datasets, design RL-friendly foundational models without relying on strong long-chain reasoning models, and expand the OctoThinker family to include new branches like tool-integrated reasoning [38].