大语言模型
Search documents
快手Klear-Reasoner登顶8B模型榜首,GPPO算法双效强化稳定性与探索能力!
AI前线· 2025-08-22 06:07
Core Viewpoint - The competition in large language models has highlighted the importance of mathematical and coding reasoning capabilities, with the introduction of the Klear-Reasoner model by Kuaishou's Klear team, which achieves state-of-the-art performance in various benchmarks [1][2]. Group 1: Model Performance - Klear-Reasoner outperforms other strong open-source models in benchmarks such as AIME2024 and AIME2025, achieving scores of 90.5% and 83.2% respectively, making it the top 8B model [2]. - The model's performance is attributed to the innovative GPPO (Gradient-Preserving Clipping Policy Optimization) algorithm, which enhances exploration capabilities while maintaining training stability [5][24]. Group 2: Technical Innovations - The GPPO algorithm allows for the retention of all gradients during training, which contrasts with traditional clipping methods that can hinder model exploration and slow down convergence [8][10]. - GPPO enables high-entropy tokens to participate in backpropagation, thus preserving exploration ability and accelerating error correction [10]. Group 3: Training Methodology - The Klear team emphasizes the importance of data quality over quantity during the supervised fine-tuning (SFT) phase, demonstrating that high-quality data sources yield better training efficiency and outcomes [12]. - For high-difficulty tasks, retaining some erroneous samples can enhance model performance by providing additional exploration opportunities [16]. - In the reinforcement learning (RL) phase, using soft rewards based on test case pass rates is more effective than hard rewards, leading to improved training stability and efficiency [19]. Group 4: Future Implications - The release of Klear-Reasoner not only showcases impressive performance but also offers a reproducible and scalable approach for reasoning models in supervised and reinforcement learning tasks, providing valuable insights for future applications in mathematics, coding, and other RLVR tasks [24].
从繁杂技巧到极简方案:ROLL团队带来RL4LLM新实践
机器之心· 2025-08-22 04:58
本研究由淘天集团算法技术—未来生活实验室与爱橙科技智能引擎事业部联合完成 ,核心作者 刘子贺,刘嘉顺, 贺彦程和王维埙等 。未来生活实验室汇聚淘天 集团的算力、数据与顶尖技术人才,专注于大模型、多模态等前沿 AI 方向,致力于打造基础算法、模型能力及各类 AI Native 应用,引领 AI 在生活消费 领域的技术创新。爱橙科技则在大模型训练与优化方面具有丰富的实践经验。双方此前联合开源了高效大模型强化学习训练框架 ROLL,此次论文工作同样 是基于 ROLL 框架的实践探索。 近年来,强化学习(Reinforcement Learning, RL)在提升大语言模型(LLM)复杂推理能力方面展现出显著效果,广泛应用于数学解题、代码生成等任 务。通过 RL 微调的模型常在推理性能上超越仅依赖监督微调或预训练的模型。也因此催生了大量的相关研究。但随之而来的,是一系列令人困惑的现象: 不同研究提出了不同的 RL 优化技巧,却缺乏统一的实验对比和机制解释,有的甚至得出相互矛盾的结论。对于研究者和工程师而言,这种 "方法多、结论 乱" 的局面,反而增加了落地应用的难度。 为此,阿里巴巴淘天集团和爱橙科技联合多所高校,基 ...
石头科技的逆袭:找到自己的方法论
2 1 Shi Ji Jing Ji Bao Dao· 2025-08-22 02:09
Core Insights - Stone Technology has achieved the highest global shipment volume of robotic vacuum cleaners, reflecting its product competitiveness and globalization progress in the latest semi-annual report [1][2] - The company reported a revenue of 7.903 billion yuan in the first half of 2025, a year-on-year increase of 78.96%, and a net profit of 678 million yuan, with a significant quarter-on-quarter profit growth of 53.29% in Q2 [1][2] Financial Performance - Revenue for the first half of 2025 reached 79.03 billion yuan, marking a continuous six-year growth [1] - The net profit for Q2 2025 was 678 million yuan, with a net profit margin rising to 9.2% [1] - Total assets at the end of the period were 19.379 billion yuan, a 10.83% increase from the beginning of the year, with net assets of 13.374 billion yuan [1] Market Dynamics - Domestic sales have been boosted by government subsidy policies, while overseas markets are seeing brand building and refined channel strategies [1][4] - The global smart robotic vacuum cleaner market is projected to ship 20.603 million units in 2024, with a year-on-year growth of 11.2% and a sales revenue of 9.31 billion USD, reflecting a 19.7% increase [4][5] - The average price of robotic vacuums has risen by 7.6% to 452 USD, driven by continuous product and technology iterations [4] Product Innovation - Stone Technology has introduced advanced products like the G30 Space Exploration version, featuring AI obstacle recognition and a five-axis folding robotic arm [6][7] - The P20 Ultra Plus addresses user pain points related to cleaning and hygiene with its self-cleaning base and advanced features [6] - The company is transitioning from a "cleaning tool" provider to a "smart home solution provider" through innovative technologies [7] International Strategy - Stone Technology is expanding its overseas channels and product price ranges, focusing on markets like Southern Europe and the UK [8][10] - The company is implementing a "de-distribution" strategy in Europe, shifting from reliance on local distributors to a direct sales model [8][9] - The establishment of manufacturing capabilities in Vietnam aims to enhance supply chain resilience and reduce geopolitical risks [10]
【点金互动易】算力芯片+Deepseek,公司部分算力芯片已实现量产,拥有实现端侧芯片的智能化处理能力
财联社· 2025-08-22 01:19
Core Viewpoint - The article emphasizes the importance of timely and professional information interpretation in the investment landscape, focusing on the investment value of significant events, analysis of industry chain companies, and key points of major policies [1]. Group 1: Company Developments - The company has achieved mass production of certain computing power chips, which possess intelligent processing capabilities for edge-side chips, and these products are already integrated with major language models such as DeepSeek and Kimi [1]. - The company has also mass-produced several wafer testing equipment that can be utilized for large-diameter wafer testing [1].
斑马智行独立赴港IPO 上汽是最大客户和重要股东
Mei Ri Shang Bao· 2025-08-21 22:57
Core Viewpoint - Alibaba plans to spin off its subsidiary, Zhibo Network Technology Co., Ltd. (Zhibo Zhixing), for a Hong Kong IPO, marking a significant move in the smart automotive sector [1][2]. Company Summary - Zhibo Zhixing was established on November 22, 2015, and will no longer be included in Alibaba's consolidated financial statements starting December 27, 2024 [2]. - As of the announcement date, Alibaba holds approximately 44.72% of Zhibo Zhixing's shares, and post-spin-off, it will retain over 30% [2]. - Zhibo Zhixing primarily provides smart automotive operating systems and solutions, with SAIC Group being its largest customer and significant shareholder [2][3]. Financial Performance - Zhibo Zhixing's revenue from 2022 to 2024 was reported as follows: 805 million yuan, 872 million yuan, and 824 million yuan, respectively [3]. - The company incurred losses and total comprehensive expenses of 878 million yuan, 876 million yuan, and 847 million yuan during the same period [3]. - Research and development expenses were 1.111 billion yuan, 1.123 billion yuan, and 980 million yuan from 2022 to 2024 [3]. Client and Supplier Relationships - SAIC Group has been Zhibo Zhixing's largest customer from 2022 to 2024, contributing 54.7%, 47.4%, and 38.8% of the company's revenue [3]. - Alibaba has been the primary supplier for Zhibo Zhixing, with procurement amounts accounting for 53.5%, 58.4%, and 50.5% of total purchases during the same period [3]. Strategic Implications - The IPO is expected to enhance Zhibo Zhixing's independent image among clients, suppliers, and potential strategic partners, facilitating better business negotiations [4]. - The spin-off will also improve Zhibo Zhixing's ability to secure bank financing and broaden its external funding channels [4]. Use of IPO Proceeds - The IPO proceeds will be allocated to research and development, market expansion, capital operations, and working capital supplementation [5]. - Specific plans include strengthening technological leadership in the smart cockpit solutions market and expanding market share both domestically and globally [5]. Market Outlook - The smart cockpit solutions market is at a pivotal development stage, supported by government policies, rapid growth in the passenger vehicle market, and advancements in chip performance and AI technologies [6]. - Global smart vehicle sales are projected to grow from 58 million units in 2024 to 86.5 million units by 2030, with a compound annual growth rate of 6.9% [6]. - The market size for smart cockpit solutions in China is expected to increase from 129 billion yuan to 327.4 billion yuan, with a compound annual growth rate of 16.8% [6].
斑马网络递表港交所,大股东包括上汽与阿里
Ju Chao Zi Xun· 2025-08-21 07:43
Group 1 - On August 20, 2023, the joint venture between SAIC and Alibaba, Zhibo Network, officially submitted its IPO application to the Hong Kong Stock Exchange, with Deutsche Bank, CICC, and Guotai Junan International as joint sponsors [2] - The IPO proceeds will be used to enhance R&D investment, increase market share in China, expand globally, support business acquisitions and expansion plans, and supplement working capital [2] Group 2 - On August 21, 2023, Alibaba announced that Zhibo would no longer be consolidated into its financial statements starting December 27, 2024, following a proposed spin-off plan submitted to the Hong Kong Stock Exchange [4] - As of the announcement date, Alibaba held approximately 44.72% of Zhibo's shares, and after the proposed adjustments and spin-off, it will continue to hold over 30% of Zhibo, which will remain an equity method investment [4] Group 3 - Zhibo Network, established in November 2015, primarily provides intelligent vehicle operating systems, smart vehicle solutions, and digital transportation solutions for the automotive and transportation industries [5] - According to ZhiShi Consulting, Zhibo is the largest software-centric intelligent cockpit solution provider in China based on projected 2024 revenue and ranks first in terms of solution deployment volume [5] - Zhibo is one of only two third-party suppliers in China with a fully self-developed automotive operating system and uniquely integrates three core pillars of smart vehicle experience: system-level operating system solutions, AI end-to-end solutions, and automotive platform services [5] - Zhibo's large language model capabilities rank first among nine top Chinese automotive AI companies in the intelligent cockpit field, excelling in various real-world scenarios such as vehicle control, driving, entertainment, mobility, business, lifestyle, and social interaction [5]
独角兽的致命软肋:拆解斑马智行招股书,阿里上汽深度绑定,独立性成最大考题
Sou Hu Cai Jing· 2025-08-21 06:08
Core Viewpoint - Zebra Network Technology Co., Ltd. (Zebra Smart Travel) has submitted its listing application to the Hong Kong Stock Exchange, focusing on smart cockpit solutions and facing challenges related to its business model heavily reliant on major shareholders [1][4]. Group 1: Business Overview - In 2024, Zebra Smart Travel achieved revenue of 824 million yuan, positioning itself as one of the only two fully self-developed automotive operating system third-party suppliers in China [1]. - The company integrates system-level operating system solutions, AI end-to-end architecture, and in-vehicle platform services into a unified offering, creating a differentiated advantage in the smart cockpit sector [1][3]. Group 2: Technology and Innovation - The company launched China's first internet car in 2016 and introduced a voice-interactive cockpit design, which remains influential in the industry [3]. - The "Yuan Shen AI" system, released in 2023, incorporates large language models into the vehicle environment, enhancing its capabilities from passive response to proactive decision-making [3]. Group 3: Shareholder Dependency and Risks - The company is significantly dependent on its major shareholders, with SAIC Group contributing 54.7%, 47.4%, and 38.8% of revenue from 2022 to 2024, and 47.48% in Q1 2025 [4]. - Alibaba, as both a supplier and customer, accounted for 53.5%, 58.4%, and 54.7% of procurement from 2022 to 2024, with a similar percentage in Q1 2025 [4]. Group 4: Organizational Changes and Market Challenges - In July 2025, the company initiated layoffs affecting over 30% of its workforce, primarily due to challenges in developing the 7.0 system, although the official adjustment was stated to be around 10% [5]. - The company faces intensified competition in the smart cockpit software market, which is growing at over 19% annually, with risks of technology homogenization as traditional Tier 1 suppliers and tech giants enter the space [5]. Group 5: Future Outlook - The company stands at a crossroads of capital market entry and technological advancement, needing to balance shareholder collaboration with operational independence while ensuring that R&D investments translate into stable revenue growth [6].
字节突然开源Seed-OSS,512K上下文碾压主流4倍长度,推理能力刷新纪录
3 6 Ke· 2025-08-21 03:55
Core Insights - ByteDance's Seed team has launched Seed-OSS, an open-source model series that mirrors OpenAI's GPT-OSS strategy, providing a version tailored for the open-source community without directly releasing the core commercial model Doubao [2] - The Seed-OSS model features a native 512K context window, significantly surpassing the 128K context window of mainstream open-source models, enabling it to handle complex tasks requiring extensive information processing [3][5] - The model architecture includes 36 billion parameters, utilizing advanced techniques such as RoPE position encoding and GQA attention mechanism, making it a robust option for various applications [5][6] Model Features - Seed-OSS allows users to set a "Thinking Budget" to control the depth of the model's reasoning, enhancing its adaptability for different task complexities [3] - The model is designed to be trained on integer multiples of 512 tokens, ensuring optimal performance [5] - Two versions of the base model are provided: one with synthetic instruction data for enhanced performance and one without for a purer model [6] Performance Metrics - Seed-OSS-36B-Base achieved a score of 65.1 on the MMLU-Pro benchmark, outperforming similar models like Qwen2.5-32B-Base, which scored 58.5 [7] - In reasoning capabilities, it scored 87.7 on the BBH benchmark, setting a new record for open-source models [7] - The model also demonstrated strong performance in coding tasks, with scores of 76.8 on HumanEval and 80.6 on MBPP [7] Benchmark Comparisons - Seed-OSS-36B-Base outperformed several competitors across various benchmarks, including MMLU-Pro, TriviaQA, and GSM8K, showcasing its superior knowledge and reasoning capabilities [8][9] - The instruction-tuned version, Seed-OSS-36B-Instruct, scored 91.7 on the AIME24 math competition, ranking just below OpenAI's OSS-20B [9] Development Background - The ByteDance Seed team, established in 2023, aims to create advanced AI foundational models, having previously released several impactful projects in niche areas [10] - Recent projects include Seed-Coder, a code generation model, and BAGEL, a multimodal model capable of processing text, images, and videos [12] - The introduction of Seed-OSS adds a significant player to the domestic open-source base model landscape [12]
字节突然开源Seed-OSS,512K上下文碾压主流4倍长度!推理能力刷新纪录
量子位· 2025-08-21 02:36
Core Viewpoint - ByteDance has launched an open-source large model named Seed-OSS-36B, featuring 360 billion parameters, which aims to compete with existing models like OpenAI's GPT-OSS series [1][3][4]. Model Features - Seed-OSS-36B boasts a native context window of 512K, significantly larger than the 128K offered by mainstream models like DeepSeek V3.1, allowing it to handle complex tasks such as legal document review and long report analysis [5][6][8]. - The model introduces a "Thinking Budget" mechanism, enabling users to set a token limit for the model's reasoning depth, which can be adjusted based on task complexity [9][10][12]. - The architecture includes 360 billion parameters, 64 layers, and utilizes RoPE position encoding, GQA attention mechanism, RMSNorm normalization, and SwiGLU activation function [13][14]. Performance Metrics - Seed-OSS-36B-Base achieved a score of 65.1 on the MMLU-Pro benchmark, outperforming Qwen2.5-32B-Base, which scored 58.5 [16]. - The model scored 87.7 on the BBH reasoning benchmark, setting a new record for open-source models, and demonstrated strong performance in math and coding tasks [17][18]. - The instruction-tuned version, Seed-OSS-36B-Instruct, scored 91.7 on the AIME24 math competition, ranking just below OpenAI's OSS-20B [20]. Development Background - The ByteDance Seed team, established in 2023, aims to create advanced AI foundational models and has released several impactful projects, including Seed-Coder and BAGEL, which address various AI tasks [21][22][23]. - The team has also developed VeOmni, a distributed training framework, and Seed LiveInterpret, an end-to-end simultaneous interpretation model [24][25]. Open Source Contribution - With the release of Seed-OSS, ByteDance adds a significant player to the domestic open-source base model landscape, promoting further advancements in AI technology [26].
上半年“增收不增利”,要打造行业AI Agent的佳发教育称“AI相关产品规模未达预期”
Mei Ri Jing Ji Xin Wen· 2025-08-20 23:51
Core Viewpoint - Jiafa Education reported a revenue of 273 million yuan for the first half of 2025, marking a year-on-year increase of 5.03%, but the net profit attributable to shareholders decreased by 4.60% to 40.78 million yuan, indicating a situation of "increased revenue but decreased profit" [1][2]. Financial Performance - The company's revenue from the "standardized examination point products and overall solutions" declined by 11.93% to approximately 154 million yuan, while the gross margin for this product line decreased by 0.47 percentage points [2]. - Revenue from "smart education products and overall solutions" increased by 66.55% to approximately 94.58 million yuan, although the gross margin for this segment fell by 15.07 percentage points to 28.8% [2]. - Operating costs rose by 20.94%, significantly outpacing the revenue growth of 5.03% [2]. - In Q1, the company experienced a substantial revenue decline of 51.82% to 55 million yuan, resulting in a net loss of 10 million yuan. However, Q2 showed improvement with a revenue increase of 49.23% to 219 million yuan and a net profit increase of 40.3% to 51 million yuan [2]. Strategic Developments - Jiafa Education aims to "build an industry AI Agent and reconstruct the entire teaching scene," having fully integrated the DeepSeek large model and developed a comprehensive educational AI application base [3]. - The company is in the early stages of market expansion for its AI-related products, which have seen increasing recognition and demand, but the business scale has not yet met expectations [3]. Shareholder Changes - In Q2 2025, notable changes among the top ten shareholders included an increase of approximately 178,000 shares by Sichuan Development Securities Investment Fund Management Co., and an increase of about 32,600 shares by Yin Hui [3].