大模型推理

Search documents
700万参数击败DeepSeek R1等,三星一人独作爆火,用递归颠覆大模型推理
机器之心· 2025-10-09 04:43
机器之心报道 对 HRM 感兴趣的读者可以参考 我们之前的报道 。 编辑:冷猫 Training Small, Thinking Big. 大模型的推理架构颠覆的未免有些太快了。 今年 6 月,来自 Sapient Intelligence 的研究者提出了分层推理模型(HRM),用循环架构打破了传统思维链(CoT)的架构限制,对大模型推理结构产生了重大的 影响。 HRM 仅包含 2700 万个参数(大约比最小的 Qwen3 0.6B 模型小 22 倍) ,仅使用 1000 个训练样本,便在复杂的推理任务上取得了卓越的性能。 仅仅过了四个月,HRM 的架构就彻底不够看了。 来自加拿大蒙特利尔三星先进技术研究所(SAIT)的高级 AI 研究员 Alexia Jolicoeur-Martineau 介绍了微型递归模型(TRM)。 这个 TRM 有多离谱呢?一个 仅 包含 700 万个参数 (比 HRM 还要小 4 倍)的网络 ,在某些最困难的推理基准测试中,其参数数量与 o3-mini 和 Gemini 2.5 Pro 等 尖端语言模型相比,甚至可以超越它们,尽管这些模型的参数数量是 TRM 的 10,000 倍。 ...
最受欢迎的开源大模型推理框架 vLLM、SGLang 是如何炼成的?
AI科技大本营· 2025-09-24 02:01
Core Viewpoint - The article discusses the development stories of vLLM and SGLang, two prominent open-source inference engines for large language models (LLMs), highlighting their innovations, community engagement, and performance metrics. Group 1: LLM Inference Challenges - The core challenge of LLM inference lies in deploying models with hundreds of billions of parameters under strict constraints of latency, throughput, and cost [3] - The inference process involves applying learned knowledge to new data, which requires efficient computation and memory management [2][3] Group 2: vLLM Development - vLLM originated from a 2023 paper on PagedAttention, which innovatively applied operating system techniques for memory management, significantly enhancing throughput [7][8] - vLLM demonstrated remarkable performance improvements, handling up to 5 times the traffic and increasing throughput by 30 times compared to previous backends [9] - The project quickly evolved from a research initiative to a community-driven open-source project, amassing over 56,000 stars on GitHub and engaging thousands of developers [15][9] Group 3: SGLang Development - SGLang was developed from the paper "SGLang: Efficient Execution of Structured Language Model Programs," featuring RadixAttention for optimized performance [12] - SGLang retains the KVCache from previous requests to reduce computation during the prefill phase, showing significant performance advantages over traditional inference engines [12] - Although SGLang's community is smaller than vLLM's, it has over 2,000 participants and has shown rapid iteration and growth [13] Group 4: Community Engagement - vLLM has a robust community with over 12,000 participants in issues and pull requests, while SGLang's community is less than half that size [15][13] - Both projects have faced challenges in managing a growing number of issues and pull requests, with vLLM generally responding faster than SGLang [13] Group 5: Performance Metrics and Comparisons - vLLM and SGLang have both integrated advanced features like Continuous Batching and various attention mechanisms, leading to significant performance enhancements [29] - The competition between these two projects has intensified, with both claiming performance leadership in their respective releases [26] Group 6: Future Trends and Developments - The article notes that as the performance race heats up, both vLLM and SGLang are focusing on reproducible methods and real-world metrics rather than just benchmark results [26] - The trend indicates a convergence in model architectures and features among leading inference engines, with a shift in competition towards factors beyond performance [29] Group 7: Investment and Support - Both projects have attracted attention from investment firms and open-source foundations, with vLLM receiving support from a16z and SGLang being recognized in the PyTorch ecosystem [31][40]
商汤拆分芯片业务始末:百度创始成员加入,半年已融15亿
36氪· 2025-09-19 13:42
Core Viewpoint - The article discusses the emergence of AI chip startups in China, focusing on the establishment of "曦望" (Sunrise) as a subsidiary of商汤 (SenseTime) to develop large model inference chips, aiming to reduce inference costs significantly and capitalize on the growing AI chip market [4][7][9]. Company Overview - "曦望" was formed as part of商汤's "1+X" strategy, which involves splitting off high-potential but resource-intensive chip development into an independent entity [5][9]. - The company aims to leverage商汤's five years of experience in chip development to accelerate its growth and market entry [11][13]. Leadership and Team - 王湛, a former key figure at百度 (Baidu), has joined "曦望" as co-CEO, bringing extensive experience in managing large teams and product development [5][6]. - The executive team includes王勇, who has 20 years of chip industry experience, and the team has grown by 50% to nearly 200 members, with many coming from major tech companies [12][13]. Financial Investment and Product Development -商汤 has invested over 1.1 billion yuan in chip development over the past five years, with "曦望" raising over 1.5 billion yuan in recent funding rounds [13][14]. - "曦望" has successfully produced two chips: the S1 chip for cloud-edge visual inference and the S2 chip for large model inference, with plans for the S3 chip to reduce inference costs by 90% [14][15][17]. Market Context and Competitive Landscape - The Chinese AI chip industry is at a pivotal moment, with companies like寒武纪 (Cambricon) and others gaining significant market traction [9][22]. - The article highlights the importance of timing in entering the AI chip market, suggesting that "曦望" is well-positioned to capitalize on the current market dynamics [24][25]. Strategic Focus and Future Outlook - "曦望" aims to focus on specific market segments and leverage its relationship with industry capital to ensure successful product commercialization [18][19]. - The company believes that the future of AI chips will hinge on integrated hardware and software capabilities, as well as the ability to predict market trends [25].
腾讯云总裁邱跃鹏:腾讯云已全面适配主流国产芯片
Xin Lang Ke Ji· 2025-09-16 03:26
Core Insights - Tencent Cloud is actively participating in the open-source community and has developed a heterogeneous computing platform that integrates various chip resources to provide cost-effective AI computing power [1][5] - The shift in the large model industry from training to inference has led to a surge in demand for inference capabilities, prompting upgrades in AI infrastructure [3][4] - Tencent Cloud's infrastructure now covers 55 availability zones globally, with over 3,200 acceleration nodes, and has successfully defended against a 183% year-on-year increase in DDoS attacks [1][10] Group 1: AI Infrastructure and Optimization - Tencent Cloud has contributed multiple optimization technologies to open-source communities, including FlexKV multi-level caching technology, which reduces KVCache usage and lowers first-word latency by up to 70% [1][4] - The company has optimized GPU communication performance by 30% and doubled performance in common data center environments through enhancements to the DeepEP technology [3][4] - The introduction of the Agent Runtime solution provides a secure and efficient environment for deploying AI agents, integrating various components such as execution engines and cloud sandbox services [5][6] Group 2: Global Expansion and Client Support - Tencent Cloud has established nine technical support centers globally and plans to build new availability zones in Osaka, Japan, and Saudi Arabia, enhancing its international presence [1][14] - The company successfully migrated a large-scale project for GoTo, Indonesia's largest tech group, completing over 500 customized requirements and establishing a third availability zone in just five months [14] - Tencent Cloud's services have been recognized as a leader in the global gaming cloud platform market, providing robust infrastructure to support over 10,000 games and ensuring low-latency experiences for players worldwide [10][11] Group 3: Advanced Technologies and Services - The Cloud Mate service, which consists of various sub-agents, enhances cloud governance and risk management, achieving a 95% interception rate for risky SQL queries [8][9] - The integration of AI with database optimization has resulted in an 80% reduction in total latency for complex queries, showcasing Tencent Cloud's commitment to improving performance [9][10] - The EdgeOne product, which combines AI and security acceleration, has facilitated over 100,000 users in deploying e-commerce web pages quickly and efficiently [11][12]
劲爆!高盛上调寒武纪目标价1835元,“寒王”市值超五粮液股价超茅台?85后创始人陈天石身价超1500亿,大佬章建平火了!
Sou Hu Cai Jing· 2025-08-25 02:37
Core Viewpoint - The A-share chip sector has become the strongest market direction due to the dual benefits of Nvidia's suspension of H20 chip production and the launch of DeepSeek, with leading company Cambricon Technologies experiencing significant stock price increases [1][5]. Group 1: Stock Performance - Cambricon's stock price surged to 1243 yuan, making it the second highest in A-shares after Kweichow Moutai, with a market capitalization exceeding 520 billion yuan [1]. - Since July 11, Cambricon's stock has increased by 137%, and year-to-date, it has risen from under 50 yuan to 1243 yuan, representing a maximum increase of over 25 times [3]. - Goldman Sachs raised Cambricon's target price by 50% to 1835 yuan, which could push its market capitalization close to 770 billion yuan if achieved [3]. Group 2: Market Dynamics - The market speculation around Cambricon is driven by three core logic points: accelerated domestic substitution due to geopolitical factors, explosive demand for large model inference driven by local developments like DeepSeek, and Cambricon's leading industry position as "China's Nvidia" [5]. - The recent news of Nvidia's suspension of H20 chip production has further catalyzed the surge in demand for domestic AI chips, with the official release of DeepSeek-V3.1 adapting to domestic chip architectures [5]. Group 3: Company Insights - Cambricon's founder, Chen Tianshi, born in 1985, has seen his wealth increase significantly with the stock price surge, holding 29.63% of the company, which is now valued at 154.1 billion yuan [5]. - Notably, prominent investor Zhang Jianping has increased his stake in Cambricon to 608.63 million shares, making him the seventh largest shareholder with a market value of 7.566 billion yuan, realizing a floating profit of over 4 billion yuan [6].
"六边形战士"GPU公司完成亿元新融资
是说芯语· 2025-08-24 01:39
Core Viewpoint - The article highlights the recent developments of Zhuhai Chip Power Technology Co., Ltd. (Chip Power Technology), including its successful B2 financing round and advancements in AI chip technology, particularly the RPP architecture, which is designed for parallel computing and has been adapted for mainstream open-source large models [2][4][6]. Group 1: Financing and Growth - Chip Power Technology completed nearly 100 million yuan in B2 financing led by Feitu Venture Capital, with funds aimed at advancing RPP chip industrialization, core technology upgrades, and expanding into edge computing and AI chip inference markets [2]. - The company previously secured several million yuan in B1 financing in March, led by Changshi Capital, indicating a strong interest from investors in its technology and market potential [2]. Group 2: Technology and Product Development - After eight years of continuous R&D and product iteration, Chip Power Technology has established a comprehensive AI computing product matrix [3]. - The core technology, RPP (Reconfigurable Parallel Processor) architecture, is designed specifically for parallel computing, offering high energy efficiency and compatibility with CUDA programming language, facilitating rapid deployment of edge AI applications [4]. - The RPP-R8 chip, based on the RPP architecture, has been commercialized in various fields such as AI PCs, medical testing, and storage servers, and has formed deep partnerships with leading companies like Lenovo [6]. Group 3: Product Specifications - The RPP-R8 AE7100E chip is noted for being the smallest and thinnest GPGPU in the industry, with a power consumption of under 10W, making it suitable for terminal and edge computing devices [6]. - The chip measures 17mm x 17mm, and the integrated M.2 acceleration card is comparable in size to half a business card, featuring up to 32 TOPS of computing power and 60GB/s memory bandwidth [6]. - The M.2 acceleration card supports major open-source models such as Qwen, Llama, and Stable Diffusion, demonstrating its versatility in AI applications [6]. Group 4: Future Directions - Following the recent financing, Chip Power Technology plans to focus on developing high-end general-purpose chips with proprietary rights in China [7].
寒武纪涨停总市值超5200亿!即将超越贵州茅台成为新股王?
Sou Hu Cai Jing· 2025-08-22 07:00
Core Viewpoint - The stock price of Cambrian-U surged by 20% to 1243.20 yuan, with a market capitalization exceeding 520 billion yuan, indicating strong market interest in the AI chip sector [1][2] Group 1: Stock Performance - Cambrian-U's stock has increased by 107.12% since July 25, reaching a new high of 1243.20 yuan, reflecting strong market expectations for AI chips [1] - The trading volume reached 16.09 billion yuan, showcasing significant investor activity [1] Group 2: Market Drivers - The acceleration of domestic substitution is a key driver, as geopolitical factors have increased demand for self-controlled AI chips among domestic cloud and internet companies [1] - The explosive demand for large model inference, driven by local models like DeepSeek, is boosting the need for high-performance AI inference chips [1] - Cambrian's technical capabilities in AI chip architecture design and hardware-software optimization are gaining recognition, leading to its nickname as "China's Nvidia" [1] Group 3: Regulatory and Operational Updates - Cambrian's application for a specific A-share issuance for 2025 has been approved by the Shanghai Stock Exchange and is pending registration with the China Securities Regulatory Commission [2] - The company clarified that rumors regarding substrate orders, revenue forecasts, and new product information are false, emphasizing that there are no undisclosed significant matters and that operations are normal [2] Group 4: Industry Outlook - Cambrian is positioned to benefit from the dual drivers of domestic substitution and increased demand for large models, suggesting a favorable industry outlook [2]
DeepSeek引爆国产AI芯片:寒武纪、华胜天成、和而泰三大龙头热度爆棚,5000亿“寒王”市值超五粮液
Jin Rong Jie· 2025-08-22 06:50
Group 1: DeepSeek-V3.1 Release - DeepSeek-V3.1 has been officially released, generating significant attention and boosting market confidence in domestic large models [1] - The upgrade includes three main changes: hybrid thinking mode, higher thinking efficiency, and stronger agent capabilities [1] Group 2: Domestic Chip Market - DeepSeek's announcement hinted at the upcoming next-generation domestic chip, UE8M0 FP8, which has sparked speculation in the market and increased interest in domestic chips [2] - The current domestic AI chip sector is at a critical window for technological breakthroughs and ecosystem implementation, with UE8M0 FP8 aiding in accelerating the domestic chip industry's progress [2] Group 3: Cambricon Technologies - Cambricon Technologies, a leading domestic AI chip company, saw its market value exceed 500 billion yuan, with stock prices reaching a peak of 1240.00 yuan, reflecting a more than 19% increase [3] - Since September 2022, Cambricon's stock has risen over 460%, with a doubling of its price in just one month, indicating strong market interest in AI chip leaders [4] Group 4: Huawei Ecosystem and Huasheng Tiancheng - Huasheng Tiancheng has gained attention as a key player in the domestic market aiming to replace Nvidia, with its stock price increasing by over 130% in the past month [6] - The company is involved in AI computing power and is a partner in Huawei's Ascend chip ecosystem, with significant investments in AI computing centers [6] Group 5: Heheta and Moer Thread - Heheta's stock surged by 300% since September 2022, driven by its stake in Moer Thread, the first domestic GPU manufacturer supporting native FP8 [8] - Moer Thread's IPO process has activated market interest in domestic GPU alternatives, while Heheta also leads in smart controllers across various sectors [8]
大华股份(002236):服务器业务有望开启新增长点
HTSC· 2025-08-19 02:04
Investment Rating - The investment rating for the company is maintained as "Buy" with a target price of RMB 28.56 [1][6]. Core Views - The company is expected to open new growth avenues in its server business, particularly with the increasing demand for AI computing power [8][9]. - The company has successfully entered the procurement systems of major clients, which is anticipated to enhance its brand influence in the computing power industry [9][12]. - The overall performance in the first half of 2025 shows positive growth across all business lines, with a significant increase in profitability and cash flow [15][16]. Financial Data Summary - The company's market capitalization is RMB 59,786 million, with a closing price of RMB 18.19 as of August 18, 2025 [2]. - Revenue projections for 2024 to 2027 are RMB 32,181 million, RMB 33,275 million, RMB 35,165 million, and RMB 38,002 million respectively, with growth rates of -0.12%, 3.40%, 5.68%, and 8.07% [5]. - The net profit attributable to the parent company is projected to be RMB 2,906 million in 2024, increasing to RMB 4,208 million by 2027, with corresponding growth rates of -60.53%, 31.91%, 1.28%, and 8.39% [5]. Business Performance Overview - In the first half of 2025, the company achieved a revenue of RMB 151.81 billion, representing a year-on-year growth of 2.12%, with a net profit of RMB 24.76 billion, up 36.80% [15][16]. - The G-end business generated RMB 18.51 billion in revenue, growing 4.68%, while the B-end business saw revenue of RMB 42.19 billion, up 8.17% [10][16]. - The overseas business accounted for 50.25% of total revenue, with a slight growth of 1.91% year-on-year [10][16]. Future Outlook - The company anticipates steady growth in the second half of 2025, focusing on policy opportunities and expanding overseas markets [11][17]. - The server business is expected to benefit from the rising demand for AI and computing power, with significant contracts already secured [9][12].
链式思维是幻象吗?从数据分布视角重新审视大模型推理,马斯克回复,Grok破防
机器之心· 2025-08-14 09:11
Core Viewpoint - The research suggests that Chain-of-Thought (CoT) reasoning in large language models (LLMs) may not represent true reasoning but rather a replication of patterns learned from training data, leading to fragility when faced with out-of-distribution tasks [2][10][37]. Data Distribution Perspective on CoT - The effectiveness of CoT is attributed to the "structured inductive bias" learned within the training distribution, indicating that the reasoning chains are merely reproductions of common patterns rather than genuine logical deductions [13][37]. - A theoretical framework is introduced to quantify the relationship between training and testing distributions, highlighting how distribution shifts can impact reasoning performance [15]. Experimental Findings on Generalization - In "task generalization," the model shows nearly 100% accuracy within the training distribution, but accuracy drops to 0.01% with slight distribution shifts, indicating a lack of true generalization [23]. - Supervised fine-tuning on a small amount of new data can restore performance, but this only expands the existing distribution boundaries without enhancing abstract generalization capabilities [24]. - In "length generalization," even minor changes in input sequence length significantly affect model performance, demonstrating a tendency to generate reasoning chains consistent with training lengths [26]. - The model is highly sensitive to format changes, with even minor alterations in input prompts leading to complete reasoning failures [28]. Universal Sensitivity to Distribution Shifts - The study finds that the sensitivity to distribution shifts is a common phenomenon across different sampling temperatures and model sizes, indicating that this issue is not isolated to specific models [31]. Practical Implications - In high-risk fields such as healthcare and finance, reliance on CoT for robust reasoning is cautioned against, as misleading reasoning chains can be more dangerous than outright incorrect answers [34]. - Current evaluation methods that depend on validation sets closely aligned with training distributions may overestimate model robustness, necessitating stricter out-of-distribution testing [35]. - While supervised fine-tuning can quickly enhance performance on specific tasks, it does not equip models with true abstract reasoning capabilities [36].