大模型推理
Search documents
技能英伟达桌面超算,加入苹果Mac Studio快爆了:推理速度飙升至277%
量子位· 2025-10-17 04:58
Core Viewpoint - EXO Labs has developed a new framework that enhances large model inference speed by combining NVIDIA's DGX Spark with Apple's M3 Ultra, achieving a speedup of up to 2.77 times for model deployment [1][5][18]. Group 1: Technology and Implementation - The framework utilizes a PD (Prefill and Decode) separation approach, where DGX Spark handles the Prefill phase due to its high computational power, while M3 Ultra manages the Decode phase, benefiting from its high memory bandwidth [11][18]. - The Prefill phase's computational demand grows quadratically with prompt length, while the Decode phase is primarily limited by memory bandwidth, making the separation of tasks advantageous [8][11]. - EXO Labs employs a streaming transmission method for KV cache, allowing for overlapping computation and data transfer between the two devices, which minimizes communication costs [16][18]. Group 2: Performance Metrics - The combination of DGX Spark and M3 Ultra results in significant performance improvements: Prefill speed increases to 3.79 times that of M3 Ultra alone, and Decode speed improves to 3.37 times that of DGX Spark [18][19]. - The overall performance metrics show that the combined system reduces total processing time to 2.32 seconds, achieving a speedup of 2.8 times compared to using M3 Ultra alone [19]. Group 3: Industry Context - NVIDIA is also exploring similar PD separation techniques with its upcoming Rubin CPX platform, which will utilize a compute-intensive processor for Prefill and a high-bandwidth memory chip for Decode [20]. - The recent delivery of DGX Spark systems to notable figures in the tech industry indicates a growing interest and investment in advanced AI inference technologies [22]. - Apple's latest M5 chip shows improvements in AI performance, but comparisons suggest that M3 Ultra may hold more value in the current landscape of AI hardware [26][30].
中国电信完成业界首个面向大模型推理的异构算力协同技术验证
Xin Lang Cai Jing· 2025-10-13 23:42
Group 1 - The core viewpoint of the articles highlights the successful implementation of the DeepSeek series model by China Telecom Research Institute in collaboration with various industry partners, achieving cost reduction and efficiency improvement in large model inference through a combination of NVIDIA and domestic computing power [1][2] - The DeepSeek 671B model demonstrated a throughput performance improvement of 30% to 72% across multiple scenarios, with a doubling of concurrent capability and a maximum reduction of 42% in inference costs under the same throughput conditions [1] - The successful verification of heterogeneous computing power collaboration for large model inference reflects China Telecom's deep understanding of intelligent computing optimization technology and its innovative practices in adapting domestic computing power [2] Group 2 - The industry consensus is shifting towards optimizing chip design for the Prefill and Decode stages of inference, with NVIDIA and Huawei releasing respective chip design plans that incorporate "high compute low storage" and "low compute high storage" strategies [2] - China Telecom Research Institute has developed a full-stack self-research heterogeneous mixed inference system that showcases three core advantages: efficient transmission between heterogeneous chip PD pools, automatic recommendation and real-time optimization of PD resource allocation, and dynamic scheduling of inference tasks [2] - China Telecom aims to continue enhancing the high-quality development of domestic computing power, creating a "connected and efficient collaborative" heterogeneous computing ecosystem for large model training and inference [2]
700万参数击败DeepSeek R1等,三星一人独作爆火,用递归颠覆大模型推理
机器之心· 2025-10-09 04:43
Core Viewpoint - The article discusses the emergence of new models in AI reasoning, particularly the Hierarchical Reasoning Model (HRM) and the Tiny Recursive Model (TRM), highlighting their efficiency and performance in complex reasoning tasks despite having significantly fewer parameters compared to traditional large models [1][4][29]. Group 1: Hierarchical Reasoning Model (HRM) - HRM, proposed by researchers from Sapient Intelligence, utilizes a hierarchical reasoning structure and has 27 million parameters, achieving remarkable performance with only 1,000 training samples [1]. - The model's architecture is based on a two-network design, which increases the parameter count compared to conventional single-network supervised learning [12]. - HRM's performance is benchmarked against various tasks, showing its accuracy in Sudoku-Extreme and Maze-Hard [25][29]. Group 2: Tiny Recursive Model (TRM) - TRM, introduced by researchers from Samsung Advanced Technology Institute, contains only 7 million parameters and outperforms larger models like o3-mini and Gemini 2.5 Pro in challenging reasoning tasks [4][29]. - The model operates through a recursive reasoning process, iterating up to 16 times to refine its answers, demonstrating the principle of "less is more" [6][9]. - TRM's experimental results indicate superior accuracy in Sudoku-Extreme (87.4%) and competitive performance in other benchmarks compared to HRM [27][29]. Group 3: Experimental Results and Comparisons - The article presents a comparison of accuracy rates between HRM and TRM across various datasets, showing TRM's efficiency in achieving higher accuracy with fewer parameters [23][29]. - In the ARC-AGI benchmarks, TRM-Att and TRM-MLP models demonstrate better performance than HRM, emphasizing the advantages of parameter efficiency and generalization capabilities [26][29]. - The findings suggest that reducing model complexity while increasing recursive iterations can lead to improved performance, challenging traditional assumptions about model depth and parameter size [15][17].
最受欢迎的开源大模型推理框架 vLLM、SGLang 是如何炼成的?
AI科技大本营· 2025-09-24 02:01
Core Viewpoint - The article discusses the development stories of vLLM and SGLang, two prominent open-source inference engines for large language models (LLMs), highlighting their innovations, community engagement, and performance metrics. Group 1: LLM Inference Challenges - The core challenge of LLM inference lies in deploying models with hundreds of billions of parameters under strict constraints of latency, throughput, and cost [3] - The inference process involves applying learned knowledge to new data, which requires efficient computation and memory management [2][3] Group 2: vLLM Development - vLLM originated from a 2023 paper on PagedAttention, which innovatively applied operating system techniques for memory management, significantly enhancing throughput [7][8] - vLLM demonstrated remarkable performance improvements, handling up to 5 times the traffic and increasing throughput by 30 times compared to previous backends [9] - The project quickly evolved from a research initiative to a community-driven open-source project, amassing over 56,000 stars on GitHub and engaging thousands of developers [15][9] Group 3: SGLang Development - SGLang was developed from the paper "SGLang: Efficient Execution of Structured Language Model Programs," featuring RadixAttention for optimized performance [12] - SGLang retains the KVCache from previous requests to reduce computation during the prefill phase, showing significant performance advantages over traditional inference engines [12] - Although SGLang's community is smaller than vLLM's, it has over 2,000 participants and has shown rapid iteration and growth [13] Group 4: Community Engagement - vLLM has a robust community with over 12,000 participants in issues and pull requests, while SGLang's community is less than half that size [15][13] - Both projects have faced challenges in managing a growing number of issues and pull requests, with vLLM generally responding faster than SGLang [13] Group 5: Performance Metrics and Comparisons - vLLM and SGLang have both integrated advanced features like Continuous Batching and various attention mechanisms, leading to significant performance enhancements [29] - The competition between these two projects has intensified, with both claiming performance leadership in their respective releases [26] Group 6: Future Trends and Developments - The article notes that as the performance race heats up, both vLLM and SGLang are focusing on reproducible methods and real-world metrics rather than just benchmark results [26] - The trend indicates a convergence in model architectures and features among leading inference engines, with a shift in competition towards factors beyond performance [29] Group 7: Investment and Support - Both projects have attracted attention from investment firms and open-source foundations, with vLLM receiving support from a16z and SGLang being recognized in the PyTorch ecosystem [31][40]
商汤拆分芯片业务始末:百度创始成员加入,半年已融15亿
36氪· 2025-09-19 13:42
Core Viewpoint - The article discusses the emergence of AI chip startups in China, focusing on the establishment of "曦望" (Sunrise) as a subsidiary of商汤 (SenseTime) to develop large model inference chips, aiming to reduce inference costs significantly and capitalize on the growing AI chip market [4][7][9]. Company Overview - "曦望" was formed as part of商汤's "1+X" strategy, which involves splitting off high-potential but resource-intensive chip development into an independent entity [5][9]. - The company aims to leverage商汤's five years of experience in chip development to accelerate its growth and market entry [11][13]. Leadership and Team - 王湛, a former key figure at百度 (Baidu), has joined "曦望" as co-CEO, bringing extensive experience in managing large teams and product development [5][6]. - The executive team includes王勇, who has 20 years of chip industry experience, and the team has grown by 50% to nearly 200 members, with many coming from major tech companies [12][13]. Financial Investment and Product Development -商汤 has invested over 1.1 billion yuan in chip development over the past five years, with "曦望" raising over 1.5 billion yuan in recent funding rounds [13][14]. - "曦望" has successfully produced two chips: the S1 chip for cloud-edge visual inference and the S2 chip for large model inference, with plans for the S3 chip to reduce inference costs by 90% [14][15][17]. Market Context and Competitive Landscape - The Chinese AI chip industry is at a pivotal moment, with companies like寒武纪 (Cambricon) and others gaining significant market traction [9][22]. - The article highlights the importance of timing in entering the AI chip market, suggesting that "曦望" is well-positioned to capitalize on the current market dynamics [24][25]. Strategic Focus and Future Outlook - "曦望" aims to focus on specific market segments and leverage its relationship with industry capital to ensure successful product commercialization [18][19]. - The company believes that the future of AI chips will hinge on integrated hardware and software capabilities, as well as the ability to predict market trends [25].
腾讯云总裁邱跃鹏:腾讯云已全面适配主流国产芯片
Xin Lang Ke Ji· 2025-09-16 03:26
Core Insights - Tencent Cloud is actively participating in the open-source community and has developed a heterogeneous computing platform that integrates various chip resources to provide cost-effective AI computing power [1][5] - The shift in the large model industry from training to inference has led to a surge in demand for inference capabilities, prompting upgrades in AI infrastructure [3][4] - Tencent Cloud's infrastructure now covers 55 availability zones globally, with over 3,200 acceleration nodes, and has successfully defended against a 183% year-on-year increase in DDoS attacks [1][10] Group 1: AI Infrastructure and Optimization - Tencent Cloud has contributed multiple optimization technologies to open-source communities, including FlexKV multi-level caching technology, which reduces KVCache usage and lowers first-word latency by up to 70% [1][4] - The company has optimized GPU communication performance by 30% and doubled performance in common data center environments through enhancements to the DeepEP technology [3][4] - The introduction of the Agent Runtime solution provides a secure and efficient environment for deploying AI agents, integrating various components such as execution engines and cloud sandbox services [5][6] Group 2: Global Expansion and Client Support - Tencent Cloud has established nine technical support centers globally and plans to build new availability zones in Osaka, Japan, and Saudi Arabia, enhancing its international presence [1][14] - The company successfully migrated a large-scale project for GoTo, Indonesia's largest tech group, completing over 500 customized requirements and establishing a third availability zone in just five months [14] - Tencent Cloud's services have been recognized as a leader in the global gaming cloud platform market, providing robust infrastructure to support over 10,000 games and ensuring low-latency experiences for players worldwide [10][11] Group 3: Advanced Technologies and Services - The Cloud Mate service, which consists of various sub-agents, enhances cloud governance and risk management, achieving a 95% interception rate for risky SQL queries [8][9] - The integration of AI with database optimization has resulted in an 80% reduction in total latency for complex queries, showcasing Tencent Cloud's commitment to improving performance [9][10] - The EdgeOne product, which combines AI and security acceleration, has facilitated over 100,000 users in deploying e-commerce web pages quickly and efficiently [11][12]
劲爆!高盛上调寒武纪目标价1835元,“寒王”市值超五粮液股价超茅台?85后创始人陈天石身价超1500亿,大佬章建平火了!
Sou Hu Cai Jing· 2025-08-25 02:37
Core Viewpoint - The A-share chip sector has become the strongest market direction due to the dual benefits of Nvidia's suspension of H20 chip production and the launch of DeepSeek, with leading company Cambricon Technologies experiencing significant stock price increases [1][5]. Group 1: Stock Performance - Cambricon's stock price surged to 1243 yuan, making it the second highest in A-shares after Kweichow Moutai, with a market capitalization exceeding 520 billion yuan [1]. - Since July 11, Cambricon's stock has increased by 137%, and year-to-date, it has risen from under 50 yuan to 1243 yuan, representing a maximum increase of over 25 times [3]. - Goldman Sachs raised Cambricon's target price by 50% to 1835 yuan, which could push its market capitalization close to 770 billion yuan if achieved [3]. Group 2: Market Dynamics - The market speculation around Cambricon is driven by three core logic points: accelerated domestic substitution due to geopolitical factors, explosive demand for large model inference driven by local developments like DeepSeek, and Cambricon's leading industry position as "China's Nvidia" [5]. - The recent news of Nvidia's suspension of H20 chip production has further catalyzed the surge in demand for domestic AI chips, with the official release of DeepSeek-V3.1 adapting to domestic chip architectures [5]. Group 3: Company Insights - Cambricon's founder, Chen Tianshi, born in 1985, has seen his wealth increase significantly with the stock price surge, holding 29.63% of the company, which is now valued at 154.1 billion yuan [5]. - Notably, prominent investor Zhang Jianping has increased his stake in Cambricon to 608.63 million shares, making him the seventh largest shareholder with a market value of 7.566 billion yuan, realizing a floating profit of over 4 billion yuan [6].
"六边形战士"GPU公司完成亿元新融资
是说芯语· 2025-08-24 01:39
Core Viewpoint - The article highlights the recent developments of Zhuhai Chip Power Technology Co., Ltd. (Chip Power Technology), including its successful B2 financing round and advancements in AI chip technology, particularly the RPP architecture, which is designed for parallel computing and has been adapted for mainstream open-source large models [2][4][6]. Group 1: Financing and Growth - Chip Power Technology completed nearly 100 million yuan in B2 financing led by Feitu Venture Capital, with funds aimed at advancing RPP chip industrialization, core technology upgrades, and expanding into edge computing and AI chip inference markets [2]. - The company previously secured several million yuan in B1 financing in March, led by Changshi Capital, indicating a strong interest from investors in its technology and market potential [2]. Group 2: Technology and Product Development - After eight years of continuous R&D and product iteration, Chip Power Technology has established a comprehensive AI computing product matrix [3]. - The core technology, RPP (Reconfigurable Parallel Processor) architecture, is designed specifically for parallel computing, offering high energy efficiency and compatibility with CUDA programming language, facilitating rapid deployment of edge AI applications [4]. - The RPP-R8 chip, based on the RPP architecture, has been commercialized in various fields such as AI PCs, medical testing, and storage servers, and has formed deep partnerships with leading companies like Lenovo [6]. Group 3: Product Specifications - The RPP-R8 AE7100E chip is noted for being the smallest and thinnest GPGPU in the industry, with a power consumption of under 10W, making it suitable for terminal and edge computing devices [6]. - The chip measures 17mm x 17mm, and the integrated M.2 acceleration card is comparable in size to half a business card, featuring up to 32 TOPS of computing power and 60GB/s memory bandwidth [6]. - The M.2 acceleration card supports major open-source models such as Qwen, Llama, and Stable Diffusion, demonstrating its versatility in AI applications [6]. Group 4: Future Directions - Following the recent financing, Chip Power Technology plans to focus on developing high-end general-purpose chips with proprietary rights in China [7].
寒武纪涨停总市值超5200亿!即将超越贵州茅台成为新股王?
Sou Hu Cai Jing· 2025-08-22 07:00
Core Viewpoint - The stock price of Cambrian-U surged by 20% to 1243.20 yuan, with a market capitalization exceeding 520 billion yuan, indicating strong market interest in the AI chip sector [1][2] Group 1: Stock Performance - Cambrian-U's stock has increased by 107.12% since July 25, reaching a new high of 1243.20 yuan, reflecting strong market expectations for AI chips [1] - The trading volume reached 16.09 billion yuan, showcasing significant investor activity [1] Group 2: Market Drivers - The acceleration of domestic substitution is a key driver, as geopolitical factors have increased demand for self-controlled AI chips among domestic cloud and internet companies [1] - The explosive demand for large model inference, driven by local models like DeepSeek, is boosting the need for high-performance AI inference chips [1] - Cambrian's technical capabilities in AI chip architecture design and hardware-software optimization are gaining recognition, leading to its nickname as "China's Nvidia" [1] Group 3: Regulatory and Operational Updates - Cambrian's application for a specific A-share issuance for 2025 has been approved by the Shanghai Stock Exchange and is pending registration with the China Securities Regulatory Commission [2] - The company clarified that rumors regarding substrate orders, revenue forecasts, and new product information are false, emphasizing that there are no undisclosed significant matters and that operations are normal [2] Group 4: Industry Outlook - Cambrian is positioned to benefit from the dual drivers of domestic substitution and increased demand for large models, suggesting a favorable industry outlook [2]
DeepSeek引爆国产AI芯片:寒武纪、华胜天成、和而泰三大龙头热度爆棚,5000亿“寒王”市值超五粮液
Jin Rong Jie· 2025-08-22 06:50
Group 1: DeepSeek-V3.1 Release - DeepSeek-V3.1 has been officially released, generating significant attention and boosting market confidence in domestic large models [1] - The upgrade includes three main changes: hybrid thinking mode, higher thinking efficiency, and stronger agent capabilities [1] Group 2: Domestic Chip Market - DeepSeek's announcement hinted at the upcoming next-generation domestic chip, UE8M0 FP8, which has sparked speculation in the market and increased interest in domestic chips [2] - The current domestic AI chip sector is at a critical window for technological breakthroughs and ecosystem implementation, with UE8M0 FP8 aiding in accelerating the domestic chip industry's progress [2] Group 3: Cambricon Technologies - Cambricon Technologies, a leading domestic AI chip company, saw its market value exceed 500 billion yuan, with stock prices reaching a peak of 1240.00 yuan, reflecting a more than 19% increase [3] - Since September 2022, Cambricon's stock has risen over 460%, with a doubling of its price in just one month, indicating strong market interest in AI chip leaders [4] Group 4: Huawei Ecosystem and Huasheng Tiancheng - Huasheng Tiancheng has gained attention as a key player in the domestic market aiming to replace Nvidia, with its stock price increasing by over 130% in the past month [6] - The company is involved in AI computing power and is a partner in Huawei's Ascend chip ecosystem, with significant investments in AI computing centers [6] Group 5: Heheta and Moer Thread - Heheta's stock surged by 300% since September 2022, driven by its stake in Moer Thread, the first domestic GPU manufacturer supporting native FP8 [8] - Moer Thread's IPO process has activated market interest in domestic GPU alternatives, while Heheta also leads in smart controllers across various sectors [8]