AI大模型推理
Search documents
DeepSeek倒逼vLLM升级,芯片内卷、MoE横扫千模,vLLM核心维护者独家回应:如何凭PyTorch坐稳推理“铁王座”
3 6 Ke· 2025-12-15 00:36
Core Insights - vLLM has rapidly become a preferred inference engine for global tech companies, with GitHub stars increasing from 40,000 to 65,000 in just over a year, driven by the open-source PagedAttention technology [1] - Neural Magic played a crucial role in vLLM's success, utilizing a "free platform + open-source tools" strategy to build a robust enterprise-level inference stack and maintain a library of pre-optimized models [1] - Red Hat's acquisition of Neural Magic in November 2024, including key team members like Michael Goin, is expected to enhance vLLM's competitive edge in the AI large model sector [1][2] Development and Optimization - The vLLM core team, led by Michael Goin, has shifted focus from optimizing Llama models to enhancing features related to the DeepSeek model, particularly with the release of DeepSeek R1 [3] - The development cycle for version 0.7.2 was tight, efficiently supporting Qwen 2.5 VL and introducing a Transformers backend for running Hugging Face models [3] - Version 0.7.3 marked a significant update with numerous contributors involved, enhancing DeepSeek with multi-token prediction and MLA attention optimizations, as well as expanding support for AMD hardware [4] Hardware Compatibility and Ecosystem - The vLLM team is committed to building an open and efficient hardware inference ecosystem, supporting various mainstream chips and collaborating closely with hardware teams like NVIDIA and AMD [8] - The integration of PyTorch as a foundational layer allows vLLM to support a wide range of hardware, simplifying the adaptation process for hardware vendors [10][11] - The team's collaboration with hardware partners ensures that vLLM can maintain high performance across different platforms, with a focus on optimizing the architecture for new hardware like the Blackwell chip [8][9] Multi-Modal Capabilities - vLLM has evolved from a text-only inference engine to a unified service platform supporting multi-modal generation and understanding, including text, images, audio, and video [17][19] - The introduction of multi-modal prefix caching significantly improves efficiency in processing various input types, while the decoupling of encoders enhances resource utilization for large-scale inference [18][19] - The release of vLLM-Omni marks a milestone in multi-modal inference, allowing for seamless integration and resource allocation across different modalities [19][21] Community and Feedback Loop - The growing trend of companies contributing modifications back to the upstream vLLM project reflects a positive feedback loop driven by the speed of community version iterations [22][23] - Collaboration with leading model labs and companies enables rapid feedback collection, ensuring that vLLM remains competitive and aligned with industry developments [23][24] - The vLLM team is actively addressing developer concerns, such as startup speed, by implementing tracking projects and optimizing performance through community engagement [24][25] Strategic Positioning - Red Hat's deep involvement in vLLM is rooted in the strategic understanding that inference is a critical component of AI application costs, aiming to integrate cutting-edge model optimizations [26][27] - The governance structure of vLLM is decentralized, with contributions from multiple organizations, allowing Red Hat to influence the project while adhering to open-source principles [26][27] - The collaboration with the PyTorch team has led to significant improvements in supporting new hardware and models, reinforcing vLLM's position as a standard in inference services [27]
国产 ASIC:PD 分离和超节点:ASIC 系列研究之四
Shenwan Hongyuan Securities· 2025-09-26 13:28
Investment Rating - The report indicates a positive investment outlook for the ASIC industry, highlighting significant growth potential driven by increasing demand for AI applications and specialized chip designs [2]. Core Insights - The report emphasizes the distinct business models of ASIC and GPU, noting that ASICs are specialized chips tightly coupled with specific downstream applications, while GPUs are general-purpose chips [3][10]. - ASICs demonstrate superior cost-effectiveness and efficiency, with notable examples such as Google's TPU v5 achieving 1.46 times the energy efficiency of NVIDIA's H200, and Amazon's Trainium2 reducing training costs by 40% compared to GPU solutions [3][15]. - The report forecasts that the global AI ASIC market could reach $125 billion by 2028, with significant contributions from major players like Broadcom and Marvell [30]. Summary by Sections 1. AI Model Inference Driving ASIC Demand - The global AI chip market is projected to reach $500 billion by 2028-2030, with AI infrastructure spending expected to hit $3-4 trillion by 2030 [8]. - ASICs are recognized for their strong specialization, offering cost and efficiency advantages over GPUs, particularly in AI applications [9][14]. 2. High Complexity of ASIC Design and Value of Service Providers - ASIC design involves complex processes requiring specialized service providers, with Broadcom and Marvell being the leading companies in this space [41][42]. - The report highlights the importance of design service providers in optimizing performance and reducing time-to-market for ASIC products [55][60]. 3. Domestic Developments: Not Just Following Trends - Domestic cloud giants like Alibaba and Baidu have made significant strides in ASIC self-research, establishing independent ecosystems rather than merely following international trends [4][30]. - The report identifies key domestic design service providers such as Chipone, Aojie Technology, and Zhaoxin, which are well-positioned to benefit from the growing demand for ASICs [41]. 4. Key Trends in Domestic ASIC Development - The report identifies PD separation and supernode architectures as two core trends in domestic ASIC development, with companies like Huawei and Haiguang leading the way [4][30]. - These trends reflect a shift towards more flexible and efficient chip designs that cater to diverse industry needs [4]. 5. Valuation of Key Companies - The report includes a valuation table for key companies in the ASIC sector, indicating strong growth prospects and market positioning for firms like Broadcom and Marvell [5].
旋极信息:浙江曲速新产品TGU01芯片主要用于AI大模型推理场景
Zheng Quan Ri Bao· 2025-09-04 09:45
Group 1 - The core viewpoint of the article is that Xuanji Information has developed a new product, the TGU01 chip, which is primarily used for AI large model inference scenarios and is already compatible with DeepSeek software [2] Group 2 - The TGU01 chip is specifically designed for applications in artificial intelligence, indicating a strategic focus on AI technology within the company [2] - The compatibility with DeepSeek software suggests potential partnerships or integrations that could enhance the chip's marketability and functionality [2]
旋极信息(300324.SZ):浙江曲速新产品TGU01芯片主要用于AI大模型推理场景,目前已经适配deepseek软件
Ge Long Hui· 2025-09-04 04:01
Group 1 - The company, Xuanji Information (300324.SZ), confirmed that its invested company, Zhejiang Qusu, has adapted its new TGU01 chip for the latest version of the DeepSeek software [1] - The TGU01 chip is primarily designed for AI large model inference scenarios [1]
英伟达:FY25Q4业绩点评:FY25Q4业绩超预期,Blackwell需求强劲,推理计算需求高速增长-20250228
EBSCN· 2025-02-28 00:22
Investment Rating - The report maintains a "Buy" rating for NVIDIA, indicating an expected investment return exceeding the market benchmark by more than 15% over the next 6-12 months [6][15]. Core Insights - NVIDIA's FY25Q4 performance exceeded market expectations with revenue of $39.33 billion, a year-over-year increase of 78% and a quarter-over-quarter increase of 12% [1][2]. - The data center business is a significant growth driver, with FY25 revenue reaching $115.2 billion, up 142% year-over-year, and accounting for 90.6% of total revenue in Q4 [2][4]. - The demand for AI large model inference is accelerating, with the upcoming Blackwell Ultra expected to launch in the second half of 2025 [3][4]. Summary by Sections Financial Performance - FY25Q4 revenue was $39.33 billion, surpassing Bloomberg's consensus estimate of $38.25 billion, with a Non-GAAP gross margin of 73.5% [1]. - For FY25, total revenue reached $130.5 billion, a 114% increase year-over-year, exceeding the consensus estimate of $129.6 billion [1][5]. - Non-GAAP net profit for FY25 was $74.26 billion, a 130% increase year-over-year, with an EPS of $2.99, also above expectations [1][5]. Business Segments - Data Center: FY25 revenue was $115.2 billion, with Q4 revenue of $35.6 billion, reflecting a 93% year-over-year increase [2]. - Gaming: FY25 revenue was $11.4 billion, with Q4 revenue of $2.5 billion, showing a decline due to supply chain constraints [2]. - Automotive: FY25 revenue reached $1.7 billion, with Q4 revenue of $600 million, marking a 103% year-over-year increase [2]. Future Guidance - For FY26Q1, NVIDIA expects revenue of $43 billion, a 65% year-over-year increase, and a Non-GAAP gross margin of 71% [1][4]. - The company anticipates continued strong demand for the Blackwell platform and AI large model inference, projecting significant revenue growth through FY2026-2028 [4][5].