Workflow
Mooncake
icon
Search documents
InferenceX v2:NVIDIA Blackwell 对阵 AMD 对阵 Hopper —— 原名 InferenceMAX --- InferenceX v2_ NVIDIA Blackwell Vs AMD vs Hopper - Formerly InferenceMAX
2026-02-24 14:19
Summary of InferenceX v2: NVIDIA Blackwell vs AMD vs Hopper Industry and Company Involved - The document discusses the competitive landscape of AI inference performance, focusing on NVIDIA's Blackwell architecture and AMD's offerings, particularly in the context of inference benchmarks and optimizations. Core Points and Arguments - **InferenceX v2 Overview**: InferenceX v2 builds on InferenceMAXv1, establishing a new standard for AI inference performance and economics through continuous testing across numerous GPUs and frameworks [3][4][7] - **Benchmarking Capabilities**: InferenceX v2 is the first suite to benchmark NVIDIA's Blackwell Ultra GB300 NVL72 and B300, as well as AMD's MI355X, across the entire Pareto frontier curve [9][10] - **Performance Comparison**: - AMD's MI355X shows competitive performance per total cost of ownership (TCO) against NVIDIA's B200 in FP8 precision using disaggregated and wide expert parallelism [21][23] - However, NVIDIA's solutions, particularly the B200 and B300, maintain a significant performance lead over AMD's offerings in many scenarios [28][34] - **Energy Efficiency**: NVIDIA GPUs demonstrate superior energy efficiency, consuming significantly fewer picoJoules per token across all workloads compared to AMD [28] - **Composability Issues**: AMD's inference optimizations struggle with composability, where individual optimizations perform well in isolation but fail to deliver competitive results when combined [29][30][31] - **Future Focus for AMD**: AMD is advised to enhance the composability of its inference optimizations and is reportedly planning to focus on software composability of FP4 and distributed inferencing after the Chinese New Year [31][33][70] Additional Important Content - **Performance Improvements**: AMD has made notable improvements in SGLang DeepSeek R1 FP4 configurations, nearly doubling throughput in under two months [66][67] - **NVIDIA's Consistency**: NVIDIA's performance results have been more stable, with minor improvements noted for the B200 SGLang over a similar timeframe [73] - **Market Dynamics**: The document highlights the competitive dynamics between NVIDIA and AMD, emphasizing the need for AMD to increase contributions to open-source projects and improve its software stack to remain competitive [70][42] - **Technical Concepts**: The document explains key technical concepts such as disaggregated prefill, tensor parallelism, and the trade-offs between interactivity and throughput in LLM inference [49][57][61] This summary encapsulates the critical insights and data points from the InferenceX v2 report, providing a comprehensive overview of the competitive landscape in AI inference technology.
来这场沙龙,一览SGLang X 超长上下文扩展、RL后训练框架、扩散语言模型等前沿技术实践
机器之心· 2026-01-29 08:12
Core Insights - The article discusses the transition of artificial intelligence from a "chat" paradigm to an "actionable" intelligent agent era, emphasizing the need for deep collaboration and experience sharing among developers in optimizing LLM systems [2] Event Overview - A Meetup organized by SGLang community, Machine Heart, and Zhangjiang Incubator will take place on February 6, focusing on LLM system optimization and practical implementation [2] - The event will feature discussions on SGLang's technical roadmap, long-context expansion, RL post-training frameworks, and diffusion language model exploration [2] Event Schedule - The event schedule includes: - 13:30-14:00: Registration - 14:00-14:30: Keynote on SGLang roadmap by Zhang Bozhou, core developer of SGLang [5] - 14:30-15:00: Keynote on Omni-infer performance optimization by Zheng Jinhwan, core developer of Omni-infer [5] - 15:00-15:30: Keynote on slime RL scaling post-training framework by Xie Chengxing, Tsinghua University PhD student [5] - 15:30-16:00: Keynote on SGLang CPP for long-context scaling by Cai Shangming, core developer of SGLang and Mooncake [5] Guest Introductions - Zhang Bozhou: Core developer of SGLang, focusing on open-source LLM support and optimization across different CUDA hardware [8] - Zheng Jinhwan: Huawei technical expert and core contributor to Omni-infer, specializing in high-performance systems and inference optimization [9] - Xie Chengxing: PhD student at Tsinghua University and core developer of the slime RL framework, with a focus on enhancing LLM reasoning and decision-making capabilities [10] - Cai Shangming: Researcher at Alibaba Cloud, core contributor to SGLang and Mooncake, with expertise in high-performance inference systems and distributed machine learning [10] - Li Zehuan: System engineer at Ant Group and core contributor to SGLang, focusing on AI infrastructure optimization [11]
基于 SGlang RBG + Mooncake 打造生产级云原生大模型推理平台
AI前线· 2025-12-12 00:40
Core Insights - The article emphasizes the rapid evolution of large language model (LLM) inference services into core enterprise infrastructure, focusing on the balance of performance, stability, and cost in building high-performance inference systems [2] - It discusses the transition from monolithic to distributed architectures in LLM inference, highlighting the need for external KVCache to alleviate memory pressure and enhance performance in high-demand scenarios [2][4] Distributed KVCache and Mooncake - Mooncake is introduced as a leading distributed KVCache storage engine designed to provide high throughput and low latency for inference frameworks like SGLang [3] - The article outlines the challenges in managing distributed KVCache systems in production environments, which necessitate the development of RoleBasedGroup (RBG) for unified management of caching and inference nodes [4] RoleBasedGroup (RBG) Design and Challenges - RBG is presented as a Kubernetes-native API aimed at AI inference, facilitating multi-role orchestration to ensure stable and high-performance operations [4][12] - The article identifies five fundamental challenges in deploying large model inference services, including the need for strong state management and performance optimization [12][15] SCOPE Framework - The SCOPE framework is introduced, focusing on five core capabilities: Stability, Coordination, Orchestration, Performance, and Extensibility, which are essential for managing LLM inference services [16][18] - RBG's design allows for rapid architecture iteration and performance-sensitive operations, addressing the complexities of multi-role dependencies and operational efficiency [15][24] Benchmark Testing and Performance Metrics - Benchmark tests demonstrate significant improvements in KVCache hit rates and inference performance, with L3 Mooncake cache achieving a 64.67% hit rate and reducing average TTFT to 2.58 seconds [32][48] - The article highlights the importance of a multi-tier caching architecture in enhancing performance for applications like multi-turn dialogue and AI agents [44] Conclusion and Future Outlook - The integration of RBG and Mooncake is positioned as a transformative approach to building production-grade LLM inference services, emphasizing the need for deep integration of high-performance design with cloud-native operational capabilities [43][44] - The article concludes with a call for community collaboration to advance this paradigm and lay the foundation for the next generation of AI infrastructure [43]
2025新一代计算产业大会召开 聚焦算力标准与技术创新
Zhong Guo Xin Wen Wang· 2025-09-17 08:59
Core Insights - The 2025 New Generation Computing Industry Conference was held in Beijing, focusing on the standardization of computing power and technological innovation paths [1][3] - Key discussions included the entire process of AI large model data acquisition, preprocessing, training, fine-tuning, and inference, emphasizing the use of open-source foundational models for application value [3] Group 1: Standardization and Innovation - The conference highlighted the need for high-level planning, collaboration, and quality application in the construction of new generation computing standards [3] - The establishment of working groups for GPU, DPU, computing product components, liquid cooling ecosystems, and heterogeneous computing was announced, along with the initiation of two national standards for server power supplies [4] Group 2: Technical Challenges and Solutions - The DPU was identified as a core chip for computing power, capable of handling data processing and network forwarding tasks to enhance CPU and GPU efficiency, but the lack of unified technical standards hinders large-scale application [3] - Two core technologies were introduced to address memory challenges in inference: Mooncake, which reduces memory consumption through shared public storage, and KTransformers, which enables CPU and GPU memory collaboration [3]
想要产品显得“贵气”,搭配就不能基础 | 烘焙“高级感”搭配指南
东京烘焙职业人· 2025-08-26 08:39
Core Insights - The article emphasizes the importance of balancing basic and extravagant elements in baking products to create high-value offerings that resonate with consumers [1][42] - The perception of ingredients plays a crucial role in determining product pricing and consumer appeal, particularly among Gen Z and new middle-class consumers [2][4] Ingredient Trends - Quality of ingredients is the primary language of product premiumization, with consumers increasingly valuing ingredient transparency [2] - Popular ingredient trends on platforms like Xiaohongshu and Douyin include contrasting flavors and regional specialties, such as mint chocolate and Yunnan mushrooms [5][4] Pricing Strategies - The article discusses the pricing differences within the same category of products, highlighting that a well-curated selection of ingredients can significantly enhance perceived value [9][11] - Seasonal ingredients are noted to evoke emotional connections, with specific keywords associated with each season influencing consumer choices [13][18] Consumer Experience - The article suggests that visual appeal is no longer sufficient; products must offer multi-sensory experiences to justify premium pricing [19] - The concept of "surprise fillings" and layered textures in products can create memorable experiences that drive repeat purchases [20][23] Marketing and Storytelling - The ultimate competitive edge in baking products lies in the storytelling aspect, where consumers seek not just food but an experience and lifestyle [29][30] - Limited editions and seasonal offerings serve as emotional leverage for consumers, enhancing the perceived value of products [31] Social Media and Branding - Products that are visually appealing and suitable for social media sharing tend to perform better in terms of consumer engagement and sales [38][39] - The article highlights the importance of creating a narrative around products to enhance their marketability and consumer interest [34][42]
促开放协作与跨界融合 2025CCF中国开源大会在上海召开
Zhong Guo Xin Wen Wang· 2025-08-02 13:15
Core Insights - The 2025 CCF China Open Source Conference opened in Shanghai, focusing on key directions such as open-source large models and embodied intelligence [1][3] - Experts from academia and industry shared forward-looking views on critical technology areas including large models, open-source hardware, and intelligent operating systems [3] Group 1: Key Developments - The conference featured the introduction of efficient inference systems Mooncake and KTransformers developed by a team led by Zheng Weimin, showcasing their core role in supporting workloads in the intelligent era [3] - Academician E Wei Nan emphasized the paradigm shift in AI from a "model-centric" to a "data-centric" approach, highlighting the need for high-quality data infrastructure to lower the barriers for AI implementation [3] Group 2: Community and Ecosystem Initiatives - The CCF Ubiquitous Operating System Open Community was established with participation from top universities and research institutions, focusing on technology research, project incubation, standard development, application promotion, and talent cultivation [4] - A series of strategic initiatives were launched, including the establishment of the CCF-Mulan Innovation Open Source Incubator and the Omni-Infer Cloud Co-Creation Plan [3][4] Group 3: Educational and Collaborative Efforts - Shanghai Jiao Tong University aims to integrate open-source concepts into its curriculum, fostering talent for next-generation operating systems [5] - The collaboration model between Shanghai Jiao Tong University and Huawei emphasizes shared goals and resources to support core technology breakthroughs [5]