异构计算架构

Search documents
算力需求井喷,英特尔至强6如何当好胜负手?
半导体芯闻· 2025-06-27 10:21
Core Viewpoint - The article discusses the transformation of AI infrastructure, emphasizing the need for a heterogeneous computing architecture that integrates both CPU and GPU resources to meet the demands of large AI models and their applications [2][4][7]. Group 1: AI Infrastructure Transformation - AI large models are reshaping the computing landscape, requiring organizations to rethink their AI infrastructure beyond just adding more GPUs [2]. - The value of CPUs, long underestimated, is returning as they play a crucial role alongside GPUs in AI workloads [3][4]. - A complete AI business architecture necessitates the simultaneous upgrade of both CPU and GPU resources to fulfill end-to-end AI business needs [5][7]. Group 2: Challenges and Solutions - The rapid iteration of large language models presents four main challenges for processors: low GPU computing efficiency, low CPU utilization, increased data movement bandwidth requirements, and GPU memory capacity limitations [5]. - Intel has developed various heterogeneous solutions to address these challenges, including: - Utilizing CPUs in the training and inference pipeline to reduce GPU dependency, improving overall training cost-effectiveness by approximately 10% [6]. - Optimizing lightweight models with the Xeon 6 processor to enhance responsiveness and free up GPU resources for primary models [6]. - Implementing QAT hardware acceleration for KV Cache compression, significantly reducing loading delays and improving user response times [6]. - Employing a sparse-aware MoE CPU offloading strategy to alleviate memory bottlenecks, resulting in a 2.45 times increase in overall throughput [7]. Group 3: Intel's Xeon 6 Processor - Intel's Xeon 6 processor, launched in 2024, represents a comprehensive solution to the evolving demands of data centers, featuring a modular design that decouples I/O and compute modules [9][10]. - The Xeon 6 processor achieves significant performance improvements, with up to 288 physical cores and a 2.3 times increase in overall memory bandwidth compared to the previous generation [12]. - It supports advanced I/O capabilities, including a 1.2 times increase in PCIe bandwidth and the first support for CXL 2.0 protocol, enhancing memory expansion and sharing [13]. Group 4: Cloud and Local Deployment Strategies - The trend of enterprises seeking "local controllable, performance usable, and cost acceptable" AI platforms is emerging, particularly in sectors like finance and healthcare [24]. - Intel's high-cost performance integrated machine aims to bridge the gap for local deployment of large models, offering flexible architectures for businesses [25][26]. - The integrated machine solution includes monitoring systems and software frameworks that facilitate seamless migration of existing models to Intel's platform, ensuring cost-effectiveness and maintainability [28][29]. Group 5: Collaborative AI Ecosystem - The collaboration between Intel and ecosystem partners is crucial for redefining the production, scheduling, and utilization of computing power, promoting a "chip-cloud collaboration" model [17][30]. - The introduction of the fourth-generation ECS instances by Volcano Engine, powered by Intel's Xeon 6 processors, showcases the enhanced performance capabilities in various computing scenarios [18][20].
14.9万元,满血流畅运行DeepSeek一体机抱回家!清华90后初创出品
量子位· 2025-04-29 04:18
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 满血DeepSeek一体机 ,价格竟然被打到 10万元 级别了! 而且还不是量化版本,正是那个671B参数、最高质量的FP8原版。 △ 左:一体机;右:DeepSeek官网 从视频中不难看出,不仅答案精准,一体机的速度也是肉眼可见地比DeepSeek官网快上一些,粗略估计是已经接近了 22 tokens/s 。 那么这个一体机到底是什么来头? 或许有小伙伴要问了,那跑DeepSeek-R1/V3的 速度 ,能跟官方一较高下吗? 可以的,甚至是 更快 的那种。例如我们提个问题,来感受一下这个feel: 一个汉字具有左右结构,左边是木,右边是乞。这个字是什么?只需回答这个字即可。 不卖关子,它就是由北京 行云集成电路 最新推出的产品—— 褐蚁HY90 ,具体价格定到了 14.9万元 。 而且除了产品,这家公司本身也是有不少的"标签"在身上的,其中最为吸睛或许当属CEO了: 季宇 ,清华90后博士、前华为"天才少年"、计算机学会CCF优博奖获得者。 那么褐蚁HY90具体执行起更多任务时,又会是什么样的效果? 来,更多维度的一波实测走起。 实测10万元级的Deep ...