异构计算架构 - filings, earnings calls, financial reports, news

异构计算架构

Search documents

海光信息20250912

2025-09-15 01:49

海光信息 20250912 摘要中国服务器 CPU 市场规模巨大，年化约 1,000 亿人民币，信创与非信创市场各占 50%。海光信息原预期在服务器 CPU 领域实现 300-400 亿收入，净利润 75-90 亿，但其业务已扩展至工作站 CPU、PC、工控机器人等领域，潜在市场空间新增近 1,000 亿。 AI 技术发展推动异构计算架构，若国内采用类似英伟达的 GPU 与 CPU 配比（2:1），国内智算中心 AI CPU 市场空间可达 1,400 亿人民币，相当于再造一个传统服务器 CPU 市场，海光信息在智算中心和新兴端侧场景潜力巨大，估值有望提升。海光信息 DCO 业务拥有 60 亿元存货，供应链保障充足，主力产品为海光 3 号和海光 4 号，有助于稳定并提升其 DCO 业务发展前景。市场对海光信息全精度加速卡的预期差体现在智算中心需求、互联网订单及单卡和集群性能三方面。未来中国政府侧智算中心需求量将远超当前水平，单体规模也将提升。海光信息在互联网领域进展加速，预计 2025 年将在 T 客户和 A 客户等取得显著进展。单卡性能预计对标英伟达特供款产品，集群架构下性能优势明 ...

Hygon Information Technology (SH:688041)

算力需求井喷，英特尔至强6如何当好胜负手？

半导体芯闻· 2025-06-27 10:21

Core Viewpoint - The article discusses the transformation of AI infrastructure, emphasizing the need for a heterogeneous computing architecture that integrates both CPU and GPU resources to meet the demands of large AI models and their applications [2][4][7]. Group 1: AI Infrastructure Transformation - AI large models are reshaping the computing landscape, requiring organizations to rethink their AI infrastructure beyond just adding more GPUs [2]. - The value of CPUs, long underestimated, is returning as they play a crucial role alongside GPUs in AI workloads [3][4]. - A complete AI business architecture necessitates the simultaneous upgrade of both CPU and GPU resources to fulfill end-to-end AI business needs [5][7]. Group 2: Challenges and Solutions - The rapid iteration of large language models presents four main challenges for processors: low GPU computing efficiency, low CPU utilization, increased data movement bandwidth requirements, and GPU memory capacity limitations [5]. - Intel has developed various heterogeneous solutions to address these challenges, including: - Utilizing CPUs in the training and inference pipeline to reduce GPU dependency, improving overall training cost-effectiveness by approximately 10% [6]. - Optimizing lightweight models with the Xeon 6 processor to enhance responsiveness and free up GPU resources for primary models [6]. - Implementing QAT hardware acceleration for KV Cache compression, significantly reducing loading delays and improving user response times [6]. - Employing a sparse-aware MoE CPU offloading strategy to alleviate memory bottlenecks, resulting in a 2.45 times increase in overall throughput [7]. Group 3: Intel's Xeon 6 Processor - Intel's Xeon 6 processor, launched in 2024, represents a comprehensive solution to the evolving demands of data centers, featuring a modular design that decouples I/O and compute modules [9][10]. - The Xeon 6 processor achieves significant performance improvements, with up to 288 physical cores and a 2.3 times increase in overall memory bandwidth compared to the previous generation [12]. - It supports advanced I/O capabilities, including a 1.2 times increase in PCIe bandwidth and the first support for CXL 2.0 protocol, enhancing memory expansion and sharing [13]. Group 4: Cloud and Local Deployment Strategies - The trend of enterprises seeking "local controllable, performance usable, and cost acceptable" AI platforms is emerging, particularly in sectors like finance and healthcare [24]. - Intel's high-cost performance integrated machine aims to bridge the gap for local deployment of large models, offering flexible architectures for businesses [25][26]. - The integrated machine solution includes monitoring systems and software frameworks that facilitate seamless migration of existing models to Intel's platform, ensuring cost-effectiveness and maintainability [28][29]. Group 5: Collaborative AI Ecosystem - The collaboration between Intel and ecosystem partners is crucial for redefining the production, scheduling, and utilization of computing power, promoting a "chip-cloud collaboration" model [17][30]. - The introduction of the fourth-generation ECS instances by Volcano Engine, powered by Intel's Xeon 6 processors, showcases the enhanced performance capabilities in various computing scenarios [18][20].

14.9万元，满血流畅运行DeepSeek一体机抱回家！清华90后初创出品

量子位· 2025-04-29 04:18

金磊发自凹非寺量子位 | 公众号 QbitAI 满血DeepSeek一体机，价格竟然被打到 10万元级别了！而且还不是量化版本，正是那个671B参数、最高质量的FP8原版。 △ 左：一体机；右：DeepSeek官网从视频中不难看出，不仅答案精准，一体机的速度也是肉眼可见地比DeepSeek官网快上一些，粗略估计是已经接近了 22 tokens/s 。那么这个一体机到底是什么来头？或许有小伙伴要问了，那跑DeepSeek-R1/V3的速度，能跟官方一较高下吗？可以的，甚至是更快的那种。例如我们提个问题，来感受一下这个feel：一个汉字具有左右结构，左边是木，右边是乞。这个字是什么？只需回答这个字即可。不卖关子，它就是由北京行云集成电路最新推出的产品—— 褐蚁HY90 ，具体价格定到了 14.9万元。而且除了产品，这家公司本身也是有不少的"标签"在身上的，其中最为吸睛或许当属CEO了：季宇，清华90后博士、前华为"天才少年"、计算机学会CCF优博奖获得者。那么褐蚁HY90具体执行起更多任务时，又会是什么样的效果？来，更多维度的一波实测走起。实测10万元级的Deep ...