异构计算

Search documents
算力需求井喷,英特尔至强6如何当好胜负手?
半导体芯闻· 2025-06-27 10:21
Core Viewpoint - The article discusses the transformation of AI infrastructure, emphasizing the need for a heterogeneous computing architecture that integrates both CPU and GPU resources to meet the demands of large AI models and their applications [2][4][7]. Group 1: AI Infrastructure Transformation - AI large models are reshaping the computing landscape, requiring organizations to rethink their AI infrastructure beyond just adding more GPUs [2]. - The value of CPUs, long underestimated, is returning as they play a crucial role alongside GPUs in AI workloads [3][4]. - A complete AI business architecture necessitates the simultaneous upgrade of both CPU and GPU resources to fulfill end-to-end AI business needs [5][7]. Group 2: Challenges and Solutions - The rapid iteration of large language models presents four main challenges for processors: low GPU computing efficiency, low CPU utilization, increased data movement bandwidth requirements, and GPU memory capacity limitations [5]. - Intel has developed various heterogeneous solutions to address these challenges, including: - Utilizing CPUs in the training and inference pipeline to reduce GPU dependency, improving overall training cost-effectiveness by approximately 10% [6]. - Optimizing lightweight models with the Xeon 6 processor to enhance responsiveness and free up GPU resources for primary models [6]. - Implementing QAT hardware acceleration for KV Cache compression, significantly reducing loading delays and improving user response times [6]. - Employing a sparse-aware MoE CPU offloading strategy to alleviate memory bottlenecks, resulting in a 2.45 times increase in overall throughput [7]. Group 3: Intel's Xeon 6 Processor - Intel's Xeon 6 processor, launched in 2024, represents a comprehensive solution to the evolving demands of data centers, featuring a modular design that decouples I/O and compute modules [9][10]. - The Xeon 6 processor achieves significant performance improvements, with up to 288 physical cores and a 2.3 times increase in overall memory bandwidth compared to the previous generation [12]. - It supports advanced I/O capabilities, including a 1.2 times increase in PCIe bandwidth and the first support for CXL 2.0 protocol, enhancing memory expansion and sharing [13]. Group 4: Cloud and Local Deployment Strategies - The trend of enterprises seeking "local controllable, performance usable, and cost acceptable" AI platforms is emerging, particularly in sectors like finance and healthcare [24]. - Intel's high-cost performance integrated machine aims to bridge the gap for local deployment of large models, offering flexible architectures for businesses [25][26]. - The integrated machine solution includes monitoring systems and software frameworks that facilitate seamless migration of existing models to Intel's platform, ensuring cost-effectiveness and maintainability [28][29]. Group 5: Collaborative AI Ecosystem - The collaboration between Intel and ecosystem partners is crucial for redefining the production, scheduling, and utilization of computing power, promoting a "chip-cloud collaboration" model [17][30]. - The introduction of the fourth-generation ECS instances by Volcano Engine, powered by Intel's Xeon 6 processors, showcases the enhanced performance capabilities in various computing scenarios [18][20].
赛道Hyper | 英伟达携手联发科入局电竞本市场
Hua Er Jie Jian Wen· 2025-06-03 02:47
Core Insights - NVIDIA is collaborating with MediaTek to develop high-performance APU, aiming for a market launch in early 2026, which could disrupt AMD's dominance in the APU sector and reshape the gaming laptop market [1][10] Group 1: APU Development - The APU will integrate NVIDIA's latest Blackwell architecture GPU module and MediaTek's custom Arm architecture CPU core, focusing on heterogeneous computing [1] - Blackwell architecture, based on TSMC's 4nm process, features significant performance enhancements, including a 2x improvement in ray tracing performance and a 4x increase in AI inference speed [1] - The APU's thermal design power (TDP) is targeted at around 65W, which is approximately 30% lower than traditional "CPU + discrete GPU" configurations [2] Group 2: Market Opportunities - The collaboration targets two major market opportunities: performance innovation in gaming laptops and computational upgrades for AI PCs [4][7] - The APU design aims to reduce laptop thickness by 15%-20%, catering to the demand for lightweight gaming devices, with IDC forecasting a 9% year-on-year growth in global gaming laptop shipments in 2024 [6] - The integrated NPU in the APU will support real-time voice recognition and image generation, positioning the new devices for the enterprise AI PC market [8] Group 3: Competitive Landscape - The partnership is expected to impact AMD's market share, as AMD's Ryzen APU currently holds an advantage in the lightweight laptop market [9] - Intel is also likely to face challenges from this collaboration, as it accelerates its own technology developments [9][10] - The introduction of the APU signifies a shift towards a "high-performance era" in APU technology, potentially leading to significant innovations in product design and industry standards [10][11]
14.9万元,满血流畅运行DeepSeek一体机抱回家!清华90后初创出品
量子位· 2025-04-29 04:18
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 满血DeepSeek一体机 ,价格竟然被打到 10万元 级别了! 而且还不是量化版本,正是那个671B参数、最高质量的FP8原版。 △ 左:一体机;右:DeepSeek官网 从视频中不难看出,不仅答案精准,一体机的速度也是肉眼可见地比DeepSeek官网快上一些,粗略估计是已经接近了 22 tokens/s 。 那么这个一体机到底是什么来头? 或许有小伙伴要问了,那跑DeepSeek-R1/V3的 速度 ,能跟官方一较高下吗? 可以的,甚至是 更快 的那种。例如我们提个问题,来感受一下这个feel: 一个汉字具有左右结构,左边是木,右边是乞。这个字是什么?只需回答这个字即可。 不卖关子,它就是由北京 行云集成电路 最新推出的产品—— 褐蚁HY90 ,具体价格定到了 14.9万元 。 而且除了产品,这家公司本身也是有不少的"标签"在身上的,其中最为吸睛或许当属CEO了: 季宇 ,清华90后博士、前华为"天才少年"、计算机学会CCF优博奖获得者。 那么褐蚁HY90具体执行起更多任务时,又会是什么样的效果? 来,更多维度的一波实测走起。 实测10万元级的Deep ...
超越DeepSeek?巨头们不敢说的技术暗战
3 6 Ke· 2025-04-29 00:15
无可置疑的,DeepSeek-R1模型的面世使中国AI技术发展有了极大的优势侧,也标志着人工智能领域的 里程碑式突破。 不过,技术创新往往伴随应用成本的转移。约65%的早期采用者反馈,在实际部署中需要投入大量开发 资源进行适配优化,这在一定程度上削弱了其理论上的效率优势。 这款具有颠覆性意义的推理模型不仅在研发效率上展现出显著优势,其性能指标可与OpenAI等业界领 军企业的产品分庭抗礼,甚至基于中国的应用场景,可能还有所超越,而其所需计算资源较同类产品大 幅缩减近30%。 该模型的成功实践既印证了算法创新的无限可能,也引出了关键的技术进化命题,即当未来算法突破与 传统计算架构出现适配瓶颈时,行业将面临怎样的转变挑战? 当前主流大模型(如GPT-4、Gemini Pro、Llama3等)正以每月迭代2-3次的频率推进技术革新,持续刷 新性能基准。DeepSeek-R1通过独创的分布式训练框架和动态量化技术,成功将单位算力下的推理效能 提升40%,其研发轨迹为行业提供了算法与系统工程协同进化的典型案例。 而且,该团队研发的多头潜注意力机制(MLA)在实现内存占用降低50%的突破性进展时,也带来了 开发复杂度的显 ...
当GPU遇见内窥镜:多核异构计算如何定义智能影像新范式
思宇MedTech· 2025-02-28 03:56
早期癌症筛查和复杂病变的识别一直是消化道疾病诊断的难题,初期病灶往往微小且隐匿,依赖医生的敏锐观察力和经验。传统内窥镜在图像清晰度和实时性方面的 局限,使得诊断更具挑战。尤其是硬件算力不足,难以满足高精度诊断需求。 据思宇了解, 开立医疗 是首家将独立显卡(GPU)集成到内镜主机的厂家,其 即将推出的 iEndo内镜平台 ,更是创新性地引入了基于CPU、GPU和FPGA的多核异 构架构 ,突破了传统内镜算力的局限。 # 多核异构计算:智能硬件的协同革命 多核异构计算技术 通过将图像采集、处理和分析任务分配给不同的计算单元,充分利用GPU、CPU和FPGA的协同作业,实现了图像处理与智能分析的实时无缝对 接。 借助多核异构计算技术的创新,开立医疗即将发布的 iEndo内镜平台,正在为内镜领域带来一场前所未有的革命。 从"单核时代"到"异构协同" 多核异构架构的优势在于通过智能分配任务, 突破了传统单核处理器的局限 。 每个处理器根据任务需求各司其职,CPU负责逻辑决策,GPU加速图像处理,NPU专 注AI推理。这种架构突破了传统单核处理器的瓶颈,提升了数据处理效率,同时降低了功耗,成为推动智能时代发展的核心算力 ...
沐曦正式启动A股IPO:燧原科技、壁仞科技、摩尔线程早前均已签署辅导协议
IPO早知道· 2025-01-16 02:21
致力于为异构计算提供全栈GPU芯片及解决方案。 本文为IPO早知道原创 作者|Stone Jin 微信公众号|ipozaozhidao 据IPO早知道消息,沐曦集成电路(上海)股份有限公司(以下简称"沐曦")于2025年1月12日同 华泰联合证券签署辅导协议,正式启动A股IPO进程。 这意味着, 沐曦成为继 燧原科技 、 壁仞科技 和摩尔线程后,不到半年内第四家启动 A 股上市进 程的"芯片独角兽" ——2024年8月23日、9月10日和11月6日,燧原科技、壁仞科技和摩尔线程相 继与中金公司、国泰君安证券和中信证券签署A股辅导协议。 成立于2 020 年的 沐曦致力于为异构计算提供全栈GPU芯片及解决方案,可广泛应用于智算、智慧 城市、云计算、自动驾驶、数字孪生、元宇宙等前沿领域,为数字经济发展提供算力支撑 ;其团队 拥有丰富的设计和产业化经验,核心成员平均拥有近20年高性能GPU产品端到端研发经验,曾主导 过十多款世界主流高性能GPU产品研发及量产,包括GPU架构定义、GPU IP设计、GPU SoC设计 及GPU系统解决方案的量产交付全流程。 截至目前, 沐曦打造 的 全栈GPU芯片产品 涵盖 用于智算 ...