异构计算
Search documents
黄海清:建议组建中国异构计算软件生态联盟,建立中国的类CUDA系统
Xin Lang Cai Jing· 2026-02-01 16:12
(来源:智通财经) 上海"两会"召开前夕,上海市政协委员、上海熠知电子科技有限公司董事长黄海清博士,建议上海政府 推进建立中国版的"异构算力"统一软件联盟生态,打造类似于英伟达的中国版编程共享软件平台,称这 对整个产业而言是非常重要的部署。 炒股就看金麒麟分析师研报,权威,专业,及时,全面,助您挖掘潜力主题机会! ...
海光信息:系统总线互联协议(HSL)+助力国产AI产业算力协同与生态升级
Jing Ji Guan Cha Wang· 2026-01-30 09:12
随着人工智能大模型等应用对算力需求的指数级增长,传统单一形态的处理器架构已难以满足多样化场景的性能需求,CPU、GPU、NPU等各类芯片组成的 异构计算体系成为行业主流。但异构系统中不同部件的高效协同一直是行业痛点,中国工程院院士郑纬民指出,CPU、GPU、加速卡存储与网络模块的协同 效率,直接决定异构系统整体性能的释放,而统一、高效的互联总线协议,正是解决这一问题的关键。在此背景下,海光信息自主研发了系统总线互联协议 (HSL),并于2025年9月在北京举办的研讨会上,面向GPU、IO、OS、OEM等产业全栈正式开放该协议。HSL协议具备高带宽、低延迟、全局地址空间一致 性、全栈开放和灵活扩展五大核心特性,相比传统PCIe接口实现多维度突破:性能上,大幅降低数据传输延迟、提升带宽,支持芯片间高速直连,让AI超 节点内GPU充分利用CPU内存空间,显著提升AI模型运行效率;开发扩展上,简化编程复杂度,可灵活支持从单机多卡到大规模智算集群的弹性扩展,支持 万卡级以上计算加速卡异构互联;生态协同上,开放完整总线协议、提供IP参考设计及开放指令集,全面支持主流国产AI芯片,助力上下游伙伴高效连接, 打破技术壁垒,释 ...
沐曦股份上市后首份业绩预告出炉!预计2025年亏损收窄50%左右 推出曦索X系列GPU品牌与产品线
Xin Lang Cai Jing· 2026-01-27 15:29
Core Viewpoint - Mu Xi Co., Ltd. has released its first earnings forecast since its IPO, projecting significant revenue growth while narrowing its losses for the fiscal year 2025 [1][3]. Financial Performance - The company expects to achieve revenue between 1.6 billion to 1.7 billion yuan in 2025, representing a year-on-year growth of 115.32% to 128.78% [1]. - The anticipated net loss for the parent company is projected to be between 650 million to 798 million yuan, a reduction of 43.36% to 53.86% compared to the previous year's loss of 1.409 billion yuan [1]. - The expected loss for the non-recurring net profit is estimated to be between 700 million to 835 million yuan, reflecting a year-on-year decrease of 20.01% to 32.94% [1]. Strategic Development - The company is implementing a "1+6+X" development strategy, focusing on market expansion and enhancing its position in the high-performance GPU industry [3]. - Mu Xi Co., Ltd. aims to integrate AI technology across various industries, which has led to increased recognition and sustained procurement from downstream customers [3]. Product Development - The company has introduced a full-stack GPU product line, including the Xi Si N series, Xi Yun C series, and Xi Cai G series [4]. - Upcoming products include the C600 and C700 series chips, with the C600 series expected to begin mass production in the first half of 2026 [4]. - The newly launched Xi Suo X series is designed for scientific intelligence applications, supporting various computational tasks and AI-driven research [5]. Market Position - Since its IPO, the stock price of Mu Xi Co., Ltd. has declined from a peak of 895 yuan per share to a closing price of 572.18 yuan, marking a decrease of 36.07% [6]. - The stock reached a historical low of 558.58 yuan per share on the same day [6].
【环球问策】英特尔宋继强:具身智能正在从预编程模式转向多智能体自主协作模式
Huan Qiu Wang· 2026-01-26 07:16
【环球网科技报道 记者 林梦雪】"作为人工智能领域的前沿热点,具身智能在CES之后热度持续攀升,其核心要义在于将智能能力 与实体设备深度融合,通过感知-决策-执行-反馈的完整闭环,实现对物理世界的主动改造。"近日,英特尔研究院副总裁、英特尔 中国研究院院长宋继强系统解读了具身智能技术的最新进展、核心挑战与产业落地路径。 硬件产品方面,英特尔最新发布的第三代酷睿Ultra For Edge处理器成为关键支撑。该产品专为工业级应用与物理AI设计,具备180 TOPS的AI算力,采用Intel 18A制程工艺,在能效比上实现显著提升。其核心优势在于工业级可靠性,包括宽温工作范围、10年稳 定供货周期,以及针对机器人场景优化的高实时性与确定性。配合英特尔机器人AI套件与具身智能SDK,形成从硬件到软件的完 整解决方案——前者提供模块化的参考设计与优化软件包,支持传统视觉模型与大模型(含VLA)的高效运行;后者新增LLM任 务规划、EtherCAT实时通信等关键能力,复用英特尔在工业机器人领域的成熟技术积累,大幅降低厂商开发成本。 据他解释,具身智能的核心特征在于物理闭环与主动交互,区别于单纯的信息处理类AI应用。无论 ...
芯片初创公司,单挑英伟达和博通
半导体行业观察· 2026-01-22 04:05
Core Insights - Upscale AI, a chip startup, has raised $200 million in Series A funding to challenge Nvidia's dominance in rack-level AI systems and compete with companies like Cisco, Broadcom, and AMD [1][3] - The rapid influx of investors reflects a growing consensus that traditional network architectures are inadequate for the demands of AI, which require high scalability and synchronization [1][2] Funding and Market Position - The funding round was led by Tiger Global, Premji Invest, and Xora Innovation, with participation from several notable investors, bringing Upscale AI's total funding to over $300 million [1] - The AI interconnect market is projected to reach $100 billion by the end of the decade, prompting Upscale AI to focus on this growing sector [6] Technology and Product Development - Upscale AI is developing a chip named SkyHammer, optimized for vertical scaling networks, which aims to provide deterministic latency for data transmission within rack components [9][10] - The company emphasizes the importance of heterogeneous computing and networks, believing that no single company can provide all the necessary technologies for AI [10][12] Competitive Landscape - Nvidia's networking revenue has seen a significant increase, with a 162% year-over-year growth, highlighting the competitive pressure in the AI networking space [3] - Upscale AI aims to create a high radix switch and a dedicated ASIC to compete with Nvidia's NVSwitch and other existing solutions [14][16] Strategic Partnerships and Standards - Upscale AI is building its platform on open standards and actively participating in various alliances, including the Ultra Accelerator Link and SONiC Foundation [7][17] - The company plans to expand its product line to include more traditional horizontal scaling switches while maintaining partnerships with major data center operators and GPU suppliers [18]
英特尔副总裁宋继强:智能体AI带来算力挑战,异构计算将成为构建AI基础设施的重要方向
Xin Lang Cai Jing· 2026-01-15 10:41
Core Insights - The development of AI capabilities is transitioning from foundational large models to intelligent agents, focusing more on providing specific functions to build workflows [3][7] - Embodied intelligence, as a significant form of physical AI, integrates digital intelligence into physical devices for interaction with the real world, primarily emphasizing reasoning applications [3][7] Group 1: AI Capability Development - AI capability is evolving towards intelligent agents that emphasize specific functionalities for workflow construction [3][7] - Industry analysts predict a shift in AI computing power demand from training to inference, which will consume a corresponding proportion of computational resources [3][7] Group 2: Heterogeneous Computing Infrastructure - The need for heterogeneous infrastructure arises from the requirement for multi-agent systems to build complete workflows and operate multiple streams in parallel [3][7] - AI agents require support from various models, schedulers, and preprocessing modules, necessitating different hardware to provide optimal energy efficiency and cost-effectiveness [3][7] - A flexible heterogeneous support capability is needed at three levels: an open AI software stack at the top, infrastructure adaptable to small and medium enterprises in the middle, and a diverse hardware integration at the bottom [3][7] Group 3: Embodied Intelligence Robotics - In the field of embodied intelligent robotics, various methods for achieving intelligent tasks are being explored, with no optimal solution currently established [4][8] - Traditional industrial automation focuses on reliability, real-time performance, and computational accuracy, while large language model-based approaches lean towards neural network solutions requiring differentiated computing architectures [4][8] - The era of embodied intelligent robots is anticipated to bring challenges in computing power and energy consumption, with heterogeneous computing becoming the core architecture of AI infrastructure [4][8] Group 4: Multi-Agent Systems - The future of robotics, when scaled to millions, is expected to transcend industrial limitations and support widespread commercial and personalized applications, necessitating multi-agent systems [4][9] - The technical stack for multi-agent systems operating on physical AI devices faces numerous challenges, with heterogeneous computing being a key pathway to address system reliability issues [4][9]
英特尔副总裁宋继强:AI计算重心正在向推理转移
Xin Lang Cai Jing· 2026-01-15 10:41
Core Insights - The development of AI capabilities is transitioning from foundational large models to intelligent agents, focusing more on providing specific functions to build workflows [3][7] - Embodied intelligence, as a significant form of physical AI, integrates digital intelligence into physical devices for interaction with the real world, primarily emphasizing reasoning applications [3][7] AI Demand and Infrastructure - Industry analysts predict that the demand for AI computing power is shifting from training to inference, which will consume a corresponding proportion of computing resources [3][7] - The construction of multi-agent systems is essential for creating complete workflows and achieving parallel operations, necessitating heterogeneous infrastructure [3][7] Heterogeneous System Requirements - Heterogeneous systems must possess flexible support capabilities at three levels: an open AI software stack at the top layer, infrastructure that meets the needs of small and medium enterprises in the middle layer, and a bottom layer that integrates diverse hardware [3][7] - The bottom layer should include various architectures such as CPUs, GPUs, NPUs, AI accelerators, and brain-like computing devices to build a flexible heterogeneous system through layered infrastructure [3][7] Embodied Intelligence Robotics - In the field of embodied intelligent robotics, various methods for achieving intelligent tasks are being explored, from traditional layered custom models to end-to-end VLA models, with no optimal solution currently established [4][8] - Traditional industrial automation solutions focus on reliability, real-time performance, and computational accuracy, while large language model-based solutions lean towards neural network approaches requiring differentiated computing architectures [4][8] Future Challenges and Opportunities - The era of embodied intelligent robots is anticipated to bring challenges in computing power and energy consumption, with heterogeneous computing becoming the core architecture of AI infrastructure [4][8] - As the scale of robots reaches millions, they are expected to break through industrial scene limitations and widely support commercial and personalized applications, necessitating multi-agent systems [4][8][9]
TPU、LPU、GPU-AI芯片的过去、现在与未来
2025-12-29 01:04
Summary of Key Points from the Conference Call Industry Overview - The conference call discusses the evolution and future of AI chips, specifically focusing on three main types: Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), and Language Processing Units (LPUs) [2][3][5]. Core Insights - **AI as a Driving Force**: The rise of artificial intelligence has made computing power the core engine of technological revolution, with GPUs, TPUs, and LPUs playing crucial roles in this landscape [2]. - **GPU Evolution**: NVIDIA's GPUs transitioned from graphics rendering to becoming foundational for AI training, largely due to the development of the CUDA ecosystem [3][4]. - **TPU Development**: Google’s TPUs were created in response to an internal computing crisis, aiming to enhance computational efficiency through a specialized architecture [5][6]. - **LPU Introduction**: The LPU, developed by Groq, represents a further specialization in AI processing, particularly for inference tasks, building on the foundation laid by TPUs [7][8][9]. Historical Context - **GPU Milestone**: The success of the AlexNet model in 2012 marked a significant turning point for GPUs in deep learning, showcasing their advantages in accelerating training processes [4]. - **TPU's Strategic Importance**: Google recognized the need for enhanced computing capabilities to support AI-driven products and services, leading to the development of TPUs [5][6]. - **LPU's Unique Position**: Groq's LPU aims to provide deterministic execution for inference tasks, addressing the high costs and complexities associated with AI deployment for smaller enterprises [9]. Technical Comparisons - **Architecture Differences**: - GPUs utilize a general-purpose architecture with CUDA cores and Tensor Cores for parallel processing [11]. - TPUs employ a Systolic Array architecture designed for efficient matrix operations [12]. - LPUs focus on deterministic execution with a programmable pipeline, optimizing for low-latency inference [14]. - **Performance Metrics**: - LPU shows high efficiency with approximately 1W per token/s, while GPUs consume significantly more power (250-700W+) [14]. - TPU v7 is reported to have a performance capability approximately 40 times that of NVIDIA's NVL72 configuration [20]. Market Dynamics - **TPU v7 Launch**: The introduction of TPU v7 signifies a shift in Google’s strategy from internal use to commercialization, targeting a broader customer base [22]. - **NVIDIA and Groq Partnership**: NVIDIA's collaboration with Groq, valued at $20 billion, aims to enhance its position in the inference market, leveraging Groq's specialized LPU technology [22][23]. Future Outlook - **Trends in AI Chip Development**: The market is expected to see a rise in specialized chips, with ASIC market share projected to exceed 30% by 2026 [25]. - **Emergence of Edge AI**: The demand for low-power inference chips like LPUs is anticipated to grow, driven by the proliferation of IoT devices [31]. - **Sector Applications**: AI chips are expected to penetrate various industries, including finance, healthcare, and manufacturing, enhancing capabilities such as automated diagnostics and personalized learning [36]. Conclusion - The evolution of AI chips reflects a dynamic interplay between technological innovation and market demand, with a clear trend towards specialization and efficiency. The competitive landscape will increasingly focus on comprehensive solutions that integrate training and inference capabilities across diverse applications [37].
连英伟达都开始抄作业了
Tai Mei Ti A P P· 2025-12-26 01:38
Core Insights - Nvidia announced a $20 billion cash technology licensing agreement with AI chip startup Groq, which is seen as a strategic move to mitigate competition and enhance its position in the AI market [1][9][19] - The deal allows Groq to operate independently while transferring most of its core technology assets to Nvidia, effectively turning a potential competitor into an ally [1][9] - The AI industry is undergoing a significant shift from centralized model training to large-scale inference, with the inference market expected to grow at a compound annual growth rate (CAGR) of 65%, reaching $40 billion by 2025 and $150 billion by 2028 [1][19] Group 1: Nvidia's Strategic Move - The $20 billion payment is 2.9 times Groq's valuation of $6.9 billion just three months prior, indicating a rare "valuation inversion" in the tech industry [1][10] - Analysts suggest that this transaction is a way for Nvidia to buy time and eliminate a significant threat while avoiding antitrust scrutiny [1][9] - Nvidia's cash and short-term investments totaled $60.6 billion as of October 2025, making the $20 billion investment manageable [10] Group 2: Groq's Technology and Market Position - Groq was founded by Jonathan Ross, a key developer of Google's TPU, aiming to create a chip optimized for AI inference, known as the Language Processing Unit (LPU) [2][3] - The LPU architecture offers significant advantages over Nvidia's GPUs, including ultra-low latency, high energy efficiency, and deterministic computing [3][12] - Groq's rapid rise in valuation and market presence includes partnerships with major clients like Meta and Saudi Aramco, and it has served over 2 million developers [4][5] Group 3: Competitive Landscape - Nvidia faces increasing competition in the inference market from Google TPU, AMD MI300X, and Huawei Ascend, which are gaining market share and offering cost advantages [6][7][8] - The dominance of Nvidia's CUDA ecosystem poses a significant barrier for competitors like Groq, as switching costs for enterprises are prohibitively high [5][15] - The AI chip market is expected to solidify, with Nvidia projected to maintain a market share of 75-80% by 2027, while other players like AMD and Google will hold smaller shares [14][19] Group 4: Future Trends and Opportunities - The integration of Groq's technology into Nvidia's ecosystem could lead to a dual-compute solution combining GPUs for training and LPUs for inference, enhancing overall efficiency [11][17] - The shift towards heterogeneous computing is anticipated, with over 80% of AI data centers expected to adopt this architecture by 2028 [17] - Despite the consolidation of power among major players, niche opportunities remain for startups in edge computing and specialized applications [18][19]
深圳理工大学唐志敏:异构计算已成必然,软件决定芯片胜负丨GAIR 2025
雷峰网· 2025-12-24 03:19
Core Viewpoint - RISC-V has the potential to integrate the characteristics of CPU, GPU, and AI processors, breaking through the ecological barriers of CUDA [47] Group 1: AI and Computing Power - The eighth GAIR Global AI and Robotics Conference will be held in Shenzhen, focusing on the core of intelligent systems—computing power [2] - Computing power is not just a reflection of hardware performance but a capability system to complete tasks under resource and time constraints [3] - The rapid growth of generative AI's demand for computing power necessitates heterogeneous computing (CPU + XPU) as CPUs alone cannot meet real-world needs [11][16] Group 2: Software and Ecosystem - The true determinant of computing power release is the software and application ecosystem, rather than the hardware itself [20] - The ecosystem includes all software that runs on processors, and the productivity is generated by application software, not the chips [24] - The x86 ecosystem has a significant market share and inertia, making it challenging for new architectures to compete [26] Group 3: RISC-V and Market Challenges - RISC-V's openness presents new possibilities, but openness does not guarantee success; many open CPUs have failed commercially [27][28] - RISC-V faces commercialization difficulties, particularly in complex computing fields, due to an immature software ecosystem [29] - The need for a robust software ecosystem is critical for RISC-V to succeed in the competitive landscape [20][29] Group 4: Future Directions - The future of computing architecture may return to a CPU-centric model, with RISC-V having the potential to unify CPU, GPU, and AI processor characteristics [47] - The importance of building a domestic computing ecosystem is recognized at the national level to avoid dependency on foreign technologies [33] - Successful chip development hinges on the ability to create a comprehensive software ecosystem that adds significant value to products and services [34][45]