数据流架构
Search documents
英特尔投资SambaNova3.5亿美元挑战GPU在AI推理领域的主导地位
Sou Hu Cai Jing· 2026-02-25 10:36
AI基础设施公司SambaNova成功融资3.5亿美元,旨在推进其数据流架构技术,将其定位为基于GPU的 AI系统的替代方案。 这轮融资的参与者包括英特尔资本,这打破了英特尔计划收购SambaNova的传言。其他投资方包括Vista Equity、Cambium Capital以及多家期待SambaNova推出最新一代可重构数据流单元(RDU)时获得丰厚回 报的风险投资基金。 英特尔将与这家新兴公司建立"多年期"合作关系,旨在为客户提供生成式AI部署的GPU替代方案。这意 味着SambaNova的新RDU将使用至强处理器,此外,双方的合作还将包括硬件软件协同设计。 后者的灵活性无疑会为SambaNova赢得优势,考虑到内存价格的飙升。 HBM2E看起来可能是个奇怪的选择,但Liang希望确保他的公司能在内存价格上涨时期顺利出货。他 说:"从成本角度来看,确保我们不陷入供应链争夺非常重要。" 虽然相比前代产品有很大改进,但SN50在纸面参数上看起来并不那么令人印象深刻,至少与现代GPU 相比是这样。它将提供英伟达近两年前推出的Blackwell架构约64%的密集FP8计算能力、三分之一的 HBM容量和不到四分之 ...
大雨解读理想L9搞全线控底盘底层逻辑
理想TOP2· 2026-02-08 04:51
Core Viewpoint - The article emphasizes the transition from 2D Vision Transformers (ViT) to 3D ViT, highlighting the advantages of processing continuous video streams for better understanding of the physical world and improved response times in autonomous vehicles [1][2]. Group 1: Transition to 3D ViT - The traditional 2D ViT processes images in a sliced manner, limiting the information captured from each frame, while 3D ViT processes video clips, integrating spatial and temporal data for enhanced feature extraction [1][2]. - The shift to 3D ViT is not merely a conversion from 2D to 3D in terms of perspective but involves a fundamental change in feature extraction dimensions, focusing on height, width, and time [2]. Group 2: Technical Advancements - The new chip architecture, referred to as a data flow architecture, allows for direct connections between layers on the silicon chip, minimizing the need for external memory reads and writes, thus optimizing latency [2]. - The self-developed chip by the company is designed to be data-driven rather than instruction-driven, achieving higher parallelism and integrating hardware and software design from the outset [4]. Group 3: Implications for Autonomous Vehicles - The advancements in chip technology necessitate corresponding improvements in vehicle control systems, leading to the development of a fully controlled chassis for the L9 model to match the enhanced processing capabilities [3].
一颗颠覆传统的芯片
半导体行业观察· 2026-02-06 01:33
Core Viewpoint - NextSilicon is innovating in computer architecture with its Maverick 2 processor, aiming to address challenges in high-performance computing (HPC) and artificial intelligence (AI) by utilizing a unique data flow architecture that enhances performance and efficiency [2][3][16]. Group 1: Company Strategy and Architecture - NextSilicon's Maverick 2 processor is designed to overcome limitations of traditional CPU and GPU architectures by directly executing computation graphs, eliminating the need for instruction serialization and reordering [7][8]. - The architecture allows for simultaneous execution of multiple memory operations and arithmetic logic unit (ALU) operations, significantly improving performance by masking core latency sensitivity [6][9]. - The company emphasizes the importance of memory management, utilizing a unique memory management unit (MMU) that handles fewer memory accesses, thus optimizing memory access patterns [10][11]. Group 2: Performance Metrics and Testing - NextSilicon's architecture has demonstrated unprecedented performance in benchmarks such as GUPS (Giga Updates Per Second), showcasing its ability to handle random memory access efficiently [18]. - The company aims to maximize performance by transforming workloads typically limited by computation into those limited by memory, thus achieving optimal performance levels [19]. Group 3: Market Focus and Future Directions - NextSilicon strategically targets the HPC market, which, despite being smaller than the AI market, provides a mature environment for technology development and customer collaboration [16][17]. - The company is exploring how to leverage its HPC chip for AI applications, indicating a future direction that combines high-performance computing with artificial intelligence workloads [23][24].
英伟达为何斥资200亿美元收购Groq
半导体行业观察· 2026-01-01 01:26
Core Viewpoint - Nvidia's acquisition of Groq's technology and talent for $20 billion raises questions about the strategic rationale behind the deal, especially given the potential for antitrust scrutiny and the actual benefits derived from Groq's technology [1][2]. Group 1: Nvidia's Acquisition Details - Nvidia paid $20 billion for a non-exclusive license of Groq's intellectual property, including its Language Processing Unit (LPU) and associated software libraries [2]. - Groq will continue to operate independently, retaining its high-performance inference-as-a-service product, despite significant talent loss to Nvidia [2]. - The acquisition is seen as a move to eliminate competition, but the justification for the $20 billion price tag remains debatable [2]. Group 2: Technology Insights - Groq's LPU utilizes Static Random Access Memory (SRAM), which is significantly faster than the High Bandwidth Memory (HBM) used in current GPUs, potentially offering 10 to 80 times the speed [3]. - Groq's chip achieved a token generation speed of 350 tok/s in tests, and even higher at 465 tok/s when running mixed expert models [3]. - However, SRAM's low space efficiency means that running medium-sized language models would require hundreds or thousands of Groq's LPUs, raising questions about its practicality [4]. Group 3: Architectural Innovations - The key innovation from Groq is its "dataflow architecture," designed to accelerate linear algebra operations during inference, which could provide Nvidia with a competitive edge in chip performance [5][6]. - This architecture allows for continuous processing of data without waiting for memory, potentially overcoming bottlenecks that slow down GPU performance [6][7]. - Groq's LPU can theoretically achieve performance levels comparable to high-end GPUs, but practical performance may vary [7]. Group 4: Future Implications - Nvidia's collaboration with Groq could lead to new technology options for enhancing chip performance, particularly in inference optimization, an area where Nvidia has previously lacked a strong offering [8]. - The upcoming Rubin series chips from Nvidia are designed to optimize the inference pipeline, indicating a shift in architecture that could leverage Groq's technology [9]. - Groq's existing chip designs may not serve as excellent decoders, but they could be useful for speculative decoding, which enhances performance by predicting outputs from smaller models [9]. Group 5: Market Context - The $20 billion price tag for Groq's technology is substantial but manageable for Nvidia, given its recent operating cash flow of $23 billion [10]. - The acquisition may not immediately impact Nvidia's current chip production, as the company could be positioning itself for long-term strategic advantages [12].
理想CTO谢炎在云栖大会分享理想自动驾驶芯片设计思路
理想TOP2· 2025-09-27 08:58
Core Viewpoint - The article discusses the evolution of intelligent driving algorithms and the importance of data flow architecture in the context of autonomous driving technology, emphasizing the need for advanced computational architectures to handle increasing demands for processing power and reasoning capabilities. Group 1: Evolution of Intelligent Driving Algorithms - The evolution of autonomous driving algorithms can be divided into three phases: the initial phase relied on rule-based algorithms, the second phase shifted towards end-to-end (E2E) learning, and the current phase is focusing on integrating visual language models (VLM) with reinforcement learning (RL) to enhance decision-making capabilities [4][5][6]. Group 2: Importance of Language Models - Language models are deemed essential for achieving long reasoning capabilities in autonomous driving, as they enable the system to generalize and handle corner cases that cannot be addressed solely through data collection or world models [7][8]. - The psychological aspect of having a driving model that aligns with human values and reasoning is highlighted, suggesting that language models can help instill a human-like worldview in autonomous systems [8][9]. Group 3: Computational Architecture - The article critiques the traditional von Neumann architecture, which prioritizes computation over data, and proposes a shift towards data-driven computation to better handle the complexities of AI processing [12][13]. - The company has developed a unique NPU architecture that focuses on data flow rather than traditional SOC designs, aiming to improve efficiency and performance in AI inference tasks [17][18]. Group 4: Performance Metrics - The performance of the company's NPU architecture is reported to be significantly higher than existing solutions, achieving up to 4.4 times the performance in CNN tasks and 2 to 3 times in LlaMA2 7B tasks, while maintaining similar transistor counts [2][18].
聚焦“新算力”,清微智能新架构助力AI科技“换道超车”
Jing Ji Wang· 2025-09-18 09:15
Group 1 - The global AI chip market is witnessing a shift towards data flow architecture, with companies like SambaNova and Groq achieving significant valuations of $5 billion and $6 billion respectively [1] - Clear Microelectronics, originating from Tsinghua University, has successfully developed and mass-produced data flow reconfigurable chip technology, positioning itself as a leader in this emerging field [1][2] - The founder of Clear Microelectronics, Wang Bo, emphasizes the need for innovation beyond traditional GPU architectures to overcome limitations in technology and materials, advocating for a "leapfrog" approach similar to the automotive industry's transition to electric vehicles [2] Group 2 - Clear Microelectronics' first "new computing power" chip, TX81, has achieved over 20,000 orders and established intelligent computing centers across multiple regions in China within just six months of its launch [2] - Investment institutions are increasingly recognizing the value of new computing power, with significant investments from major funds indicating a strong market trend towards data flow architecture [3] - The transition to data flow architecture is seen as a critical signal for achieving self-sufficiency in the computing power industry, with support from initiatives like ChatGPT and DeepSeek3.1 [3]
理想自动驾驶芯片最核心的是数据流架构与软硬件协同设计
理想TOP2· 2025-09-05 04:56
Core Viewpoint - The article discusses the advancements in Li Auto's self-developed chip architecture, particularly focusing on the VLA architecture and its implications for autonomous driving capabilities [1][2]. Group 1: Chip Development and Architecture - Li Auto's self-developed chip is designed with a data flow architecture that emphasizes hardware-software co-design, making it suitable for running large neural networks efficiently [5][9]. - The chip is expected to achieve 2x performance compared to leading chips when running large language models like GPT and 3x for vision models like CNN [5][8]. - The development timeline from project initiation to vehicle deployment is approximately three years, indicating a rapid pace compared to similar projects [5][8]. Group 2: Challenges and Innovations - Achieving real-time inference on the vehicle's chip is a significant challenge, with efforts focused on optimizing performance through various engineering techniques [3][4]. - Li Auto is implementing innovative parallel decoding methods to enhance the efficiency of action token inference, which is crucial for autonomous driving [4]. - The integration of CPU, GPU, and NPU in the Thor chip aims to improve versatility and performance in processing large amounts of data, which is essential for autonomous driving applications [3][6]. Group 3: Future Outlook - The company expresses strong confidence in its innovative architecture and full-stack development capabilities, which are expected to become key differentiators in the future [7][10]. - The relationship between increased computing power and improved performance in advanced driver-assistance systems (ADAS) is highlighted, suggesting a predictable enhancement in capabilities as technology evolves [6][9].
重磅!中国团队发布SRDA新计算架构,从根源解决AI算力成本问题,DeepSeek“神预言”成真?
Xin Lang Cai Jing· 2025-06-09 13:27
Core Insights - The article discusses the challenges of current AI computing architectures, particularly the high cost of computational power relative to the value generated by large models, highlighting a need for innovative hardware solutions [1][3][5] - The release of the SRDA AI architecture white paper by Yupan AI proposes a new system-level simplified reconfigurable dataflow architecture aimed at addressing the core bottlenecks in AI computing [3][6][17] Current Challenges in AI Hardware - The existing GPGPU architecture is seen as a general-purpose solution that does not fully meet the specific needs of large model training and inference, leading to inefficiencies [6][7] - Many dedicated AI architectures designed before the explosion of large models in 2023 lack consideration for the specific demands of these models, resulting in low utilization rates and reliance on advanced manufacturing processes [7][8] Key Features of Next-Generation AI Computing Chips - The white paper identifies critical issues such as insufficient memory and interconnect bandwidth, low computational efficiency, complex network designs, and excessive power consumption as major challenges for current AI architectures [8][12][18] - The SRDA architecture emphasizes a dataflow-centric design, optimizing data movement and reducing memory access frequency, which is crucial for enhancing performance and energy efficiency [11][12][14] Innovations Proposed by SRDA - SRDA integrates high-bandwidth, large-capacity 3D-DRAM memory directly into the computing chip, addressing memory bottlenecks effectively [11][14] - The architecture features a unified network design that simplifies cluster complexity and reduces management overhead, potentially surpassing existing technologies like NVLink [12][16] - SRDA allows for reconfigurability to adapt to evolving AI models, focusing on core AI computations while minimizing unnecessary complexity [16][18] Implications for the AI Industry - The SRDA architecture presents a comprehensive solution to the I/O bottlenecks faced by AI computing, offering a systematic approach to the development of AI chips [17][18] - The adoption of the dataflow paradigm in AI chip design may lead to a shift in industry standards, with more companies likely to explore similar architectures in the near future [17][18]