并行计算

Search documents
算力:从英伟达的视角看算力互连板块成长性 - Scale Up 网络的“Scaling Law”存在吗?
2025-08-21 15:05
算力:从英伟达的视角看算力互连板块成长性 - Scale Up 网络的"Scaling Law"存在吗?20250821 摘要 东吴证券认为 Scale Up 网络不仅限于现有 ASIC 和机柜方案,而是需要 更大规模的跨机柜连接,将机柜当作积木连接,光纤和 AEC 将带来 1:9 的等效 1.6T 连接需求,每 4 个芯片需一台交换机,解决内存墙问题, 满足 AI 并行计算需求。 英伟达通过提高 Nvlink 带宽(每代产品单卡带宽翻倍)和扩大 Up 规模 (如 H100 到 GH200 的升级)推广 Scale Up 网络,旨在解决硬件内 存墙和并行计算需求,但初期因成本高和推理需求不足,用户采用度不 高,后推出更具性价比的 NVO32 方案。 Scale Up 网络由硬件内存墙和 AI 计算范式驱动,提供超节点内更高带 宽,英伟达系统中 Up 带宽是 Out 的九倍。未来随着 Up 规模扩大,可 能取代 Out,实现 AI 网络统一连接。 随着 Scale Up 需求显现,光纤、AEC 和交换机环节将受益,在跨机柜、 大范围、高性能连接方面发挥关键作用,推动市场增长。 内存墙指模型参数量和算力增速快于配 ...
全球市值第一 英伟达如何踏入AI计算芯片领域
天天基金网· 2025-08-12 11:24
以下文章来源于睿远FUND ,作者小睿 睿远FUND . 睿远基金官方订阅号,第一时间发布睿远基金动态、分享优质基金及投资内容,做持有人长期利益最大 化的价值投资实践者。 美国半导体巨头英伟达在 6 月初超过微软成为全球市值最高的上市公司之后, 7 月初公司市值突破 4 万亿美元,成为有史以来首家达到这一重要里程碑的企业,当时的股价触及 164.32 美元的历史最高 点,而目前英伟达的股价已经超过了 180 美元。 市场普遍认为,英伟达这波股价的飙升,主要由于投资者对于人工智能变革潜力的坚定信心,并且英伟 达的主要合作伙伴 OpenAI 也在近期发布了最新的 GPT-5 ,英伟达的市值里程碑也凸显了企业正将资 产支出转向 AI 领域的发展方向。 英伟达,最初是游戏芯片制造商,随后转型为加密挖矿芯片制造商,如今则成为人工智能计算芯片制造 巨头,以及该领域无可争议的早期赢家。 那么, 英伟达是如何踏入人工智能计算芯片领域的呢? 在《黄仁勋:英伟达之芯》一书中,作者为读者们呈现了当时英伟达是如何抓住了这个千载难逢的机 会。 千载难逢的机会 英伟达何以快速转型 在提到英伟达的 AI 之路,需要先介绍一个重要的人物,那 ...
当前处理器架构,还有哪些提升机会?
半导体行业观察· 2025-07-20 04:06
Core Viewpoint - The article discusses the evolving focus of processor design from solely performance to also include power efficiency, highlighting the challenges and opportunities in current architectures [3][4]. Group 1: Performance vs. Power Efficiency - Processors have traditionally prioritized performance, but now they must also consider power consumption, leading to a reevaluation of design choices [3]. - Improvements in performance that significantly increase power consumption may no longer be acceptable, prompting a shift towards more energy-efficient designs [3][4]. - Current architectures are experiencing diminishing returns in performance improvements, making it increasingly difficult to achieve further gains [3]. Group 2: Architectural Innovations - 3D-IC technology offers a middle ground in power consumption, being more efficient than traditional PCB connections while still consuming more power than single-chip solutions [4]. - Co-packaged optics (CPO) is gaining traction as a means to reduce power consumption by bringing optical devices closer to silicon chips, driven by advancements in technology and demand for high-speed digital communication [4]. - Asynchronous design presents potential benefits but also introduces complexity and unpredictability in performance, which has hindered its widespread adoption [5]. Group 3: AI and Memory Challenges - The rise of AI computing has intensified the focus on memory efficiency, as processors must manage vast amounts of parameters without excessive energy consumption [6]. - The balance between execution power and data movement power is crucial, especially as clock frequencies continue to rise without proportional performance gains [6][7]. - Architectural features like speculative execution, out-of-order execution, and limited parallelism are essential for maximizing processor utilization [6][7]. Group 4: Cost vs. Benefit of Features - The implementation of features like branch prediction can significantly enhance performance but may also lead to increased area and power consumption [8]. - A small, simple branch predictor can improve performance by 15%, while a larger, more complex one can achieve a 30% increase but at a much higher cost in terms of area and power [8]. - The overall overhead from branch prediction and out-of-order execution can range from 20% to 30%, indicating a trade-off between performance gains and resource consumption [8]. Group 5: Parallelism and Its Limitations - Current processors offer limited parallelism, primarily through multiple cores and functional units, but true parallelization remains a challenge due to the nature of many algorithms [9][10]. - Amdahl's Law highlights the limitations of parallelization, as not all algorithms can be fully parallelized, which constrains performance improvements [10]. - The need for explicit parallel programming complicates the adoption of multi-core processors, as developers often resist changing their programming methods [11]. Group 6: Future Directions and Customization - The industry may face a creative bottleneck in processor design, necessitating new architectures that may sacrifice some generality for efficiency [16]. - Custom accelerators, particularly for AI workloads, can significantly enhance power and cost efficiency by tailoring designs to specific tasks [14][15]. - The deployment of custom NPUs can lead to substantial improvements in processor efficiency, with reported increases in performance metrics such as TOPS/W and utilization [15].
处理器架构,走向尽头?
半导体芯闻· 2025-07-17 10:32
Core Insights - The article emphasizes the shift in processor design focus from solely performance to also include power efficiency, as performance improvements that lead to disproportionate power increases may no longer be acceptable [1][2] - Current architectures are facing challenges in achieving further performance and power efficiency improvements, necessitating a reevaluation of microarchitecture designs [1][3] Group 1: Power Efficiency and Architecture - Processor designers are re-evaluating microarchitectures to control power consumption, with many efficiency improvements still possible through better design of existing architectures [1][2] - Advancements in process technology, such as moving to smaller nodes like 12nm, continue to be a primary method for reducing power consumption [1][2] - 3D-IC technology offers a new power efficiency point, providing lower power and higher speed compared to traditional PCB connections [2][3] Group 2: Implementation Challenges - Asynchronous design presents challenges, as it can lead to unpredictable performance and increased complexity, which may negate potential power savings [3][4] - Techniques like data and clock gating can help reduce power consumption, but they require careful analysis to identify major contributors to power usage [3][4] - The article notes that the most significant power savings opportunities lie at the architecture level rather than the RTL (Register Transfer Level) implementation [3][4] Group 3: AI and Performance Trade-offs - The rise of AI computing has pushed design teams to address the memory wall, balancing execution power and data movement power [5][6] - Architectural features such as speculative execution, out-of-order execution, and limited parallelism are highlighted as complex changes made to improve performance [5][6] - The article discusses the trade-offs between the complexity of features like branch prediction and their impact on area and power consumption [9][10] Group 4: Parallelism and Programming Challenges - Parallelism is identified as a key method for improving performance, but current processors have limited parallelism capabilities [10][11] - The article highlights the challenges of explicit parallel programming, which can deter software developers from utilizing multi-core processors effectively [13][14] - The potential for accelerators to offload tasks from CPUs is discussed, emphasizing the need for efficient design to improve overall system performance [15][16] Group 5: Custom Accelerators and Future Directions - Custom accelerators, particularly NPUs (Neural Processing Units), are gaining attention for their ability to optimize power and performance for specific AI workloads [17][18] - The article suggests that creating application-specific NPUs can significantly enhance efficiency, with reported improvements in TOPS/W and utilization [18][19] - The industry may face a risk of creative stagnation, necessitating new architectural concepts to overcome existing limitations [19]
OpenAI甩开英伟达,谷歌TPU“横刀夺爱”
3 6 Ke· 2025-07-02 23:10
Group 1 - Nvidia has regained its position as the world's most valuable company, surpassing Microsoft, but faces new challenges from OpenAI's shift towards Google's TPU chips for AI product support [1][3] - OpenAI's transition from Nvidia's GPUs to Google's TPUs indicates a strategic move to diversify its supply chain and reduce dependency on Nvidia, which has been the primary supplier for its large model training and inference [3][5] - The high cost of Nvidia's flagship B200 chip, priced at $500,000 for a server equipped with eight units, has prompted OpenAI to seek more affordable alternatives like Google's TPU, which is estimated to be in the thousands of dollars range [5][6] Group 2 - Google's TPU chips are designed specifically for AI tasks, offering a cost-effective solution compared to Nvidia's GPUs, which were originally developed for graphics rendering [8][10] - The TPU's architecture allows for efficient processing of matrix operations, making it particularly suitable for AI applications, while Nvidia's GPUs, despite their versatility, may not be as optimized for specific AI tasks [10][11] - The demand for inference power in the AI industry has surpassed that for training power, leading to a shift in focus among AI companies, including OpenAI, towards leveraging existing models for various applications [15]
量子算力跨越临界点
2025-06-19 09:46
Summary of Quantum Computing and Communication Conference Call Industry Overview - The conference focused on the **quantum computing** and **quantum communication** industries, highlighting their current status, challenges, and future potential [1][2][16]. Key Points and Arguments Quantum Computing - **Quantum Computing Basics**: Quantum computing utilizes quantum bits (qubits) that can exist in multiple states simultaneously, allowing for exponential speedup in specific algorithms compared to classical computing [5][14]. - **Current Technologies**: The main technologies in quantum computing include: - **Superconducting**: Used by companies like Google and IBM, known for high gate fidelity and long coherence times [6]. - **Trapped Ions**: Represented by companies like INQ, offering higher fidelity but facing scalability challenges [6]. - **Neutral Atom Optical Tweezers**: Lower environmental requirements but longer operation times [6]. - **Industry Stage**: The quantum computing industry is still in its early stages, primarily serving the education and research markets, with potential applications in materials, chemicals, biomedicine, and finance [1][21]. Quantum Communication - **Key Technologies**: Quantum communication includes: - **Quantum Key Distribution (QKD)**: Ensures secure key distribution using quantum properties, making interception detectable [9][33]. - **Quantum Teleportation**: Transfers quantum states using entangled particles, with significant implications for future information transmission [10]. - **Advantages**: Quantum communication offers enhanced security due to its fundamental properties, although it still relies on classical channels for information transmission [15]. Challenges and Development - **Key Issues**: The development of quantum computing faces challenges such as: - Environmental noise affecting qubits [17]. - The need for quantum error correction to achieve fault-tolerant quantum computing [4][53]. - Weak upstream supply chains, particularly for dilution refrigerants [17][18]. - **Measurement Systems**: Current measurement systems require optimization for low-temperature environments, and specialized equipment is needed for effective quantum control [19]. Market and Future Outlook - **Market Applications**: The primary market for quantum technologies is currently in education and research, but significant potential exists in materials science, biomedicine, and finance due to their complex computational needs [21][28]. - **Future Projections**: By 2025-2030, specialized quantum computers for optimization problems are expected to emerge, with general-purpose quantum computers gradually becoming more prevalent [23]. - **Technological Maturity**: Technologies like quantum key distribution and quantum random number generators are nearing practical application, particularly in high-security sectors [24]. Notable Companies and Developments - **Leading Companies**: Key players in the quantum computing space include IBM, Google, and IONQ, with significant advancements in superconducting and trapped ion technologies [30][32]. - **Investment Trends**: The potential for breakthroughs in quantum technology could lead to significant shifts in funding towards successful companies, particularly if major milestones are achieved [46]. Additional Important Content - **Quantum Measurement**: Quantum measurement technologies are advancing rapidly, with applications in military and research fields [27]. - **Economic Challenges**: Each technology route faces unique economic challenges, and the lack of a decisive breakthrough currently prevents a clear funding shift [46]. - **Security and Commercial Value**: Enhancing security through quantum technologies can create commercial value, particularly in sectors requiring high security [47]. This summary encapsulates the key insights from the conference call, providing a comprehensive overview of the quantum computing and communication landscape, its challenges, and future opportunities.
阿里通义发布并行计算新策略:1.6B等效4.4B,内存消耗骤降95%
量子位· 2025-05-28 04:22
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 既能提升模型能力,又不显著增加内存和时间成本 ,LLM第三种Scaling Law被提出了。 对于 1.6B模型 ,能实现性能接近4.4B模型, 内存占用仅为后者的1/22,延迟增加量为1/6 。 由此提出假设:并行计算的规模(如路径数量)可能是提升模型能力的关键因素,而非仅依赖参数规模或推理时间的串行扩展(如生成更多 token)。 并且可直接应用于现有模型(如Qwen-2.5),无需从头训练。 这就是阿里通义团队提出的 PARSCALE 。 目前LLMs的优化主要有两种思路:参数扩展(如GPT-4)和推理时间扩展(如DeepSeek-R1),但会增加内存和时间成本。 阿里通义团队提出的新范式受CFG(无分类器引导)双路径推理机制的启发。 他们将CFG的并行思想从 " 生成阶段的推理优化 " 扩展为 " 训练和推理全流程的「计算缩放」 "。 让我们来扒一扒技术细节。 将CFG的并行思想扩展到计算缩放 PARSCALE对于CFG双路径的灵感迁移 CFG 通过同时运行有条件生成(输入提示词)和无条件生成(不输入提示词)两条路径,再通过加权平均融合结果,提升生 ...
“黄仁勋最信赖的作者”深度交流:英伟达传奇背后以及AI的下一步
聪明投资者· 2025-04-02 03:23
《黄仁勋:英伟达之芯》作者 斯蒂芬 ·威特 ( Stephen Witt)近日做客 "诺亚 N+全球读书会 " ,在 ARK Wealth新加坡 进行了一场 深度分享。 这场交流围 绕 "AI霸主"英伟达及其传奇CEO黄仁勋展开,揭示了这家科技巨头背后更多不为人知的故 事。 作为一位知名记者、畅销书作家和科技文化观察者,威特以最高访问权限多次采访黄仁勋及其亲朋好 友、合作伙伴和英伟达员工,深入了解这位 "硅谷最具魅力CEO"的个人特质与领导风格。 英伟达公关团队甚至表示,他们从未见过黄仁勋以如此积极与热情的方式回应记者。 威特通过他与黄仁勋长达 6 年 、 7次面对面 深度交流,以及对 200多位相关人士的采访,为我们描 绘了一个立体而鲜活的黄仁勋形象 : 一位技术天才,更是一位极具远见和执行力的领导者。 他对计算机技术有着近乎痴迷的热爱,这种热情驱动着他不断突破边界,从 3D图形到人工智能,再到 如今押注"全宇宙",他始终站在科技发展的最前沿。 威特还分享了许多关于黄仁勋吸引顶尖人才的故事。 他提到,黄仁勋不仅用财富激励员工,更用技术梦想感染他们。这种领导风格让许多工程师心甘情愿追 随他数十年,即使在公司股 ...
又一个芯片架构,走向消亡?
半导体行业观察· 2025-04-02 01:04
Core Viewpoint - The article discusses the ambitious vision behind the development of the Cell processor by Sony, IBM, and Toshiba, highlighting its potential to revolutionize computing architecture and its eventual shortcomings in the market [1][3][21]. Group 1: Development of Cell Processor - In 2000, Sony, IBM, and Toshiba announced a collaboration to develop the Cell processor, aiming for a computing architecture that could achieve unprecedented performance levels, targeting 1 trillion floating-point operations per second [3][4]. - IBM committed to investing $400 million to establish design centers and manufacturing facilities for the Cell processor, while Sony and Toshiba contributed their respective technologies [4]. - The Cell processor was designed to integrate multiple computing units on a single chip, with the goal of creating a highly parallel computing environment [4][5]. Group 2: Technical Specifications - The Cell processor features a 64-bit PowerPC core (PPE) and up to 32 synergistic processing elements (SPEs), achieving peak performance of 1 TFLOPS in its initial prototype [11][12]. - The architecture includes a unique memory structure where SPEs cannot directly access system memory, requiring explicit data management, which increases programming complexity but enhances efficiency [9][12]. - The interconnect bus (EIB) allows for high bandwidth communication between processing units, crucial for maximizing the processor's performance [9]. Group 3: Market Performance and Challenges - Despite its theoretical performance, the Cell processor faced significant challenges in mass production due to high power consumption and complex architecture, leading to a reduced number of SPEs in the final version [11][12]. - The PlayStation 3, which utilized the Cell processor, struggled in the market due to its high manufacturing costs and the difficulty developers faced in optimizing games for its architecture [13][14]. - Competing products, such as Microsoft's Xbox 360, offered simpler architectures that were easier for developers to work with, further hindering the PS3's market performance [13][14]. Group 4: Legacy and Conclusion - Although the Cell processor did not achieve mainstream success in gaming, it found applications in high-performance computing, notably in the Roadrunner supercomputer, which was the first to exceed 1 PetaFLOPS [16][18]. - The innovative design of the Cell processor influenced future computing architectures, particularly in parallel processing and GPU computing [21]. - By 2012, IBM officially discontinued support for the Cell architecture, marking the end of an era for a processor that had once held great promise [19].
深度|英伟达黄仁勋:GPU是一台时光机,让人们看到未来;下一个十年AI将在某些领域超越人类的同时赋能人类
Z Potentials· 2025-03-01 03:53
Core Insights - NVIDIA has rapidly evolved into one of the world's most valuable companies due to its pioneering role in transforming computing through innovative chip and software designs, particularly in the AI era [2][3]. Group 1: Historical Context - The inception of NVIDIA was driven by the observation that a small portion of code in software could handle the majority of processing through parallel execution, leading to the development of the first modern GPU [3][4]. - The choice to focus on video games was strategic, as the gaming market was identified as a potential driver for technological advancements and a significant entertainment market [5][6]. Group 2: Technological Innovations - The introduction of CUDA allowed programmers to utilize familiar programming languages to harness GPU power, significantly broadening the accessibility of parallel processing capabilities [7][9]. - The success of AlexNet in 2012 marked a pivotal moment in AI, demonstrating the potential of GPUs in training deep learning models, which initiated a profound transformation in the AI landscape [11][12]. Group 3: Current Developments - Major breakthroughs in computer vision, speech recognition, and language understanding have been achieved in recent years, showcasing the rapid advancements in AI capabilities [14][15]. - NVIDIA is focusing on the application of AI in various fields, including digital biology, climate science, and robotics, indicating a shift towards practical applications of AI technology [21][38]. Group 4: Future Vision - The future of automation is anticipated to encompass all moving entities, with robots and autonomous systems becoming commonplace in daily life [26][27]. - NVIDIA's ongoing projects, such as Omniverse and Cosmos, aim to create advanced generative systems that will significantly impact robotics and physical systems [37][38]. Group 5: Energy Efficiency and Limitations - The company emphasizes the importance of energy efficiency in computing, having achieved a remarkable 10,000-fold increase in energy efficiency for AI computations since 2016 [32][33]. - Current physical limitations in computing are acknowledged, with a focus on improving energy efficiency to enhance computational capabilities [31][32].