Workflow
NVIDIA GPU
icon
Search documents
HAMi × NVIDIA:GPU 拓扑感知调度实现详解
AI前线· 2025-10-25 05:32
转载自 | Dynamia 密瓜智能 作为一个活跃的开源项目,HAMi 由来自 15+ 国家、350+ 贡献者共同维护,已被 200+ 企业与机构在实际生产环境中采纳,具备良好的可扩展性与支持保 障。 HAMi 社区在 v2.7.0 版本中正式推出了针对 NVIDIA GPU 的 拓扑感知调度 功能。此特性主要解决高性能计算(HPC)和 AI 大模型训练场景下的多卡通 信瓶颈问题,通过智能调度,将计算任务精确部署到物理连接最紧密、通信速度最快的 GPU 组合上,从而最大化加速计算任务,提升集群整体的算力效 能。 本文将在功能介绍的基础上,深入代码实现,详细剖析 HAMi 在支持 NVIDIA GPU 拓扑感知调度时的具体设计与实现原理。 HAMi 对 NVIDIA GPU 的拓扑感知调度,其核心设计思想是:首先在节点本地将复杂的物理拓扑精确地量化为设备间的 "通信分数"。然后,调度器在决策 时,基于这些分数做出最终的、最优的选择。 动态计算拓扑分数 :Device Plugin 能够通过 NVML 动态探测节点上 GPU 间的物理连接拓扑(如 NVLink、PCIe),并将其量化为设备间的 "通信 分数",为 ...
甲骨文推出面向AI的新一代OCI Zettascale10 Cluster
Core Insights - Oracle announced the launch of its large-scale cloud AI supercomputer, Oracle Cloud Infrastructure (OCI) Zettascale10, during the global AI conference [1] - OCI Zettascale10 connects tens of thousands of NVIDIA GPUs across multiple data centers, forming a multi-gigawatt cluster [1] - The peak performance of OCI Zettascale10 can reach 16 zettaFLOPS [1]
英伟达发文:NVIDIA芯片不存在后门、终止开关和监控软件
Jing Ji Guan Cha Bao· 2025-08-06 05:52
Group 1 - NVIDIA asserts that its chips do not contain backdoors, kill switches, or monitoring software, emphasizing the importance of security in modern computing [2] - The company highlights the widespread applications of its GPUs in various sectors, including healthcare, finance, scientific research, autonomous driving, and AI infrastructure [2] - NVIDIA argues that embedding backdoors or kill switches could compromise global digital infrastructure and erode trust in leading technologies, aligning with existing laws that mandate companies to fix vulnerabilities rather than create them [2] Group 2 - Recent reports indicate that NVIDIA's computing chips have been exposed to serious security issues, prompting U.S. lawmakers to call for advanced chips to include tracking and remote shutdown capabilities [3] - The National Internet Information Office of China has requested NVIDIA to explain the security risks associated with its H20 computing chips sold to China, in accordance with local cybersecurity laws [3]
英伟达凌晨发文回应芯片“后门”问题
Core Viewpoint - NVIDIA asserts that its chips do not contain backdoors, kill switches, or monitoring software, emphasizing that these features are not part of building a trustworthy system [1][2] Group 1: Security Concerns - Some experts and policymakers have suggested implementing "kill switches" in hardware to mitigate misuse risks, which NVIDIA argues would be a permanent flaw beyond user control [1][2] - Comparisons have been made between smartphone features like "Find My Phone" and hardware kill switches, but NVIDIA contends that such analogies are flawed as software functions are user-controlled [2] Group 2: Regulatory Actions - On July 31, the Cyberspace Administration of China interviewed NVIDIA regarding security risks associated with its H20 chips sold in China, following reports of serious security issues [2] - NVIDIA responded by stating that its chips do not have backdoors and do not allow remote access or control [2] Group 3: Product Developments - NVIDIA's CEO Jensen Huang announced the submission of a request to resume sales of the H20 GPU, with the U.S. government assuring the company that a license will be granted [3] - The company also introduced a new fully compatible NVIDIA RTX PRO GPU, aimed at digital twin AI for smart factories and logistics [3] Group 4: AI as a Fundamental Resource - Huang emphasized that AI has reached a pivotal point, becoming a fundamental resource akin to energy, water, and the internet, and highlighted NVIDIA's commitment to supporting open-source research and application development globally [3]
浪人早报 | 英伟达声明芯片不存在后门、奇瑞董事长就加班反思致歉、理想i8上市不到一周统一为单一版本…
Xin Lang Ke Ji· 2025-08-06 00:45
Group 1 - Nvidia denies the existence of backdoors in its GPUs, stating that there should be no remote disabling features without user consent [2] - Chery's chairman apologizes for past inefficiencies and announces a 30% reduction in meetings and attendees to improve operational efficiency [2] Group 2 - Li Auto announces that the newly launched Li i8 model will be unified to a single version, i8 Max, with a price adjustment from 349,800 yuan to 339,800 yuan, and offers additional benefits [3] - Yushutech unveils a new quadruped robot, Unitree A2, weighing approximately 37 kg with a range of about 20 km [3][4] Group 3 - Hema X member stores will close all locations by August 31, marking the end of this membership-based retail format in China [5] - AMD reports a second-quarter revenue of $7.69 billion, a 32% year-over-year increase, but a 31% decline in adjusted net profit [6] Group 4 - Gree Electric's chairman emphasizes the importance of product quality over low pricing, stating that misleading consumers can damage brand reputation [7] - Taobao is set to launch a new membership system that integrates various Alibaba services, enhancing user engagement and loyalty [8] Group 5 - JD.com plans to open its first large discount supermarket format in August, featuring a large store model with over 5,000 SKUs and competitive pricing [8] - Geely responds to reports of a major integration of its autonomous driving teams, indicating ongoing discussions about the restructuring [9] Group 6 - State Grid reports a record daily electricity load of 1.222 billion kilowatts, driven by high temperatures, and highlights its efforts to enhance cross-regional power transmission capabilities [11]
英伟达全新开源模型:三倍吞吐、单卡可跑,还拿下推理SOTA
量子位· 2025-07-29 05:05
Core Viewpoint - NVIDIA has launched the Llama Nemotron Super v1.5, an open-source model designed for complex reasoning and agent tasks, achieving state-of-the-art performance while tripling throughput compared to its predecessor, and efficiently running on a single GPU [2][11]. Model Introduction - Llama Nemotron Super v1.5 is an upgraded version of Llama-3.3-Nemotron-Super-49B-V1, specifically tailored for complex reasoning and intelligent agent tasks [3]. Model Architecture - The model employs Neural Architecture Search (NAS) to balance accuracy and efficiency, effectively converting throughput improvements into lower operational costs [4]. - NAS generates non-standard, non-repetitive network modules, introducing two key changes compared to traditional Transformers: - Skip attention mechanism, which bypasses the attention layer in certain modules [6]. - Variable Feedforward Network (FFN), where different modules utilize varying expansion/compression ratios [7]. Efficiency Improvements - The model reduces FLOPs by skipping attention or altering FFN widths, allowing for more efficient operation under resource constraints [8]. - A block-wise distillation process was applied to the original Llama model, constructing multiple variants for each module and searching for optimal combinations [9]. Training and Dataset - The model was trained on 40 billion tokens from three datasets: FineWeb, Buzz-V1.2, and Dolma, focusing on English single-turn and multi-turn conversations [10]. - Post-training involved a combination of supervised fine-tuning and reinforcement learning to enhance performance in key tasks such as coding, mathematics, reasoning, and instruction following [10]. Deployment and Ecosystem - NVIDIA's AI models are optimized for running on NVIDIA GPU-accelerated systems, achieving significant speed improvements over CPU-only solutions [12]. - Llama Nemotron Super v1.5 is now open-source, available for developers on build.nvidia.com or via Hugging Face [13]. Ecosystem and Model Series - The Llama Nemotron ecosystem integrates large language models, training and inference frameworks, optimization tools, and enterprise deployment solutions for high-performance AI application development [14]. - NVIDIA has introduced three series of large language models: Nano, Super, and Ultra, catering to different deployment needs and user profiles [16]. - The Super series, including Llama Nemotron Super v1.5, balances precision and computational efficiency for single GPU use [17]. Enterprise Support - The Nemotron model has gained support from major enterprises like SAP, Microsoft, and Deloitte for building AI agent platforms aimed at enterprise-level process automation and complex problem-solving [17].
热浪中的台积电,却危机四伏
3 6 Ke· 2025-05-14 10:41
Group 1 - TSMC reported a record Q1 2025 revenue of $25.53 billion, a 41.6% year-over-year increase, and an operating profit of $12.38 billion, up 56.1% year-over-year [1] - TSMC's market share has been steadily increasing since Q1 2019, projected to reach 68% by 2025, while Samsung's share is expected to decline from 19% to 8% in the same period [2] - TSMC's wafer shipments in Q1 2025 were 3.26 million, which is 82% of the peak shipment of 3.97 million wafers [9][14] Group 2 - TSMC's 8-inch and 12-inch fab utilization rates are projected to be 69% and 86% respectively in Q1 2025, indicating underutilization compared to historical levels [10][12][14] - The decline in demand for 7nm technology has led to a significant drop in sales, with expectations that TSMC may convert 7nm capacity to 5nm or 3nm nodes [21][23] - TSMC's sales to the US reached a record 77% in Q1 2025, driven by increased demand for AI semiconductors, particularly NVIDIA GPUs [25][27] Group 3 - The share of smartphone sales in TSMC's revenue has decreased to 28% by Q1 2025, while high-performance computing (HPC) sales have risen to 59% [29][31] - TSMC's automotive semiconductor sales remain low, which may impact the future prospects of its Kumamoto factory [32]
摩根士丹利:美国股票策略_区间交易持续,直至形势明朗
摩根· 2025-04-27 03:56
Investment Rating - The report maintains a trading range for the S&P 500 between 5000-5500, indicating a cautious outlook on the market [4][6][11] Core Insights - The dispersion of earnings per share (EPS) revisions is increasing, suggesting that the upcoming earnings season may act as a rotational catalyst rather than affecting the index level [4][25] - The report emphasizes the importance of identifying high-quality stocks in industries that are less risky, such as Transports, Materials, Pharma/Biotech, and Tech Hardware [4][25] - There is a notable uncertainty surrounding tariff impacts, with many companies withdrawing guidance or adopting conservative approaches due to macroeconomic uncertainties [24][41] Summary by Sections Market Overview - The S&P 500 is expected to remain within the 5000-5500 range due to competing factors affecting both upside and downside risks [4][6] - Upside risks include a more dovish Federal Reserve, a broader trade deal with China, and improved earnings revisions, while downside risks involve declining business confidence and rising back-end rates [4][11] Earnings Revisions - Earnings revisions breadth is currently at -24%, the lowest since the 2022 growth scare, indicating a potential for further cuts to 2025/2026 EPS [41][42] - Cyclical and tariff-sensitive industries, such as Autos, Transports, and Tech Hardware, are leading the downward revisions [41][46] Sector Analysis - In Hardlines/Broadlines/Food Retail, companies are reducing exposure to China and absorbing about 50% of tariff costs, with no significant consumer slowdown observed [34] - The Tech Hardware sector is facing challenges due to tariffs, with enterprise spending remaining robust while small and medium businesses are delaying projects [37] - The Media & Telecom sector is experiencing weaker advertising spending, with companies adjusting guidance downward due to macro uncertainties [37] Investment Opportunities - The report suggests focusing on high-quality stocks in less risky industries, utilizing tools such as industry frameworks and quality stock screens to identify potential investments [4][25] - Specific companies identified as quality cyclicals include Halliburton, Schlumberger, and Ecolab, among others [47]
黄仁勋,买下一个团队
投资界· 2024-12-07 07:14
下一个黄金赛道。 作者 I 岳笑笑 报道 I 投资界PEdaily 黄仁勋又出手了。 本周,黄仁勋现身越南,期间宣布一则消息:英伟达已收购越南医疗保健初创公司 Vi nBr a i n。 这一笔交易看似意外,却又有迹可循。今年以来,黄仁勋在多个公开场合强烈表示了对AI医疗的狂热。这一次,他买下越南明星AI医 疗大模型公司——Vi nBr a i n,成为押注AI医疗的又一注脚。如今打开Vi nBr a i n的官网,"Vi nBr a i n is now a pa rt of NVIDIA"的通知 已经出现在主页。 买了一个AI医疗团队 被黄仁勋看中,Vi nBr a i n有何来头? 资料显示,Vi nBr a i n的创始人是St e ve n Tr u on g。他曾在人工智能和软件行业工作了26年,就职于霍尼韦尔、I n t e lliCommuniti e s等顶 级技术公司,还是微软人工智能产品平台的关键发起人。 图源官网 2019年,St e ve n Tr uong回到越南,在照顾中风的年迈母亲时,他感受到到医院过度拥挤给患者及其家人带来的不便。结合自身经 历,他产生了将AI与医疗结合创业的 ...