Workflow
人工智能模型可靠性
icon
Search documents
GPU和CPU,发出警告
半导体行业观察· 2025-07-14 01:16
Core Viewpoint - NVIDIA has urged customers to enable Error-Correcting Code (ECC) to defend against a new variant of RowHammer attacks targeting its GPUs, known as GPUHammer, which can manipulate data in GPU memory [3][4][5]. Group 1: GPUHammer Attack Details - GPUHammer is the first RowHammer exploit specifically targeting NVIDIA GPUs, allowing malicious users to flip bits in GPU memory and alter data of other users [3]. - The most alarming consequence of this attack is a drastic drop in AI model accuracy, from 80% to below 1% [4]. - Unlike CPUs, which have benefitted from side-channel defense research, GPUs lack parity checks and instruction-level access control, making them more vulnerable to low-level fault injection attacks [5]. Group 2: Impact on AI Models - In a proof-of-concept, single-bit flips were used to corrupt an ImageNet deep neural network model, reducing its accuracy from 80% to 0.1% [5]. - GPUHammer poses a broader threat to AI infrastructure, encompassing various attacks from GPU-level faults to data poisoning and model pipeline intrusions [5][6]. Group 3: Shared GPU Environment Risks - In shared GPU environments, such as cloud machine learning platforms, malicious tenants can launch GPUHammer attacks against adjacent workloads, affecting inference accuracy and corrupting cached model parameters without direct access [7]. - This introduces cross-tenant risks that are often overlooked in current GPU security considerations [7]. Group 4: Recommendations and Mitigations - To mitigate the risks posed by GPUHammer, enabling ECC is recommended, although it may reduce the performance of A6000 GPUs by 10% and decrease memory capacity by 6.25% [9][10]. - Monitoring GPU error logs for ECC-related corrections can help identify ongoing bit-flip attempts [9]. - Newer NVIDIA GPUs, such as H100 or RTX 5090, are not affected due to on-chip ECC capabilities [9]. Group 5: Broader Implications - The implications of GPUHammer extend to edge AI deployments, autonomous systems, and fraud detection engines, where silent corruption may be difficult to detect or reverse [9]. - Organizations deploying GPU-intensive AI must incorporate GPU memory integrity into their security and audit frameworks to comply with regulatory standards [10]. Group 6: AMD Vulnerabilities - AMD has warned of a new side-channel attack, Transient Scheduler Attack (TSA), affecting multiple chip models, which could lead to information leakage [11][12]. - The vulnerabilities are rated as medium to low severity, but their complexity means only attackers with local access can exploit them [11][13]. - AMD suggests updating to the latest Windows versions to mitigate these vulnerabilities, although the attacks are difficult to execute [19].