Workflow
RTX 4090
icon
Search documents
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
具身智能之心· 2025-12-24 00:25
点击下方 卡片 ,关注" 具身智能之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 想用3D高斯泼溅 (3DGS) 重建一座城市? 过去,这往往意味着一套昂贵的GPU集群。如今,研究人员给出了另一种答案: 一张RTX 4090,加上足够大的CPU内存,也可以完成城市 级3D重建 。 来自纽约大学的研究团队在ASPLOS 2026上提出了名为 CLM (CPU-offloaded Large-scale 3DGS training) 的系统。该工作通过将3D 高斯泼溅训练中占用显存最多的参数转移到CPU内存中,使单张消费级显卡也能训练上亿规模的高斯点模型,为大场景神经渲染显著降低了 硬件门槛。 3DGS的规模应用瓶颈 3D高斯泼溅 (3DGS) 因其高质量渲染效果和极高的渲染速度,已成为神经渲染领域的重要技术路线。然而,当研究人员尝试将其用于城市 街区、大型室内空间等复杂场景时,问题很快显现出来—— GPU显存成为最直接、也最难解决的瓶颈 。 一个高精度的3DGS模型通常包含数千万乃至上亿个高斯点。每个高斯点包含位置、形状、颜色和不透 ...
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
3 6 Ke· 2025-12-23 07:27
3DGS的规模应用瓶颈 想用3D高斯泼溅(3DGS)重建一座城市? 过去,这往往意味着一套昂贵的GPU集群。如今,研究人员给出了另一种答案:一张RTX 4090,加上足够大的CPU内存,也可以完成城市级3D重建。 来自纽约大学的研究团队在ASPLOS 2026上提出了名为 CLM(CPU-offloaded Large-scale 3DGS training)的系统。该工作通过将3D高斯泼溅训练中占用显 存最多的参数转移到CPU内存中,使单张消费级显卡也能训练上亿规模的高斯点模型,为大场景神经渲染显著降低了硬件门槛。 3D高斯泼溅(3DGS)因其高质量渲染效果和极高的渲染速度,已成为神经渲染领域的重要技术路线。然而,当研究人员尝试将其用于城市街区、大型室 内空间等复杂场景时,问题很快显现出来——GPU显存成为最直接、也最难解决的瓶颈。 一个高精度的3DGS模型通常包含数千万乃至上亿个高斯点。每个高斯点包含位置、形状、颜色和不透明度等数十个可学习参数,训练过程中还需同时保 存梯度和优化器状态。研究人员指出,即便是RTX 4090这样的24GB显存显卡,也只能容纳约一两千万个高斯点的完整训练状态,远不足以覆盖城市 ...
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
量子位· 2025-12-23 04:16
想用3D高斯泼溅 (3DGS) 重建一座城市? 过去,这往往意味着一套昂贵的GPU集群。如今,研究人员给出了另一种答案: 一张RTX 4090,加上足够大的CPU内存,也可以完成城市 级3D重建 。 来自纽约大学的研究团队在ASPLOS 2026上提出了名为 CLM (CPU-offloaded Large-scale 3DGS training) 的系统。该工作通过将3D 高斯泼溅训练中占用显存最多的参数转移到CPU内存中,使单张消费级显卡也能训练上亿规模的高斯点模型,为大场景神经渲染显著降低了 硬件门槛。 3DGS的规模应用瓶颈 3D高斯泼溅 (3DGS) 因其高质量渲染效果和极高的渲染速度,已成为神经渲染领域的重要技术路线。然而,当研究人员尝试将其用于城市 街区、大型室内空间等复杂场景时,问题很快显现出来—— GPU显存成为最直接、也最难解决的瓶颈 。 非羊 整理自 凹非寺 量子位 | 公众号 QbitAI 一个高精度的3DGS模型通常包含数千万乃至上亿个高斯点。每个高斯点包含位置、形状、颜色和不透明度等数十个可学习参数,训练过程中 还需同时保存梯度和优化器状态。研究人员指出,即便是RTX 4090这样 ...
最大游戏up主也玩本地AI?让笔记本都能跑大模型的Parallax来了
机器之心· 2025-11-20 09:35
Core Viewpoint - PewDiePie, a prominent gaming influencer, has created a local AI system, sparking widespread discussion about the potential of local AI deployments versus cloud-based solutions [1][5][6]. Group 1: Local AI System Development - PewDiePie invested $20,000 to assemble a local AI system with 10 NVIDIA GPUs, including 8 modified RTX 4090 and 2 RTX 4000 Ada, capable of running models with parameters ranging from 70 billion to 245 billion [4]. - The local AI system allows for complete control over the AI environment, contrasting with traditional cloud-based AI models where users rent resources without ownership [10][11]. - The local AI's key advantages include privacy, performance, and composability, making it an attractive option for users concerned about data security and control [12][18]. Group 2: Rise of Local AI Projects - The emergence of local AI projects like Parallax has gained significant attention, with endorsements from various AI communities and platforms [16][23]. - Parallax is described as a fully autonomous local AI operating system, challenging the notion that AI must be cloud-based [24][25]. - The system supports cross-platform deployment across different devices, allowing users to maintain control over their models and data [26]. Group 3: Performance and Scalability - Parallax offers three operational modes: single device, local cluster, and global cluster, enabling flexible deployment options [29]. - Performance tests indicate that Parallax can significantly enhance inference speed and throughput compared to existing solutions, achieving up to 3.2 times higher throughput in GPU pool configurations [31]. - The system is compatible with over 40 open-source models and can run seamlessly on various operating systems, enhancing its accessibility [31]. Group 4: Getting Started with Parallax - The Parallax GitHub repository provides clear guidance for users to start deploying models on their devices [33]. - Users have successfully run models like Qwen 235B on personal devices, indicating the practicality of local AI setups [34]. - An ongoing event encourages users to showcase their local AI setups, with attractive prizes, further promoting engagement with the Parallax platform [37][38].
恐慌又来了!欧美一起跌,道指重挫超500点,苹果英伟达低迷
Sou Hu Cai Jing· 2025-11-18 18:39
标普500指数下跌0.92%,4月以来首次跌破50日均线这一关键技术支撑位。 市场恐慌指数VIX随之飙升12.97%,报22.39,显示投资者恐慌情绪大幅升温。 华尔街的恐慌情绪像病毒一样蔓延。 道指暴跌557点,标普500指数跌破关键支撑线,而"硅谷风投教父"彼得·蒂尔清仓英伟达的全部股份,正在引发一场关 于AI泡沫的深刻忧虑。 交易终端屏幕上的红色数字不断跳动。 华尔街刚刚经历了一个不眠之夜,道指暴跌557.24点,跌幅达1.18%,创下近一个月来最差表现。 彼得·蒂尔的清仓行为引发市场广泛关注。 他不仅清空了英伟达全部持股,还减持特斯拉20.76万股,减持比例高达76%。 蒂尔宏观基金三季度末持仓总市值仅为7440万美元,相比二季度的2.12亿美元大幅下降65%。 该基金同时新建仓苹果和微软,但总体呈现大幅收缩态势。 彼得·蒂尔今年早些时候曾警告英伟达估值过高,并将科技股估值飙升与1999-2000年互联网泡沫破灭进行了类比。 作为PayPal联合创始人和Facebook早期投资者,他在硅谷拥有巨大影响力,其投资动向备受关注。 在这场全面溃败中,"硅谷风投教父"彼得·蒂尔旗下基金清仓英伟达全部股份的消息 ...
疯了,游戏本逆天改装:一颗电阻4090反杀5090
3 6 Ke· 2025-11-12 03:47
Core Viewpoint - A modification involving the addition of a single resistor has allowed an RTX 4090 gaming laptop to outperform an RTX 5090 in certain benchmarks, highlighting the significant impact of power consumption on performance [1][10]. Group 1: Power Consumption and Performance - Power consumption is a critical factor that directly influences the performance of gaming laptops, with high-end models often boasting total power consumption exceeding 200W [3][5]. - The total power consumption typically refers to the combined power of the CPU and GPU, where higher power levels correlate with better performance due to enhanced cooling and power supply requirements [5][12]. - A user modified their ROG Zephyrus M16 by adding a resistor, effectively lowering the circuit resistance and allowing the RTX 4090 to draw nearly double its original power limit, resulting in performance that rivals the RTX 5090 [9][10]. Group 2: Benchmark Comparisons - After the modification, the performance of the RTX 4090 in the ROG M16 surpassed that of the RTX 5090 in most 3DMark tests, with the highest score in the Speedway benchmark showing a 9.6% lead [10][11]. - The overall performance improvement from the modification was over 20% in most benchmarks, with some tests showing increases of more than 35% [11][12]. Group 3: Manufacturer Limitations - NVIDIA is identified as the entity that sets power consumption limits for mobile GPUs, which restricts manufacturers from fully utilizing the hardware's potential [13][15]. - Despite the potential for higher performance through increased power limits, manufacturers often adhere to NVIDIA's restrictions to maintain product differentiation and avoid market conflicts [15][16]. - There are indications that NVIDIA may consider lifting power limits for future high-end models to cater to hardcore gaming enthusiasts seeking significant performance boosts [15][16].
打破显存墙:谢赛宁团队提出CLM,单卡RTX 4090「撬动」1亿高斯点
机器之心· 2025-11-11 08:40
Core Insights - 3D Gaussian Splatting (3DGS) is an emerging method for novel view synthesis that utilizes a set of images with poses to iteratively train a scene representation composed of numerous anisotropic 3D Gaussian bodies, capturing the appearance and geometry of the scene [2][4] - The CLM system proposed by the team allows 3DGS to render large scenes using a single consumer-grade GPU, such as the RTX 4090, by addressing GPU memory limitations [6][8] Group 1: 3DGS Overview - 3DGS has shown revolutionary application potential in fields such as 3D modeling, digital twins, visual effects (VFX), VR/AR, and robot vision reconstruction (SLAM) [5] - The quality of images rendered using 3DGS depends on the fidelity of the trained scene representation, with larger and more complex scenes requiring more Gaussian bodies, leading to increased memory usage [5] Group 2: CLM System Design - CLM is designed based on the insight that the computation of 3DGS is inherently sparse, allowing only a small subset of Gaussian bodies to be accessed during each training iteration [8][20] - The system employs a novel unloading strategy that minimizes performance overhead and scales to large scenes by dynamically loading only the necessary Gaussian bodies into GPU memory while offloading the rest to CPU memory [8][11] Group 3: Performance and Efficiency - The implementation of CLM can render a large scene requiring 102 million Gaussian bodies on a single RTX 4090 while achieving top-tier reconstruction quality [8] - Each view typically accesses only 0.39% of the Gaussian points, with a maximum of 1.06% for any single view, highlighting the sparse nature of the data [23] Group 4: Optimization Techniques - The team utilized several unique characteristics of 3DGS to significantly reduce communication overhead associated with unloading, including pre-computing the accessed Gaussian sets for each view and leveraging spatial locality to optimize data transfer between CPU and GPU [12][17] - The microbatch scheduling optimization allows for overlapping access patterns between consecutive batches, enhancing cache hit rates and reducing redundant data transfers [24][25] Group 5: Results and Impact - CLM enhances the training capacity of 3DGS models by up to 6.1 times compared to pure GPU training baselines, enabling the training of larger models that improve scene reconstruction accuracy while lowering communication and unloading overhead [27]
X @vitalik.eth
vitalik.eth· 2025-10-16 01:23
RT Justin Drake (@drakefjustin)Progress toward real-time proving for Ethereum L1 is nothing short of extraordinary.In May, SP1 Hypercube proved 94% of L1 blocks in under 12 seconds using 160 RTX 4090s. Five months later Pico Prism proves 99.9% of the same blocks in under 12 seconds, with just 64 RTX 5090s. Average proving latency is now 6.9 seconds.Performance has outpaced Moore's law ever since Zcash pioneered practical SNARKs a decade ago. Today's Pico Prism results are a striking reminder of that exponen ...
Advanced Micro Devices, Inc. (AMD): A Bull Case Theory
Yahoo Finance· 2025-09-28 23:43
Core Thesis - Advanced Micro Devices, Inc. (AMD) is positioned as a strong investment opportunity due to its potential market share gains and the challenges faced by competitor Nvidia, with a target price range of $168–$187 over the next 12–18 months [2][5]. Financial Performance - AMD reported a 32% year-over-year revenue growth in Q2 2025, reaching $7.7 billion, driven by a 73% increase in gaming revenue to $1.1 billion and a 14% rise in data center revenue to $3.2 billion [3]. - Wall Street forecasts suggest a 15–20% compound annual growth rate (CAGR) for earnings per share (EPS) through 2027, despite near-term margin pressures from export controls [3]. Competitive Landscape - Nvidia's structural GPU reliability issues, such as problems with RTX 4090 connectors, create a competitive opportunity for AMD, which is seen as a stable alternative [4]. - AMD's RX 9070 XT shows strong performance and improved power efficiency, while its open-source ROCm platform enhances its data center positioning [4]. Market Opportunities - AMD could capture $3.6–$6 billion in incremental revenue from potential market share gains in the $120 billion discrete GPU segment, although Nvidia's ecosystem dominance poses challenges [5]. - The company's diversified revenue streams and competitive GPU offerings support the potential for multiple expansions, despite macroeconomic risks such as Federal Reserve rate hikes [5]. Historical Context - AMD's stock price has appreciated approximately 39% since May 2025, reflecting strong revenue growth driven by data center and Ryzen processor sales, as well as AI demand [6].
X @Polyhedra
Polyhedra· 2025-09-25 17:00
Multi-GPU Environment - Confirmed stable execution and compatibility of MPI runtime on a dual-GPU setup (RTX 4090 ×2, CUDA 12.8) [1] Hardware & Software - Validated MPI runtime on a dual-GPU setup with RTX 4090 ×2 and CUDA 12.8 [1]