神经渲染
Search documents
黄仁勋CES回应全场!内存卡了GPU脖子,游戏玩家可能只能用旧显卡了
量子位· 2026-01-07 09:11
Core Viewpoint - Huang Renxun emphasizes that robots are the "AI immigrants" capable of taking on jobs that humans are unwilling to do, highlighting the need for AI to support economic growth and job creation [10][11]. Group 1: AI and Robotics - Huang states that the "robot revolution" will drive economic progress and create more job opportunities while maintaining low inflation levels [11]. - He predicts that by the end of this year, robots will achieve human-level capabilities in mobility, joint movement, and fine motor skills [12]. - The development of robots requires not only visual perception but also tactile capabilities, which poses significant technical challenges [13]. Group 2: Autonomous Driving - Huang introduced the world's first open-source, large-scale autonomous driving visual-language-action (VLA) reasoning model, Alpamayo 1, and praised Tesla's FSD technology as world-class [15][16]. - NVIDIA's role is to provide a complete technology stack for companies developing autonomous vehicles, rather than manufacturing the vehicles themselves [16][20]. - The company has a high industry penetration rate, with over 1 billion vehicles on the road, and expects that millions will have strong autonomous driving capabilities in the next decade [20]. Group 3: AI Infrastructure and Memory Supply - Huang introduced NVIDIA's next-generation AI supercomputing platform, Vera Rubin, and discussed the challenges posed by rising memory prices and supply constraints [24][25]. - The company is positioned as a key player in the memory market, addressing the growing demand for high-bandwidth memory (HBM) and collaborating closely with suppliers to ensure production capacity aligns with product launches [36]. Group 4: Gaming and AI - NVIDIA upgraded its super-resolution model with the new DLSS 4.5 version, indicating a shift towards AI-driven gaming experiences [31]. - Huang predicts that future video games will be filled with AI characters, significantly enhancing realism and interactivity [32][33].
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
具身智能之心· 2025-12-24 00:25
点击下方 卡片 ,关注" 具身智能之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 想用3D高斯泼溅 (3DGS) 重建一座城市? 过去,这往往意味着一套昂贵的GPU集群。如今,研究人员给出了另一种答案: 一张RTX 4090,加上足够大的CPU内存,也可以完成城市 级3D重建 。 来自纽约大学的研究团队在ASPLOS 2026上提出了名为 CLM (CPU-offloaded Large-scale 3DGS training) 的系统。该工作通过将3D 高斯泼溅训练中占用显存最多的参数转移到CPU内存中,使单张消费级显卡也能训练上亿规模的高斯点模型,为大场景神经渲染显著降低了 硬件门槛。 3DGS的规模应用瓶颈 3D高斯泼溅 (3DGS) 因其高质量渲染效果和极高的渲染速度,已成为神经渲染领域的重要技术路线。然而,当研究人员尝试将其用于城市 街区、大型室内空间等复杂场景时,问题很快显现出来—— GPU显存成为最直接、也最难解决的瓶颈 。 一个高精度的3DGS模型通常包含数千万乃至上亿个高斯点。每个高斯点包含位置、形状、颜色和不透 ...
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
3 6 Ke· 2025-12-23 07:27
Core Insights - The article discusses a new system called CLM (CPU-offloaded Large-scale 3DGS training) developed by a research team from New York University, which allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory [1][20]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a significant technology in neural rendering due to its high-quality rendering and speed, but it faces scalability issues when applied to complex scenes like urban areas, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a billion Gaussian points, with each point requiring substantial memory for parameters and gradients, making it difficult to train on a single GPU [2][3]. Group 2: CLM System Design - CLM is designed to address the GPU memory bottleneck by dynamically loading Gaussian parameters from CPU memory only when needed, rather than keeping all parameters in GPU memory [3][4]. - The system employs three key mechanisms: 1. **Attribute Segmentation**: Only "key attributes" necessary for visibility are stored in GPU memory, while the majority of parameters are offloaded to CPU memory [5][6]. 2. **Pre-rendering Visibility Culling**: CLM calculates visible Gaussian points before rendering, reducing unnecessary computations and memory usage on the GPU [7][8]. 3. **Efficient CPU Utilization**: CLM minimizes data transfer delays through micro-batching, caching, and intelligent scheduling, allowing the CPU to effectively assist in training without slowing down the process [10][12]. Group 3: Performance Results - The implementation of CLM on an RTX 4090 allowed for the training of 102.2 million Gaussian points, a 6.7-fold increase compared to the traditional method, which could only handle 15.3 million points [13][14]. - Despite communication overhead, CLM achieved a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [16]. - The quality of reconstruction improved significantly, with the PSNR of the 102.2 million point model reaching 25.15 dB, compared to 23.93 dB for the 15.3 million point model [18]. Group 4: Broader Implications - CLM represents a cost-effective solution for large-scale 3D reconstruction, addressing deployment challenges without the need for multi-GPU setups, which is beneficial for both academic and industrial applications [20]. - The growing demand for efficient and low-cost 3D reconstruction tools in areas like digital twins and large-scale mapping makes CLM's approach particularly relevant [20].
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
量子位· 2025-12-23 04:16
Core Viewpoint - The article discusses the introduction of CLM (CPU-offloaded Large-scale 3DGS training), a system that allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory, significantly lowering hardware requirements for large-scale neural rendering [1][21]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a crucial technology in neural rendering due to its high-quality output and rendering speed, but it faces significant challenges when applied to complex scenes like urban blocks, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a hundred million Gaussian points, with each point requiring substantial memory for parameters, gradients, and optimizer states. Even high-end GPUs like the RTX 4090, with 24GB of memory, can only handle about 15-20 million points, which is insufficient for city-scale scenes [2][3]. Group 2: CLM Design Principles - CLM is based on the observation that only a small fraction of Gaussian points are actively used during each rendering pass, with less than 1% of points accessed in large scenes [3]. - The system design of CLM involves dynamically loading Gaussian parameters from CPU memory as needed, rather than keeping all parameters in GPU memory [4]. Group 3: Key Mechanisms of CLM - **Attribute Segmentation**: CLM retains only "key attributes" (10 parameters) necessary for visibility checks in GPU memory, while the remaining 80% of "non-key attributes" are stored in CPU memory and loaded on demand [6][7]. - **Pre-rendering Visibility Culling**: Unlike traditional methods, CLM calculates visible Gaussian point indices before rendering, reducing unnecessary GPU computations and memory usage by only loading visible points from CPU memory [9][10]. - **Efficient CPU-GPU Collaboration**: CLM employs a multi-layered design to mitigate data transfer delays, including micro-batching, caching mechanisms, and intelligent scheduling to maximize efficiency and minimize communication overhead [12][13][14][15]. Group 4: Performance Results - CLM technology significantly increases model size, allowing for the training of 102.2 million Gaussian points on the "MatrixCity BigCity" dataset, a 6.7-fold increase compared to traditional methods, which maxed out at 15.3 million points [16]. - The quality of reconstruction improves with more parameters, achieving a PSNR of 25.15dB for the 102.2 million point model, compared to 23.93dB for the smaller model [18]. - Despite communication overhead, CLM maintains a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [19]. Group 5: Broader Implications - CLM represents a significant advancement in addressing deployment bottlenecks in 3DGS training, integrating CPU resources into the training process without the need for multi-GPU setups, thus providing a cost-effective solution for large-scale scene reconstruction [21]. - The growing demand for efficient and low-cost 3D reconstruction tools in applications like digital twins and large-scale map reconstruction highlights the importance of CLM's approach in optimizing existing computational resources [21].
仿真专场!一文尽览神经渲染(NERF/3DGS)技术在具身仿真框架Isaac Sim中的实现
具身智能之心· 2025-09-28 01:05
Core Viewpoint - Neural Rendering (NERF/3DGS) is revolutionizing 3D reconstruction technology, significantly enhancing the realism of images used in autonomous driving and embodied intelligence simulations, addressing the limitations of traditional computer graphics rendering [3][4]. Group 1: Background and Technology - NERF and 3DGS utilize neural networks to express spatial data, excelling in new perspective synthesis, which is crucial for sensor simulation in autonomous driving and embodied intelligence [3]. - The integration of NERF and 3DGS into existing simulation frameworks is proposed as a more efficient approach than developing new frameworks from scratch, allowing for real-time rendering while leveraging existing 3D digital assets and algorithm interfaces [3][4]. Group 2: Implementation in Simulation Software - NVIDIA's Isaac Sim has incorporated neural rendering technology, enabling the insertion of 3DGS models into simulation environments, allowing for both static backgrounds and dynamic interactive objects [4][5]. - The process of importing 3DGS models into Isaac Sim involves generating USDZ models and ensuring they possess physical properties for interaction within the simulation [5][8]. Group 3: Model Interaction and Physics - To achieve realistic interactions, imported models must have physical attributes added, such as collision properties, to ensure they interact correctly with other objects in the simulation [8][14]. - The integration of dynamic objects, such as a LEGO bulldozer, into the simulation environment demonstrates the capability of 3DGS models to interact with both static and dynamic elements [11][15]. Group 4: Performance and Future Considerations - The performance metrics indicate that even with a high workload, the simulation maintains a good frame rate and low memory usage, showcasing the efficiency of the neural rendering technology [17]. - Future challenges include improving light and shadow interactions between 3DGS models, providing accurate ground truth information for algorithms, and enhancing computational efficiency for larger scenes [18][19].
自动驾驶之心项目与论文辅导来了~
自动驾驶之心· 2025-08-07 12:00
Core Viewpoint - The article announces the launch of the "Heart of Autonomous Driving" project and paper guidance, aimed at assisting students facing challenges in their research and development efforts in the field of autonomous driving [1]. Group 1: Project and Guidance Overview - The project aims to provide support for students who encounter difficulties in their research, such as environmental configuration issues and debugging challenges [1]. - Last year's outcomes were positive, with several students successfully publishing papers in top conferences like CVPR and ICRA [1]. Group 2: Guidance Directions - **Direction 1**: Focus on multi-modal perception and computer vision, end-to-end autonomous driving, large models, and BEV perception. The guiding teacher has published over 30 papers in top AI conferences with a citation count exceeding 6000 [3]. - **Direction 2**: Emphasis on 3D Object Detection, Semantic Segmentation, Occupancy Prediction, and multi-task learning based on images or point clouds. The guiding teacher is a top-tier PhD with multiple publications in ECCV and CVPR [5]. - **Direction 3**: Concentration on end-to-end autonomous driving, OCC, BEV, and world model directions. The guiding teacher is also a top-tier PhD with contributions to several mainstream perception solutions [6]. - **Direction 4**: Focus on NeRF / 3D GS neural rendering and 3D reconstruction. The guiding teacher has published four CCF-A class papers, including two in CVPR and two in IEEE Transactions [7].
4万多名作者挤破头,CVPR 2025官方揭秘三大爆款主题, 你卷对方向了吗?
机器之心· 2025-05-28 03:02
Core Insights - The article discusses the latest trends in the field of computer vision, highlighting three major research directions that are gaining traction as of 2025 [3][4]. Group 1: Major Research Directions - The three prominent areas identified are: 1. Multi-view and sensor 3D technology, which has evolved from 2D rendering to more complex 3D evaluations, significantly influenced by the introduction of NeRF in 2020 [5]. 2. Image and video synthesis, which has become a focal point for presenting environmental information more accurately, reflecting advancements in the ability to analyze and generate multimedia content [6]. 3. Multimodal learning, which integrates visual, linguistic, and reasoning capabilities, indicating a trend towards more interactive and comprehensive AI systems [7][8]. Group 2: Conference Insights - The CVPR 2025 conference has seen a 13% increase in paper submissions, with a total of 13,008 submissions and an acceptance rate of 22.1%, indicating a highly competitive environment [3]. - The conference emphasizes the importance of diverse voices in the research community, ensuring that every paper, regardless of the author's affiliation, is given equal consideration [8].