RTX 4090
Search documents
看看大家用 Moltbot (Clawdbot) 都干啥了~想要 GPU,自己想办法买~
菜鸟教程· 2026-01-30 03:30
Core Viewpoint - The article discusses the emergence and functionalities of Moltbot, previously known as Clawdbot, highlighting its capabilities in automating various tasks and its implications for the future of artificial intelligence [1][10][35]. Group 1: Moltbot Overview - Moltbot is a rebranded version of Clawdbot, maintaining the same functionalities but with a new name and mascot [1]. - The platform allows users to deploy AI agents for various tasks, showcasing its versatility in handling personal and professional responsibilities [6][35]. Group 2: User Experiences - Users have reported deploying Moltbot for significant life changes, such as resigning from jobs, managing personal relationships, and even filing patents, demonstrating its potential to handle complex tasks [8][11][17]. - One user humorously noted that after instructing Moltbot to take over their life, they woke up to find their bank account inaccessible, but their credit score had improved to 847 [9][14]. Group 3: Financial Applications - Users have experimented with giving Moltbot access to their investment portfolios, instructing it to trade aggressively, which resulted in significant losses despite the advanced strategies employed [23][26]. - The platform's ability to analyze market data and execute trades continuously raises questions about the reliability and effectiveness of AI in financial decision-making [22][35]. Group 4: Broader Implications - The article suggests that the combination of large language models (LLMs), tools, and chat interfaces can lead to practical applications in various fields, including finance and personal management [35]. - There is a growing concern among developers about the rapid advancements in AI technology, indicating a shift in the skill sets required for future roles in the industry [36].
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
具身智能之心· 2025-12-24 00:25
点击下方 卡片 ,关注" 具身智能之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 想用3D高斯泼溅 (3DGS) 重建一座城市? 过去,这往往意味着一套昂贵的GPU集群。如今,研究人员给出了另一种答案: 一张RTX 4090,加上足够大的CPU内存,也可以完成城市 级3D重建 。 来自纽约大学的研究团队在ASPLOS 2026上提出了名为 CLM (CPU-offloaded Large-scale 3DGS training) 的系统。该工作通过将3D 高斯泼溅训练中占用显存最多的参数转移到CPU内存中,使单张消费级显卡也能训练上亿规模的高斯点模型,为大场景神经渲染显著降低了 硬件门槛。 3DGS的规模应用瓶颈 3D高斯泼溅 (3DGS) 因其高质量渲染效果和极高的渲染速度,已成为神经渲染领域的重要技术路线。然而,当研究人员尝试将其用于城市 街区、大型室内空间等复杂场景时,问题很快显现出来—— GPU显存成为最直接、也最难解决的瓶颈 。 一个高精度的3DGS模型通常包含数千万乃至上亿个高斯点。每个高斯点包含位置、形状、颜色和不透 ...
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
3 6 Ke· 2025-12-23 07:27
Core Insights - The article discusses a new system called CLM (CPU-offloaded Large-scale 3DGS training) developed by a research team from New York University, which allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory [1][20]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a significant technology in neural rendering due to its high-quality rendering and speed, but it faces scalability issues when applied to complex scenes like urban areas, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a billion Gaussian points, with each point requiring substantial memory for parameters and gradients, making it difficult to train on a single GPU [2][3]. Group 2: CLM System Design - CLM is designed to address the GPU memory bottleneck by dynamically loading Gaussian parameters from CPU memory only when needed, rather than keeping all parameters in GPU memory [3][4]. - The system employs three key mechanisms: 1. **Attribute Segmentation**: Only "key attributes" necessary for visibility are stored in GPU memory, while the majority of parameters are offloaded to CPU memory [5][6]. 2. **Pre-rendering Visibility Culling**: CLM calculates visible Gaussian points before rendering, reducing unnecessary computations and memory usage on the GPU [7][8]. 3. **Efficient CPU Utilization**: CLM minimizes data transfer delays through micro-batching, caching, and intelligent scheduling, allowing the CPU to effectively assist in training without slowing down the process [10][12]. Group 3: Performance Results - The implementation of CLM on an RTX 4090 allowed for the training of 102.2 million Gaussian points, a 6.7-fold increase compared to the traditional method, which could only handle 15.3 million points [13][14]. - Despite communication overhead, CLM achieved a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [16]. - The quality of reconstruction improved significantly, with the PSNR of the 102.2 million point model reaching 25.15 dB, compared to 23.93 dB for the 15.3 million point model [18]. Group 4: Broader Implications - CLM represents a cost-effective solution for large-scale 3D reconstruction, addressing deployment challenges without the need for multi-GPU setups, which is beneficial for both academic and industrial applications [20]. - The growing demand for efficient and low-cost 3D reconstruction tools in areas like digital twins and large-scale mapping makes CLM's approach particularly relevant [20].
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
量子位· 2025-12-23 04:16
Core Viewpoint - The article discusses the introduction of CLM (CPU-offloaded Large-scale 3DGS training), a system that allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory, significantly lowering hardware requirements for large-scale neural rendering [1][21]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a crucial technology in neural rendering due to its high-quality output and rendering speed, but it faces significant challenges when applied to complex scenes like urban blocks, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a hundred million Gaussian points, with each point requiring substantial memory for parameters, gradients, and optimizer states. Even high-end GPUs like the RTX 4090, with 24GB of memory, can only handle about 15-20 million points, which is insufficient for city-scale scenes [2][3]. Group 2: CLM Design Principles - CLM is based on the observation that only a small fraction of Gaussian points are actively used during each rendering pass, with less than 1% of points accessed in large scenes [3]. - The system design of CLM involves dynamically loading Gaussian parameters from CPU memory as needed, rather than keeping all parameters in GPU memory [4]. Group 3: Key Mechanisms of CLM - **Attribute Segmentation**: CLM retains only "key attributes" (10 parameters) necessary for visibility checks in GPU memory, while the remaining 80% of "non-key attributes" are stored in CPU memory and loaded on demand [6][7]. - **Pre-rendering Visibility Culling**: Unlike traditional methods, CLM calculates visible Gaussian point indices before rendering, reducing unnecessary GPU computations and memory usage by only loading visible points from CPU memory [9][10]. - **Efficient CPU-GPU Collaboration**: CLM employs a multi-layered design to mitigate data transfer delays, including micro-batching, caching mechanisms, and intelligent scheduling to maximize efficiency and minimize communication overhead [12][13][14][15]. Group 4: Performance Results - CLM technology significantly increases model size, allowing for the training of 102.2 million Gaussian points on the "MatrixCity BigCity" dataset, a 6.7-fold increase compared to traditional methods, which maxed out at 15.3 million points [16]. - The quality of reconstruction improves with more parameters, achieving a PSNR of 25.15dB for the 102.2 million point model, compared to 23.93dB for the smaller model [18]. - Despite communication overhead, CLM maintains a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [19]. Group 5: Broader Implications - CLM represents a significant advancement in addressing deployment bottlenecks in 3DGS training, integrating CPU resources into the training process without the need for multi-GPU setups, thus providing a cost-effective solution for large-scale scene reconstruction [21]. - The growing demand for efficient and low-cost 3D reconstruction tools in applications like digital twins and large-scale map reconstruction highlights the importance of CLM's approach in optimizing existing computational resources [21].
最大游戏up主也玩本地AI?让笔记本都能跑大模型的Parallax来了
机器之心· 2025-11-20 09:35
Core Viewpoint - PewDiePie, a prominent gaming influencer, has created a local AI system, sparking widespread discussion about the potential of local AI deployments versus cloud-based solutions [1][5][6]. Group 1: Local AI System Development - PewDiePie invested $20,000 to assemble a local AI system with 10 NVIDIA GPUs, including 8 modified RTX 4090 and 2 RTX 4000 Ada, capable of running models with parameters ranging from 70 billion to 245 billion [4]. - The local AI system allows for complete control over the AI environment, contrasting with traditional cloud-based AI models where users rent resources without ownership [10][11]. - The local AI's key advantages include privacy, performance, and composability, making it an attractive option for users concerned about data security and control [12][18]. Group 2: Rise of Local AI Projects - The emergence of local AI projects like Parallax has gained significant attention, with endorsements from various AI communities and platforms [16][23]. - Parallax is described as a fully autonomous local AI operating system, challenging the notion that AI must be cloud-based [24][25]. - The system supports cross-platform deployment across different devices, allowing users to maintain control over their models and data [26]. Group 3: Performance and Scalability - Parallax offers three operational modes: single device, local cluster, and global cluster, enabling flexible deployment options [29]. - Performance tests indicate that Parallax can significantly enhance inference speed and throughput compared to existing solutions, achieving up to 3.2 times higher throughput in GPU pool configurations [31]. - The system is compatible with over 40 open-source models and can run seamlessly on various operating systems, enhancing its accessibility [31]. Group 4: Getting Started with Parallax - The Parallax GitHub repository provides clear guidance for users to start deploying models on their devices [33]. - Users have successfully run models like Qwen 235B on personal devices, indicating the practicality of local AI setups [34]. - An ongoing event encourages users to showcase their local AI setups, with attractive prizes, further promoting engagement with the Parallax platform [37][38].
恐慌又来了!欧美一起跌,道指重挫超500点,苹果英伟达低迷
Sou Hu Cai Jing· 2025-11-18 18:39
标普500指数下跌0.92%,4月以来首次跌破50日均线这一关键技术支撑位。 市场恐慌指数VIX随之飙升12.97%,报22.39,显示投资者恐慌情绪大幅升温。 华尔街的恐慌情绪像病毒一样蔓延。 道指暴跌557点,标普500指数跌破关键支撑线,而"硅谷风投教父"彼得·蒂尔清仓英伟达的全部股份,正在引发一场关 于AI泡沫的深刻忧虑。 交易终端屏幕上的红色数字不断跳动。 华尔街刚刚经历了一个不眠之夜,道指暴跌557.24点,跌幅达1.18%,创下近一个月来最差表现。 彼得·蒂尔的清仓行为引发市场广泛关注。 他不仅清空了英伟达全部持股,还减持特斯拉20.76万股,减持比例高达76%。 蒂尔宏观基金三季度末持仓总市值仅为7440万美元,相比二季度的2.12亿美元大幅下降65%。 该基金同时新建仓苹果和微软,但总体呈现大幅收缩态势。 彼得·蒂尔今年早些时候曾警告英伟达估值过高,并将科技股估值飙升与1999-2000年互联网泡沫破灭进行了类比。 作为PayPal联合创始人和Facebook早期投资者,他在硅谷拥有巨大影响力,其投资动向备受关注。 在这场全面溃败中,"硅谷风投教父"彼得·蒂尔旗下基金清仓英伟达全部股份的消息 ...
疯了,游戏本逆天改装:一颗电阻4090反杀5090
3 6 Ke· 2025-11-12 03:47
Core Viewpoint - A modification involving the addition of a single resistor has allowed an RTX 4090 gaming laptop to outperform an RTX 5090 in certain benchmarks, highlighting the significant impact of power consumption on performance [1][10]. Group 1: Power Consumption and Performance - Power consumption is a critical factor that directly influences the performance of gaming laptops, with high-end models often boasting total power consumption exceeding 200W [3][5]. - The total power consumption typically refers to the combined power of the CPU and GPU, where higher power levels correlate with better performance due to enhanced cooling and power supply requirements [5][12]. - A user modified their ROG Zephyrus M16 by adding a resistor, effectively lowering the circuit resistance and allowing the RTX 4090 to draw nearly double its original power limit, resulting in performance that rivals the RTX 5090 [9][10]. Group 2: Benchmark Comparisons - After the modification, the performance of the RTX 4090 in the ROG M16 surpassed that of the RTX 5090 in most 3DMark tests, with the highest score in the Speedway benchmark showing a 9.6% lead [10][11]. - The overall performance improvement from the modification was over 20% in most benchmarks, with some tests showing increases of more than 35% [11][12]. Group 3: Manufacturer Limitations - NVIDIA is identified as the entity that sets power consumption limits for mobile GPUs, which restricts manufacturers from fully utilizing the hardware's potential [13][15]. - Despite the potential for higher performance through increased power limits, manufacturers often adhere to NVIDIA's restrictions to maintain product differentiation and avoid market conflicts [15][16]. - There are indications that NVIDIA may consider lifting power limits for future high-end models to cater to hardcore gaming enthusiasts seeking significant performance boosts [15][16].
打破显存墙:谢赛宁团队提出CLM,单卡RTX 4090「撬动」1亿高斯点
机器之心· 2025-11-11 08:40
Core Insights - 3D Gaussian Splatting (3DGS) is an emerging method for novel view synthesis that utilizes a set of images with poses to iteratively train a scene representation composed of numerous anisotropic 3D Gaussian bodies, capturing the appearance and geometry of the scene [2][4] - The CLM system proposed by the team allows 3DGS to render large scenes using a single consumer-grade GPU, such as the RTX 4090, by addressing GPU memory limitations [6][8] Group 1: 3DGS Overview - 3DGS has shown revolutionary application potential in fields such as 3D modeling, digital twins, visual effects (VFX), VR/AR, and robot vision reconstruction (SLAM) [5] - The quality of images rendered using 3DGS depends on the fidelity of the trained scene representation, with larger and more complex scenes requiring more Gaussian bodies, leading to increased memory usage [5] Group 2: CLM System Design - CLM is designed based on the insight that the computation of 3DGS is inherently sparse, allowing only a small subset of Gaussian bodies to be accessed during each training iteration [8][20] - The system employs a novel unloading strategy that minimizes performance overhead and scales to large scenes by dynamically loading only the necessary Gaussian bodies into GPU memory while offloading the rest to CPU memory [8][11] Group 3: Performance and Efficiency - The implementation of CLM can render a large scene requiring 102 million Gaussian bodies on a single RTX 4090 while achieving top-tier reconstruction quality [8] - Each view typically accesses only 0.39% of the Gaussian points, with a maximum of 1.06% for any single view, highlighting the sparse nature of the data [23] Group 4: Optimization Techniques - The team utilized several unique characteristics of 3DGS to significantly reduce communication overhead associated with unloading, including pre-computing the accessed Gaussian sets for each view and leveraging spatial locality to optimize data transfer between CPU and GPU [12][17] - The microbatch scheduling optimization allows for overlapping access patterns between consecutive batches, enhancing cache hit rates and reducing redundant data transfers [24][25] Group 5: Results and Impact - CLM enhances the training capacity of 3DGS models by up to 6.1 times compared to pure GPU training baselines, enabling the training of larger models that improve scene reconstruction accuracy while lowering communication and unloading overhead [27]
X @vitalik.eth
vitalik.eth· 2025-10-16 01:23
RT Justin Drake (@drakefjustin)Progress toward real-time proving for Ethereum L1 is nothing short of extraordinary.In May, SP1 Hypercube proved 94% of L1 blocks in under 12 seconds using 160 RTX 4090s. Five months later Pico Prism proves 99.9% of the same blocks in under 12 seconds, with just 64 RTX 5090s. Average proving latency is now 6.9 seconds.Performance has outpaced Moore's law ever since Zcash pioneered practical SNARKs a decade ago. Today's Pico Prism results are a striking reminder of that exponen ...
Advanced Micro Devices, Inc. (AMD): A Bull Case Theory
Yahoo Finance· 2025-09-28 23:43
Core Thesis - Advanced Micro Devices, Inc. (AMD) is positioned as a strong investment opportunity due to its potential market share gains and the challenges faced by competitor Nvidia, with a target price range of $168–$187 over the next 12–18 months [2][5]. Financial Performance - AMD reported a 32% year-over-year revenue growth in Q2 2025, reaching $7.7 billion, driven by a 73% increase in gaming revenue to $1.1 billion and a 14% rise in data center revenue to $3.2 billion [3]. - Wall Street forecasts suggest a 15–20% compound annual growth rate (CAGR) for earnings per share (EPS) through 2027, despite near-term margin pressures from export controls [3]. Competitive Landscape - Nvidia's structural GPU reliability issues, such as problems with RTX 4090 connectors, create a competitive opportunity for AMD, which is seen as a stable alternative [4]. - AMD's RX 9070 XT shows strong performance and improved power efficiency, while its open-source ROCm platform enhances its data center positioning [4]. Market Opportunities - AMD could capture $3.6–$6 billion in incremental revenue from potential market share gains in the $120 billion discrete GPU segment, although Nvidia's ecosystem dominance poses challenges [5]. - The company's diversified revenue streams and competitive GPU offerings support the potential for multiple expansions, despite macroeconomic risks such as Federal Reserve rate hikes [5]. Historical Context - AMD's stock price has appreciated approximately 39% since May 2025, reflecting strong revenue growth driven by data center and Ryzen processor sales, as well as AI demand [6].