量子位
Search documents
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
量子位· 2025-12-23 04:16
Core Viewpoint - The article discusses the introduction of CLM (CPU-offloaded Large-scale 3DGS training), a system that allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory, significantly lowering hardware requirements for large-scale neural rendering [1][21]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a crucial technology in neural rendering due to its high-quality output and rendering speed, but it faces significant challenges when applied to complex scenes like urban blocks, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a hundred million Gaussian points, with each point requiring substantial memory for parameters, gradients, and optimizer states. Even high-end GPUs like the RTX 4090, with 24GB of memory, can only handle about 15-20 million points, which is insufficient for city-scale scenes [2][3]. Group 2: CLM Design Principles - CLM is based on the observation that only a small fraction of Gaussian points are actively used during each rendering pass, with less than 1% of points accessed in large scenes [3]. - The system design of CLM involves dynamically loading Gaussian parameters from CPU memory as needed, rather than keeping all parameters in GPU memory [4]. Group 3: Key Mechanisms of CLM - **Attribute Segmentation**: CLM retains only "key attributes" (10 parameters) necessary for visibility checks in GPU memory, while the remaining 80% of "non-key attributes" are stored in CPU memory and loaded on demand [6][7]. - **Pre-rendering Visibility Culling**: Unlike traditional methods, CLM calculates visible Gaussian point indices before rendering, reducing unnecessary GPU computations and memory usage by only loading visible points from CPU memory [9][10]. - **Efficient CPU-GPU Collaboration**: CLM employs a multi-layered design to mitigate data transfer delays, including micro-batching, caching mechanisms, and intelligent scheduling to maximize efficiency and minimize communication overhead [12][13][14][15]. Group 4: Performance Results - CLM technology significantly increases model size, allowing for the training of 102.2 million Gaussian points on the "MatrixCity BigCity" dataset, a 6.7-fold increase compared to traditional methods, which maxed out at 15.3 million points [16]. - The quality of reconstruction improves with more parameters, achieving a PSNR of 25.15dB for the 102.2 million point model, compared to 23.93dB for the smaller model [18]. - Despite communication overhead, CLM maintains a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [19]. Group 5: Broader Implications - CLM represents a significant advancement in addressing deployment bottlenecks in 3DGS training, integrating CPU resources into the training process without the need for multi-GPU setups, thus providing a cost-effective solution for large-scale scene reconstruction [21]. - The growing demand for efficient and low-cost 3D reconstruction tools in applications like digital twins and large-scale map reconstruction highlights the importance of CLM's approach in optimizing existing computational resources [21].
智能体落地元年,Agent Infra是关键一环|对话腾讯云&Dify
量子位· 2025-12-23 04:16
Core Viewpoint - The year 2025 is anticipated to be the "Agent Year," marking a significant shift in the industry towards practical applications of Agent technology [1][2]. Group 1: Development and Challenges of Agents - The Agent technology has transitioned from a nascent stage to practical engineering applications throughout the year [3][7]. - Key challenges in the implementation of Agents include the need for a robust engineering approach to manage complex systems and the importance of Agent Infrastructure (Infra) [6][21]. - The industry recognizes the value of Agents as they effectively address real-world problems, moving from theoretical discussions to tangible applications [6][12]. Group 2: Perspectives from Industry Leaders - Industry experts highlight a clear divide between traditional narratives from Silicon Valley and practical applications seen in smaller businesses, indicating a shift towards realism in Agent development [8][10]. - The emergence of AI coding tools is noted as a significant development, changing software engineering paradigms and serving as a universal interface for Agents [7][34]. - The consensus among experts is that the capital market is seeking new organizational methods, as the previous internet era's benefits have been largely exhausted [12][13]. Group 3: Engineering and Infrastructure - The concept of Agent Infra is crucial for managing the uncertainties inherent in Agent systems, with a focus on creating a safe and effective operational environment [21][22]. - The development of safety sandboxes and observability tools is essential for addressing the risks associated with autonomous Agent operations [22][23]. - The distinction between essential complexity and incidental complexity in enterprise problem-solving is emphasized, with a focus on building a common subset of solutions for various challenges [27][28]. Group 4: Future Trends and Directions - Future developments in Agent Infra are expected to focus on ensuring safe and reliable operations while optimizing the intelligence of Agents through continuous data utilization [38][39]. - The integration of memory management and semantic context is highlighted as a key area for enhancing Agent capabilities [40]. - The industry anticipates a significant transformation in mobile development ecosystems as Agents become mainstream, necessitating a shift in development methodologies and collaborative practices [41][44].
量子位编辑作者招聘
量子位· 2025-12-23 04:16
加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
我们走访全国百强三甲医院,发现40%都选了同一家AI公司
量子位· 2025-12-23 03:01
Core Viewpoint - The article discusses the challenges and opportunities in the medical AI sector, particularly focusing on the company Yunzhisheng, which has established itself as a leading player in the field by successfully integrating AI solutions into hospitals and demonstrating significant operational efficiency improvements [1][9][64]. Group 1: Medical AI Challenges - Patients increasingly consult AI chatbots before visiting doctors, leading to communication challenges in clinical settings [2][3]. - The high "hallucination rate" of general AI models in medical contexts can reach up to 40%, raising concerns about their reliability [4][5]. - Medical AI must navigate stringent requirements for stability, acceptance across various healthcare systems, and the high costs associated with medical errors [12][18][19]. Group 2: Yunzhisheng's Position - Approximately 40% of top-tier hospitals in China have adopted Yunzhisheng's medical AI solutions, indicating its strong market presence [9][22]. - The company has deployed its solutions in 400 hospitals, with a nearly 90% direct usage rate of generated medical records, significantly reducing doctors' time spent on documentation [22][23][25]. - Yunzhisheng's medical AI solutions are designed to integrate seamlessly into existing workflows, enhancing efficiency without adding to the workload of healthcare professionals [58][60]. Group 3: Technological Advancements - Yunzhisheng's latest model, "Shanhai·Zhimed 5.0," employs a dual-core system capable of processing structured information and multimodal inputs, enhancing its diagnostic capabilities [34][36]. - The model's architecture includes a three-layer data paradigm that improves its understanding of medical contexts and reduces hallucination rates to below 3% [42]. - The company has consistently ranked at the top of medical AI evaluation platforms, demonstrating its technological superiority [45][46]. Group 4: Business Growth and Market Trends - Yunzhisheng's medical business revenue reached 0.70 billion, a 22.3% increase year-on-year, highlighting its growth trajectory [66]. - The average revenue per medical client has more than doubled, indicating a significant increase in customer value [66]. - The broader market for medical AI is expected to grow, with increasing investments and policy support aimed at integrating AI into healthcare workflows [79][82][86].
易烊千玺的华为绿手机,真的AI了
量子位· 2025-12-23 00:15
Core Viewpoint - The article discusses the launch of Huawei's nova 15 series, highlighting its innovative features, design, and pricing strategy, which aims to attract a younger audience while enhancing AI capabilities in photography and communication. Product Overview - The nova 15 series includes three models: standard, Pro, and Ultra, all equipped with HarmonyOS 6 [4] - The Ultra and Pro versions feature a horizontal stacked design and dual star lens module, resembling "two big eyes" [5] - The series introduces the Kirin 9 series chip for the first time in the Pro and Ultra versions, aligning performance with Mate and Pura series [6] - Pricing starts at 4199 yuan for the Ultra version, 3499 yuan for the Pro version, and 2699 yuan for the standard version [7][10] AI Capabilities - The nova 15 series incorporates advanced AI features, particularly in photography, with the Ultra and Pro versions featuring a dual red maple imaging system [14] - Color accuracy has improved by 120%, and spatial resolution has increased by 100,000 times due to the red maple original color lens [17] - AI assists in photo composition and includes a unique "color dipping" feature that allows users to apply colors and styles from online images to their own photos [21][22] - The series also offers AI-driven photo editing capabilities and one-click content creation [25][28] Communication Features - The phone includes a call summary feature that automatically generates key points after a call, which can be synced to a memo [31] - Dual-direction call noise reduction is optimized for noisy environments like subways and malls [33] - A family fraud prevention feature allows family members to share risk information and assist in handling suspicious calls [35] Design and Specifications - The nova 15 series maintains a recognizable design with a focus on youthfulness and high identification [39] - The Ultra version features a 2.5D flat screen and is available in four colors: "vibrant green," "good match purple," "zero degree white," and "fantasy night black" [40] - Both the Ultra and Pro versions come with a 6500mAh battery and support for 50W wireless fast charging [42][51] - The Ultra version has a thickness of 6.8mm and weighs approximately 209g [43] - The rear camera system includes three 50MP RYYB lenses, supporting variable aperture and optical stabilization [45] - The series is equipped with Kunlun glass and supports IP68 & IP69 dust and water resistance [47] Performance Improvement - According to Huawei's lab tests, the overall performance of the nova 15 series has improved by 62% compared to the previous generation [53]
智谱IPO敲钟前,连夜把开源编程大模型SOTA了
量子位· 2025-12-23 00:15
AIME 25和人类最后 考试 (HLE) 等基准 中,GLM-4.7分数超GPT-5.1; SWE-Bench分数达(73.8%,+5.8%),创开源新高。 鱼羊 henry 发自 麦蒿寺 量子位 | 公众号 QbitAI 2025倒计时,新SOTA模型涌现没有放缓迹象。 一夜之间,编程SOTA模型易主,而且上线即开源,依然来自中国大模型公司—— 智谱AI,GLM-4.7。 这波更新,技术报告里满眼都是 Coding , Coding ,还是 Coding 。 而能力的提升,带来的最直观效果是: 官方Demo显示,写个植物大战僵尸不费劲: 官网Chatbot和API均已就为,现在就能在线开玩。 Demo来吧,展示 在前端生成质量上,GLM-4.7展现出明显升级:页面结构更干净、组件层级更清晰。 相比GLM-4.6,更像是现代的Web UI,网友元素中更加美观。 总而言之,模型这么一发,双旦的节庆氛围一下到位了(doge)。 在复杂几何结构与空间关系的表达上,GLM-4.7模型能够保持较好的结构一致性与细节稳定性。 3D资产的生成质量也有显著提升。 在PPT与视觉物料生成方面,GLM-4.7标题层级明确、元素 ...
为什么Agent总是Demo猛如龙实战一条虫?
量子位· 2025-12-22 09:30
Core Viewpoint - The article discusses the limitations of AI agents in real-world applications compared to their impressive demonstrations, emphasizing that adaptability is a key factor for improvement [1]. Summary by Sections Definition and Functionality of Agents - Agents are defined as AI systems that can plan, utilize tools (such as search engines and databases), and remember information to complete complex tasks independently [3]. Adaptability Framework - The core bottleneck in current agent systems is adaptability, specifically how models adjust their behavior based on feedback signals [6]. - A 2x2 classification framework is proposed to categorize existing adaptation methods into four paradigms based on two dimensions: who is optimized (the agent or the tools) and where the feedback signal comes from (tool execution results or agent output evaluations) [7][8][9]. Four Paradigms of Adaptation - **A1 Paradigm**: Agents learn from feedback based on tool execution, such as whether code runs successfully [10]. - **A2 Paradigm**: Uses the agent's final output as the optimization signal, exemplified by models like DeepSeek-R1 that train reasoning capabilities through reinforcement learning [11]. - **T1 Paradigm**: Tools are pre-trained independently and then called by the agent, allowing for plug-and-play functionality [12]. - **T2 Paradigm**: Tools optimize themselves based on the agent's output, creating a symbiotic relationship [13]. Benefits of Classification - This classification helps developers avoid trial and error when improving AI capabilities, allowing for targeted adaptations based on specific needs [15]. - It also clarifies trade-offs: modifying AI (A1/A2) is flexible but costly, while modifying tools (T1/T2) is cheaper but limited by the AI's inherent capabilities [16]. Key Findings on Data Efficiency - The T2 paradigm demonstrates significantly higher data efficiency compared to the A2 paradigm. For instance, the Search-R1 using A2 requires approximately 170,000 training samples, while T2 only needs 2,400 samples, achieving comparable results [18][19][20]. Frontiers in Adaptability Research - The article identifies four cutting-edge directions for agent adaptability research: - **Co-Adaptation**: Aims for agents and tools to optimize together within the same learning cycle, presenting challenges in credit assignment [21]. - **Continual Adaptation**: Addresses the need for agents to continuously learn new skills without forgetting old ones in a changing environment [23]. - **Safe Adaptation**: Highlights concerns that large models may erode safety measures established during supervised fine-tuning, making them more vulnerable to attacks [25]. - **Efficient Adaptation**: Focuses on resource-constrained scenarios, discussing techniques like LoRA and FlashRL for efficient learning [27]. Additional Resources - The article mentions that a GitHub repository has been opened to continuously collect related papers and resources, serving as a guide for developers building agent systems [29].
硅谷停电干崩谷歌Robotaxi,马斯克贴脸热嘲:特斯拉就没事
量子位· 2025-12-22 09:30
一凡 发自 凹非寺 量子位 | 公众号 QbitAI 一次大规模停电,暴露了全球无人车一哥的短板。 被曝估值冲上千亿美元没几天,Waymo就因为当地停电全面停摆了,挡在路中间,造成城市 拥堵,相关视频疯传。 马斯克第一时间"补刀",表示自家Robotaxi就没受到影响。看上去,特斯拉代表的L2渐进式 路线,似乎小胜了一局……反正马哥认为这就是彰显优越性的时刻。 在Robotaxi战场上,今年马斯克的一举一动,都把自动驾驶之争推向了新的高潮,大洋两岸 更多玩家开始入场,沿着「特斯拉路线」前进,与「Waymo路线」争夺自动驾驶圣杯。 所以问题是,停电是如何影响Waymo Robotaxi的? 当地停电,Waymo停工 Waymo停摆源自一场火灾,旧金山变电站失火,导致当地大规模停电,据说直接影响到13 万居民用电。 更要命的是,因为大范围停电,马路上的红绿灯都不亮了,引发Waymo无人车全面停摆。 真是屋漏偏逢连夜雨,本就混乱的交通,这下因为无人车挡在路上变得更堵了。Waymo只好 连夜找拖车运走了无人车,同时宣布在当地停运,目前还不清楚什么时候重新上线。 所以为啥停电会导致Waymo停运?首先是官方回应暴露的运 ...
全自研仿真GPU求解器x虚实对标物理测量工厂,打造具身合成数据SuperApp,加速具身仿真生态丨光轮智能@MEET2026
量子位· 2025-12-22 08:01
编辑部 整理自 MEET2026 量子位 | 公众号 QbitAI 从大模型智能的"语言世界"迈向具身智能的"物理世界",仿真正在成为连接落地的底层基础设施。 在本次量子位MEET2026智能未来大会上,光轮智能联合创始人兼总裁 杨海波 给出了他的观察: 具身智能的规模远大于文本与视觉模型,因为数据维度更真实、更复杂。 这也就意味着,具身智能时代的核心,不是算法本身,而是它所依赖的数据是否有效、可扩展——仿真是唯一能够解决数据问题的方案。 在仿真策略的路上,会遇到仿真不真实、Sim2Real不可靠等行业痛点, 光轮智能正在通过自研的一整套"测量、生成、求解"仿真基础设施来 解决这些问题 ,为具身智能提供数据、训练、评测的全流程解决方案。 △ 杨海波指出光轮智能深耕合成数据领域 另外杨海波还进一步指出, 仿真不是孤立的技术工具,需要以真实产业需求为锚点,通过应用场景构建生态。 其中, 具身仿真资产制作是生态的源头活水 ,依托自动化物理测量与生成技术,产出高物理真实的规范化数据资产,为具身训练提供核心燃 料; 大规模RL训练则通过并行的虚拟场景让智能体高效试错学习,将数据价值转化为具身实际技能 ,同时反向打磨仿真 ...
倒反天罡!Gemini Flash表现超越Pro,“帕累托前沿已经反转了”
量子位· 2025-12-22 08:01
Core Insights - Gemini 3 Flash outperforms its predecessor Gemini 2.5 Pro and even the flagship Gemini 3 Pro in various benchmarks, achieving a score of 78% in the SWE-Bench Verified test, surpassing Gemini 3 Pro's score of 76.2% [1][6][9] - The performance of Gemini 3 Flash in the AIME 2025 mathematics competition benchmark is notable, scoring 99.7% with code execution capabilities, indicating its advanced mathematical reasoning skills [7][8] - The article emphasizes a shift in perception regarding flagship models, suggesting that smaller, optimized models like Flash can outperform larger models, challenging the traditional belief that larger models are inherently better [19][20] Benchmark Performance - In the Humanity's Last Exam, Flash scored 33.7% without tools, closely trailing Pro's 37.5% [7][8] - Flash's performance in various benchmarks includes: - 90.4% in GPQA Diamond for scientific knowledge [8] - 95.2% in AIME 2025 for mathematics without tools [8] - 81.2% in MMMU-Pro for multimodal understanding [8] - Flash's speed is three times that of Gemini 2.5 Pro, with a 30% reduction in token consumption, making it cost-effective at $0.50 per million tokens for input and $3.00 for output [9] Strategic Insights - Google’s team indicates that the Pro model's role is to "distill" the capabilities of Flash, focusing on optimizing performance and cost [10][12][13] - The evolution of scaling laws is discussed, with a shift from merely increasing parameters to enhancing reasoning capabilities through advanced training techniques [15][16] - The article highlights the importance of post-training as a significant area for future development, suggesting that there is still substantial room for improvement in open-ended tasks [17][18] Paradigm Shift - The emergence of Flash has sparked discussions about the validity of the "parameter supremacy" theory, as it demonstrates that smaller, more efficient models can achieve superior performance [19][21] - The integration of advanced reinforcement learning techniques in Flash is cited as a key factor in its success, proving that increasing model size is not the only path to enhancing capabilities [20][22] - The article concludes with a call to reconsider the blind admiration for flagship models, advocating for a more nuanced understanding of model performance [23]