Workflow
量子位
icon
Search documents
紧急吃瓜!英伟达GPU供应要缩水了,第一刀砍向RTX 50系列
量子位· 2025-12-18 02:34
Core Viewpoint - NVIDIA plans to significantly reduce the production of its GeForce RTX 50 series graphics cards by 30%-40% in the first half of 2026, prioritizing high-profit models over mid-range options [1]. Group 1: Production Cuts and Market Impact - The reduction in production will primarily affect the RTX 5060 Ti 16GB and RTX 5070 Ti models, which are popular among mid-range gamers [6]. - Consumers may face a choice between lower-spec 8GB graphics cards or higher-priced models due to the limited availability of 16GB options [9]. - The anticipated increase in NAND and DRAM memory costs could lead to higher overall prices for gaming systems, potentially discouraging consumer purchases [5]. Group 2: Supply Chain Challenges - A shortage of memory, particularly GDDR7, is contributing to the production cuts, as NVIDIA cannot produce at full capacity without sufficient memory supply [4]. - The price of GDDR5 memory has already begun to rise, which, combined with reduced GPU production, may result in a dual impact of shortages and price increases in the GPU market by 2026 [10]. Group 3: Competitive Landscape - The situation has prompted discussions among consumers about switching to AMD as a potential alternative, indicating a shift in competitive dynamics within the GPU market [11].
国产AI芯片看两个指标:模型覆盖+集群规模能力 | 百度智能云王雁鹏@MEET2026
量子位· 2025-12-18 02:34
Core Viewpoint - The article discusses the challenges and opportunities for domestic AI chips, particularly Baidu's Kunlun chip, in supporting large-scale training for next-generation models, amidst the ongoing dominance of Nvidia in the market [1][5]. Group 1: Challenges in Large-Scale Training - The evaluation of chip capabilities has shifted from mere computational power to the ability to stably support training for models ranging from hundreds of millions to trillions of parameters [1][5]. - The first major challenge is cluster stability, where any interruption in a large-scale training system can lead to significant downtime, especially in systems with thousands of GPUs [7][10]. - The second challenge involves achieving linear scalability in large clusters, which requires advanced communication optimization and system-level coordination [10][11]. - The third challenge is the model ecosystem and precision system, where Nvidia's extensive model ecosystem provides a competitive edge in training accuracy [15][19]. Group 2: Solutions and Strategies - To address cluster stability, the company emphasizes the need for detailed monitoring and verification to preemptively identify potential issues [8][9]. - For scalability, the company has developed a communication strategy that bypasses CPU limitations, allowing for optimized task management across different workloads [14][20]. - The company is focusing on a highly generalized operator system to ensure reliability in large-scale training, adapting to various model sizes and shapes [19][27]. Group 3: Current Developments and Future Directions - The company has successfully implemented large-scale training with its Kunlun chip, achieving significant results with models like Qianfan-VL and Baidu Steam Engine, which have demonstrated state-of-the-art performance in various tasks [28][30]. - The future direction includes expanding the capabilities of domestic chips to support even larger clusters and more complex models, aiming for a comprehensive coverage of major model systems [27][31]. - The article highlights the importance of binding advanced self-developed models to the Kunlun chip to enhance its acceptance and performance in the market [29].
小米大模型“杀”进第一梯队:代码能力开源第一,智商情商全在线
量子位· 2025-12-18 00:30
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 又有一个国产模型,悄悄跻身到了开源第一梯队。 这次不是DeepSeek也不是Qwen,而是小米刚刚官宣的开源模型 MiMo-V2-Flash 。 仅用了309B的参数规模,该模型就展现出了极高的效能密度,在多项权威综合评测中均取得了令人瞩目的优异成绩。 不仅分数高,它还在实现2.6倍推理加速的同时,兼顾了顶尖的模型效果与极致的部署成本。 在小米刚刚举行的"人车家全生态"合作伙伴大会上,小米将该模型定义成了"迈向Agent时代的全新语言基座"。 这个模型在海外也受到了广泛好评,X网友评价说MiMo-V2-Flash将能够让智能体变得更加实用。 还有人在线许愿,希望能推出gguf格式,方便适配自己使用的模型框架。 从技术报告中,我们也了解到了小米在MiMo-V2-Flash背后采用的一系列关键技术: 具体来看—— 给学生模型请一个"私教天团" MiMo-V2-Flash采用了MoE架构,总参数量为309B,包含256个专家,相比那些动辄参数量以T计的巨头模型和2倍参数量的开源模型,可谓 是以小博大。 MiMo-V2-Flash采用了动态激活机制,激活专家数为 ...
“特斯拉延期交付机器人是卡在灵巧手上,中国灵巧手遥遥领先”| 灵心巧手@MEET2026
量子位· 2025-12-17 10:00
Core Viewpoint - The dexterous hand is a core component of embodied intelligence, capable of independent application in real-world scenarios without relying on humanoid robots, and represents a high-barrier soft-hard integrated platform [7][12][13]. Group 1 - The dexterous hand is not merely an accessory to humanoid robots but serves as the central execution platform for embodied intelligence [3][7]. - A good dexterous hand must possess high degrees of freedom, durability, cost-effectiveness, and multi-modal perception, along with tailored solutions for various scenarios [5][31]. - The global dexterous hand market features three main technical routes: tendon-driven, rigid-link, and direct-drive transmission, with the company having solutions in all three areas [16][32]. Group 2 - The company emphasizes that a truly effective dexterous hand should mimic human hand capabilities, including high freedom of movement and the ability to interact with various tools [18][20]. - The current market for dexterous hands has seen prices drop to below 10,000 yuan, making them competitive with traditional two-finger grippers [23]. - The company is focused on developing both the hardware and the necessary algorithms to ensure the dexterous hand can perform a wide range of tasks in real-world applications [24][55]. Group 3 - The company has developed several models of dexterous hands, including the Linker Hand O6, which is lightweight and capable of significant force, and the Linker Hand L20, known for its speed and efficiency in industrial environments [44][46]. - The Linker Hand L30, based on a tendon-driven structure, is set to commercialize in November 2024, showcasing advanced flexibility and responsiveness [52][53]. - The company is committed to self-research for key components like tactile sensors, motors, and reducers, ensuring high durability and performance [55].
腾讯调整大模型组织架构:姚顺雨加盟,向总裁刘炽平汇报
量子位· 2025-12-17 10:00
Core Viewpoint - Tencent has announced a significant organizational restructuring in its AI division, with the notable addition of Yao Shunyu, a prominent figure in the AI research community, as the Chief AI Scientist [1][4][11]. Group 1: Yao Shunyu's Background and Role - Yao Shunyu, a former OpenAI researcher and a distinguished academic, has joined Tencent as the Chief AI Scientist in the CEO's office, reporting directly to Tencent's president, Liu Chiping [2][4]. - At only 28 years old, Yao has made substantial contributions to the field of AI, particularly in the area of large models and agent-based research, with notable works including Tree of Thoughts and ReAct [3][19]. - His recent departure from OpenAI and subsequent move to Tencent has garnered significant attention, highlighting his status as a leading talent in the AI sector [3][11]. Group 2: Organizational Changes at Tencent - Tencent has restructured its AI organization, establishing new departments such as AI Infra, AI Data, and Data Computing Platform to enhance its large model development capabilities [6][8]. - The AI Infra department, led by Yao, will focus on building the technical capabilities for large model training and inference, aiming to create a competitive edge in AI infrastructure [8][10]. - The restructuring aims to strengthen Tencent's engineering advantages and improve the efficiency of AI large model research, aligning with the company's strategic goals in AI [8][12]. Group 3: Tencent's AI Product Development - Over the past year, Tencent has launched more than 30 new models under its Mix Yuan series, with Mix Yuan 2.0 showing significant improvements in pre-training data and reinforcement learning strategies [9]. - Tencent's AI product, Yuanbao, has rapidly gained user acceptance, becoming one of the top AI applications in China, and is integrated into major platforms like WeChat and QQ [10]. - The company is undergoing a comprehensive AI-driven efficiency transformation, with over 900 applications utilizing its Mix Yuan models across various internal services [10][12]. Group 4: Strategic Importance of AI for Tencent - Tencent's advancements in AI are closely tied to its extensive resources, including rich scenarios, vast data, and a strategic approach, positioning the company favorably in the AI landscape [14][15]. - The recruitment of top talent like Yao Shunyu signifies Tencent's commitment to accelerating its AI initiatives and enhancing its capabilities in the competitive AI market [11][12].
全球功能最全的视频生成模型来了
量子位· 2025-12-17 10:00
Core Viewpoint - Alibaba has launched the new Tongyi Wansxiang 2.6 model, which is the most comprehensive video generation model globally, covering various capabilities such as text-to-video, image generation, and audio-driven video creation [1]. Group 1: Video Generation Capabilities - The Wansxiang 2.6 model introduces multi-audio driven video capabilities, along with features like audio-visual synchronization and multi-shot storytelling, which were not available in Sora 2 [2]. - The model demonstrates significant improvements in artistic style control, realistic portrait generation, and understanding of historical and cultural semantics in image generation [3][8]. - The model's video generation capabilities include video reference generation, maintaining subject consistency, and natural audio-visual synchronization, which enhances the overall user experience [11][12]. Group 2: Performance Testing - Initial tests show that Wansxiang 2.6 performs well in video subject consistency and prompt understanding, achieving a near 1:1 replication of the subject's appearance and matching lip movements accurately [11]. - The model's ability to generate multi-shot narratives is effective, with smooth transitions and coherent storytelling across different shots, although some abstract actions may still pose challenges [17][18]. - The model's aesthetic quality in video generation has improved, showcasing a cinematic feel and strong visual appeal, particularly in complex scenes like cyberpunk cityscapes [14][24]. Group 3: Image Generation Enhancements - Wansxiang 2.6 has made advancements in image generation, particularly in style transfer, portrait generation, and bilingual text handling, demonstrating a better grasp of new aesthetic styles [19][22]. - The model successfully generated a food promotional poster with clear bilingual text and an appealing layout, indicating its reliability in aesthetic judgment [25][27]. - Overall, the model's performance is commendable, with minor flaws in multi-character dialogue and complex action understanding, but it is deemed usable for daily short video creation and secondary creation tasks [28][29].
摩尔线程算法一鸣惊人,图形学顶会夺银!已开源
量子位· 2025-12-17 09:07
Core Viewpoint - Moore Threads won the silver medal at the 3D Gaussian Splatting Reconstruction Challenge (3DGS Challenge) during SIGGRAPH Asia 2025, showcasing its advanced algorithm capabilities and hardware-software optimization in next-generation graphics rendering technology [1][2][13]. Group 1: 3D Gaussian Splatting Technology - 3D Gaussian Splatting (3DGS) is a revolutionary 3D scene representation and rendering technology proposed in 2023, achieving an exceptional balance between image quality, efficiency, and resource usage [4]. - Compared to traditional Neural Radiance Fields (NeRF), 3DGS significantly enhances rendering efficiency by hundreds to thousands of times while maintaining realistic rendering quality, demonstrating strong adaptability in ray tracing, VR/AR real-time rendering, and multi-modal fusion [4][6]. - 3DGS is becoming a key foundational technology in embodied AI training scenarios, providing reliable support for accurate world modeling, enhancing path planning, environmental perception, and complex task execution [7][8]. Group 2: Competition and Performance - The 3DGS Challenge required participants to complete high-quality 3DGS reconstruction within 60 seconds using real terminal video sequences and SLAM point clouds, with PSNR and reconstruction speed as evaluation metrics [9][10]. - Moore Threads achieved an average PSNR of 27.58 and a reconstruction time of 34 seconds, ranking third overall and significantly outperforming many teams [15][16]. Group 3: LiteGS Development - Moore Threads developed the LiteGS foundational library to optimize the training process of 3DGS, achieving a significant reduction in training time and parameter count while maintaining high reconstruction quality [17][20]. - LiteGS can achieve up to 10.8 times training acceleration and reduce parameter count by over 50%, while also exceeding mainstream solutions in PSNR by 0.2–0.4 dB [20][21]. - LiteGS has been fully open-sourced on GitHub to promote collaboration and continuous evolution in 3D reconstruction and rendering technology [23]. Group 4: Strategic Implications - The success at the international graphics competition reflects Moore Threads' ability to grasp global technology trends and lead the future direction of graphics computing technology [23][25]. - The company will host the first MUSA Developer Conference on December 20-21, 2025, to discuss how technologies like 3DGS can shape the future and empower fields such as embodied intelligence [25].
大模型的进化方向:Words to Worlds | 对话商汤林达华
量子位· 2025-12-17 09:07
Core Insights - The article discusses the breakthrough of the SenseNova-SI model, developed by SenseTime, which has surpassed the Cambrian-S model in spatial intelligence capabilities [2][5][50] - It highlights a shift in AI paradigms, moving away from merely scaling models to a focus on foundational research and understanding of multi-modal and spatial intelligence [9][20][22] Model Performance - SenseNova-SI achieved state-of-the-art (SOTA) results across various spatial intelligence benchmarks, outperforming both open-source and proprietary models [4][5] - Specific performance metrics show SenseNova-SI scoring higher than Cambrian-S in key areas such as spatial reasoning and hallucination suppression [50] Paradigm Shift in AI - The article emphasizes that the traditional AI model scaling approach is reaching its limits, necessitating a return to fundamental research [9][15][20] - SenseTime's approach involves a new architecture called NEO, which integrates visual and language processing at the core level, allowing for better understanding of spatial relationships [39][42] Technological Innovations - The NEO architecture allows simultaneous processing of visual and textual tokens, enhancing the model's ability to understand and interact with the physical world [42][46] - SenseNova-SI demonstrates a tenfold increase in data efficiency, requiring only 10% of the training data compared to similar models to achieve SOTA performance [49] Industrial Application - The article discusses the importance of making AI technologies economically viable, emphasizing that high costs and slow processing times are barriers to widespread adoption [55][58] - SenseTime's SekoTalk product exemplifies the successful application of AI in real-time video generation, significantly reducing processing time from hours to real-time [64][66] Future Directions - The article encourages young researchers and entrepreneurs to explore diverse fields beyond large language models, such as embodied intelligence and AI for science [68][70] - It concludes with a vision for China's potential in developing AI that deeply interacts with the physical world, positioning it as a leader in this emerging landscape [72][73]
让大模型“吃一堑长一智”,南理工百度等提出模型记忆新方法
量子位· 2025-12-17 09:07
Core Viewpoint - The article discusses a new method called ViLoMem, developed by Nanjing University of Science and Technology in collaboration with Baidu, which addresses the issue of large models having poor memory retention, enabling them to learn from past mistakes by separating visual and logical errors into distinct memory streams [1][5]. Group 1: ViLoMem Framework - ViLoMem employs a dual-stream semantic memory system that allows models to remember visual and logical errors separately, enhancing their ability to learn from experiences [15][16]. - The framework consists of two main components: memory generation and memory retrieval, which work together to improve the model's performance without altering its parameters [18][5]. Group 2: Memory Generation - When a model fails on a task, ViLoMem activates two branches: a visual analysis module to identify visual errors and a logical analysis module to pinpoint logical mistakes, generating structured guidelines for both types of errors [19][20][21]. - Newly generated memories are matched for similarity with existing memories to either merge them into more abstract rules or create new memory slots, preventing memory overload while allowing for the abstraction of general semantic patterns [22][24]. Group 3: Memory Retrieval - The retrieval strategies for visual and logical memories differ, with visual memory using a two-stage retrieval process that includes image-level similarity search and question semantic filtering [27][28]. - Logical memory retrieval focuses on understanding the problem first before searching for relevant rules, which is more effective than simple keyword matching [29]. Group 4: Performance Improvement - ViLoMem has shown significant performance improvements across six multimodal reasoning benchmarks, with notable gains in mathematical tasks, such as a +6.48 increase for GPT-4.1 on MathVision [2][31]. - Smaller models benefit even more from ViLoMem, with Qwen3-VL-8B achieving a +4.38 increase on MMMU [31]. Group 5: Cross-Model Memory Transfer - An interesting experiment demonstrated that smaller models could achieve better scores by utilizing memories generated by larger models, indicating a form of "free knowledge distillation" [34][36]. - This suggests that experiences from stronger models can directly enhance the performance of weaker models without the need for fine-tuning [36].
挖掘注意力中的运动线索:无需训练,解锁4D场景重建能力
量子位· 2025-12-17 09:07
VGGT4D团队 投稿 量子位 | 公众号 QbitAI 如何让针对静态场景训练的3D基础模型 (3D Foundation Models) ,在不增加训练成本的前提下,具备处理动态4D场景的能力? 来自 香港科技大学(广州)与地平线(Horizon Robotics) 的研究团队提出了 VGGT4D 。该工作通过深入分析Visual Geometry Transformer (VGGT) 的内部机制,发现并利用了隐藏在注意力层中的运动线索。 VGGT4D的核心设想:能否在不进行额外训练的前提下,直接从预训练的3D基础模型中挖掘出4D感知能力? 作为一种 无需训练 (Training-free) 的框架,VGGT4D在动态物体分割、相机位姿估计及长序列4D重建等任务上均取得了优异性能。 从3D迈向4D的挑战 近年来,以VGGT、DUSt3R为代表的3D基础模型在静态场景重建中表现出色。然而,面对包含移动物体 (如行人、车辆) 的 动态4D场景 时,这些模型的性能往往显著下降。动态物体的运动不仅干扰背景几何建模,还会导致严重的相机位姿漂移。 现有的解决方案通常面临两类挑战: 计算或训练成本高: 依赖繁重的测试时 ...