Workflow
量子位
icon
Search documents
清华联手英伟达打造扩散模型新蒸馏范式!视频生成提速50倍,4步出片不穿模
量子位· 2025-10-22 09:12
Core Insights - The article discusses a new distillation paradigm called rCM that significantly enhances video generation speed by up to 50 times while maintaining high quality and diversity in the generated content [4][20][33] Group 1: Introduction of rCM - rCM is a novel large-scale diffusion model distillation paradigm developed by Tsinghua University and NVIDIA, which successfully extends continuous time consistency distillation to billion-parameter models [5][9] - The method addresses bottlenecks in existing approaches, particularly in real-world applications involving large-scale text-to-image and text-to-video models [3][9] Group 2: Technical Innovations - The rCM framework introduces a forward-reverse divergence joint optimization approach, which enhances inference speed while ensuring high-quality and diverse generation results [4][11] - By utilizing self-developed FlashAttention-2 JVP CUDA operators and compatible distributed training strategies, rCM successfully applies continuous time consistency distillation to leading models like Cosmos and Wan2.1 [13][18] Group 3: Performance Metrics - rCM demonstrates exceptional performance across various large-scale text-to-image and text-to-video tasks, compressing the sampling process from hundreds of steps to an impressive 1-4 steps, achieving a speedup of 15-50 times [20][21] - In evaluations, the rCM model matches or even surpasses the performance of teacher models that require hundreds of sampling steps [21][25] Group 4: Quality and Diversity - The rCM model effectively addresses the quality shortcomings of previous models by incorporating reverse divergence as a regularization term, allowing it to maintain high diversity while improving quality [19][22] - Compared to previous state-of-the-art distillation methods, rCM exhibits significantly higher diversity in generated video content, effectively avoiding "mode collapse" issues [25][31] Group 5: Future Applications - rCM is expected to be widely applied in NVIDIA's Cosmos series of world models, indicating its potential for broader industry adoption [34]
KTransformers入选计算机系统顶会、与主流框架合作,趋境&清华让「异构」成为推理新范式
量子位· 2025-10-22 09:12
Core Insights - KTransformers, an open-source project developed by Turing Technology and Tsinghua University's KVCache.AI team, focuses on system innovation during the inference phase of large models, enabling efficient operation on diverse hardware architectures with lower computational power [2][4]. Group 1: KTransformers Overview - KTransformers is a high-performance heterogeneous inference framework that optimally utilizes various computing resources such as GPUs, CPUs, and memory [2]. - The project paper was recognized at the prestigious SOSP 2025 conference, highlighting its significance in the field of computer systems [2][4]. Group 2: Technical Innovations - The framework introduces an "Expert Deferral" mechanism, allowing for efficient scheduling of experts in Mixture of Experts (MoE) models, which reduces computational load without sacrificing model performance [7][13]. - KTransformers achieves nearly 4x speedup on a single Intel Xeon processor compared to traditional PyTorch implementations, significantly enhancing CPU performance in expert calculations [12]. - The system allows for dynamic overlapping of CPU and GPU loads, resulting in a model throughput increase of approximately 1.45 times, with minimal impact on model accuracy [15][16]. Group 3: Collaboration and Ecosystem - KTransformers has partnered with SGLang, a mainstream inference framework, to integrate full GPU inference with heterogeneous inference, enhancing the overall architecture for large model deployment [5][19]. - This collaboration enables developers to access both full GPU and heterogeneous inference capabilities seamlessly, particularly beneficial in scenarios with limited GPU resources [21]. Group 4: Market Position and Future Directions - KTransformers has gained significant traction in the developer community, with over 15.2K stars on GitHub, indicating its widespread adoption as a foundational framework for large model inference [24]. - The project aims to democratize AI capabilities, making them accessible beyond elite computational paths, and is actively collaborating with various domestic CPU and GPU platforms to promote cost-effective solutions [28][29].
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-22 09:12
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 2025 人工智能年度领航企业 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 1、注册地在中国,或主营业务主要面向中国市场; 2、主营业务属于人工智能及相关产业,或已将人工智能广泛应用于主营业务,并在细分领域居于行业领先地位; 评选标准 : 2025 人工智能年度潜力创业公司 3、具备成熟的产品或服务,已获得实际客户应用及市场认可; 4、近一年在技术 ...
腾讯开源混元世界模型1.1,视频秒变3D世界,单卡推理仅需1秒
量子位· 2025-10-22 09:12
Core Viewpoint - Tencent has released and open-sourced the Hunyuan World Model 1.1, a unified end-to-end 3D reconstruction model that supports generating 3D worlds from multiple views or videos with high precision and efficiency [1][3][16]. Group 1: Model Features - Hunyuan World Model 1.1 is the industry's first unified feedforward 3D reconstruction model, capable of handling various input modalities and producing multiple outputs simultaneously, achieving state-of-the-art (SOTA) performance [4][18][21]. - The model supports flexible input handling, allowing the integration of camera poses, intrinsic parameters, and depth maps to enhance reconstruction quality [18][20]. - It features a single-card deployment with one-second inference time, significantly faster than traditional methods that may take minutes or hours [22][24]. Group 2: Performance Comparison - In comparisons with Meta's MapAnything and AnySplat models, Hunyuan World Model 1.1 demonstrated superior surface smoothness and scene regularity in 3D point cloud reconstruction tasks [11][12][14]. - The model excels in both geometric accuracy and detail restoration, providing more stable and realistic scene reconstructions compared to its competitors [14][15]. Group 3: User Accessibility - The model is fully open-sourced, allowing developers to clone it from GitHub and deploy it locally, while ordinary users can access it online to generate 3D scenes from uploaded images or videos [34][37]. - The technology aims to democratize 3D reconstruction, making it accessible for anyone to create professional-level 3D scenes in seconds [37].
全球首款!高性能人形机器人跑跳进入万元机时代
量子位· 2025-10-22 09:12
Core Viewpoint - The article discusses the launch of Bumi, the world's first high-performance humanoid robot priced under 10,000 yuan, aimed at the consumer market, making advanced robotics accessible to households [3][9]. Group 1: Product Features - Bumi is a humanoid robot that can walk, jump, and interact, designed to be a programming teacher and a play companion for children [12][13]. - The robot weighs 12 kg and stands under one meter tall, making it easy to handle [6]. - Bumi's capabilities include stable walking and dancing, showcasing advanced motion control technology [19][21]. Group 2: Educational Value - Bumi allows children to learn programming through a drag-and-drop interface, enabling them to design sequences of actions for the robot [24][26]. - This interactive learning experience is positioned as a more engaging alternative to traditional interest classes for children [27]. Group 3: Company Background - The company behind Bumi, Songyan Power, was founded less than two years ago and has a young team, primarily composed of graduates from Tsinghua University [28][30]. - The founder, Jiang Zheyuan, has a notable academic background, having progressed through Tsinghua University from kindergarten to doctoral studies [30][32]. - Songyan Power has previously developed several advanced robotic products, demonstrating a strong technical foundation [33][34]. Group 4: Market Position and Future Outlook - Bumi represents a significant step in making humanoid robots a part of everyday life, marking a shift in the perception of robotics from futuristic concepts to practical household items [49][50]. - The company has successfully completed multiple rounds of financing, positioning itself among the leading players in the commercialization of robotics [45].
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
量子位· 2025-10-22 05:48
Core Points - The ICCV 2025 conference in Hawaii highlighted significant contributions from Chinese researchers, who accounted for 50% of the paper submissions [1] - Various prestigious awards were announced, showcasing advancements in computer vision research [3] Award Highlights - Best Paper Award (Marr Prize): "Generating Physically Stable and Buildable Brick Structures from Text" introduced BRICKGPT, a model that generates stable brick structures based on text prompts, utilizing a dataset of over 47,000 structures [4][24][26] - Best Student Paper Award: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models" proposed a method for image editing without inversion, achieving state-of-the-art results [6][39][40] - Best Paper Honorary Mention: "Spatially-Varying Autofocus" developed a technique for dynamic depth adjustment in imaging, enhancing focus clarity across scenes [7][42][44] - Best Student Paper Honorary Mention: "RayZer: A Self-supervised Large View Synthesis Model" demonstrated 3D perception capabilities using uncalibrated images [9][47][49] Special Awards - Helmholtz Prize: Awarded to "Fast R-CNN" for its efficient object detection capabilities, significantly improving training and testing speeds [10][52][54] - Another Helmholtz Prize was given for research on rectified activation functions, achieving performance surpassing human-level accuracy on ImageNet [10][59][60] - Evelyn Erham Award: Recognized teams for their contributions to 3D modeling and visual question answering [12][63][68] - Distinguished Researcher Award: David Forsyth and Michal Irani were honored for their impactful work in computer vision [14][73][76] - Azriel Rosenfeld Lifetime Achievement Award: Rama Chellappa was recognized for his extensive contributions to the field [16][79] Research Contributions - The BRICKGPT model was developed to generate physically stable structures, utilizing a large dataset and innovative mechanisms for stability [24][26] - FlowEdit's approach allows for seamless image editing across different model architectures, enhancing flexibility in applications [39][40] - The spatially-varying autofocus technique improves image clarity by dynamically adjusting focus based on scene depth [42][44] - RayZer's self-supervised learning approach enables 3D scene reconstruction without the need for calibrated camera data [47][49] Conclusion - The ICCV 2025 conference showcased groundbreaking research and innovations in computer vision, with significant contributions from various teams and individuals, particularly highlighting the achievements of Chinese researchers [1][3]
Qwen深度研究一夜升级!可生成网页和音频播客,新模型能认医生手写体
量子位· 2025-10-22 05:48
Core Insights - The article discusses the advancements in the Qwen deep research capabilities, highlighting the addition of auditory and visual outputs, enabling the generation of web pages and audio content [1][2]. Group 1: New Features and Functionalities - The Qwen deep research tool can now convert lengthy text into audio podcasts, facilitating easier consumption of information during fragmented time [3]. - Compared to the previously popular NoteBookLM, the deep research tool eliminates the need for users to provide content to the AI, streamlining the input process [4]. - The latest visual language model, Qwen3 VL, can even recognize difficult handwritten notes, showcasing significant improvements in model capabilities [7]. Group 2: User Interaction and Experience - Upon activating the deep research feature, the system defaults to the most powerful Qwen3-Max model and first confirms the user's specific intent before proceeding [9][10]. - The entire operation takes approximately six minutes, resulting in a traditional AI text response and a downloadable PDF file [12][15]. Group 3: Performance Metrics and Comparisons - The Qwen3-VL series has been updated with a maximum parameter version of 32 billion and a minimum of 2 billion, with the team indicating this is the final update for this series [28][29]. - Performance evaluations show that the 32 billion version surpasses the previous Qwen2.5-VL's 72 billion version and competes favorably against closed-source solutions from OpenAI and Anthropic [30]. Group 4: Deployment and Accessibility - Users can generate a simple and aesthetically pleasing web page with dynamic effects, including a day/night mode, enhancing the visual presentation of AI-generated research results [19][20]. - The deployment feature allows users to publish their web content either publicly or privately, providing flexibility in sharing information [22].
中国数学家再中数学四大刊,兰州大学首篇:突破斯托克斯方程“光滑性”限制
量子位· 2025-10-22 05:48
Core Viewpoint - The article highlights the significant achievement of professors Geng Jun from Lanzhou University and Shen Zhongwei from Westlake University, whose research paper has been accepted by one of the top four mathematics journals, Inventiones Mathematicae, marking a milestone for Lanzhou University in the field of mathematics [2][6]. Group 1: Research Focus - The research centers on the Stokes equation, a fundamental aspect of fluid mechanics, specifically investigating the infinite norm pre-estimation of the Stokes operator in non-smooth regions [3][4]. - The study aims to understand the behavior of fluid motion in irregular boundary spaces, such as natural river channels, rather than smooth pipelines [4][5]. Group 2: Key Breakthroughs - The paper presents two major breakthroughs: 1. It establishes that in three-dimensional and higher spaces with C¹ boundaries, and in two-dimensional spaces with Lipschitz boundaries, the maximum fluid velocity can be estimated based on the maximum external force [11]. 2. It introduces a novel approach using large-scale averaging to address the issue of pressure control, allowing for the estimation of maximum velocity in bounded regions [12]. Group 3: Theoretical and Practical Implications - The research fills a critical gap in the theoretical understanding of the Stokes equation in non-smooth regions, clarifying the applicability of C¹ and Lipschitz boundaries and enhancing the mathematical analysis framework of fluid mechanics [13]. - Practically, the findings provide engineers with more accurate computational tools for real-world fluid scenarios, improving the precision of velocity and pressure estimations in non-smooth boundary conditions [14]. Group 4: Authors' Background - The authors, Geng Jun and Shen Zhongwei, are prominent mathematicians with extensive academic backgrounds, having collaborated on influential papers prior to this achievement [15][20]. - Geng Jun, a professor at Lanzhou University, specializes in harmonic analysis and partial differential equations, while Shen Zhongwei, who recently returned to China, has a distinguished career in mathematics education and research [16][22][23].
OpenAI首款ChatGPT浏览器发布!现在就能免费下载使用
量子位· 2025-10-21 23:50
Core Viewpoint - OpenAI has launched ChatGPT Atlas, an AI-native browser that integrates ChatGPT's capabilities directly into the browsing experience, aiming to redefine how users interact with the web and search for information [1][7][11]. Group 1: Features of ChatGPT Atlas - Each tab in the Atlas browser integrates ChatGPT for direct conversation, allowing users to ask questions about the current webpage without needing to switch tabs or copy-paste [12][14]. - The browser includes a context-aware assistant that can provide tailored responses based on the content being viewed, enhancing user interaction [14]. - A memory feature allows ChatGPT to remember key information from previous browsing sessions, enabling users to retrieve relevant data without re-explaining context [15][17]. - The "Cursor Chat" function enables users to select text and have ChatGPT edit or rewrite it, improving efficiency in tasks like email replies and report organization [18]. - The Agent Mode allows ChatGPT to perform a series of tasks on behalf of the user, such as research, form filling, and making reservations, streamlining the browsing experience [20][22]. Group 2: Strategic Intent and Market Positioning - The launch of the Atlas browser is seen as a strategic move to directly compete with Google, especially with the anticipated release of Gemini 3, which may reshape browser functionalities [32][33]. - OpenAI aims to establish a new traffic entry point and redefine search and advertising models, moving away from traditional keyword-based searches to a conversational interface [34][35]. - The introduction of a subscription model for the Agent features indicates a shift towards a new business model centered around browser and agent integration, potentially aligning with existing app ecosystems [36][38]. Group 3: Industry Implications - The development of the ChatGPT Atlas browser signifies a transformation in browser functionality from simple web navigation to a platform for intelligent assistance and task automation [38][39]. - The evolution of AI capabilities from passive recommendations to active execution of tasks marks a significant trend, impacting various sectors such as e-commerce, travel, and financial services [39][40].
阿里云秘密武器亮相顶会:狂砍82%英伟达含量,213块GPU干了1192块的活
量子位· 2025-10-21 23:50
Core Viewpoint - Alibaba Cloud has introduced a new GPU pooling system called Aegaeon, which significantly reduces the demand for NVIDIA GPUs by 82% through innovative resource allocation techniques [1][3]. Group 1: Research Background - The research was conducted in collaboration with Peking University, led by Alibaba Cloud's CTO Zhou Jingren [2]. - The study identified that 17.7% of GPU resources were allocated to underutilized models, which only accounted for 1.35% of total request volume [4]. Group 2: Aegaeon's Innovations - Aegaeon addresses the inefficiencies in GPU resource allocation by implementing token-level automatic scaling technology, allowing for dynamic model switching during token generation rather than waiting for entire requests to complete [10][11]. - The system has achieved a 97% reduction in the overhead associated with automatic scaling through various optimizations, including an 80% reduction in initialization overhead and improved memory management [14][15]. Group 3: Performance Outcomes - Aegaeon has demonstrated performance improvements of up to 9 times, with a minimum of 1.5 times, compared to existing systems like ServerlessLLM and MuxServe [18]. - In practical deployment, Aegaeon has serviced 47 models of varying sizes, increasing GPU utilization from 13.3%-33.9% to 48.1% without any service level objective violations or interruptions [20].