量子位
Search documents
何恺明组三位本科生领衔!持续聚焦Flow模型,突破归一化流生成效率瓶颈
量子位· 2025-12-15 04:04
鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 何恺明团队新作,持续聚焦Flow模型。 论文提出名为 双向归一化流 (BiFlow) 的新框架,通过解耦前向过程——将数据映射为噪声,和逆向过程——把噪声再转回来生成图片, 成功打破了传统归一化流生成模型效率低下的问题。 值得一提的是,论文的三位一作分别是来自清华姚班和MIT的本科生。 BiFlow:逆向过程不必是前向过程的精确逆运算 归一化流方法 (NFs) 已经成为生成建模的一种原则性框架。 标准的归一化流包含前向过程和逆向过程: 与MeanFlow对流匹配的优化不同,这次主要旨在解决归一化流在生成模型中的局限。 前向过程将数据映射为噪声,逆向过程则通过对前向过程求逆来生成样本。 传统的NF模型有一个硬性规定,逆向过程必须是前向过程的精确逆运算——要像钥匙和锁一样完全匹配。这就导致了两个问题: BiFlow的核心创新就在于, 打破了"逆向过程必须是前向过程的精确逆运算"这一规则 。 设计思路是这样的: BiFLow解耦了前向过程和逆向过程的设计。 模型设计受限:因为要保证 "可逆",不能使用很多强大的通用架构 (比如视觉Transformer) ,得特 ...
低调霸榜全球最难SQL榜单超两月,国产AI这次选择高调开源!
量子位· 2025-12-14 07:12
Core Viewpoint - Ant Group's AI division, Ant Financial Technology, has made significant strides in the AI data analysis field, recently achieving top rankings in global SQL benchmarks and announcing the open-source release of its Agentar-SQL series, which includes comprehensive frameworks for real-time text-to-SQL conversion and other data capabilities [2][4][5]. Group 1: Achievements and Innovations - Ant Group's Agentar-Scale-SQL achieved a dual first-place ranking in the BIRD benchmark with an execution accuracy of 81.67% and execution efficiency of 77% [5]. - The average query accuracy of Ant Group's Agentar SQL tools exceeded 92% during a trial with a major city commercial bank, representing over a threefold improvement compared to traditional query methods [7]. - Ant Group's AI solutions have been adopted by 100% of state-owned commercial banks and over 60% of local commercial banks in China, indicating a strong market presence [18]. Group 2: Strategic Focus and Market Approach - Ant Group's CEO emphasized that the true value of AI lies in its ability to address real-world industry challenges rather than just technological advancement [9]. - The company has adopted a unique "pay-for-performance" model, reducing the barriers for small and medium-sized institutions to implement AI by allowing them to pay based on tangible business outcomes [42][43]. - Ant Group has established deep partnerships with 300 collaborators, serving over 13,000 end customers, and has upgraded its "Xinglan Plan" to enhance partner capabilities across various dimensions [45][47]. Group 3: Broader Applications and Future Directions - The AI methodologies developed in the financial sector are being adapted for broader applications, such as in public transportation and energy sectors, showcasing the versatility of Ant Group's AI capabilities [27][30][37]. - Ant Group's AI solutions have gained international recognition, serving over a hundred overseas financial institutions and being selected for the Hong Kong Monetary Authority's generative AI sandbox project [48][49]. - The company is positioned as a leader in the AI industry, with its technology being recognized for its robustness and applicability in various sectors beyond finance [20].
量子位编辑作者招聘
量子位· 2025-12-14 07:12
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring sensitivity to data and strong logical structuring [11]. - **AI Product Direction**: Involves monitoring the application of AI in software and hardware, conducting product evaluations, and engaging with entrepreneurs and product experts [11]. Group 3: Benefits and Growth - Employees will have the opportunity to engage with cutting-edge AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses, fostering a dynamic and open work environment [6]. Group 4: Company Growth Metrics - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
统一视觉多模态与多任务!快手可灵与港科大团队发布视频生成模型,加速真实世界理解
量子位· 2025-12-14 07:12
Core Insights - The article introduces UnityVideo, a new visual framework developed by research teams from Hong Kong University of Science and Technology, Chinese University of Hong Kong, Tsinghua University, and Kuaishou, which enhances video generation by integrating multiple visual modalities [1][3][4]. Group 1: Model Capabilities - UnityVideo utilizes unified training across various visual modalities such as depth maps, optical flow, skeletons, and segmentation masks, allowing the model to better understand the physical world and generate more realistic and controllable videos [3][12]. - The model demonstrates zero-shot generalization, enabling it to generate reasonable results for previously unseen objects or scenes [4][16]. - The unified training approach significantly accelerates convergence speed and improves performance in RGB video generation tasks compared to single modality training [15][16]. Group 2: Technical Innovations - UnityVideo features dynamic task routing, allowing seamless integration of three training paradigms within a single architecture [19]. - A key breakthrough is the dynamic noise scheduling strategy, which randomly selects training modes during iterations, preventing catastrophic forgetting and enabling harmonious coexistence of multiple training objectives [21][22]. - The model incorporates a context learner and a modality-adaptive switcher to effectively distinguish between different modality signals, enhancing its ability to generalize across tasks [27][30]. Group 3: Training Strategy - UnityVideo employs a two-phase curriculum learning strategy, first training on carefully selected single-person scene data to establish spatial correspondence, followed by introducing all modalities and diverse scene data [33][35]. - The OpenUni dataset, containing 1.3 million multimodal video samples, supports this unified training paradigm, ensuring balanced sampling across modalities [35][36]. Group 4: Performance Results - UnityVideo outperforms existing models in various tasks, achieving high scores in physical reasoning, controllable generation, and modality estimation [39][41]. - The model's qualitative results demonstrate superior understanding of physical phenomena, such as light refraction in water, and maintains high video quality without common issues like background flickering [41][42]. - In quantitative comparisons, UnityVideo achieves a background consistency score of 97.44% and an aesthetic quality score of 64.12% in text-to-video generation tasks [44]. Group 5: Generalization and Understanding - The model exhibits strong generalization capabilities, accurately estimating unseen data and overcoming overfitting issues common in specialized models [43][56]. - UnityVideo's design emphasizes the importance of integrating multiple dimensions of perception, akin to human understanding, which enhances its ability to model physical laws and improve overall video generation quality [60][65].
OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE
量子位· 2025-12-14 05:17
Core Viewpoint - The article discusses the introduction of Circuit Sparsity technology, which allows for a significant reduction in the connections of large language models, making them more interpretable and efficient by retaining only 0.1% of the connections while achieving similar performance to traditional dense models [1][3][6]. Group 1: Circuit Sparsity Technology - Circuit Sparsity is a method that enforces sparsity in the internal connections of models, making the computation process more understandable and addressing the black-box nature of traditional dense Transformers [6][10]. - The model retains only 0.1% of its connections, allowing for a clear and traceable decision-making process, akin to a circuit diagram [10][12]. - Experimental data shows that the task-specific circuits of sparse models are 16 times smaller than those of dense models while maintaining necessary and sufficient conditions for task completion [14]. Group 2: Comparison with MoE Models - The article contrasts Circuit Sparsity with the Mixture of Experts (MoE) model, which uses a gating network to split the model into multiple expert sub-networks, leading to issues such as feature fragmentation and knowledge redundancy [16][18]. - Circuit Sparsity aims for native sparsity, allowing for clearer feature representation and avoiding the interference seen in MoE models [18]. - Despite its advantages, Circuit Sparsity currently faces high computational costs, being 100 to 1000 times more demanding than traditional dense models, which may limit its immediate applicability in the industry [20][21]. Group 3: Future Directions - The team plans to expand Circuit Sparsity technology to larger models to unlock more complex reasoning circuits, indicating ongoing research in AI interpretability [22]. - Two potential methods to overcome the training efficiency challenges of sparse models have been identified: extracting sparse circuits from existing dense models and optimizing training mechanisms for new interpretable sparse models [24].
为Token付费是一件很愚蠢的事情,用户应该为智能付费丨RockAI刘凡平@MEET2026
量子位· 2025-12-13 08:30
Core Insights - The next stage of artificial intelligence (AI) development requires overcoming two major challenges: the Transformer architecture and the backpropagation algorithm [1][7][54] - The focus should shift from larger models to creating "living" models that possess native memory, autonomous learning, and continuous evolution capabilities [2][4][48] - This transition signifies a move from centralized cloud computing to decentralized learning, where each device can contribute to knowledge generation [3][5][70] Group 1: Hardware Awakening - The concept of "hardware awakening" suggests that devices can learn and adapt in real-time, transforming them from mere tools into active intelligent agents [4][64] - A multitude of such intelligent agents collaborating in the real world can lead to the emergence of collective intelligence [5][71] - The current reliance on the Transformer model limits the potential for true intelligence, as it does not facilitate autonomous learning or native memory [21][30][76] Group 2: Redefining Value - The future of AI will redefine the value of hardware, moving beyond traditional metrics like memory and processing power to focus on the co-creation of value between users and devices [64][66] - Users should pay for intelligence rather than token consumption, as the latter is seen as an inefficient model [15][19][21] - The emergence of devices with autonomous learning capabilities will enhance user experience and privacy, as data remains localized [68][69] Group 3: Collective Intelligence - Collective intelligence arises when each device possesses its own intelligence and can learn from the physical world, similar to human collaboration [71][76] - True intelligence is characterized by the ability to generate knowledge rather than merely disseminating it, which is a limitation of current large models [75][77] - The path to general artificial intelligence is through collective intelligence rather than the centralized model exemplified by companies like OpenAI [77]
太初元碁乔梁:AI算法已经跑到单芯片极限|MEET2026
量子位· 2025-12-13 06:30
Core Viewpoints - The demand for computing power in industry applications is increasing exponentially due to the development of AI technology, which requires algorithms to achieve millisecond-level accuracy [1][7] - High-performance computing (HPC) will be a foundational support across various computing scenarios, from manufacturing to scientific research and AI applications [2][13] - The concept of "super-intelligent integration" has become a consensus in the industry, emphasizing the need for heterogeneous integration in hardware architecture to meet the growing computing demands of AI algorithms [3][10] Group 1 - The evolution of the computing era has shifted from traditional scientific computing to "super-intelligent integration," driven by the increasing need for computing power in AI applications [7][12] - AI's demand for computing power is largely due to the generalization of AI algorithms, which require substantial computational resources for various AI models and agents [9][10] - The importance of high-performance computing is underscored as it will permeate traditional scientific research, manufacturing, and AI applications, presenting significant market opportunities for hardware and software developers [13][16] Group 2 - The company focuses on high-performance computing and AI integration, aiming to enhance the capabilities of AI algorithms through advanced hardware design, such as the TC link for high-speed interconnection [25][27] - The development of an open-source ecosystem is essential for the growth of the AI industry, with the company advocating for collaboration among enterprises to build a robust AI ecosystem [27][28] - The company is actively involved in practical applications of HPC and AI in various fields, including scientific research, energy, and low-altitude economy, demonstrating its commitment to leveraging technology for societal benefits [28][34][36]
量子位编辑作者招聘
量子位· 2025-12-13 04:34
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring strong analytical skills and a passion for interviews [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, requiring a keen understanding of product experiences and market trends [11]. Group 3: Benefits and Growth - Employees can expect to engage with cutting-edge AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits, and a supportive environment for professional growth, including mentorship from senior editors [6][12]. Group 4: Company Impact - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS'25
量子位· 2025-12-13 04:34
△ 卫星星座任务规划效果展示 卫星星座是由多颗卫星组成的协同网络,具备远超单星的全球覆盖、快速响应和高频观测能力。从美国的巨型卫星通信星座到我国的"千帆"星 座, 卫星星座已从科幻概念走向产业核心,成为数字经济时代的基础设施。 这些运行在距地数百公里的卫星星座,正默默支撑着遥感、通信、导航、气象预测等关键行业。但每一个稳定运行的星座背后,都藏着一个高 维、动态、强约束的规划难题。 如何在短短几分钟的观测窗口内,调度数十颗卫星形成协同观测网络,执行上百项任务,同时响应地震救 援、海上搜救、森林火灾等突发需求? 人工智能技术正在成为破解这一难题的关键钥匙。北航刘偲教授团队提出 首个大规模真实星座调度基 准AEOS-Bench ,更创新性地将Transformer模型的泛化能力与航天工程的专业需求深度融合,训练 内嵌时间约束的调度模型AEOS- Former 。这一组合为未来的"AI星座规划"奠定了新的技术基准。 AEOS-Bench&AEOS-Former团队 投稿 量子位 | 公众号 QbitAI 将卫星星座送入轨道我们都知道很难,但高效规划调度在轨卫星星座执行任务也不简单。 随着部署的星座规模越来越大,通过人 ...
美国视频生成老炮儿,入局世界模型
量子位· 2025-12-13 04:34
Core Insights - Runway has launched its first general world model GWM-1, which is based on the latest Gen-4.5 video generation model [1][8] - The GWM-1 includes three variants: GWM Worlds, GWM Avatars, and GWM Robotics, each designed for different applications [5][12] Group 1: GWM-1 Overview - GWM-1 utilizes an autoregressive architecture that allows for frame-by-frame prediction based on previous memory content [9] - The model supports real-time interactive control, enabling users to adjust camera angles, modify robot operation commands, or audio [10] Group 2: GWM Worlds - GWM Worlds allows users to explore a coherent and responsive environment without manually designing each space [13] - Users can provide a static scene for reference, and the model will generate an immersive, infinite, and explorable space in real-time [13] - It maintains spatial consistency of scene elements during long sequences of movement, unlike other world models that generate limited frame sequences [13] - Users can change physical rules of the environment through text prompts, facilitating training for agents in real-world actions [15][16] - GWM Worlds can also support VR immersive experiences by generating virtual environments in real-time [17] Group 3: GWM Avatars - GWM Avatars is an audio-driven interactive video generation model that simulates human dialogue with realistic facial expressions and gestures [18][19] - It can serve as a personalized tutor or enhance customer service by creating digital humans that can interact naturally [20] - The model is set to launch with an API for integration into various products or services [22] Group 4: GWM Robotics - GWM Robotics functions as a learning-based simulator rather than a fixed-rule programming model, predicting video sequences based on robot data [23] - It generates synthetic training data to enhance existing robot datasets without the need for expensive real-world data collection [24] - The model allows for direct testing of strategy models without deploying them on physical robots, improving safety and efficiency [26] - A Python SDK for GWM Robotics has been released, supporting multi-view video generation and long context sequences for seamless integration into modern robot strategy models [29] Group 5: Gen-4.5 Upgrades - The latest Gen-4.5 update includes native audio generation and editing capabilities, allowing for realistic dialogue, sound effects, and background audio [30][31] - Users can edit existing audio to meet specific needs and utilize multi-shot editing for consistent transformations across video segments [33]