量子位
Search documents
统一视觉多模态与多任务!快手可灵与港科大团队发布视频生成模型,加速真实世界理解
量子位· 2025-12-14 07:12
Core Insights - The article introduces UnityVideo, a new visual framework developed by research teams from Hong Kong University of Science and Technology, Chinese University of Hong Kong, Tsinghua University, and Kuaishou, which enhances video generation by integrating multiple visual modalities [1][3][4]. Group 1: Model Capabilities - UnityVideo utilizes unified training across various visual modalities such as depth maps, optical flow, skeletons, and segmentation masks, allowing the model to better understand the physical world and generate more realistic and controllable videos [3][12]. - The model demonstrates zero-shot generalization, enabling it to generate reasonable results for previously unseen objects or scenes [4][16]. - The unified training approach significantly accelerates convergence speed and improves performance in RGB video generation tasks compared to single modality training [15][16]. Group 2: Technical Innovations - UnityVideo features dynamic task routing, allowing seamless integration of three training paradigms within a single architecture [19]. - A key breakthrough is the dynamic noise scheduling strategy, which randomly selects training modes during iterations, preventing catastrophic forgetting and enabling harmonious coexistence of multiple training objectives [21][22]. - The model incorporates a context learner and a modality-adaptive switcher to effectively distinguish between different modality signals, enhancing its ability to generalize across tasks [27][30]. Group 3: Training Strategy - UnityVideo employs a two-phase curriculum learning strategy, first training on carefully selected single-person scene data to establish spatial correspondence, followed by introducing all modalities and diverse scene data [33][35]. - The OpenUni dataset, containing 1.3 million multimodal video samples, supports this unified training paradigm, ensuring balanced sampling across modalities [35][36]. Group 4: Performance Results - UnityVideo outperforms existing models in various tasks, achieving high scores in physical reasoning, controllable generation, and modality estimation [39][41]. - The model's qualitative results demonstrate superior understanding of physical phenomena, such as light refraction in water, and maintains high video quality without common issues like background flickering [41][42]. - In quantitative comparisons, UnityVideo achieves a background consistency score of 97.44% and an aesthetic quality score of 64.12% in text-to-video generation tasks [44]. Group 5: Generalization and Understanding - The model exhibits strong generalization capabilities, accurately estimating unseen data and overcoming overfitting issues common in specialized models [43][56]. - UnityVideo's design emphasizes the importance of integrating multiple dimensions of perception, akin to human understanding, which enhances its ability to model physical laws and improve overall video generation quality [60][65].
OpenAI突然开源新模型!99.9%的权重是0,新稀疏性方法代替MoE
量子位· 2025-12-14 05:17
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 破解AI胡说八道的关键,居然是给大模型砍断99.9%的连接线? 也就是 Circuit Sparsity 技术的开源实现。 这是一种通过人为约束模型内部连接的稀疏性,让模型计算过程可拆解、可理解的大语言模型变体,本质上是为了解决传统稠密Transformer 的黑箱问题,让内部的计算电路能被人类清晰解读,知道AI是如何做决策的,避免轻易相信AI的胡话(doge)。 OpenAI悄悄开源新模型,仅有0.4B参数,且99.9%的权重为零。 更有人直言这种「极致稀疏+功能解耦」的思路可能会让当下热门的MoE(混合专家模型)走上末路。 那么,当Transformer的权重被训练到近乎全0,会发生什么呢? 放弃粗糙近似,追求原生稀疏 先说说为啥这个模型的思考过程能像电路图一样好懂。 咱们平时用的传统大模型,内部神经元连接得密密麻麻,权重矩阵几乎全为非零值,信息传递呈现出高度叠加状态,就像一团扯不开的乱线, 没人能说清它是怎么得出某个结论的。 这些留存的非零权重连接就像电路图里的导线,信息只能沿着固定路径传递;同时,模型还会通过 均值屏蔽 剪枝方法,为每个任务拆出专属 ...
为Token付费是一件很愚蠢的事情,用户应该为智能付费丨RockAI刘凡平@MEET2026
量子位· 2025-12-13 08:30
Core Insights - The next stage of artificial intelligence (AI) development requires overcoming two major challenges: the Transformer architecture and the backpropagation algorithm [1][7][54] - The focus should shift from larger models to creating "living" models that possess native memory, autonomous learning, and continuous evolution capabilities [2][4][48] - This transition signifies a move from centralized cloud computing to decentralized learning, where each device can contribute to knowledge generation [3][5][70] Group 1: Hardware Awakening - The concept of "hardware awakening" suggests that devices can learn and adapt in real-time, transforming them from mere tools into active intelligent agents [4][64] - A multitude of such intelligent agents collaborating in the real world can lead to the emergence of collective intelligence [5][71] - The current reliance on the Transformer model limits the potential for true intelligence, as it does not facilitate autonomous learning or native memory [21][30][76] Group 2: Redefining Value - The future of AI will redefine the value of hardware, moving beyond traditional metrics like memory and processing power to focus on the co-creation of value between users and devices [64][66] - Users should pay for intelligence rather than token consumption, as the latter is seen as an inefficient model [15][19][21] - The emergence of devices with autonomous learning capabilities will enhance user experience and privacy, as data remains localized [68][69] Group 3: Collective Intelligence - Collective intelligence arises when each device possesses its own intelligence and can learn from the physical world, similar to human collaboration [71][76] - True intelligence is characterized by the ability to generate knowledge rather than merely disseminating it, which is a limitation of current large models [75][77] - The path to general artificial intelligence is through collective intelligence rather than the centralized model exemplified by companies like OpenAI [77]
太初元碁乔梁:AI算法已经跑到单芯片极限|MEET2026
量子位· 2025-12-13 06:30
Core Viewpoints - The demand for computing power in industry applications is increasing exponentially due to the development of AI technology, which requires algorithms to achieve millisecond-level accuracy [1][7] - High-performance computing (HPC) will be a foundational support across various computing scenarios, from manufacturing to scientific research and AI applications [2][13] - The concept of "super-intelligent integration" has become a consensus in the industry, emphasizing the need for heterogeneous integration in hardware architecture to meet the growing computing demands of AI algorithms [3][10] Group 1 - The evolution of the computing era has shifted from traditional scientific computing to "super-intelligent integration," driven by the increasing need for computing power in AI applications [7][12] - AI's demand for computing power is largely due to the generalization of AI algorithms, which require substantial computational resources for various AI models and agents [9][10] - The importance of high-performance computing is underscored as it will permeate traditional scientific research, manufacturing, and AI applications, presenting significant market opportunities for hardware and software developers [13][16] Group 2 - The company focuses on high-performance computing and AI integration, aiming to enhance the capabilities of AI algorithms through advanced hardware design, such as the TC link for high-speed interconnection [25][27] - The development of an open-source ecosystem is essential for the growth of the AI industry, with the company advocating for collaboration among enterprises to build a robust AI ecosystem [27][28] - The company is actively involved in practical applications of HPC and AI in various fields, including scientific research, energy, and low-altitude economy, demonstrating its commitment to leveraging technology for societal benefits [28][34][36]
量子位编辑作者招聘
量子位· 2025-12-13 04:34
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 跟进AI基建层新进展,包括但不限于芯片、AI Infra、云计算领域新进展,核心玩家动态; 做前沿论文、开源社区、技术大会 (Hot Chips、NeurIPS、MLSys) 技术报告大众化解读; 参与 ...
面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS'25
量子位· 2025-12-13 04:34
△ 卫星星座任务规划效果展示 卫星星座是由多颗卫星组成的协同网络,具备远超单星的全球覆盖、快速响应和高频观测能力。从美国的巨型卫星通信星座到我国的"千帆"星 座, 卫星星座已从科幻概念走向产业核心,成为数字经济时代的基础设施。 这些运行在距地数百公里的卫星星座,正默默支撑着遥感、通信、导航、气象预测等关键行业。但每一个稳定运行的星座背后,都藏着一个高 维、动态、强约束的规划难题。 如何在短短几分钟的观测窗口内,调度数十颗卫星形成协同观测网络,执行上百项任务,同时响应地震救 援、海上搜救、森林火灾等突发需求? 人工智能技术正在成为破解这一难题的关键钥匙。北航刘偲教授团队提出 首个大规模真实星座调度基 准AEOS-Bench ,更创新性地将Transformer模型的泛化能力与航天工程的专业需求深度融合,训练 内嵌时间约束的调度模型AEOS- Former 。这一组合为未来的"AI星座规划"奠定了新的技术基准。 AEOS-Bench&AEOS-Former团队 投稿 量子位 | 公众号 QbitAI 将卫星星座送入轨道我们都知道很难,但高效规划调度在轨卫星星座执行任务也不简单。 随着部署的星座规模越来越大,通过人 ...
美国视频生成老炮儿,入局世界模型
量子位· 2025-12-13 04:34
Core Insights - Runway has launched its first general world model GWM-1, which is based on the latest Gen-4.5 video generation model [1][8] - The GWM-1 includes three variants: GWM Worlds, GWM Avatars, and GWM Robotics, each designed for different applications [5][12] Group 1: GWM-1 Overview - GWM-1 utilizes an autoregressive architecture that allows for frame-by-frame prediction based on previous memory content [9] - The model supports real-time interactive control, enabling users to adjust camera angles, modify robot operation commands, or audio [10] Group 2: GWM Worlds - GWM Worlds allows users to explore a coherent and responsive environment without manually designing each space [13] - Users can provide a static scene for reference, and the model will generate an immersive, infinite, and explorable space in real-time [13] - It maintains spatial consistency of scene elements during long sequences of movement, unlike other world models that generate limited frame sequences [13] - Users can change physical rules of the environment through text prompts, facilitating training for agents in real-world actions [15][16] - GWM Worlds can also support VR immersive experiences by generating virtual environments in real-time [17] Group 3: GWM Avatars - GWM Avatars is an audio-driven interactive video generation model that simulates human dialogue with realistic facial expressions and gestures [18][19] - It can serve as a personalized tutor or enhance customer service by creating digital humans that can interact naturally [20] - The model is set to launch with an API for integration into various products or services [22] Group 4: GWM Robotics - GWM Robotics functions as a learning-based simulator rather than a fixed-rule programming model, predicting video sequences based on robot data [23] - It generates synthetic training data to enhance existing robot datasets without the need for expensive real-world data collection [24] - The model allows for direct testing of strategy models without deploying them on physical robots, improving safety and efficiency [26] - A Python SDK for GWM Robotics has been released, supporting multi-view video generation and long context sequences for seamless integration into modern robot strategy models [29] Group 5: Gen-4.5 Upgrades - The latest Gen-4.5 update includes native audio generation and editing capabilities, allowing for realistic dialogue, sound effects, and background audio [30][31] - Users can edit existing audio to meet specific needs and utilize multi-shot editing for consistent transformations across video segments [33]
半世纪难题48小时破解!陶哲轩组队把AI数学玩成打怪游戏了
量子位· 2025-12-13 04:34
Core Viewpoint - The collaboration between mathematicians and AI has led to the resolution of the long-standing Erdős 1026 problem, which had remained unsolved for 50 years, in just 48 hours [1][2][3]. Group 1: Problem Overview - The Erdős 1026 problem was proposed in 1975 and involves determining the minimum possible value of a function related to a game theory scenario involving two players, Alice and Bob [8][10][12]. - The problem's complexity was highlighted by the introduction of a maximum constant c(n) that represents the minimum proportion of coins Bob can guarantee to take, regardless of how Alice distributes them [10][13]. Group 2: AI's Role in the Solution - AI tools played a crucial role in solving the problem quickly, with traditional methods potentially taking weeks or months to reach a conclusion [3][5]. - The use of AI models, such as Harmonic and AlphaEvolve, allowed mathematicians to automate the construction and proof of key inequalities, transforming the original problem into a computational geometry challenge [16][18][22]. Group 3: Collaborative Efforts - The solution involved multiple mathematicians working together, with contributions from Boris Alexeev, Koishi Chan, and Lawrence Wu, showcasing the effectiveness of human-AI collaboration [17][28][32]. - The collaborative approach of combining human insight with AI capabilities is emerging as a new trend in mathematical problem-solving [46]. Group 4: Historical Context and Future Implications - The Erdős problems, proposed by the renowned mathematician Paul Erdős, have been a significant part of mathematical research, with many remaining unsolved [39][41]. - The increasing success of AI in solving these problems suggests a shift in how mathematical research may be conducted in the future, with AI becoming a standard tool for researchers [41][42].
交大高金朱宁:经济学家视角下AI时代的范式思维转变 | MEET2026
量子位· 2025-12-13 02:00
Core Viewpoints - The concept of scarcity has changed after the emergence of AI, prompting a need for deeper consideration on how to make better choices in the face of this new reality [6][11] - As AI begins to replace human decision-making, competition may arise between humans and algorithms, as well as among algorithms themselves [6][22] Economic Implications - Economics has historically focused on technological progress and its impact on economic principles and human welfare, with fundamental concepts like "what is human?" and "what is production?" undergoing significant changes in the AI era [8][11] - The traditional view of scarcity, which included time, computational power, and creativity, is being challenged as AI can now perform tasks that previously required significant human effort [11][12] - AI is expected to contribute to global economic growth by 0.5% to 0.7% annually over the next decade, although this may not be sufficient to support high valuations in tech markets [14][24][25] Industry Impact - The nature of work is changing, with both white-collar and blue-collar jobs facing potential replacement by AI, blurring the lines between these categories [31] - Knowledge-intensive industries, previously thought to be safe from AI disruption, are also at risk as AI capabilities evolve [33] - Companies are encouraged to focus on how to leverage AI technology to enhance productivity and efficiency rather than seeking industries that are immune to AI [33] Global Considerations - There is a significant disparity in access to AI capabilities between high-income and low-income countries, which may exacerbate global wealth distribution issues [28][29] - The shift towards AI-driven trade will lead to new regulatory and governance challenges, particularly regarding accountability in cross-border transactions [30]
中国机器人比赛应急救援,美国网友Reddit破防:我们还在给机器狗化妆拍段子
量子位· 2025-12-12 06:41
亨利 发自 凹非寺 量子位 | 公众号 QbitAI 崇"洋"媚外这一块,也是轮到美国网友了! 最近,一篇"中国机器人在比火场救人,美国机器狗还在给扎克伯格套脸?"的帖子被顶上了Reddit热门。 中国的机器人已经在比赛应急救援了,咱还在给机器狗化妆拍段子。说不落后,那是骗自己。 底下有位网友还来了句神补刀: 不是我们的科学家不干活,而是经费全被这种花里胡哨的玩梗项目吸走了(笑)。 这多多少少让一众美国网友有点破防。 毕竟,这可不是给机器人绑个消防栓,拍段子炒预期,而是已经切切实实地成为一个能上场PK、评分的项目了。 这位发帖的网友表示: 而这一救援项目,就出自最近在上海举办的 GDPS 2025(全球开发者先锋大会暨国际具身智能技能大赛) 。 有意思的是,因为这次GDPS 2025急的、破防的还并不在少数。 如此看来,中国具身好像反倒是外国人更关注,盯得更紧了。 这是怎么一回事? 中国具身还是外国人盯得紧 有一说一,最近外国网友明显开始关注中国具身智能的发展了,而且比咱自己人都盯得紧。 这次的GDPS 2025就是一个典型。 除了上面的机器人应急救援比赛以外,GDPS 2025比赛的规模也实属给外国网友刺激 ...