Workflow
Wan
icon
Search documents
生成视频总出物理bug?用VLM迁移+token级对齐,让燃烧在正确位置发生,碰撞遵循动量守恒丨CVPR 2026近满分接收
量子位· 2026-03-19 07:09
Core Viewpoint - The article discusses the advancements in generative video models, particularly focusing on the ProPhy framework, which aims to enhance the physical understanding and spatial alignment of video generation, moving from mere visual imitation to true physical simulation [1][8][33]. Group 1: Current State of Generative Video Models - Generative video models like Wan and NVIDIA's Cosmos can create highly realistic dynamic scenes that appear to mimic the real world [1][2]. - Despite their visual realism, these models often lack a true understanding of physical principles, leading to inconsistencies in generated videos [3][6][10]. Group 2: Limitations of Existing Models - Current models primarily rely on implicit learning and coarse global physical category labels, which do not allow for a clear understanding of different physical laws and their evolution in reality [10]. - There is a lack of fine-grained spatial alignment, meaning that models cannot accurately position physical events in the generated scenes [10]. Group 3: Introduction of ProPhy - ProPhy introduces a new progressive physical alignment framework that enables video diffusion models to achieve layered physical understanding and spatial physical alignment [8][9]. - This framework allows models to not only determine what physical phenomena to present but also where these phenomena should occur in the video [8][9]. Group 4: Mechanism of ProPhy - ProPhy employs a two-stage physical expert mechanism: the Semantic Physical Expert (SEB) for macro understanding of physical structures and the Refinement Expert Block (REB) for precise spatial alignment [13][14]. - SEB identifies potential physical phenomena from textual prompts, while REB dynamically assigns the most suitable physical expert to each spatial location [13][14]. Group 5: Experimental Results - ProPhy shows significant improvements in physical correctness and semantic adherence, with a 19.7% increase in joint metrics on the VideoPhy2 benchmark [20][22]. - In dynamic performance evaluations, ProPhy enhances the Dynamic Degree metric and overall quality scores, demonstrating its effectiveness in generating physically consistent videos [23]. Group 6: Implications and Future Directions - ProPhy represents a shift from visual similarity to adherence to physical rules, indicating a move towards a controllable physical world model [26][29]. - Future developments may include integrating continuous dynamics modeling and physical engines with generative models, potentially leading to a new AI form capable of simulating the operation of the world [34].
CVPR 2026 | 给扩散模型装上「物理引擎」: 北大彭宇新团队提出NS-Diff,使扩散模型学会流体与刚体力学
机器之心· 2026-03-19 01:25
本文是 北京大学彭宇新教授团队 在文本生成视频领域的最新研究成果, 相关论文已被 CVPR 2026 接收 。 背景与动机 想象一下,当你让 AI 生成一段「牛奶倒入咖啡产生丝滑旋涡」的视频时,却发现 AI 根本无法生成出你想 要的「丝滑旋涡」。虽然如今的 Sora、Wan 等视频生成模型已经能做出如电影般华丽的画面,但它们往往 只是「画皮难画骨」—— 因为 AI 并不真正懂得现实世界的物理定律,导致生成的视频经常出现违背常识的 「穿帮」镜头。 论文标题:NS-Diff: Fluid Navier–Stokes Guided Video Diffusion via Reinforcement Learning 论文链接:http://39.108.48.32/mipl/download_paper.php?fileId=202601 开源代码:https://github.com/PKU-ICST-MIPL/NS-Diff_CVPR2026 实验室网址:https://www.wict.pku.edu.cn/mipl 在物理世界中,液体的流动遵循着复杂的纳维 - 斯托克斯(Navier-Stokes)方程,而 ...
仅凭"动作剪影",打通视频生成与机器人世界模型!BridgeV2W让机器人学会"预演未来"
机器之心· 2026-02-21 02:57
机器人如何 "脑补" 未来? 想象一下,你面前摆着一杯咖啡,你伸手去拿,在你的手真正触碰到杯子之前,你的大脑已经在 "脑补" 了整个过程:手臂将如何移动、杯子会是什么触 感、抬起后桌面的样子…… 这种对未来场景的想象和预测能力,正是人类操控世界的核心认知基石。 那么,能否赋予机器人同样的 "预演能力",先在 "脑海" 中模拟动作后果,再付诸执行?这就是 具身世界模型 要做的事情:让机器人在行动前,就能 "看 见" 未来。近年来,借助大规模视频生成模型(如 Sora、Wan 等)强大的视觉先验,这一方向取得了令人瞩目的进展。 然而,一个尴尬的问题始终悬而未决: 视频生成模型的世界由像素编织而成,而机器人的语言却是关节角度与位姿坐标,它们使用完全不同的 "表征语言" 描述同一个物理世界。 为了解决上述问题, 具身智能公司中科第五纪联合中科院自动化所团队推出 BridgeV2W ,它通过一个极为优雅的设计, 具身掩码(Embodiment Mask) ,一种由机器人动作渲染出的 "动作剪影",将坐标空间的动作无缝映射到像素空间,从而真正打通预训练视频生成模型与世界模型之间的桥梁,让 机器人学会可靠地 "预演未来"。 ...
A16Z最新洞察:视频模型从狂飙到分化,产品化是下一个机会
3 6 Ke· 2025-10-28 00:18
Core Insights - The video generation model industry is transitioning from a phase of rapid performance improvement to a "product era," focusing on diversity and specialization rather than just model parameters and benchmark scores [2][4][12] - There is a growing realization that no single model can dominate all video generation tasks, leading to a trend of specialization where different models excel in specific areas [4][11][12] - The need for better integrated products to simplify the creative process is becoming increasingly apparent, as many creators still rely on multiple tools to achieve their desired outcomes [13][15][16] Group 1: Industry Trends - The pace of progress in video generation models has slowed, with most mainstream models now capable of generating impressive 10-15 second videos with synchronized audio [1][6] - The concept of a "superior model" in the video domain is being challenged, as recent releases like Sora 2 have not consistently outperformed predecessors like Veo 3 [4][11] - The industry is witnessing a shift towards models that are tailored for specific capabilities, such as physical simulation and multi-shot editing, rather than one-size-fits-all solutions [2][11][12] Group 2: Product Development - The current landscape shows that while video generation capabilities have improved, the corresponding product development has not kept pace, leading to a gap in user experience and creative efficiency [13][15] - Companies are beginning to address this gap by developing tools that allow users to modify video elements more intuitively, such as Runway's suite of tools and OpenAI's Sora Storyboard [15][16] - The future is expected to see more specialized models for specific industries or scenarios, along with comprehensive creative toolkits that integrate various media elements into a cohesive workflow [16]
被高估的易中天
Sou Hu Cai Jing· 2025-10-15 01:18
Core Insights - The release of Sora2 by OpenAI has introduced new dynamics in the AI sector, but it does not significantly outperform existing domestic video models from ByteDance, Kuaishou, and Alibaba [2] - The competitive landscape is shifting, with Google launching Veo3.1 shortly after Sora2, indicating a potential for Google to surpass OpenAI due to its extensive infrastructure [2] - The AI hardware development path is uncertain, as companies like Alibaba and Google utilize their own AI chips, challenging the notion of a unified model dominated by OpenAI and Nvidia [3] Company Performance - New Yi Sheng reported a remarkable financial performance for the first half of 2025, with revenue reaching 10.437 billion yuan, a year-on-year increase of 282.64%, and net profit of 3.942 billion yuan, up 355.68% [5] - The company’s Q2 revenue was 6.385 billion yuan, reflecting a quarter-on-quarter growth of 57.5%, and net profit of 2.37 billion yuan, up 50.7% [7] - New Yi Sheng has transitioned from a traditional optical module supplier to a core supplier of AI computing infrastructure, capitalizing on the global AI computing investment boom [7][8] Market Trends - The demand for 800G optical modules is expected to reach 19.9 million units globally by 2025, with the market for 1.6T modules halved to 1 million units [8] - The company has successfully launched 800G/1.6T optical module products, establishing a competitive advantage in the AI computing infrastructure sector [8] - The revenue from high-speed optical modules (over 4.25G) constitutes 98.91% of total revenue, indicating a strong focus on advanced technology [9] Client Composition - The top five clients account for 72.74% of accounts receivable, primarily consisting of major cloud players like Amazon, Microsoft, and Meta [11] - ByteDance has emerged as the largest domestic client, with Alibaba expected to procure 5 million units of 800G optical modules by 2025, capturing a 25% market share [11][13] - The optimization of the client structure provides a more stable growth trajectory for the company, reducing reliance on a single market [13] Risks and Challenges - Inventory levels have increased by 43.86% to 5.944 billion yuan, with a significant rise in inventory impairment losses, indicating potential risks associated with rapid technological changes [14] - Accounts receivable have surged by 97.59% to 5.017 billion yuan, raising concerns about cash flow and potential bad debt risks [15] - The emergence of CPO technology poses a threat to traditional optical modules, with predictions suggesting it will dominate the market by 2027 [16][17]