Workflow
视频生成模型
icon
Search documents
仅凭"动作剪影",打通视频生成与机器人世界模型!BridgeV2W让机器人学会"预演未来"
AI科技大本营· 2026-02-11 06:50
AI 科技大本营(ID:rgznai100) 想象一下,你面前摆着一杯咖啡,你伸手去拿,在你的手真正触碰到杯子之前,你的大脑已经在"脑补"了整个过程:手臂将如何移动、杯子会是什么触 感、抬起后桌面的样子……这种对未来场景的想象和预测能力,正是人类操控世界的核心认知基石。 那么,能否赋予机器人同样的"预演能力",先在"脑海"中模拟动作后果,再付诸执行?这就是具身世界模型要做的事情:让机器人在行动前,就能"看 见"未来。近年来,借助大规模视频生成模型(如Sora、Wan等)强大的视觉先验,这一方向取得了令人瞩目的进展。 然而,一个尴尬的问题始终悬而未决: 视频生成模型的世界由像素编织而成,而机器人的语言却是关节角度与位姿坐标,它们使用完全不同的"表征语 言"描述同一个物理世界。 为了解决上述问题,中科第五纪联合中科院自动化所团队推出 BridgeV2W,它通过一个极为优雅的设计,具身掩码(Embodiment Mask),一种由 机器人动作渲染出的"动作剪影",将坐标空间的动作无缝映射到像素空间,从而真正打通预训练视频生成模型与世界模型之间的桥梁,让机器人学会可 靠地"预演未来"。 「以棱镜之思,折射 AI 研究 ...
即梦回应Seedance2.0视频生成争议
Sou Hu Cai Jing· 2026-02-10 16:59
编辑:何隆星 审定:黄青 核发:邹建宾 点亮 近期,字节跳动最新视频生成模型Seedance 2.0在即梦、豆包、小云雀等产品开启内测,引发国内国外广泛关注。值得注意的是,由于该模型在真实感上 的突出表现,引发不少网友对可能混淆现实的担忧。对于用户反馈,字节跳动很快做出反应。2月9日,即梦创作者社群中,平台运营人员发布消息 称:"Seedance 2.0 在内测期间收获了远超预期的关注,感谢大家的使用反馈。为了保障创作环境的健康可持续,我们正在针对反馈进行紧急优化,目前暂 不支持输入真人图片或视频作为主体参考",并表示平台深知创意的边界是尊重,产品调整后会以更完善的面貌与大家正式见面。 来源:新京报 新闻荐读 ...
字节跳动回应Seedance2.0隐私争议
新华网财经· 2026-02-10 08:52
近期,字节跳动最新视频生成模型Seedance 2.0在 即梦 、豆包、小云雀等产品开启内测,引发国内国外广泛关注。值得注意的是,由于 该模型在真实感上的突出表现,引发不少网友对可能混淆现实的担忧。对于用户反馈,字节跳动很快做出反应。2月9日,即梦创作者社群 中,平台运营人员发布消息称:"Seedance 2.0 在内测期间收获了远超预期的关注,感谢大家的使用反馈。为了保障创作环境的健康可持 续,我们正在针对反馈进行紧急优化,目前暂不支持输入真人图片或视频作为主体参考",并表示平台深知创意的边界是尊重,产品调整后 会以更完善的面貌与大家正式见面。 多家银行,上调存款利率 雷军:春节期间小米汽车因自身故障抛锚,可报销1500元高铁/机票费和 500元住宿费 来源:新京报 关注" 新华网财经 "视频号 更多财经资讯等你来看 往期推荐 ...
冯骥谈即梦Seedance2.0:很庆幸来自中国,AIGC的童年时代结束了
Xin Lang Cai Jing· 2026-02-09 08:09
"我很庆幸,至少今天的Seedance 2.0,来自中国。"他表示。 游戏科学CEO冯骥今日发文谈及刚上线的即梦Seedance 2.0:AI理解多模态信息并整合的能力完成了一 次飞跃,令人惊叹。AIGC的童年时代,结束了。 以下为微博全文: 昨天试了试刚上线即梦的Seedance 2.0。 4、产能爆炸。一般性视频的制作成本将无法再沿用影视行业的传统逻辑,开始逐渐趋近算力的边际成 本。内容领域必将迎来史无前例的通货膨胀,传统组织结构与制作流程会被彻底重构。相信只要用过的 朋友,都能很快理解这个预测绝非杞人忧天。 5、视频全民化。基于第4点,一切之前需要考虑制作成本的展示方式都会轻易视频化,典型如电商广告 与预拍摄。虽然目前游戏研发的核心环节尚未被AI直接侵蚀,但视频本身会逐渐向定制化、实时化、 游戏化演进,也许会成为未来某种全新的娱乐方式。 它在使用手册标题的最后,写着Kill the game!(杀死比赛)。这个评价,相当客观。 AIGC的童年时代,结束了。 写几条初步用过后的感受以及预测,仅供参考: 1、领先。当前地表最强的视频生成模型,没有之一。 2、全能。AI理解多模态信息(文、画、影、音)并整合的 ...
海外“屠榜”、国内强敌环伺:可灵AI月收入飙至1.4亿元,但视频生成大战才刚打响
Sou Hu Cai Jing· 2026-01-15 00:57
Core Insights - Kuaishou's AI product, Keling AI, has rapidly increased its monthly revenue from 150 million yuan to 140 million yuan in less than a year, with an annual recurring revenue (ARR) projected to reach 240 million USD (approximately 1.68 billion yuan) by December 2025 [1][4] - The launch of the "Motion Control" feature during the New Year holiday significantly boosted Keling AI's downloads, leading to a 28.15% increase in Kuaishou's stock price by January 14 [1][4] Revenue Growth - Keling AI's revenue surpassed 140 million yuan in December 2025, showing a noticeable acceleration in commercialization compared to 150 million yuan in Q1 2025 [4] - The growth in revenue is attributed to increased demand for video generation models and Kuaishou's ongoing investment in computational power [4][5] Competitive Landscape - The competition in the domestic video generation sector is intensifying, with Alibaba Cloud's Wanxiang 2.6 and Volcano Engine's Seedance 1.5 Pro targeting the same market as Keling AI [3][6] - Keling AI's revenue structure indicates that nearly 70% comes from paid subscriptions by professional users, such as content creators and marketers [6] Future Directions - Kuaishou plans to focus its capital expenditures on computational upgrades and technological advancements over the next 1-2 years [5] - To expand its user base, Keling AI aims to enhance its product offerings for ordinary users and integrate social interaction features [9][10] Product Innovation - Keling AI has launched several innovative models, including the "Keling O1" and "Keling 2.6," which support advanced features like audio-visual synchronization and multi-angle generation [6][7] - The recent success of the "Motion Control" feature, which allows users to create dynamic videos from static images, exemplifies Keling AI's strategy to simplify user experience and broaden its appeal [10]
一个人用AI,也能拍出获奖电影?|看不见的新大陆
Sou Hu Cai Jing· 2026-01-03 16:01
Core Insights - The article discusses the emergence of AI video generation technology, highlighting the journey of Wang Changhu and his company, Aishi Technology, which aims to democratize video creation for everyone [1][9][32] Group 1: Industry Overview - The AI industry is experiencing a significant transformation, comparable to the impact of steam engines, electricity, and computers, with AI models becoming a universal technology [1][4] - 2022 marked the beginning of the AIGC era, with tools like Midjourney and ChatGPT showcasing the potential of AI, leading to a new wave of interest and investment in AI technologies [4][5] - The AI video generation sector was initially viewed skeptically by experts, who doubted its viability within five years, yet Aishi Technology has successfully launched a leading AI video model [6][10] Group 2: Company Strategy - Aishi Technology's strategy focuses on video generation, a less popular but potentially more impactful area compared to text and image models, allowing the company to carve out a unique niche [10][11] - The company aims to make video creation accessible to everyone, transforming ordinary users into directors of their own content, thus lowering the barriers to entry in video production [13][14] - The product "拍我AI" (Shoot Me AI) exemplifies this philosophy, enabling users to create videos with simple prompts, significantly enhancing user engagement and creativity [15][16] Group 3: Growth Mechanism - Aishi Technology has developed a self-reinforcing growth cycle, where high-quality models attract users, who in turn provide valuable data that enhances the models and products [20][22] - The company’s approach combines advanced technology with user-friendly design, resulting in a product that appeals to a broad audience and encourages content creation [21][22] Group 4: Organizational Efficiency - Aishi Technology emphasizes extreme organizational efficiency and precise technical judgment to compete with larger firms, aiming for ten times the efficiency of its competitors [24][25] - The company adopts a flat organizational structure to facilitate rapid decision-making and resource allocation, enhancing overall productivity [26][27] - Continuous innovation and attracting top talent are crucial for maintaining a competitive edge in the rapidly evolving AI landscape [28]
全球功能最全的视频生成模型来了
量子位· 2025-12-17 10:00
Core Viewpoint - Alibaba has launched the new Tongyi Wansxiang 2.6 model, which is the most comprehensive video generation model globally, covering various capabilities such as text-to-video, image generation, and audio-driven video creation [1]. Group 1: Video Generation Capabilities - The Wansxiang 2.6 model introduces multi-audio driven video capabilities, along with features like audio-visual synchronization and multi-shot storytelling, which were not available in Sora 2 [2]. - The model demonstrates significant improvements in artistic style control, realistic portrait generation, and understanding of historical and cultural semantics in image generation [3][8]. - The model's video generation capabilities include video reference generation, maintaining subject consistency, and natural audio-visual synchronization, which enhances the overall user experience [11][12]. Group 2: Performance Testing - Initial tests show that Wansxiang 2.6 performs well in video subject consistency and prompt understanding, achieving a near 1:1 replication of the subject's appearance and matching lip movements accurately [11]. - The model's ability to generate multi-shot narratives is effective, with smooth transitions and coherent storytelling across different shots, although some abstract actions may still pose challenges [17][18]. - The model's aesthetic quality in video generation has improved, showcasing a cinematic feel and strong visual appeal, particularly in complex scenes like cyberpunk cityscapes [14][24]. Group 3: Image Generation Enhancements - Wansxiang 2.6 has made advancements in image generation, particularly in style transfer, portrait generation, and bilingual text handling, demonstrating a better grasp of new aesthetic styles [19][22]. - The model successfully generated a food promotional poster with clear bilingual text and an appealing layout, indicating its reliability in aesthetic judgment [25][27]. - Overall, the model's performance is commendable, with minor flaws in multi-character dialogue and complex action understanding, but it is deemed usable for daily short video creation and secondary creation tasks [28][29].
阿里,最新发布!
证券时报· 2025-12-16 09:56
Core Viewpoint - Alibaba has launched the upgraded Wanxiang 2.6 model, which is the first video model in China to support character role-playing, enhancing video creation efficiency and quality [1]. Group 1: Product Features - The Wanxiang 2.6 model includes features such as audio-visual synchronization, multi-camera generation, and sound-driven capabilities, making it the most comprehensive video generation model globally [1]. - The model allows for a maximum video length of 15 seconds, which is the highest in China, and introduces new functionalities like role-playing and storyboard control [1]. - Users can create professional-level videos by simply uploading personal videos and inputting prompts, enabling them to act as movie protagonists in high-quality cinematic scenes [1]. Group 2: Market Position - The previous version, Wanxiang 2.5, was the first audio-visual synchronization video generation model released in China and ranked first in the LMArena evaluation [1]. - The enhancements in Wanxiang 2.6 further improve image quality, sound effects, and adherence to instructions, catering to professional film-level requirements [1].
阿里发布通义万相2.6系列模型 上线国内首个角色扮演功能
Zheng Quan Ri Bao· 2025-12-16 07:09
Core Insights - Alibaba has launched the next-generation Wanxiang 2.6 model, which is the first video model in China to support character role-playing functionality, enhancing capabilities for professional film production and image creation [1][2] Group 1: Product Features - The Wanxiang 2.6 model includes features such as audio-visual synchronization, multi-camera generation, and sound-driven capabilities, making it the most comprehensive video generation model globally [1] - The model has improved video quality, sound effects, and instruction adherence, achieving a maximum video length of 15 seconds, which is the highest in China [1] - New functionalities include character role-playing and storyboard control, allowing for one-click completion of videos with single or multiple characters, as well as automatic multi-camera switching for professional tasks [1][2] Group 2: Technological Innovations - The model integrates multiple innovative technologies for multi-modal joint modeling and learning, capturing emotional, postural, and visual features from reference videos, along with acoustic characteristics like tone and speech rate [1] - The high-level semantic understanding in storyboard control allows the creation of professional-grade multi-camera segments with a cohesive narrative and atmosphere [2] Group 3: User Accessibility - Users can experience Wanxiang 2.6 directly on the official website, and enterprise users can access the model API through Alibaba Cloud [2] - The model family supports over ten visual creation capabilities, including text-to-image, image editing, text-to-video, and video editing, widely applied in AI comics, advertising design, and short video creation [2]
阿里发布万相2.6系列模型,上线国内首个角色扮演功能
Ge Long Hui· 2025-12-16 04:50
Core Viewpoint - Alibaba has launched the next-generation Wanxiang 2.6 series model, which is a comprehensive upgrade aimed at professional film production and image creation scenarios. This model is the first in China to support character role-playing functionality and includes features such as audio-visual synchronization, multi-camera generation, and sound-driven capabilities, making it the most feature-rich video generation model globally [1]. Group 1 - The Wanxiang 2.6 model is designed for professional film production and image creation [1]. - It is the first video model in China to support character role-playing functionality [1]. - The model includes advanced features such as audio-visual synchronization, multi-camera generation, and sound-driven capabilities [1]. Group 2 - The Wanxiang 2.6 model has been made available on Alibaba Cloud's Bailian and the Wanxiang official website [1].