Multimodal Generation
Search documents
创智刘鹏飞、Sand.ai曹越,两大AI青年学者团队联手,开源音视频基座模型
机器之心· 2026-03-23 04:03
开源多模态生成领域,迎来架构级的底层突破。 作为演绎级人像音视频的开源基座模型,daVinci-MagiHuman 以 150 亿参数的单流 Transformer 为核心,实现了文本、视频、音频在统一骨干网络下的联合建模, 彻底告别了跨注意力和模态专属分支。 研发团队介绍 这一成果由上海创智学院(SII)GAIR 实验室 与 Sand.ai 共同完成。 上海创智学院是由顶尖大学、头部企业和科研机构联合建设的新型人才培养机构;其 GAIR 实验室由刘鹏飞博士领导,聚焦生成式人工智能的前沿研究,涵盖多 模态视频基座模型、文本大模型预训练及智能体构建等方向。在多模态世界模型方面,实验室已展开了系统性探索:从开源首个原生无扩散的多模态模型 Anole, 到提出以生成图像进行思考的新范式 Thinking with Generated Images,再到面向实时交互场景的 LiveTalk,以及面向数字世界理解与模拟的数字基因工作,逐步构 建起从多模态生成、视觉推理到实时交互的完整研究链条。近期,该实验室已产出 daVinci-MagiHuman、Data Darwinism、daVinci-Agency、daVin ...
刚刚,AI视频的天花板被掀翻!测完SkyReels后飘了:我亦有成为专业导演的潜质
机器之心· 2025-11-04 03:45
Core Viewpoint - The article discusses the rapid evolution of AI video generation, highlighting the launch of Kunlun Wanwei's new platform, SkyReels, which aims to provide a comprehensive, low-threshold multi-modal AI video creation experience for creators [1][74]. Group 1: SkyReels Overview - SkyReels is a one-stop, zero-threshold multi-modal AI video creation platform that integrates various AI tools and models into a single creative space [2][8]. - The platform features an infinite canvas that allows users to interactively create content using top global AI models, enabling real-time effects and creative flexibility [9][10]. - SkyReels V3, the newly released multi-modal video generation model, has undergone comprehensive optimization for image, audio, and video capabilities [2][60]. Group 2: Features and Functionalities - The platform includes various creative modes such as infinite canvas, digital human narration, multi-template generation, and agent functionalities, catering to diverse creator needs [2][17]. - Users can easily generate videos by simply dragging images onto the canvas and inputting basic requirements, showcasing the platform's user-friendly design [12][28]. - The Super Agent feature allows users to brainstorm and generate content collaboratively, enhancing the creative process [16][17]. Group 3: Performance and Quality - SkyReels V3 has achieved significant advancements in video generation quality, including improved consistency between subjects and backgrounds, as well as enhanced instruction-following capabilities [60][62]. - The platform supports single-shot multi-character dialogues, allowing for natural and fluid conversations in generated videos [63][52]. - SkyReels V3 has been validated through various benchmark tests, demonstrating superior performance compared to other models in the industry [62][64]. Group 4: Market Position and Future Outlook - Kunlun Wanwei has positioned itself as a leader in the AI video generation sector, continuously enhancing its product offerings and expanding its ecosystem [74]. - The company reported a revenue of 5.8 billion yuan in the first three quarters of the year, reflecting a 52% year-on-year growth, driven by its multi-modal integration strategy [74]. - The launch of SkyReels is expected to further solidify Kunlun Wanwei's position in the global AI video market and accelerate the vision of making professional video creation accessible to everyone [74].
X @Elon Musk
Elon Musk· 2025-08-07 18:38
Hiring & AI Focus - The company is actively recruiting for roles in multimodal understanding and generation [1] - The company aims to develop future AI interfaces [1]