Workflow
自回归扩散模型
icon
Search documents
百度蒸汽机,盯上长视频生成实时交互
Core Insights - The competition in the multimodal video generation space remains intense, with no company holding a definitive long-term technological advantage, according to Baidu's Chief Architect of Commercial R&D, Li Shuanglong [2]. Group 1: Industry Developments - OpenAI recently launched its latest multimodal video generation model, Sora 2, prompting domestic AI video players, including Baidu, to frequently update their offerings [3]. - On October 15, Baidu upgraded its video generation model, Baidu Steam Engine (Wenxin Specialized), focusing on enhancing user interaction experience [3]. Group 2: Technological Advancements - The Steam Engine model now supports real-time interactive generation of long AI videos, overcoming the traditional limitation of approximately 10 seconds in video length [4]. - Users can initiate the video generation process by uploading an image and a prompt, allowing for real-time previews and modifications throughout the generation process, enabling control over the video’s plot, visuals, and transitions [4]. - The industry typically employs "head and tail frame continuation" technology to extend video length, but this can lead to a lack of coherence. Baidu aims to provide interactive and editable support to better meet creators' needs [4]. Group 3: Technical Challenges and Updates - Baidu's Steam Engine team has faced numerous technical challenges in achieving these advancements, including infrastructure upgrades and the introduction of Autoregressive Diffusion Models to eliminate training and inference biases and optimize consistency [4]. - Since the release of the Steam Engine model in July, it has maintained a significant update frequency on a monthly basis [4]. - Baidu is also planning an app for the Steam Engine, as revealed by Liu Lin, General Manager of Baidu's Commercial R&D [4].