多模态预训练 - filings, earnings calls, financial reports, news

多模态预训练

Search documents

36氪· 2026-03-26 04:35

Core Viewpoint - The article highlights the significant contributions of female AI scientists, particularly focusing on Zeng Yan's rapid rise and achievements in the field of AI model development at ByteDance, emphasizing her role in the Seedance 2.0 project and the innovative approaches she has taken in video generation technology [5][12][46]. Group 1: Zeng Yan's Background and Achievements - Zeng Yan joined ByteDance AI Lab as a fresh graduate in September 2021, starting as an algorithm engineer and quickly published a paper on the X-VLM model, which aligns visual concepts with text [19][24]. - Over the next two years, she published eight papers in top international conferences and served as a reviewer for several prestigious journals [29]. - In 2023, she transitioned to the newly established Seed model research department, where her expertise in multi-modal pre-training became crucial [31][33]. Group 2: Key Projects and Innovations - Zeng Yan led two significant projects, CCLM and Lynx, which focused on cross-language and cross-modal understanding, allowing models trained on English data to perform tasks in other languages [36][39]. - The PixelDance project, which addresses the balance between dynamic and stable video generation, marked a pivotal moment in her career, leading to her rapid promotion within the company [41][46]. - Seedance 2.0, which she was instrumental in developing, features a dual-branch diffusion transformer architecture that allows for simultaneous video and audio generation, enhancing synchronization and overall quality [53][56]. Group 3: Technical Breakthroughs and Model Performance - The Seedance 2.0 model can generate a 1-minute 2K video in 60 seconds, a 30% improvement over its predecessor, showcasing the efficiency of Zeng Yan's team's optimizations [62]. - The model incorporates a cross-branch calibration module to ensure alignment between video and audio, establishing a foundational understanding of multi-modal relationships during the pre-training phase [59][61]. - The advancements in Seedance 2.0 also include multi-shot narrative capabilities, allowing the model to understand and implement professional cinematographic techniques [64]. Group 4: Comparison with Other Female AI Scientists - Zeng Yan and Luo Fuli, both prominent female figures in AI, share a focus on finding balance in their respective projects, with Zeng Yan emphasizing dynamic stability in video generation and Luo Fuli achieving cost efficiency in model performance [66][72]. - Their career paths differ, with Zeng Yan advancing within ByteDance and Luo Fuli transitioning between various companies, highlighting the diverse opportunities available in the AI field [73][75].

多模态预训练

多粒度对齐

Artificial Intelligence

Artificial Intelligence

Seedance 2.0

X-VLM

CCLM