画到哪，动到哪！字节跳动发布视频生成「神笔马良」ATI，已开源！

Core Viewpoint - The article discusses the development of ATI, a new controllable video generation framework by ByteDance, which allows users to create dynamic videos by drawing trajectories on static images, transforming user input into explicit control signals for object and camera movements [2][4]. Group 1: Introduction to ATI - Angtian Wang, a researcher at ByteDance, focuses on video generation and 3D vision, highlighting the advancements in video generation tasks due to diffusion models and transformer architectures [1]. - The current mainstream methods face a significant bottleneck in providing effective and intuitive motion control for users, limiting creative expression and practical application [2]. Group 2: Methodology of ATI - ATI accepts two basic inputs: a static image and a set of user-drawn trajectories, which can be any shape, including lines and curves [6]. - The Gaussian Motion Injector encodes these trajectories into motion vectors in latent space, guiding the video generation process frame by frame [6][14]. - The model uses Gaussian weights to ensure that it can "see" the drawn trajectories and understand their relation to the generated video [8][14]. Group 3: Features and Capabilities - Users can draw trajectories for key actions like running or jumping, with ATI accurately sampling and encoding joint movements to generate natural motion sequences [19]. - ATI can handle up to 8 independent trajectories simultaneously, ensuring that object identities remain distinct during complex interactions [21]. - The system allows for synchronized camera movements, enabling users to create dynamic videos with cinematic techniques like panning and tilting [23][25]. Group 4: Performance and Applications - ATI demonstrates strong cross-domain generalization, supporting various artistic styles such as realistic films, cartoons, and watercolor renderings [28]. - Users can create non-realistic motion effects, such as flying or stretching, providing creative possibilities for sci-fi or fantasy scenes [29]. - The high-precision model based on Wan2.1-I2V-14B can generate videos comparable to real footage, while a lightweight version is available for real-time interactions in resource-constrained environments [30]. Group 5: Open Source and Community - The Wan2.1-I2V-14B model version of ATI has been open-sourced on Hugging Face, facilitating high-quality, controllable video generation for researchers and developers [32]. - Community support is growing, with tools like ComfyUI-WanVideoWrapper available to optimize model performance on consumer-grade GPUs [32].