让AI当「动作导演」：腾讯混元动作大模型开源，听懂模糊指令，生成高质量3D角色动画

Core Viewpoint - The article discusses the limitations in the 3D character animation creation field due to the scarcity of high-quality motion assets and presents Tencent's innovative solution, HY-Motion 1.0, which aims to revolutionize motion generation through advanced data processing and model training techniques [1][2]. Group 1: Core Technology - The HY-Motion 1.0 model is built on over 3000 hours of industrial-grade refined motion data, which is essential for supporting its 1 billion parameter performance [4]. - The data engine integrates various sources, including monocular video motion capture, optical motion capture, and expressive artist hand-keyed animation assets, ensuring a balance between model generalization and generation quality [6]. - A standardized data processing pipeline was established, which includes data cleaning, normalization, and a closed-loop labeling process using VLM and LLM to enhance the diversity of motion descriptions [6][7]. Group 2: Generation Pipeline - A specialized LLM Prompt Engineering module was designed to convert user prompts into structured action descriptions and precise timing, enhancing the controllability of generated motions [7][8]. - The model undergoes two-stage fine-tuning, first transforming vague multi-language instructions into structured English descriptions and then optimizing through reinforcement learning to improve semantic consistency and timing [7][13]. Group 3: Model Design - The core architecture of HY-Motion 1.0 combines Diffusion Transformer (DiT) with Flow Matching, employing a dual-stream to single-stream structure for effective multi-modal integration [10][12]. - Techniques such as "semantic pollution prevention" and "local constraints" are implemented to ensure logical consistency and physical continuity in long-sequence generation [12]. Group 4: Training Process - The training process includes large-scale pretraining on 3000 hours of data, high-quality fine-tuning with 400 hours of selected data, and reinforcement learning to enhance action realism and semantic alignment [15][16]. - The model achieved a SSAE score of 78.6%, significantly outperforming state-of-the-art models in terms of instruction adherence [17]. Group 5: Community Engagement and Applications - Since its open-source release, HY-Motion 1.0 has gained popularity among game developers, AI designers, and animators, who have integrated it into mainstream AI workflows like ComfyUI [26][27]. - The model provides a cost-effective AI solution for creators lacking expensive motion capture equipment, facilitating high-quality motion asset production [33][34].