Workflow
MoTok
icon
Search documents
可控性与自然度不再「二选一」!token砍到1/6,NTU+港中文实现动作越控制越自然
量子位· 2026-03-31 06:43
Core Viewpoint - The article discusses the limitations of existing methods in motion generation, highlighting the trade-off between control and naturalness, and introduces MoTok as a solution that effectively combines high-level semantic planning with low-level detail reconstruction [2][10]. Group 1: MoTok Overview - MoTok is a new paradigm for conditional motion generation that utilizes a diffusion-based discrete motion tokenizer, addressing the conflict between high-level planning and low-level control [2][4]. - The method significantly reduces the number of tokens required for motion generation to one-sixth of the state-of-the-art (SOTA) methods while improving motion quality, achieving an 89% reduction in trajectory error and a 65% decrease in Fréchet Inception Distance (FID) [2][5]. Group 2: Three-Stage Framework - MoTok proposes a Perception–Planning–Control framework, where the perception stage understands conditions, the planning stage organizes actions in a discrete token space, and the control stage reconstructs motion details using a diffusion-based decoder [4][16]. - This framework allows for flexible global and local condition injection, enabling adaptation to various input conditions and motion generation tasks [4][16]. Group 3: Token Compression and Quality - Traditional methods require a high number of tokens to retain both high-level semantics and low-level details, complicating downstream generation [5][6]. - MoTok's approach allows for a more efficient use of tokens, enhancing the planning phase and improving the overall quality of generated motions [6][7]. Group 4: Control Injection Strategy - MoTok addresses the conflict between joint trajectory conditions and text conditions by implementing a coarse-to-fine control injection strategy, where coarse constraints are applied during planning and fine-grained constraints during control [9][10]. - This separation allows for improved harmony between semantic planning and motion control, overcoming the limitations of existing methods [10][12]. Group 5: Experimental Validation - The article presents experimental results demonstrating the effectiveness of MoTok's dual-stage constraint injection, showing that retaining only coarse constraints in the planning phase leads to increased trajectory control error, while only applying fine-grained constraints in the control phase harms motion distribution [12][13]. - The results indicate that MoTok achieves better performance in both text-to-motion (T2M) and motion-to-text (M2T) tasks compared to traditional methods [7][8].