Workflow
理想同学MindGPT 3.1
icon
Search documents
和理想基座模型负责人交流我之前说的对理想有帮助的字节论文
理想TOP2· 2025-09-17 05:01
Core Viewpoint - Both Li Auto and ByteDance independently discovered a fundamental issue in the exploration of agents, leading to similar solutions and effects based on their respective business characteristics [2][4]. Group 1: Solutions and Algorithms - Li Auto's approach is more focused on efficient and practical engineering solutions, while ByteDance's method is supported by more formal and comprehensive mathematical theorems, considering all possible scenarios [3][27]. - Li Auto proposed the AWE algorithm, while ByteDance introduced the Entropy-Modulated Policy Gradients (EMPG) framework, which consists of two components: Self-Calibrating Gradient Scaling and Future Clarity Bonus [4][10]. - AWE focuses on supervised fine-tuning (SFT) within token-level adjustments, whereas EMPG emphasizes reinforcement learning (RL) at the step level, both addressing gradient issues caused by uncertainty [4][27]. Group 2: Key Components of Algorithms - AWE is designed to dynamically adjust the influence of each token on model parameter updates, allowing the model to learn easier tokens first before tackling more difficult ones [9]. - Self-Calibrating Gradient Scaling in the EMPG framework directly intervenes and calibrates the strength of learning signals based on the model's confidence in its actions [10]. - Future Clarity Bonus serves as an internal reward mechanism, guiding agents to choose paths that lead to clearer future states, thus enhancing learning efficiency [11]. Group 3: Insights on Learning Dynamics - The core insight from both companies is that there exists an undesirable coupling between the strength of learning signals (gradients) and the model's uncertainty state (entropy) [24][25]. - The EMPG framework focuses on the uncertainty at the step level, while AWE emphasizes the token level, with both approaches utilizing the model's internal feedback signals to guide training [27][28]. - Li Auto's AWE primarily addresses gradient size, while EMPG tackles both gradient size and credit assignment issues [6][27].