后训练范式
Search documents
让大模型不再过度思考!上海AI Lab后训练新范式重塑CoT,推理又快又好
量子位· 2025-12-21 02:00
Core Viewpoint - The article discusses the introduction of a new post-training paradigm called RePro (Rectifying Process-level Reward) aimed at improving the reasoning efficiency of large language models (LLMs) by addressing the issue of "overthinking" during inference [2][30]. Group 1: RePro Overview - RePro views the reasoning process as an optimization of the model's internal state, providing a fresh perspective on reshaping the Chain-of-Thought (CoT) in large models [3]. - The core idea of RePro is to treat the model's reasoning trajectory as a path to find the optimal solution on a loss surface [3]. Group 2: Correction Mechanisms - RePro incorporates a process reward mechanism directly into reinforcement learning with value regression (RLVR) processes like PPO and GRPO [4]. - It features a computable objective function J that quantifies the model's confidence in its current reasoning context, with higher values indicating greater confidence in the correctness of the answer [5][6]. Group 3: Reasoning Quality Assessment - RePro introduces a dual scoring mechanism to evaluate reasoning quality based on the growth rate and smoothness of the objective function J [10]. - The Magnitude Score measures the improvement in the objective function, while the Stability Score assesses whether the reasoning process is smooth or filled with hesitation [11][13]. Group 4: Integration into RL Training - RePro employs an entropy filtering strategy to reduce computational costs by segmenting the reasoning chain into logical paragraphs and selecting only the top-k segments for reward calculation [18][20]. - The process-level reward is calculated based on the improvement in the process score, which is combined with the final correctness to serve as the advantage function input for reinforcement learning [21][22]. Group 5: Experimental Results - RePro has been tested across various tasks, showing stable improvements in accuracy across different RL algorithms, including PPO and GRPO [23]. - The model demonstrated a significant reduction in the average number of tokens generated during reasoning, indicating a more efficient inference process [25][27]. - Instances of backtracking behavior during reasoning were significantly reduced, showcasing improved logical flow in the model's thought process [28].
肖仰华教授:具身智能距离“涌现”还有多远?|Al&Society百人百问
腾讯研究院· 2025-06-27 06:59
Core Viewpoint - The article discusses the transformative impact of generative AI and embodied intelligence on technology, business, and society, emphasizing the need for a multi-faceted exploration of AI's opportunities and challenges [1]. Group 1: AI Development Trends - The development of AI in recent years has followed two clear trajectories: generative AI (AIGC) and embodied intelligence [5][9]. - Generative AI aims to equip machines with human-like cognitive abilities, while embodied intelligence focuses on enabling machines to mimic human sensory and action capabilities [10][11]. - The current AI landscape highlights the importance of data quality and training strategies over sheer data volume and computational power [6][19]. Group 2: Embodied Intelligence - The next phase of embodied intelligence is expected to involve mind-body coordination, reflecting the philosophical inquiry into how human-level intelligence arises [6][11]. - The application of embodied intelligence in consumer markets hinges on the machine's ability to empathize and understand human emotional needs [6][10]. - There is a significant gap in the data required for embodied intelligence to reach its potential, with current datasets lacking the scale necessary for generalization [7][24]. Group 3: AI as a Technological Revolution - Generative AI is characterized as a technological revolution based on three criteria: foundational nature, exponential productivity enhancement, and profound societal impact [13][14]. - The societal implications of AI's cognitive capabilities are vast, potentially affecting all human activities and leading to concerns about cognitive laziness among humans [14][16]. - In contrast, the impact of embodied intelligence on productivity is seen as limited compared to the cognitive advancements of generative AI [15][16]. Group 4: Data and Model Relationships - The relationship between model algorithms and data is crucial, with algorithms determining the lower limit of model performance and data defining the upper limit [20][21]. - The current focus in AI development is on enhancing data quality and training strategies, particularly in the context of embodied intelligence [19][22]. - The industry faces challenges in data acquisition for embodied intelligence, necessitating innovative approaches to data collection and synthesis [25][26]. Group 5: Future Directions - To overcome the data scarcity in embodied intelligence, strategies such as leveraging real, simulated, and synthetic data are being explored [25][26]. - The development of wearable devices capable of capturing real-world actions could provide a substantial data foundation for embodied intelligence [26]. - The complexity of human experience and environmental interaction presents significant challenges for the data-driven advancement of embodied intelligence [34][35].