Workflow
Pre-Training
icon
Search documents
Mid-Training 会成为未来的 Pre-Training 吗?
机器之心· 2025-11-23 01:30
Group 1: Core Concepts of Mid-Training - The concept of "Mid-Training" is emerging as a potential new phase in the training of large language models (LLMs), positioned between pre-training and post-training, with OpenAI establishing a dedicated department for it in July 2024 [5][6][7] - Mid-Training is described as a vital stage that enhances specific capabilities of LLMs, such as mathematics, programming, reasoning, and long-context extension, while maintaining the foundational abilities of the model [9][10] - The definition and implementation of Mid-Training are still not universally agreed upon, with various organizations exploring its effects and mechanisms, indicating a growing interest in this area [8][11] Group 2: Technical Insights and Strategies - Research from Peking University and Meituan has attempted to clarify the definition of Mid-Training, focusing on data management, training strategies, and model architecture optimization [8][10] - Key optimization strategies for Mid-Training include data curation to enhance data quality, training strategies like learning rate annealing and context extension, and architecture optimization to improve model performance [10] - The exploration of Mid-Training has gained momentum since 2025, with increasing references in research papers from institutions like Microsoft and Zero One [6][7]