大模型开发计算
Search documents
瘦身不降智!大模型训推效率提升30%,京东大模型开发计算研究登Nature旗下期刊
量子位· 2025-05-21 04:01
Core Insights - The article discusses a groundbreaking research by JD's Exploration Research Institute on large models, which has been published in a Nature journal, focusing on a system that trains and updates large models in open environments while collaborating with smaller models [1][2]. Group 1: Innovations and Efficiency - The research introduces four innovative methods that enhance inference efficiency by an average of 30% and reduce training costs by 70% [8]. - The four innovations include model distillation, data governance, training optimization, and cloud-edge collaboration [1][11]. - Model distillation employs dynamic hierarchical distillation technology, achieving efficient training in low-resource scenarios by adjusting only 0.5% of parameters, thus lowering deployment costs for large models [5][11]. Group 2: Practical Applications and Solutions - JD's large model development technology supports enterprises in model training and production, transforming bulky AI models into efficient smaller models without losing intelligence [3][4]. - The JoyBuild platform offers customized solutions for large model development and industry applications, enabling rapid transformation of general models into specialized models tailored to business needs [10][12]. - The platform can complete the entire process from data preparation to model deployment in less than a week, significantly reducing the required workforce from over 10 scientists to just 1-2 algorithm personnel, and saving 90% on inference costs [10]. Group 3: Data Governance and Optimization - The data governance method involves cross-domain dynamic sampling algorithms that automatically mix data from different fields while incorporating privacy protection and active learning techniques to enhance the generalization ability of large models [11]. - Training optimization utilizes a Bayesian optimization framework for hyperparameter tuning and architecture search, improving resource utilization by 40% in MPMD scenarios [11]. Group 4: Future Prospects - JD aims to further enhance the efficiency of large model development and computation, enabling both small and large enterprises to build proprietary AI applications at low costs and drive the large-scale application of AI [12].