ScaleNet
Search documents
华为诺亚发布ScaleNet:模型放大通用新范式
机器之心· 2025-11-18 03:30
Core Insights - The article discusses the challenges of scaling models in the field of AI, particularly the high costs associated with training large-scale models and the need for efficient model expansion methods [2][3][4]. - The ScaleNet method is introduced as a solution that allows for effective model expansion while maintaining parameter efficiency, demonstrating significant performance improvements in both visual and language tasks [5][20]. Research Motivation - The high computational cost of training large-scale models has led researchers to explore methods like Progressive Training, which reuses weights from smaller models to initialize larger ones. However, these methods often introduce new independent parameters, increasing storage requirements and slowing down optimization [4]. Core Methodology - ScaleNet combines two key techniques: Layer-wise Weight Sharing and Lightweight Adapters [6][7]. - Layer-wise Weight Sharing allows new layers to share parameters with existing layers in the pre-trained model, enhancing parameter efficiency and accelerating the learning process [8]. - Lightweight Adapters are introduced for each shared layer to provide unique adjustments, ensuring that while knowledge is shared, each layer can still learn specialized functions, thus maintaining model capacity and performance [11]. Experimental Results and Analysis - In visual model evaluations, ScaleNet outperformed baseline methods in accuracy while maintaining similar parameter counts across various architectures, such as DeiT and Swin [14]. - For instance, ScaleNet achieved a Top-1 accuracy of 76.46% with 6.45 million parameters in the Deit-Tiny model, compared to 75.01% for the baseline [15]. - ScaleNet also demonstrated superior training efficiency, requiring only 100 epochs and 15.8 hours to reach an accuracy of 81.13% in the DeiT-Small model, compared to 300 epochs and 47.3 hours for direct training [16]. Generalization to Language Models - The research team applied ScaleNet to the Llama-3.2-1B language model, achieving an average performance improvement of 0.92% across various common-sense reasoning benchmarks, indicating its cross-modal applicability [17][18]. - The method also showed stable improvements in downstream visual tasks such as object detection and semantic segmentation, further confirming its generalization capabilities [19]. Conclusion - ScaleNet provides an efficient and cost-effective technical pathway for expanding pre-trained models, significantly enhancing training efficiency and model performance in both visual and language tasks. This work contributes to the development of larger, stronger, and more economical AI models, promoting sustainable growth in the AI field [20].