两阶段架构编辑法

Search documents
李飞飞团队提出架构设计新思路!无需从头训练,直接“嫁接”预训练模型关键组件
量子位· 2025-06-20 05:53
Core Viewpoint - The article discusses the potential of using pre-trained models as a foundation for exploring new architecture designs, highlighting a method called "Grafting" that allows researchers to modify components of existing models to study new architectures efficiently [1][2][7]. Summary by Sections Introduction to Grafting - Researchers propose "Grafting" as a new approach to reduce the high costs associated with training models from scratch, allowing for efficient exploration of new architectures [2][7]. Focus on DiTs Model - The research centers on the DiTs model, widely used for image and video generation, where a testing platform was built to assess the impact of Grafting on model quality [4][5]. Results of Grafting - Many hybrid designs achieved performance comparable to the original model while utilizing less than 2% of the pre-training computational resources [5][22]. - The application of Grafting to the PixArt-Σ model resulted in a 1.43 times increase in generation speed, with a quality decrease of less than 2% [6][23]. Two-Stage Architecture Editing Method - Grafting employs a two-stage architecture editing method involving Activation Distillation and Lightweight Fine-tuning to modify the pre-trained DiTs [11][16]. Challenges in Implementation - Two main challenges are identified: initializing new operators before integration and mitigating error accumulation from multiple replacements [14][15]. Experimental Results - Three experiments were conducted: 1. **Hybrid Architecture Experiment**: Validated the feasibility of replacements, showing that a 50% replacement of attention layers resulted in only a slight increase in FID score [20]. 2. **Text-to-Image Generation Experiment**: Demonstrated the effectiveness of the new architecture with a significant speed improvement and minimal quality loss [23]. 3. **Parallelization Experiment**: Showed that restructuring the model into parallel blocks improved generation quality while reducing depth [25][26]. Limitations and Future Potential - The research is limited to the DiT-XL/2 model and specific replacement types, which may affect the generalizability of the findings [27]. - Despite limitations, Grafting shows significant potential for exploring new model architectures, especially in resource-constrained environments [28].