五年，终于等来Transformers v5

Core Insights - The article discusses the release of the first release candidate version v5.0.0rc0 of the Transformers library, marking a significant transition from version 4 to version 5 after a five-year technical cycle [2] - The library has seen a dramatic increase in usage, with daily downloads rising from 20,000 at the time of v4's release to over 3 million today, and total installations surpassing 1.2 billion [2] - The core focus of the v5 update is on simplicity, pre-training, interoperability with high-performance inference engines, and making quantization a core feature [2][3] Evolution and Features - The v5 version establishes PyTorch as the sole core backend and emphasizes four key dimensions of evolution: extreme simplicity, transition from fine-tuning to pre-training, interoperability with high-performance inference engines, and enhanced quantization capabilities [2] - The team aims for a clean and clear model integration approach, promoting broader standardization and stronger generality [4] - Over the past five years, an average of 1-3 new models has been added weekly, with the goal of becoming the only trusted source for model definitions [4] Modular Design and Tools - Hugging Face has advanced a modular design approach, simplifying maintenance and speeding up integration while fostering community collaboration [6] - The introduction of the AttentionInterface provides a centralized abstraction layer for attention mechanisms, streamlining the management of common auxiliary functions [8] - Tools are being developed to identify similarities between new models and existing architectures, aiming to automate the model conversion process into the Transformers format [9][10] Training Enhancements - The v5 version increases support for pre-training, with redesigned model initialization and support for forward and backward propagation optimization operators [15][16] - Hugging Face continues to collaborate closely with fine-tuning tools in the Python ecosystem and ensures compatibility with tools in the JAX ecosystem [17] Inference Improvements - Inference is a key focus of the v5 update, introducing dedicated kernels, cleaner default settings, new APIs, and optimized support for inference engines [18][19] - The v5 version aims to complement specialized inference engines rather than replace them, ensuring compatibility with engines like vLLM, SGLang, and TensorRT-LLM [21] Local Deployment and Quantization - The team collaborates with popular inference engines to allow Transformers to be used as a backend, enhancing the value of models added to Transformers [23] - Quantization is positioned as a core capability of Transformers, ensuring compatibility with major functionalities and providing a reliable framework for training and inference [27]