Workflow
编码器 - 解码器架构
icon
Search documents
T5Gemma模型再更新,谷歌还在坚持编码器-解码器架构
机器之心· 2025-12-19 03:42
Core Viewpoint - Google has recently intensified its model releases, introducing the Gemini 3 Flash and the unexpected T5Gemma 2, which builds on the capabilities of the Gemini 3 series [1][3]. Group 1: T5Gemma 2 Overview - T5Gemma 2 is a new generation encoder-decoder model that is the first to support multi-modal and long-context capabilities, built on the robust features of Gemini 3 [9]. - The model offers three pre-trained scales: 270M-270M, 1B-1B, and 4B-4B, and is the first high-performance encoder-decoder model in the community to support ultra-long contexts of up to 128K tokens [9][11]. Group 2: Innovations and Upgrades - T5Gemma 2 continues the adaptation training approach of T5Gemma, converting a pre-trained decoder model into an encoder-decoder model, while leveraging key innovations from Gemini 3 to extend into the visual-language model domain [13]. - Significant architectural innovations include: 1. Shared word embeddings between the encoder and decoder, reducing overall parameter count and allowing for more effective capabilities within the same memory footprint [15]. 2. Merging self-attention and cross-attention into a unified attention layer, enhancing model parallelization efficiency and inference performance [16] [15]. Group 3: Model Capabilities - T5Gemma 2 demonstrates significant upgrades in capabilities: 1. Multi-modal capability, enabling the model to understand and process both images and text, facilitating tasks like visual question answering and multi-modal reasoning [17]. 2. Extended context support, with the ability to handle context windows of up to 128K tokens through a local-global alternating attention mechanism [18]. 3. Large-scale multilingual support, capable of operating in over 140 languages due to training on larger and more diverse datasets [19]. Group 4: Performance Results - T5Gemma 2 sets a new standard for compact encoder-decoder models, showing outstanding performance in key capability areas and inheriting the powerful multi-modal and long-context features of Gemini 3 [21]. - In benchmark tests, T5Gemma 2 outperforms both Gemini 3 and T5Gemma in multi-modal performance, long-context capability, and overall general capabilities across various tasks [25][29].
编码器-解码器架构的复兴?谷歌一口气发布32个T5Gemma模型
机器之心· 2025-07-10 08:35
Core Viewpoint - The article discusses the launch of Google's T5Gemma model, highlighting its potential to revitalize the encoder-decoder architecture in the context of large language models (LLMs) and its competitive performance compared to existing models [1][12]. Group 1: Model Launch and Features - Elon Musk announced the release of Grok 4 model, attracting attention from the AI community, while Google continued to update its Gemma series models [1][2]. - Google introduced a series of multimodal models for health AI development, including MedGemma 4B and 27B, which assist in diagnosis and provide medical advice [3][4]. - The T5Gemma model, based on the Gemma 2 framework, utilizes an adaptation technique to convert pre-trained decoder-only models into encoder-decoder architectures, offering various configurations and sizes [5][8][9]. Group 2: Performance and Efficiency - T5Gemma's performance is comparable to or superior to the decoder-only Gemma models, dominating the quality-efficiency trade-off in multiple benchmark tests [21][24]. - In practical applications, T5Gemma demonstrated significant advantages in tasks like GSM8K, achieving higher accuracy with similar or lower latency compared to smaller models [22][23]. - The flexibility of the adaptation method allows for the combination of different model sizes, enabling tailored solutions for specific tasks [18][19]. Group 3: Research and Development Insights - Google explored the feasibility of building top-tier encoder-decoder models from pre-trained decoder models, leading to promising results in complex reasoning tasks [15][28]. - The T5Gemma model showed substantial improvements in various benchmarks, indicating its potential to create more powerful foundational models [28][31]. - The article suggests that the advancements in T5Gemma could lead to a resurgence of encoder-decoder models in the LLM era [12][33].