编码器 - 解码器架构
Search documents
T5Gemma模型再更新,谷歌还在坚持编码器-解码器架构
机器之心· 2025-12-19 03:42
从模型名称可以看出,T5Gemma 系列模型与 T5 息息相关。T5(Text-to-Text Transfer Transformer) 是 Google 在 2019 年提出的一种 编码器 - 解码器(Encoder– Decoder)大模型框架 ,「编解码器大模型」的思想源头,几乎都能追溯到 T5。 T5Gemma 使用了 「适应(adaptation)」 技术将已经完成预训练的仅解码器模型转换为编码器 - 解码器架构。 编辑|冷猫 最近,或许是年底了,谷歌的发布变得有些密集。比如昨天,谷歌发布了在智能 / 成本上全球性价比最高的模型 Gemini 3 Flash 。 在 Gemini 3 Flash 发布后,大家都以为谷歌今年的模型发布已经收官的时候,谷歌却又掏出了一个让大家都意想不到的模型更新: T5Gem ma 2 。 T5Gemma 系列模型似乎没能给大众留下什么深刻印象。今年 7 月,谷歌第一次发布了 T5Gemma 模型系列, 并且一口气发布了 32 个模型 。 但遗憾的是,「编码器 - 解码器架构」始终没有成为大模型世界的主流,在「仅解码器」大语言模型快速迭代的大背景下难逃逐渐被边缘化的命 ...
编码器-解码器架构的复兴?谷歌一口气发布32个T5Gemma模型
机器之心· 2025-07-10 08:35
Core Viewpoint - The article discusses the launch of Google's T5Gemma model, highlighting its potential to revitalize the encoder-decoder architecture in the context of large language models (LLMs) and its competitive performance compared to existing models [1][12]. Group 1: Model Launch and Features - Elon Musk announced the release of Grok 4 model, attracting attention from the AI community, while Google continued to update its Gemma series models [1][2]. - Google introduced a series of multimodal models for health AI development, including MedGemma 4B and 27B, which assist in diagnosis and provide medical advice [3][4]. - The T5Gemma model, based on the Gemma 2 framework, utilizes an adaptation technique to convert pre-trained decoder-only models into encoder-decoder architectures, offering various configurations and sizes [5][8][9]. Group 2: Performance and Efficiency - T5Gemma's performance is comparable to or superior to the decoder-only Gemma models, dominating the quality-efficiency trade-off in multiple benchmark tests [21][24]. - In practical applications, T5Gemma demonstrated significant advantages in tasks like GSM8K, achieving higher accuracy with similar or lower latency compared to smaller models [22][23]. - The flexibility of the adaptation method allows for the combination of different model sizes, enabling tailored solutions for specific tasks [18][19]. Group 3: Research and Development Insights - Google explored the feasibility of building top-tier encoder-decoder models from pre-trained decoder models, leading to promising results in complex reasoning tasks [15][28]. - The T5Gemma model showed substantial improvements in various benchmarks, indicating its potential to create more powerful foundational models [28][31]. - The article suggests that the advancements in T5Gemma could lead to a resurgence of encoder-decoder models in the LLM era [12][33].