Assembly of Experts (AoE)技术

Search documents
野生DeepSeek火了,速度碾压官方版,权重开源
机器之心· 2025-07-04 08:59
Core Viewpoint - The article discusses the emergence of the "DeepSeek R1T2" model, which is faster and performs better than its predecessor R1, while also being an open-source model developed by TNG, a German AI consulting company [1][5][3]. Technical Aspects - The R1T2 model utilizes the Assembly of Experts (AoE) technology and integrates three major models: DeepSeek V3, R1, and R1-0528 [2]. - It is built on the DeepSeek-MoE Transformer architecture with a parameter scale of 671 billion [13]. - The model represents the first iteration of the initial model "R1T Chimera," upgraded to a Tri-Mind fusion architecture, incorporating the R1-0528 base model [14]. Performance Comparison - R1T2 is reported to be 200% faster than R1-0528 and 20% faster than R1, with improved performance in GPQA Diamond and AIME 24 benchmarks compared to R1, but not reaching the level of R1-0528 [1][18]. - R1T2 is positioned as an ideal replacement for R1, offering better performance while being more economical than R1-0528 [18]. - Compared to R1T, R1T2 is generally recommended unless specific personality traits of R1T are required [18]. Limitations - R1T2 has certain limitations, such as not supporting function calls due to the influence of the R1 base model, which may be addressed in future versions [20]. - It has a significantly higher response consistency than R1T but is still lower than R1-0528 [20].