Core Viewpoint - Tencent's Hunyuan TurboS model ranks 7th globally in the latest Chatbot Arena evaluation, showcasing its advanced capabilities and innovative architecture [1][2]. Group 1: Model Architecture and Innovations - Hunyuan TurboS employs a hybrid Transformer-Mamba architecture, achieving a balance between performance and efficiency through the integration of Mamba's long-sequence processing and Transformer’s contextual understanding [2][7]. - The model features 128 layers and utilizes an innovative "AMF" (Attention → Mamba2 → FFN) and "MF" (Mamba2 → FFN) interleaved module pattern, maintaining high computational efficiency while having a total of 560 billion parameters [7][14]. - An adaptive long-short thinking chain mechanism allows the model to dynamically switch between quick response and deep thinking modes based on problem complexity, optimizing resource allocation [2][7]. Group 2: Training and Evaluation - The model was trained on a dataset comprising 16 trillion tokens, significantly enhancing its performance compared to previous iterations [10][13]. - Hunyuan TurboS achieved an overall score of 1356 in the LMSYS Chatbot Arena, ranking it among the top 7 out of 239 models evaluated [2][49]. - The model demonstrated strong performance across various benchmarks, particularly excelling in multi-task capabilities and multilingual support, ranking first in Chinese, French, and Spanish [4][42]. Group 3: Post-Training Strategies - The post-training process includes four key modules: Supervised Fine-Tuning (SFT), Adaptive Long-short CoT Fusion, Multi-round Deliberation Learning, and Two-stage Large-scale Reinforcement Learning [8][22]. - SFT data was meticulously curated across multiple themes, ensuring high-quality samples for training [24][26]. - The adaptive long-short CoT fusion method allows the model to choose between long and short reasoning chains based on the complexity of the task, enhancing its reasoning capabilities [26][29]. Group 4: Performance Metrics - Hunyuan TurboS outperformed many leading models in key areas such as mathematical reasoning, logic reasoning, and knowledge-intensive tasks, particularly in Chinese evaluations [41][42]. - The model achieved a cost-effective output generation, using only 52.8% of the tokens compared to similar models while maintaining performance [43][45]. - The model's architecture and training optimizations resulted in a 1.8x acceleration in inference compared to pure Transformer MoE models [47].
腾讯混元TurboS技术报告首次全公开:560B参数混合Mamba架构,自适应长短链融合