2.4万亿参数原生全模态大模型 文心5.0正式版上线
Sou Hu Cai Jing·2026-01-22 18:45

Core Insights - Baidu has officially launched the native multimodal large model Wenxin 5.0, featuring 2.4 trillion parameters and supporting various forms of input and output, including text, images, audio, and video [1] - Wenxin 5.0 has surpassed models like Gemini-2.5-Pro and GPT-5-High in over 40 authoritative benchmark evaluations, establishing itself in the top tier globally [1] Technical Advancements - Unlike most industry models that use "post-fusion" multimodal approaches, Wenxin 5.0 employs a unified autoregressive architecture for native multimodal modeling, allowing for joint training of multiple data types [2] - The model utilizes a large-scale mixture of experts structure with an activation parameter rate below 3%, enhancing inference efficiency while maintaining strong capabilities [2] - Significant breakthroughs in multimodal understanding, coding, and creative writing have been achieved, exemplified by the model's ability to generate executable front-end code and simulate classical literary styles [2] Industry Impact - The Wenxin Mentor program has expanded to include 835 experts from various sectors, contributing to the model's improvement in logical rigor, professional depth, and creative quality [3] - The launch of Wenxin 5.0 signifies the maturation and practicality of native multimodal technology, enhancing China's competitive edge in the global AI industry [3] - As of January 15, Wenxin 5.0 ranked first in the domestic text leaderboard and eighth globally in the LMArena, outperforming several mainstream models [3]