Core Insights - Baidu has officially launched the native multimodal large model Wenxin 5.0, featuring 2.4 trillion parameters and supporting various input and output formats including text, images, audio, and video [1][2] - The model employs a unified autoregressive architecture for native multimodal modeling, allowing for joint training of multiple data types, which enhances the integration and optimization of multimodal features [1] - Wenxin 5.0 has achieved significant breakthroughs in multimodal understanding, coding, and creative writing capabilities, showcasing its advanced intelligence and tool utilization [1][2] Technical Features - The model utilizes a large-scale mixture of experts structure with ultra-sparse activation parameters, maintaining strong performance while improving inference efficiency [1] - It incorporates end-to-end multi-round reinforcement learning training based on long-term task trajectory data, significantly enhancing the model's agent and tool calling capabilities [1] Market Position - Wenxin 5.0's launch signifies the maturation and practicality of native multimodal technology, reflecting the independent innovation capabilities of Chinese model manufacturers in the global AI industry [2] - As of January 15, Wenxin 5.0 ranked first in China and eighth globally in the LMArena text leaderboard, surpassing several mainstream models including GPT-5.1-High and Gemini-2.5-Pro [3]
2.4万亿参数原生全模态大模型 百度文心5.0正式版上线