Core Insights - Baidu officially launched its native multimodal large model, Wenxin Model 5.0, at the 2025 Baidu World Conference on November 13, featuring a total parameter count of 2.4 trillion [2][4]. Group 1: Model Capabilities - Wenxin Model 5.0 utilizes a native multimodal unified modeling technology, supporting various forms of input and output, including text, images, audio, and video [2]. - The model has undergone significant upgrades in capabilities such as multimodal understanding, instruction adherence, creative writing, factual accuracy, agent planning, and tool application, demonstrating strong understanding, logic, memory, and persuasion skills [2]. - In over 40 authoritative benchmark evaluations, Wenxin Model 5.0's language and multimodal understanding capabilities are on par with models like Gemini-2.5-Pro and GPT-5-High, while its image and video generation capabilities are comparable to specialized models in vertical fields, achieving a global leading level [2]. Group 2: Technical Architecture - The model employs a unified autoregressive architecture for native multimodal modeling, integrating language, image, video, and audio data from the training phase, allowing for comprehensive feature fusion and optimization [4]. - Utilizing the PaddlePaddle deep learning framework, Wenxin Model 5.0 features a super-sparse mixture of experts architecture, with a total parameter scale exceeding 2.4 trillion and an activation parameter ratio below 3%, enhancing inference efficiency while maintaining model strength [4]. - The model incorporates a large-scale tool environment and utilizes long-term task trajectory data, employing end-to-end multi-round reinforcement learning based on thought and action chains, significantly improving the model's agent and tool invocation capabilities [4]. Group 3: Market Position and Accessibility - The Wenxin Model 5.0 Preview has been made available on the Wenxin App for users to experience directly, while developers and enterprise users can access the model's API services through Baidu's Qianfan large model platform [5]. - According to the latest rankings from LMArena on November 8, the Wenxin model ERNIE-5.0-Preview-1022 ranked second globally and first in China for text tasks, particularly excelling in creative writing and complex problem understanding [6].
百度发布原生全模态大模型文心5.0,李彦宏:持续推高智能天花板