百度发布原生全模态大模型文心5.0 李彦宏：持续推高智能天花板

Core Insights - Baidu officially launched the Wenxin 5.0 model at the 2025 Baidu World Conference, featuring a parameter count of 2.4 trillion and utilizing a native multimodal unified modeling technology that supports various forms of input and output, including text, images, audio, and video [1][3] Group 1: Model Capabilities - Wenxin 5.0 exhibits significant advancements in multimodal understanding, instruction adherence, creative writing, factual accuracy, agent planning, and tool application, demonstrating strong capabilities in understanding, logic, memory, and persuasion [1][3] - In over 40 authoritative benchmark evaluations, Wenxin 5.0's language and multimodal understanding capabilities are on par with models like Gemini-2.5-Pro and GPT-5-High, while its image and video generation capabilities are comparable to specialized models in vertical fields, confirming its global leading status [1][3] Group 2: Technical Innovations - The model employs a unified autoregressive architecture for native multimodal modeling, integrating language, image, video, and audio data from the training phase, allowing for comprehensive feature fusion and optimization [3] - Utilizing the PaddlePaddle deep learning framework, Wenxin 5.0 features a super-sparse mixture of experts architecture, with a total parameter scale exceeding 2.4 trillion and an activation parameter ratio below 3%, enhancing inference efficiency while maintaining robust capabilities [3] Group 3: Market Position and Accessibility - The Wenxin 5.0 Preview is now available on the Wenxin App for users to experience directly, while developers and enterprise users can access the model through Baidu's Qianfan large model platform [4] - As of November 8, the LMArena large model arena ranked Wenxin model ERNIE-5.0-Preview-1022 as the second globally and first in China for text tasks, particularly excelling in creative writing and complex problem understanding [4]