百度发布文心大模型5.0，李彦宏称将持续推高智能天花板

Core Insights - Baidu officially launched its native multimodal large model, Wenxin 5.0, at the 2025 Baidu World Conference, featuring 2.4 trillion parameters and supporting various forms of input and output including text, images, audio, and video [1][4]. Group 1: Model Capabilities - Wenxin 5.0 has undergone comprehensive upgrades in multimodal understanding, instruction adherence, creative writing, factuality, agent planning, and tool application, demonstrating strong understanding, logic, memory, and persuasion capabilities [4]. - In over 40 authoritative benchmark evaluations, Wenxin 5.0's language and multimodal understanding capabilities are on par with models like Gemini-2.5-Pro and GPT-5-High, while its image and video generation capabilities are comparable to specialized models in vertical fields, achieving a globally leading level [4]. Group 2: Technical Innovations - Unlike most industry multimodal models that use post-fusion methods, Wenxin 5.0 employs a unified autoregressive architecture for native multimodal modeling, integrating language, image, video, and audio data from the training phase [6]. - The model utilizes the PaddlePaddle deep learning framework and features a super-sparse mixture of experts architecture, with a total parameter scale exceeding 2.4 trillion and an activation parameter ratio below 3%, enhancing inference efficiency while maintaining strong capabilities [6]. Group 3: Market Position and Accessibility - The Wenxin 5.0 Preview is now available on the Wenxin App for users to experience directly, while developers and enterprise users can access the Wenxin 5.0 API services through Baidu's Qianfan large model platform [6]. - As of November 8, the latest rankings from LMArena show that the Wenxin model ERNIE-5.0-Preview-1022 ranks second globally in text task evaluations and first in China, particularly excelling in creative writing and complex problem understanding [6].