百度“文心5.0”正式版发布，两名年轻技术骨干公开亮相

Core Insights - Baidu has launched the official version of its native multimodal large model, Wenxin 5.0, which features 2.4 trillion parameters and utilizes a unified modeling technology for multimodal understanding and generation [1][4] - The model supports various types of information inputs and outputs, including text, images, audio, and video, positioning Baidu at the forefront of the global AI landscape [1][3] Model Architecture and Performance - Wenxin 5.0 employs a unified autoregressive architecture for native multimodal modeling, allowing for joint training of multiple data types within a single framework, enhancing feature integration and optimization [4] - The model's language and multimodal understanding capabilities have surpassed those of competitors like Gemini-2.5-Pro and GPT-5-High, securing a leading position in international evaluations [3] Technical Innovations - The model incorporates a large-scale mixture of experts structure with ultra-sparse activation parameters, maintaining strong performance while improving inference efficiency [8] - Baidu has developed several application models, including a voice token-based end-to-end synthesis model and real-time interactive digital human technology, which enhance user engagement and application versatility [12][11] Application and Industry Impact - Baidu's digital human generation technology has been successfully applied in live streaming and e-commerce, demonstrating the practical value of its models in real-world scenarios [13] - The Qianfan platform supports the deployment of Wenxin 5.0 and over 150 SOAT models, providing a comprehensive environment for enterprises to innovate and operate efficiently [15] Strategic Vision - Baidu aims to create a closed-loop ecosystem that integrates chips, intelligent cloud platforms, and models to empower various AI applications across industries, reflecting its commitment to advancing AI solutions [15]