模型参数达2.4万亿,百度发布文心大模型5.0
Guo Ji Jin Rong Bao·2026-01-22 09:53

Core Insights - Baidu has launched the official version of its native multimodal large model, Wenxin 5.0, which features 2.4 trillion parameters and supports various forms of input and output, including text, images, audio, and video [1] Group 1: Model Technology - Wenxin 5.0 utilizes a unified autoregressive architecture for native multimodal modeling, allowing for joint training of multiple data types within a single framework, enhancing the integration and optimization of multimodal features [1] - The model employs a large-scale mixture of experts (MoE) structure with ultra-sparse activation parameters, activating less than 3% of parameters, which improves inference efficiency while maintaining strong model capabilities [1] - The model's training incorporates long-term task trajectory data and end-to-end reinforcement learning based on thinking and action chains, significantly enhancing its agent and tool invocation capabilities [1] Group 2: Application Development - Baidu has developed matrix and specialized models based on the Wenxin foundational model, with matrix models aimed at product-level applications and general scenarios, while specialized models target industry applications and vertical scenarios [4] - Three technological breakthroughs were highlighted: an end-to-end synthesis model based on voice tokens, a technology that surpasses human live streaming in 5 minutes, and real-time interactive digital human technology [4][5] - The real-time interactive digital human technology employs a three-state token linkage architecture for streaming control of text, voice, and video, allowing for automatic action transitions and low-latency, high-expressiveness video output [5] Group 3: Platform and Agent Development - Baidu's Qianfan platform has developed over 1.3 million agents, with daily tool invocation exceeding ten million times, showcasing the platform's extensive application and integration capabilities [6] - The platform offers comprehensive support for enterprise operations across multiple scenarios, significantly lowering the innovation threshold for enterprise agents [5][6]