百度大模型“上新”

Core Insights - Baidu has officially launched the Wenxin 5.0 model, which features a parameter count of 2.4 trillion and utilizes a native multimodal modeling architecture, enabling comprehensive understanding and generation across text, images, audio, and video [2][3] Group 1: Model Performance - Wenxin 5.0 surpasses models like Gemini-2.5-Pro and GPT-5-High in language and multimodal understanding, positioning itself in the top tier globally [2] - The model's image and video generation capabilities are comparable to specialized models in vertical fields, indicating a leading position in the global market [2] Group 2: Technical Innovations - Unlike many industry models that use post-fusion multimodal approaches, Wenxin 5.0 employs a unified autoregressive architecture for native multimodal modeling, allowing for joint training of diverse data types [2] - The model incorporates a large-scale mixed expert structure with ultra-sparse activation parameters, activating less than 3% of parameters, which enhances inference efficiency while maintaining robust capabilities [3] Group 3: Development and Community Engagement - The "Wenxin Mentor" program has expanded to include 835 experts from various sectors such as technology, finance, culture, education, and healthcare, providing guidance on logic, depth, creativity, and value alignment for the model [3][4] - Wenxin 5.0 has achieved significant recognition, ranking first in domestic text and visual understanding categories on the LMArena global model competition platform, with a score of 1460, surpassing models like GPT-5.1-High [4]