大模型硝烟再起，DeepSeek、通义千问、Google、OpenAI先后迎来更新

Group 1 - DeepSeek V3 model has completed a minor version upgrade, with improvements in post-training methods, maintaining the same base model, and featuring approximately 660 billion parameters. The open-source version supports a context length of 128K, while the web, app, and API versions provide 64K context [2][13][14] - The upgraded capabilities of DeepSeek V3 include enhanced performance in reasoning tasks, improved front-end development capabilities, optimized Chinese writing, and better Chinese search abilities. The model has shown significant improvements in tool usage, role-playing, and conversational Q&A [14][15] Group 2 - Tongyi Qwen 2.5-Omni-7B has been officially open-sourced, showcasing excellent multi-modal performance. It can process text, images, audio, and video inputs simultaneously, generating text and natural speech outputs in real-time. The model is designed for easy deployment on smart hardware like mobile devices [3][19][20] - The Qwen 2.5-Omni model utilizes a unique Thinker-Talker dual-core architecture, allowing for efficient real-time semantic understanding and speech generation. It supports various input forms and can recognize emotions through audio and video [19][20] Group 3 - Google has released the Gemini 2.5 Pro model, which supports native multi-modal capabilities and has shown improvements in both the base model and post-training techniques. It excels in reasoning, mathematics, science, and programming benchmarks, with a context window of 1 million tokens [4][24] - Gemini 2.5 Pro can handle complex problems from various information sources, including text, audio, images, and video. It is currently available for Gemini Advanced paid users and will be launched on Google AI Studio [24][25] Group 4 - OpenAI has introduced the GPT-4o image generation model, which focuses on creating images and is available across various subscription tiers. The pricing has been reduced by 50% compared to GPT-4 Turbo, making it more accessible [30][31] - GPT-4o features significant improvements, including better text integration, enhanced context understanding, improved multi-object binding, and diverse style adaptation capabilities. It allows users to generate detailed images by simply describing their requirements [31][32]