深度思考模式
Search documents
R1模型发布一周年 DeepSeek新模型“MODEL1”曝光
Xin Lang Cai Jing· 2026-01-21 04:05
Core Insights - DeepSeek has unveiled a new model architecture named "MODEL1" as part of its FlashMLA software, which is designed to optimize large model inference generation on NVIDIA GPUs [1][2] - MODEL1 is expected to be a highly efficient inference model with lower memory usage compared to the existing V3.2 model, making it suitable for edge devices and cost-sensitive applications [2] - The company is set to launch its next flagship AI model, DeepSeek V4, in mid-February 2025, which is anticipated to enhance coding capabilities [3] Group 1 - The FlashMLA tool analyzes a total of 114 code files and identifies the MODEL1 architecture mentioned 31 times [1] - MODEL1 supports multiple GPU architectures, including specific implementations for NVIDIA H100/H200 and B200, indicating a tailored optimization for the latest GPU technology [2] - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [2] Group 2 - The V3 model, launched in December 2024, established a strong performance foundation with its efficient MoE architecture, followed by rapid iterations leading to V3.2 [3] - The R1 model, released in January 2025, excels in complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode [3] - Recent technical papers from DeepSeek suggest ongoing development of new models that may integrate innovative training methods and AI memory modules [3]
Gemini 3 Pro刷新ScienceQA SOTA|xbench快报
红杉汇· 2025-11-20 03:38
Core Insights - Google has officially launched its latest foundational model, Gemini 3, which shows significant improvements in deep reasoning, multimodal understanding, and agent programming capabilities [1] - Gemini 3 Pro achieved a new state-of-the-art (SOTA) score of 71.6 on the xbench-ScienceQA leaderboard, surpassing Grok-4 and demonstrating faster response times and lower costs [1][3] Performance Metrics - Gemini 3 Pro scored an average of 71.6 with a BoN of 85, while Grok-4 scored 65.6, indicating a 6-point lead over the second-place model [5] - The average response time for Gemini 3 Pro is 48.62 seconds, significantly faster than Grok-4's 227.24 seconds and GPT-5.1's 149.91 seconds [6] - Cost analysis shows that running the ScienceQA tasks with Gemini 3 Pro costs only $3, compared to $32 for GPT-5.1, making it substantially more economical [6] Technological Advancements - Gemini 3 introduces a cognitive architecture that shifts from reactive to cautious reasoning, utilizing a "Deep Think" mode that allows for multiple reasoning pathways and self-verification [8] - The model employs a sparse MoE architecture, activating only a small subset of its vast parameters during computation, which enhances efficiency while maintaining performance [8] Developer Tools and Features - The introduction of "Vibe Coding" allows Gemini 3 to align code generation with developer intent, functioning as an autonomous agent capable of executing complex tasks within an IDE [9] - Gemini 3 Pro integrates with Google’s Antigravity platform, enabling developers to automate workflows that involve reading web pages, executing commands, and generating code seamlessly [10] Multimodal Capabilities - Gemini 3 adopts a native multimodal architecture, allowing it to process text, code, images, video, and audio using a unified world model, enhancing its perception and interaction capabilities [11] - The model can generate dynamic, interactive user interfaces in real-time based on user intent, marking a shift from static outputs to interactive experiences [12] Hardware Infrastructure - Gemini 3 is trained on Google’s proprietary TPU (Tensor Processing Unit), designed for high-bandwidth and parallel computing, facilitating efficient training and cost management [13]