Workflow
DeepSeek
icon
Search documents
DeepSeek的模型,让AI第一次学会了反思。
数字生命卡兹克· 2025-11-28 01:21
Core Insights - DeepSeek has launched a new model, DeepSeekMath-V2, which emphasizes self-verifiable mathematical reasoning, addressing limitations in previous AI models that focused solely on final answers [1][8][30]. Group 1: Model Capabilities - DeepSeekMath-V2 can not only provide answers but also self-check its problem-solving steps, allowing it to identify and correct its own mistakes [3][49]. - The model has achieved performance levels comparable to Olympic gold medalists, excelling in competitions such as IMO 2025 and Putnam 2024 [5][6][50]. Group 2: Philosophical Context - The model's development responds to concerns raised by AI experts about the gap between AI performance in assessments and real-world problem-solving capabilities [12][26]. - The approach taken by DeepSeekMath-V2 reflects a shift from external validation to internal self-assessment, promoting a deeper understanding of mathematical reasoning [50]. Group 3: Methodology - DeepSeekMath-V2 employs a dual-structure system with a Generator that creates solutions and a Verifier that critically evaluates these solutions for logical consistency and accuracy [46][49]. - The introduction of a Meta-Verifier ensures that the evaluation process is fair and accurate, enhancing the overall reliability of the model [49]. Group 4: Performance Metrics - In the IMO competition, DeepSeekMath-V2 solved 5 out of 6 problems, demonstrating its high-level capabilities [50]. - In the Putnam Competition, it scored 118 out of 120, showcasing its ability to tackle extremely challenging mathematical problems [50].
新突破!DeepSeek推出新模型
新华网财经· 2025-11-28 01:15
Core Insights - DeepSeek launched a new mathematical reasoning model, DeepSeekMath-V2, on HuggingFace, which utilizes a self-verifying training framework [2] - The model is built on DeepSeek-V3.2-Exp-Base and employs an LLM verifier to automatically review generated mathematical proofs, continuously optimizing performance with high-difficulty samples [3] - DeepSeekMath-V2 achieved gold medal levels in both the 2025 International Mathematical Olympiad (IMO) and the 2024 Chinese Mathematical Olympiad (CMO), and scored 118/120 in the 2024 Putnam Mathematical Competition [3][4] Performance Metrics - In the IMO 2025, the model scored 83.3% across problems P1 to P5 [4] - In the CMO 2024, it achieved 73.8% on problems P1, P2, P4, P5, and P6 [4] - In the Putnam 2024, it scored 98.3% on problems A1 to B6 [4] Model Architecture - The core architecture of DeepSeekMath-V2 establishes a self-driven verification-generation loop, with one LLM acting as a "reviewer" for proof verification and another as a "creator" for proof generation, utilizing reinforcement learning for collaboration [5] - A "meta-verification" layer is introduced to effectively suppress model hallucinations [5] Competitive Edge - In a self-constructed test of 91 CNML-level problems, DeepSeekMath-V2 demonstrated superior mathematical reasoning capabilities, outperforming GPT-5-Thinking-High and Gemini 2.5-Pro across all categories including algebra, geometry, number theory, combinatorics, and inequalities [7] - The model also excelled in the IMO-ProofBench benchmark, surpassing DeepMind's DeepThink at the IMO gold medal level in basic sets and maintaining strong competitiveness in more challenging advanced sets [8] Future Directions - The DeepSeek team indicates that while significant work remains, these results suggest that self-verifying mathematical reasoning is a viable research direction, potentially aiding in the development of more powerful mathematical AI systems [10]
第1个获得数学奥赛金牌的开源模型!DeepSeek新模型获网友盛赞:公开技术文件,了不起!
Hua Er Jie Jian Wen· 2025-11-28 00:46
Core Insights - DeepSeek has launched its latest open-source mathematical reasoning model, DeepSeekMath-V2, which has achieved gold medal status in the highly competitive International Mathematical Olympiad (IMO) 2025, marking a significant breakthrough in open-source AI capabilities in complex reasoning [1][3]. Group 1: Model Performance - DeepSeekMath-V2 solved 5 out of 6 problems in the simulated IMO 2025, becoming the first open-source model to achieve gold medal status in such a prestigious competition [1]. - The model also demonstrated top-tier performance in other challenging mathematics competitions, including achieving gold medal status in the Chinese Mathematical Olympiad (CMO) and scoring 118 out of 120 in the Putnam Mathematics Competition 2024, surpassing the highest human score of 90 [3]. Group 2: Innovation in Training Framework - The model employs an innovative self-verification training framework, which includes a dedicated verifier that assesses the quality of the proof process rather than just the correctness of the final answer [2][11]. - To prevent overfitting, DeepSeek has implemented a dynamic evolution strategy that increases computational demands and automatically labels difficult proofs, ensuring that the verifier and generator evolve in sync [12]. Group 3: Open Source and Community Impact - DeepSeekMath-V2's weights are publicly available under the Apache 2.0 license, allowing researchers and developers to download and utilize the model freely, which is seen as a significant step towards the democratization of AI [2][4]. - The release has sparked discussions about the potential impact of open-source models on the commercial viability of closed-source products, particularly concerning major players like NVIDIA [2].
DeepSeek上新,“奥数金牌水平”
Di Yi Cai Jing· 2025-11-28 00:40
Core Insights - DeepSeek has released a new model, DeepSeek-Math-V2, which is the first open-source model to achieve International Mathematical Olympiad (IMO) gold medal level performance [3][5] - The model outperforms Google's Gemini DeepThink in certain benchmarks, showcasing its capabilities in mathematical reasoning [5][9] Performance Metrics - DeepSeek-Math-V2 achieved 83.3% in IMO 2025 and 73.8% in CMO 2024, while scoring 98.3% in the Putnam 2024 competition [4] - In the Basic benchmark, Math-V2 scored nearly 99%, significantly higher than Gemini DeepThink's 89%, but in the Advanced subset, Math-V2 scored 61.9%, slightly lower than Gemini's 65.7% [5] Research Implications - The paper titled "DeepSeek Math-V2: Towards Self-Validating Mathematical Reasoning" emphasizes the importance of rigorous mathematical proof processes rather than just correct answers [8] - DeepSeek advocates for self-validation in mathematical reasoning to enhance the development of more powerful AI systems [8] Industry Reactions - The release of Math-V2 has generated excitement in the industry, with comments highlighting its unexpected success over Google's model [9] - The competitive landscape is evolving, with other major players like OpenAI and Google releasing new models, raising anticipation for DeepSeek's next moves [10]
DeepSeek上新,“奥数金牌水平”
第一财经· 2025-11-28 00:35
Core Viewpoint - DeepSeek has released an open-source model, DeepSeek-Math-V2, which is the first model to achieve IMO gold medal level in mathematics and outperforms Google's Gemini DeepThink in certain benchmarks [3][5]. Group 1: Model Performance - DeepSeek-Math-V2 achieved nearly 99% on the Basic benchmark, significantly outperforming Gemini DeepThink, which scored 89% [5]. - In the more challenging Advanced subset, Math-V2 scored 61.9%, slightly below Gemini DeepThink's 65.7% [5]. - The model has demonstrated gold medal-level performance in IMO 2025 and CMO 2024, and nearly perfect scores in the Putnam 2024 exam (118/120) [8]. Group 2: Research and Development Insights - DeepSeek emphasizes the importance of verifying mathematical reasoning comprehensively and rigorously, moving from a result-oriented approach to a process-oriented one [8]. - The model is designed to teach AI to review proof processes like a mathematician, enhancing its ability to solve complex mathematical proofs without human intervention [8]. Group 3: Industry Reactions and Expectations - The release of Math-V2 has generated excitement in the industry, with reactions noting that DeepSeek has surpassed expectations by defeating Google's IMO Gold model by a 10% margin [9]. - There is anticipation regarding DeepSeek's next moves, especially concerning updates to its flagship models, as the industry awaits further developments [9].
DeepSeek上新!首个奥数金牌水平的模型来了
Di Yi Cai Jing· 2025-11-28 00:22
Core Insights - DeepSeek has released a new model, DeepSeek-Math-V2, which is the first open-source model to achieve International Mathematical Olympiad (IMO) gold medal level performance [1] - The model outperforms Google's Gemini DeepThink in certain benchmarks, showcasing its capabilities in mathematical reasoning [1][5] Performance Metrics - DeepSeek-Math-V2 achieved 83.3% on IMO 2025 problems and 73.8% on CMO 2024 problems [4] - In the Putnam 2024 competition, it scored 98.3%, demonstrating exceptional performance [4] - On the Basic benchmark, Math-V2 scored nearly 99%, while Gemini DeepThink scored 89% [5] - In the Advanced subset, Math-V2 scored 61.9%, slightly below Gemini DeepThink's 65.7% [5] Research and Development Focus - The model emphasizes self-verification in mathematical reasoning, moving from a result-oriented approach to a process-oriented one [8] - DeepSeek aims to enhance the rigor and completeness of mathematical proofs, which is crucial for solving open problems [8] - The research indicates that self-verifying mathematical reasoning is a viable direction for developing more powerful AI systems [8] Industry Reaction - The release has generated significant interest, with comments highlighting DeepSeek's competitive edge over Google's model [9] - The industry is keenly awaiting further developments from DeepSeek, especially regarding their flagship model updates [10]
DeepSeek强势回归,开源IMO金牌级数学模型
3 6 Ke· 2025-11-27 23:34
Core Insights - DeepSeek has introduced a new model, DeepSeek-Math-V2, which aims to enhance self-verifiable mathematical reasoning capabilities in AI [1][2] - The model reportedly outperforms Gemini DeepThink, achieving gold medal-level performance in mathematical competitions [3] Model Development - DeepSeek-Math-V2 is based on the previous version, DeepSeek-Math-7b, which utilized 7 billion parameters to match the performance of GPT-4 and Gemini-Ultra [4] - The new model addresses limitations in current AI mathematical reasoning by focusing on the rigor of the reasoning process rather than just the accuracy of final answers [5][6] Self-Verification Mechanism - The model incorporates a self-verification system that includes a proof verification component, a meta-verification layer, and a self-evaluating generator [7][11] - The verification system is designed to assess the reasoning process in detail, providing feedback similar to human experts [8][10] Training and Evaluation - The training process involves a unique honest reward mechanism, where the model is incentivized to self-assess its performance and identify its own errors [11][15] - The model has demonstrated impressive results in various mathematical competitions, achieving high scores in IMO 2025, CMO 2024, and Putnam 2024 [16][17] Performance Metrics - In the IMO-ProofBench benchmark, DeepSeek-Math-V2 achieved nearly 99% accuracy in basic problems and performed competitively in advanced problems [18] - The model's dual improvement cycle between the verifier and generator significantly reduces the occurrence of hallucinations in large models [20] Future Implications - DeepSeek emphasizes that self-verifiable mathematical reasoning represents a promising research direction that could lead to the development of more powerful mathematical AI systems [20]
AI员工几分钟响应 跨镇街建十大万亩级园区
Nan Fang Du Shi Bao· 2025-11-27 23:11
沙溪镇六乡涌曾经为黑臭河涌,经过治理后变得水清岸绿。 在城市化进程不断加速的今天,打造宜居、韧性、智慧城市已不再是选择题,而是关乎长远发展的必答 题。一座宜居的城市,能够吸引并留住人才;一座韧性的城市,能够有效抵御风险;一座智慧的城市, 能够通过技术手段破解"城市病",提升运行效率,降低管理成本。 中山,正以技术赋能、制度创新与生态优先,系统推进城市治理现代化:从AI员工实现"分钟级"政务响 应,到集成式改革破解体制瓶颈,再到绿色发展重塑城乡风貌,一幅以人为本、可持续、有温度的城市 发展图景正在徐徐展开。 技术赋能 "AI员工"上岗,驱动政务服务"系统性优化" 深圳企业员工张敏慧常年在深中两地往返,计划在中山东区购置一套房产。她走进东区街道政务服务中 心,被引导至31号窗口,与一位特殊的"AI员工"对话。短短几分钟内,她就清晰地了解到使用公积金在 中山购房的具体条件、所需材料和咨询渠道。 这位AI员工是中山市首位在镇街层面投入使用的"政务服务AI专员",平均响应时间仅0.8秒,咨询准确 率稳定在80%以上。与普通的云端AI服务不同,它深度对接了市场监管、社保、医保等12个部门的业务 系统,实时更新维护800余 ...
北京发布太空数据中心建设规划方案;国家发改委将健全具身智能准入和退出机制;XREAL联合谷歌12月发布AI眼镜——《投资早参》
Mei Ri Jing Ji Xin Wen· 2025-11-27 23:01
Important Market News - Brent crude oil for January closed up $0.21, an increase of over 0.33%, at $63.34 per barrel. WTI crude oil saw a daily increase of 1.00%, closing at $59.23 per barrel [1] - Major European stock indices closed mixed, with Germany's DAX30 up 0.31% at 23,767.56 points, the UK FTSE 100 down 0.02% at 9,689.65 points, France's CAC40 up 0.04% at 8,099.47 points, and the Euro Stoxx 50 down 0.06% at 5,652.15 points [1] Industry Insights - The Beijing Municipal Science and Technology Commission and the Zhongguancun Science City Management Committee released a plan for the construction of a space data center. The plan proposes a centralized large data center system in the 700 to 800 km dawn-dusk orbit, capable of accommodating a million-class server cluster for space-based data relay transmission and computing services [2] - The "Star Eye" space perception constellation plan was officially launched, consisting of 156 satellites aimed at creating a space information analysis platform and space management service platform [2] - The satellite internet industry is becoming a new frontier in global technology competition, with the satellite communication market currently valued at approximately 40-50 billion yuan, expected to exceed 200-400 billion yuan by 2030, with an annual compound growth rate of 10%-28% [3] - The industry is at a critical turning point from "concept validation" to "scale application," with advancements in technology, cost reductions, and expanded application scenarios expected to create a new communication pattern of "integrated space and ground, interconnected everything" over the next decade [3] - The National Development and Reform Commission plans to promote the healthy and standardized development of the embodied intelligence industry through three main approaches: establishing industry standards, accelerating core technology breakthroughs, and promoting infrastructure construction [3] - The humanoid robot industry is expected to see significant growth by 2025, driven by leading companies enhancing component performance and reducing costs, with a focus on core supply chains and application scenarios [4] - XREAL officially launched its global headquarters in Shanghai and announced a partnership with Google to develop the "Project Aura" AR glasses, which will integrate Google Gemini AI as its core [5][6]
阿维塔“递表”港股IPO;DeepSeek推出新模型丨每经早参
Mei Ri Jing Ji Xin Wen· 2025-11-27 22:19
Group 1 - The third New Quality Productivity Automotive Conference will be held from November 28 to 30, 2025 [3] - Huawei's Mate80 and Mate80 Pro series will officially go on sale on November 28 [3] - The first batch of seven dual-innovation artificial intelligence ETFs will collectively launch on November 28 [3] Group 2 - The Hong Kong fire in Tai Po has resulted in 83 fatalities as of November 28 [6] - The Hong Kong government will provide emergency relief of 10,000 HKD per household affected by the fire [6] - A total of over 600 million HKD has been pledged in donations from various enterprises and organizations for disaster relief and recovery efforts [13][14] Group 3 - The Ministry of Commerce of China held a video conference with the German Federal Minister of Economics and Energy to discuss issues related to Nexperia [6] - The Chinese government is taking targeted measures to enhance credit repair, simplifying application materials and improving efficiency [9] Group 4 - Japan plans to issue approximately 11.7 trillion JPY (about 529.9 billion CNY) in government bonds to fund a new economic stimulus plan [11] - The former President of Peru, Pedro Castillo, has been sentenced to over 11 years in prison for conspiracy to commit rebellion [11] Group 5 - Anta Sports has responded to rumors regarding a potential bid for Puma, stating it does not comment on market speculation [20] - The leadership change at Wahaha Group may lead to strategic adjustments that could impact the competitive landscape [21] Group 6 - Joy City Property has officially delisted from the Hong Kong Stock Exchange after 12 years, following a privatization plan [23] - Avita Technology has submitted its IPO application to the Hong Kong Stock Exchange, marking a significant move for a state-owned enterprise in the new energy vehicle sector [27] Group 7 - The Chinese open-source AI model download share has surpassed that of the United States, indicating a significant advancement in AI technology [31]