Workflow
DeepSeek
icon
Search documents
不只是“做题家”!DeepSeek最新模型打破数学推理局限,部分性能超越Gemini DeepThink
Tai Mei Ti A P P· 2025-11-28 05:45
Core Insights - DeepSeek has released its latest mathematical model, DeepSeek Math-V2, which has generated significant excitement in the AI community due to its self-verifying capabilities in deep reasoning, particularly in mathematics [1][2]. Model Performance - Math-V2 demonstrates strong theorem-proving abilities, distinguishing itself from previous models that merely solved problems without rigorous reasoning [2]. - The model achieved gold medal-level results in the IMO 2025 and CMO 2024 competitions, and scored 118 out of 120 in the Putnam 2024 competition, showcasing its superior performance [2]. Benchmarking Results - In the IMO-Proof Bench evaluation, Math-V2 scored 99%, outperforming Google's Gemini Deep Think (89%) and GPT-5 (59%) [3]. - In advanced testing, Math-V2 scored 61.9%, just behind Gemini Deep Think's 65.7% [3]. Community Impact - The release of Math-V2 has sparked discussions across social media platforms and communities, highlighting its potential to automate verification-heavy tasks in programming languages [5][8]. - Experts in the AI field have praised DeepSeek's return and the significance of Math-V2, indicating a shift from "chatbot" to "reasoner" era in AI development [8][9].
第1个获得数学奥赛金牌的开源模型!DeepSeek新模型获网友盛赞:公开技术文件,了不起!
华尔街见闻· 2025-11-28 04:35
DeepSeek最新发布的开源数学模型,正将其推向与OpenAI和谷歌等科技巨头同场竞技的舞台DeepSeekMath-V2的模型,在被誉为全球最难的高中数学竞赛 中达到了金牌水平,成为首个实现这一成就的开源模型,标志着开源人工智能在复杂推理能力上的一次重大突破。 昨日DeepSeek宣布推出其最新的数学推理模型DeepSeekMath-V2,该模型在模拟的2025年国际数学奥林匹克竞赛(IMO)中解决了6个问题中的5个,达到 了金牌水平。 这一成就使其成为第一个在IMO级别竞赛中获得金牌的开源模型,引发了AI研究和开发者社区的高度关注。 这一表现直接对标了行业巨头。就在今年7月,谷歌DeepMind的Gemini高级版本和一个来自OpenAI的实验性推理模型也达到了IMO 2025的金牌标准,同样解 决了5个问题,它们是首批达到该水平的人工智能模型。 然而,与谷歌和OpenAI的闭源实验模型不同,DeepSeekMath-V2的模型权重根据Apache 2.0许可证公开发布,可供公众下载。 值得一提的是,DeepSeekMath-V2采用了一种创新的自我验证训练框架。该方法的核心是训练一个专门的"验证器"( ...
新突破!DeepSeek推出新模型,科创AIETF(588790)红盘震荡
Xin Lang Cai Jing· 2025-11-28 03:15
Group 1: Market Performance - The Shanghai Stock Exchange Sci-Tech Innovation Board Artificial Intelligence Index increased by 0.22% as of November 28, 2025, with notable gains from companies such as Zhongke Xingtou (up 4.13%) and Hongsoft Technology (up 3.00%) [1] - The Sci-Tech AI ETF (588790) showed a mixed performance, with a recent price of 0.76 yuan and a cumulative increase of 1.75% over the past week as of November 27, 2025 [1] - The trading volume for the Sci-Tech AI ETF was 1.17 billion yuan, with a turnover rate of 1.96% [1] Group 2: AI Industry Development - China's generative artificial intelligence is in a rapid development phase, with improving fundamentals for AI-related companies across both software and hardware sectors [2] - The demand for AI applications continues to grow, and domestic computing power is rising quickly, indicating a clear development trend in China's AI sector [2] - By 2026, the focus will shift towards the application and innovation of AI, as the large model market begins to consolidate [2] Group 3: Fund Performance and Composition - The Sci-Tech AI ETF has seen a significant growth of 2.848 billion yuan in scale over the past six months [3] - The fund's shares increased by 318 million shares this month, indicating substantial growth [3] - The latest net outflow for the Sci-Tech AI ETF was 110 million yuan, but over the past 19 trading days, there were 11 days of net inflow totaling 249 million yuan [3] - The index tracks 30 major companies in the AI sector, with the top ten stocks accounting for 70.92% of the index [3]
“在数学上,中国模型没输过”!DeepSeek 深夜屠榜,Math V2 以碾压姿态终结“最强数学模型”之争
AI前线· 2025-11-28 02:54
Core Insights - DeepSeek has released a new mathematics reasoning model, DeepSeek-Math-V2, with 685 billion parameters, which is the first open-source model to reach the gold medal level of the International Mathematical Olympiad (IMO) [2][9] - The model outperforms its predecessor, DeepSeek-Math-7B, which had only 7 billion parameters and was comparable to GPT-4 and Gemini-Ultra [4] - The model's performance in the IMO-ProofBench benchmark shows it scored nearly 99% in the Basic subset, surpassing Gemini DeepThink's 89%, while in the Advanced subset, it scored 61.9%, slightly below Gemini DeepThink's 65.7% [5][9] Performance Metrics - In real competition problems, DeepSeek-Math-V2 achieved gold medal levels in IMO 2025 and CMO 2024, and scored 118 out of 120 in Putnam 2024, demonstrating strong theorem proving capabilities [7][8] - The model's performance in specific contests includes 83.3% in IMO 2025 and 73.8% in CMO 2024 [8] Technical Advancements - The accompanying technical paper highlights significant breakthroughs in mathematical reasoning rigor, theorem proving capabilities, and surpassing some benchmarks set by Google's Gemini DeepThink [9][12] - A key innovation of DeepSeek-Math-V2 is its self-verification mechanism, allowing the model to check its reasoning chain for completeness and logical consistency, which is crucial for mathematical tasks [13][16] Community Response - The open-source release has garnered strong reactions from developer communities, with many expressing surprise at the model's performance and potential future applications in programming [18][20] - Users have noted the importance of mathematical correctness in AI-generated code, indicating a demand for models that excel in mathematical reasoning [20][23] Industry Implications - The release of DeepSeek-Math-V2 is redefining the competitive landscape of large model mathematics reasoning research, with self-verification becoming a key technological pathway for the next generation of mathematical AI [25]
最新研究发现,用诗歌“诱骗”人工智能可有效绕过安全限制
Xin Jing Bao· 2025-11-28 02:28
Core Insights - A recent study from Italy reveals that "adversarial poetry" can effectively bypass the safety mechanisms of large language models, indicating a significant vulnerability in AI systems [2][3]. Group 1: Research Findings - The study tested 25 mainstream AI models, including those from Google, OpenAI, and Anthropic, and found that the overall attack success rate of adversarial poetry reached 62% [3]. - Some models, like DeepSeek, showed over 70% susceptibility to poetic manipulation, while Gemini was affected in over 60% of cases [3]. - In contrast, GPT-5 demonstrated a high resistance, rejecting 95% to 99% of attempts to manipulate it through poetry [3]. Group 2: Mechanism of Attack - Adversarial poetry involves rephrasing harmful instructions into poetic forms, which can obscure the malicious intent from the models [2][4]. - An example provided in the study illustrates how a question about extracting uranium was transformed into a metaphorical poem, making it difficult for the model to recognize the underlying threat [4][5]. Group 3: Implications for AI Models - The research suggests that larger models, which are trained on extensive datasets, are more prone to "over-interpretation" and thus more vulnerable to these poetic attacks [6]. - Smaller models, with limited training data, exhibit greater resistance to such manipulations, possibly due to their reduced ability to parse metaphorical language [6]. Group 4: Philosophical Context - The study references Plato's concerns about the potential dangers of mimetic language, highlighting the timeless relevance of these issues in the context of modern AI [7].
DeepSeek发布可自验证数学模型DeepSeekMath-V2!科创人工智能ETF华夏(589010)早盘微跌震荡,中科星图领涨
Mei Ri Jing Ji Xin Wen· 2025-11-28 02:11
Group 1 - The core viewpoint of the articles highlights the significant advancements in AI technology, particularly with the introduction of DeepSeek and its impact on the AI industry [1][2] - The 科创人工智能ETF (589010) has shown a slight decline of 0.15%, while it has attracted a net inflow of 32.51 million yuan yesterday, totaling over 200 million yuan in the last five days [1] - DeepSeek has launched a new mathematical reasoning model, DeepSeekMath-V2, which utilizes a self-verifying training framework and has achieved gold medal levels in various competitions [1] Group 2 - The introduction of DeepSeek marks a pivotal moment for the AI industry, significantly reducing training and inference costs while accelerating the commercialization of AI [2] - DeepSeek APP has experienced rapid growth, reaching 100 million users within seven days without any advertising, making it the fastest-growing AI application globally [2] - The 科创人工智能ETF closely tracks the Shanghai Stock Exchange's AI index, covering high-quality enterprises across the entire industry chain, benefiting from high R&D investment and policy support [2]
DeepSeek上新模型;摩尔线程部分新股遭弃购丨科技风向标
Group 1: Technology Developments - DeepSeek launched a new mathematical reasoning model, DeepSeekMath-V2, which achieved gold medal levels in international competitions and demonstrated the feasibility of self-verifying reasoning paths [2] - Quark AI glasses were released by Alibaba, featuring dual-chip design and various models priced from 1,899 to 3,799 yuan [4] - Tianfu Communication announced its mass production capabilities for 800G and 1.6T high-speed optical engines, with ongoing investments in R&D for performance optimization [6] Group 2: Corporate Restructuring and Workforce Changes - HP announced a global layoff plan affecting 4,000 to 6,000 employees, approximately 10% of its workforce, to streamline operations and enhance productivity through AI [3] - ByteDance is in negotiations to sell its subsidiary, Shanghai Mutong Technology, to Saudi Arabia's Savvy Games Group, with the deal potentially valued at 14.5 billion yuan [5] Group 3: Market and Investment Activities - Dongxin Co. reported a strategic cooperation framework agreement with a leading domestic cloud service provider, focusing on various technological solutions [12] - Hechang New Materials plans to acquire a 51% stake in Shenzhen Xinwei Communications for approximately 234.6 million yuan, gaining control over the company [15] - Wuwen Chip has completed nearly 500 million yuan in A+ round financing, attracting significant investment from both state-owned and market-oriented funds [16] Group 4: Regulatory and Industry Insights - The National Development and Reform Commission addressed the rapid growth and potential "bubble" in the humanoid robot industry, noting over 150 companies in the sector with a growth rate exceeding 50% [7] - The Chinese Electronic Technology Standardization Institute clarified that existing 3C certified power banks will not be affected by new safety standards, easing consumer concerns [10]
DeepSeek上新模型;摩尔线程部分新股遭弃购丨新鲜早科技
Group 1: Technology Developments - DeepSeek launched a new mathematical reasoning model, DeepSeekMath-V2, which achieved gold medal levels in major math competitions, showcasing the feasibility of self-verifying reasoning paths [2] - Quark AI glasses were released by Alibaba, featuring advanced hardware and dual operating systems, with prices starting from 1,899 yuan [4] - Tianfu Communication announced its capability for mass production of 800G and 1.6T high-speed optical engines, with ongoing investments in R&D for performance optimization [6] Group 2: Corporate Restructuring and Acquisitions - HP announced a global layoff plan affecting 4,000 to 6,000 employees, approximately 10% of its workforce, to streamline operations and enhance productivity through AI [3] - ByteDance is in negotiations to sell its subsidiary, Shanghai Mutong Technology, to Saudi Arabia's Savvy Games Group, with the deal's outcome uncertain [5] - Haichang New Materials plans to acquire a 51% stake in Shenzhen Xinwei Communications for approximately 234.6 million yuan, gaining control over the company [15] Group 3: Market Trends and Responses - The National Development and Reform Commission highlighted the rapid growth of humanoid robots, which are expanding at over 50% annually, while cautioning against market saturation and product redundancy [7] - Hongmeng Zhixing reported a surge in online attacks against the company, asserting that it will pursue legal action against those spreading false information [8] - The Chinese Electronic Technology Standardization Institute clarified that existing 3C certified power banks will remain valid despite rumors of new standards coming into effect [10] Group 4: Financial Activities - Muxi Co. announced its IPO plans, aiming to raise 3.904 billion yuan, potentially becoming the second domestic GPU company listed on the A-share market [13] - Moer Thread reported a significant number of shares were abandoned during its IPO, with over 29302 shares worth approximately 334.86 million yuan not subscribed [14] - Wuwen Chip completed nearly 500 million yuan in A+ round financing, attracting investments from various state-owned and market-oriented funds [16]
GPT-5危了,DeepSeek开源世界首个奥数金牌AI,正面硬刚谷歌
3 6 Ke· 2025-11-28 01:55
Core Insights - DeepSeek has launched its new model, DeepSeekMath-V2, which has won the IMO 2025 gold medal, showcasing capabilities that rival or even surpass Google's IMO gold medal model [1][3][22] - This is the first open-source IMO gold medal model, marking a significant advancement in AI [1][24] Model Performance - DeepSeekMath-V2 demonstrated strong theorem-proving abilities, solving 5 out of 6 problems in the IMO 2025, achieving a gold medal level [3][4] - In the CMO 2024, it also reached gold medal status, and in the Putnam 2024, it scored 118 out of 120, surpassing the highest human score of 90 [3][4] Comparison with Competitors - DeepSeekMath-V2 outperformed Google's Gemini Deep Think in the ProofBench-Basic tests and closely followed it in the ProofBench-Advanced tests [5][22] - The model's performance indicates a significant leap in capabilities compared to existing models like OpenAI's GPT-5 and Gemini 2.5-Pro [26][28] Self-Verification Mechanism - A key breakthrough of DeepSeekMath-V2 is its self-verification capability, allowing it to self-assess and improve its proofs [12][36] - The model employs a unique "three-in-one" system consisting of a Generator, Verifier, and Meta-Verifier to enhance its proof quality [15][16] Training Methodology - The training process involved a high-compute search strategy, generating numerous candidate proofs and validating them rigorously [32][35] - The model's ability to self-correct and refine its proofs through multiple iterations significantly improved its performance [38] Implications for AI Development - The success of DeepSeekMath-V2 suggests a shift in AI from merely mimicking human responses to emulating human thought processes, emphasizing the importance of self-reflection in achieving advanced AI [36][37]
DeepSeek再破谷歌OpenAI垄断:开源IMO数学金牌大模型
量子位· 2025-11-28 01:53
Core Insights - DeepSeek has released a new mathematical model, DeepSeekMath-V2, focusing on self-verifiable mathematical reasoning [1][7] - The model has achieved gold medal-level scores in IMO 2025 and CMO 2024, and scored 118/120 in Putnam 2024, surpassing the highest human score of 90 [2][43] - DeepSeekMath-V2 is the first open-source IMO gold medal model, raising competitive pressure on companies like Google and OpenAI [4][5] Model Performance - DeepSeekMath-V2 outperforms GPT-5-Thinking-High and Gemini 2.5-Pro across all CNML problem categories, including algebra, geometry, number theory, combinatorics, and inequalities [2][34] - The model's architecture includes 685 billion parameters, emphasizing strong proof verification capabilities [7] Training Methodology - The training process involves an iterative reinforcement learning loop that alternates between optimizing the proof verifier and the proof generator [9] - A large dataset of 17,500 proof-required math problems was collected from AoPS competitions to train the proof verifier [12] - The verifier is trained to identify issues in proofs and assign scores based on three levels of correctness [10] Meta-Verification Mechanism - A meta-verification mechanism was introduced to enhance the verifier's accuracy by assessing the validity of the identified issues [14] - The meta-verifier is trained using a dataset created from expert evaluations of the verifier's output [15] Proof Generation - The trained verifier serves as a reward model for the proof generator, which learns to self-review and correct its outputs [23] - The reward structure encourages accurate self-assessment and correction of errors in generated proofs [27] Automation and Efficiency - The collaboration between the verifier and generator leads to a fully automated data labeling process, replacing time-consuming manual annotations [29][35] - The automated process ensures high consistency with expert evaluations, significantly improving efficiency [35] Experimental Results - The model's average quality score for proof analysis improved from 0.85 to 0.96, demonstrating the effectiveness of the meta-verification mechanism [21] - The model's ability to generate correct proofs was validated through rigorous testing, showing superior performance across various mathematical problem categories [34][39]