Workflow
AI数学推理
icon
Search documents
Gemini再揽金牌,力压大学学霸,AI数学推理时代来了
3 6 Ke· 2025-08-12 00:56
Gemini奥数金牌,实至名归!ETH Zurich博士在大学生国际数学竞赛(IMC)中,测试了Gemini的三种模式,表现远高于前8%的金牌门槛, 远超普通大学生。 大学生数学不如AI? 近日,在MathArena上,苏黎世联邦理工学院SRI实验室博士生Jasper Dekoninck,启动了一项新比赛:大学生国际数学竞赛(IMC)。 | Acc | Cost | 1 | 2 | 3 | 4 | | ୧ | | --- | --- | --- | --- | --- | --- | --- | --- | | 94.50% | $94.64 | 100% | 100% | 90% | 100% | 70% | 100% | | 93.00% | N/A | 100% | 100% | 90% | 100% | 80% | 100% | | 88.00% | $114.52 | 100% | 100% | 100% | 100% | 20% | 100% | 最终,LLM以高分胜出:语言模型在国际数学竞赛中拔得头筹。 Gemini远超普通大学生水平 国际数学奥林匹克竞赛(IMO)一直被研究人员视为AI系统数学 ...
计算机行业重大事项点评:DeepSeek-Prover-V2发布,专注数学推理
Huachuang Securities· 2025-05-04 09:28
Investment Rating - The report maintains a "Recommendation" rating for the computer industry, expecting the industry index to outperform the benchmark index by more than 5% in the next 3-6 months [20]. Core Insights - The release of DeepSeek-Prover-V2, a large language model focused on mathematical reasoning, marks a significant advancement in AI mathematics capabilities. The model features 671 billion parameters and employs a mixture of experts (MoE) architecture, enhancing its efficiency and performance in formal theorem proving [4][7]. - The report highlights the successful integration of formal and informal mathematical proofs, with DeepSeek-Prover-V2 achieving an 88.9% pass rate on the MiniF2F-test dataset, setting a new industry benchmark [7]. - Investment opportunities are identified in various sectors related to AI applications, including office software, finance, large models, industrial applications, medical technology, and more, suggesting a broad range of companies to consider for investment [7][8]. Industry Overview - The computer industry comprises 336 listed companies with a total market capitalization of 426.57 billion yuan and a circulating market value of 363.99 billion yuan [4]. - The absolute performance of the industry over the past 12 months has been 27.0%, with a relative performance of 22.4% compared to the benchmark index [5]. Related Research Reports - The report references several related studies, including a commentary on Alibaba's open-source Qwen3 model and a weekly report on the development trends in the intelligent driving industry [7].
刚刚!DeepSeek-Prover-V2-671B 发布,网友:DS 是假期终结者
程序员的那些事· 2025-05-01 02:04
Core Viewpoint - DeepSeek has launched DeepSeek-Prover-V2-671B, marking a significant advancement in AI mathematical reasoning capabilities, particularly in automated theorem proving [2][4]. Group 1: Model Overview - DeepSeek-Prover-V2-671B is a next-generation automated theorem proving expert model with 671 billion parameters, optimized for proof generation and verification in the Lean 4 framework [4][6]. - The model employs a mixture of experts (MoE) architecture, activating approximately 37 billion parameters per inference, enhancing computational efficiency while maintaining strong reasoning capabilities [4][6]. Group 2: Key Breakthroughs - The release signifies three major milestones, including the potential for innovation across various application domains [6]. - The model's specifications include a context length of approximately 128,000 tokens, allowing it to handle complex reasoning chains and lengthy proofs [6][7]. - The attention mechanism is likely a multi-head latent attention (MLA), which compresses key-value (KV) cache, significantly reducing memory requirements [6][7]. Group 3: Applications and Impact - The model supports formal verification in areas such as cryptographic security proofs and chip design validation, enabling rigorous mathematical checks in automated processes [7]. - It aids mathematicians in formalizing theorems, exploring new conjectures, and proving complex mathematical problems, potentially accelerating mathematical research [7]. - The model can be utilized as an interactive educational tool, guiding students in mastering rigorous mathematical proof methods [7].