Workflow
DeepSeek
icon
Search documents
DeepSeek开源新模型,数学推理能力大提升
Hu Xiu· 2025-05-01 00:48
Core Insights - DeepSeek has officially released DeepSeek-Prover-V2 on Hugging Face, continuing its open-source momentum with two versions launched [1][4] - The training core of DeepSeek-Prover-V2 combines "recursion + reinforcement learning," enabling the model to break down complex theorems into sub-goals and reasoning paths [3][8] Model Specifications - DeepSeek-Prover-V2-7B is based on the previous V1.5 model and supports a maximum context input of 32K [4] - DeepSeek-Prover-V2-671B is trained on the DeepSeek-V3-Base, showcasing the strongest reasoning performance [4] Training Process - The training process consists of two phases: the first phase focuses on rapid mode using an "expert iteration" method, where successful answers refine the model [5] - In the second phase, more complex logical reasoning capabilities are trained, incorporating mathematical knowledge from DeepSeek-V3 and formal data [6] Reinforcement Learning - The GRPO reinforcement learning algorithm is introduced to enhance reasoning capabilities, allowing the model to autonomously learn to select optimal solutions from multiple candidates [8] - The system generates 32 different proof schemes for each theorem, retaining only those verified as correct by the Lean verification system [9] Model Distillation - After developing the powerful 671B model, the team distilled its capabilities into a smaller 7B model, allowing users to achieve near-equivalent mathematical reasoning abilities on resource-limited devices [10][11] Reasoning Modes - The rapid mode (non-CoT) focuses on speed, generating concise Lean code answers without showing the thought process, suitable for handling numerous problems [12] - The logical mode (CoT) details each step of the reasoning process, ensuring clarity and transparency [12] Performance Evaluation - In the final performance assessment, DeepSeek-Prover-V2-671B achieved an 88.9% pass rate in the MiniF2F test, successfully solving 49 problems from the PutnamBench dataset [17] New Dataset - DeepSeek introduced a new formal mathematical dataset, ProverBench, containing 325 problems across various mathematical domains, including number theory, algebra, and calculus [18][19] Comparison and Trends - The comparison shows a significant trend: the performance gap between large language models in "informal mathematical reasoning" and "formal mathematical reasoning" is narrowing [21] - The evolution of model structure and training strategies enables models to produce rigorous, verifiable mathematical proofs [22] Future Directions - DeepSeek-Prover-V2 indicates a shift in focus from merely generating content to generating structured logic, which may touch upon the foundational structure of general artificial intelligence [33][34]
陆家嘴财经早餐2025年5月1日星期四
Wind万得· 2025-04-30 22:29
// 热点聚焦 // 1、 4月央行开展了1.2万亿元买断式逆回购操作。由于当月有1.7万亿元买断式逆回购到期,因此4月央行买断式逆回购操作缩量5000亿元,为该项政策工 具创立以来首次缩量。由于当月MLF净投放量为5000亿元,因此4月央行中期流动性操作为等量续作,结束了去年10月以来央行持续注入中期流动性的操 作过程。同时,央行连续4个月暂停公开市场国债买卖操作。 分析认为,4月买断式逆回购缩量续作,并不意味着央行正在收紧市场流动性,这很可能意 味着下一步即将实施降准。 2、 《民营经济促进法》出台,自5月20日起施行,是我国首部专门关于民营经济发展的基础性法律。5月起,国内还将有一批新规开始施行。 其中包括: 办理结婚、离婚登记都无需再出示户口本;新建住宅建筑层高不低于3米、四层以上新建住宅要设电梯;消费者自付款之日起七日内有权请求经营者返还 预付款本金;禁止保险公司开发5年期以下的万能险;中瑙(鲁)互免持特定类别护照人员签证等。 3、 美国经济在年初出现自2022年以来的首次萎缩,主要原因是关税实施前进口激增以及消费者支出放缓。 根据政府公布的初步估算, 第一季度经通胀 调整后的美国国内生产总值(GD ...
中国电子:国产开源模型千帆竞发,阿里 Qwen-3、小米 MiMo、DeepSeek Prover 集中发布
Investment Rating - The report indicates that Alibaba's Qwen currently ranks at the top of the open-source model rankings, with expectations for continued leadership in model capability and ecosystem monetization [2]. Core Insights - The report highlights a surge in domestic open-source models, with significant releases from Alibaba, Xiaomi, and DeepSeek, showcasing advancements in large language models (LLMs) [1][8]. - Alibaba's Qwen-3 series demonstrates substantial performance improvements, achieving 10-30% accuracy gains on various benchmarks and enhancing inference speed by 20-40% [9][12]. - Xiaomi's MiMo model, with 7 billion parameters, excels in reasoning and code generation tasks, outperforming larger proprietary models through innovative training strategies [10][12]. - DeepSeek's Prover-V2-671B model shows strong performance in formal logic reasoning, indicating a strategic focus on specialized AI applications [11][12]. - The report anticipates that as more domestic models are released, the industry may face challenges related to homogenization and competition, pushing for more customized solutions in vertical industries [5]. Summary by Sections Alibaba Qwen-3 - The Qwen-3 series includes models ranging from 1.5 billion to 72 billion parameters, designed for various inference needs, with notable performance enhancements over previous generations [9]. - Deployment costs are significantly lower, requiring only 4 H20 GPUs for full-capacity operation, which is advantageous compared to similar models from OpenAI and Grok [2][12]. Xiaomi MiMo - MiMo's training involved 25 trillion tokens and innovative mechanisms to improve training efficiency, achieving a 2.29x increase in training speed and a 1.96x acceleration in verification processes [10]. DeepSeek-Prover-V2-671B - This model excels in mathematical theorem proving, particularly in formal logic, and serves as a precursor to DeepSeek's upcoming models, reflecting the company's commitment to advancing AI capabilities [11]. Industry Trends - The report suggests that the next phase for open-source models will involve customization based on user data and feedback, aiming to establish long-term barriers and user loyalty in specific industries [5].
整理:4月30日欧盘美盘重要新闻汇总
news flash· 2025-04-30 15:10
Domestic News - The manufacturing Purchasing Managers' Index (PMI) for April is reported at 49.0%, a decrease of 1.5 percentage points from the previous month, indicating a decline in manufacturing activity [3] - The new Private Economy Promotion Law will come into effect on May 20, aiming to support the development of the private sector [4] - The total holdings of gold ETFs in the Chinese market have reached a historical high, as reported by the World Gold Council [6] - The People's Bank of China conducted a 12 billion yuan reverse repurchase operation using a fixed quantity and interest rate bidding method [11] International News - Traders are fully pricing in four rate cuts of 25 basis points by the Federal Reserve by the end of 2025 [1] - Global gold demand in Q1 reached the highest level for a first quarter since 2016, according to the World Gold Council [2] - The U.S. economy has contracted, with a reported GDP decline of 0.3% in the first quarter, marking the first economic shrinkage since 2022 [6]
AI数学天花板来了?DeepSeek新模型低调开源,网友直呼:R2指日可待!
Hua Er Jie Jian Wen· 2025-04-30 12:52
就在所有人都在期待DeepSeek官宣R2大模型之际,公司却出其不意地在"五一"前夕投下了另一枚技术炸弹。 4月30日,DeepSeek在Hugging Face平台上悄然开源了其最新模型——DeepSeek-Prover-V2-671B,一个专注于数学定理证明的大语言模型,专门针 对形式化数学证明任务进行优化。 DeepSeek-Prover-V2-671B使用了DeepSeek-V3架构,参数高达6710亿,采用MoE(混合专家)模式,具有61层Transformer层,7168维隐藏层。 | Hugging Face Q. Search models, datasets, users ... | | Models | ■ Datasets ■ Spaces Posts | Docs | Enterprise | Pricing | VII | Log In Sign Up | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | < deepseek-ai/DeepSeek-Prover-V2-671B = 0 Wke 152 | Follo ...
OpenAI回滚了最新版本的GPT-4o,因ChatGPT“过于谄媚”
虎嗅APP· 2025-04-30 12:21
本文来自微信公众号: 机器之心 ,作者:杨文、Panda,题图来自:AI生成 昨晚,奥特曼在 X 上发了条帖子,大意是由于发现 GPT-4o "过于谄媚"的问题,所以从周一晚上开始回滚 GPT-4o 的最新更新。 免费 ChatGPT 用户已 100% 回滚,付费用户完成回滚后会再次更新。同时,他还透露,团队正在对模型个性进行额外的修复,并将在未来几天分享更 多信息。 就在刚刚,OpenAI 还专门发博客来回应此事,详细解释了事情的经过以及他们如何处理模型"拍马屁"的情况。 OpenAI 也指出,这个问题很重要。ChatGPT"阿谀奉承"的性格影响了大家对它的信任和使用体验。如果它总是说好听、但不真诚的话,就会让人觉得 它不可靠,甚至有些烦。 为了解决大模型过度逢迎的问题,OpenAI 除了撤销最新的 GPT-4o 更新外,还采取了更多措施: 目前,用户可以通过自定义指令等功能,给模型提供具体指示来塑造其行为。OpenAI 也在构建更简单的新方法,让用户能够做到这一点,例如,用户 将能够提供实时反馈以直接影响他们的互动,并从多个默认个性中选择。 优化核心训练技术与系统提示:明确引导模型避免阿谀奉承。 增加更多 ...
扎克伯格最新专访:AI 会在知识工作和编程领域,引发一场巨大的革命
Sou Hu Cai Jing· 2025-04-30 10:02
Core Insights - Meta's CEO Mark Zuckerberg discussed the competitive landscape of AI development, particularly comparing the Llama 4 model with DeepSeek, asserting that Llama 4 offers higher efficiency and broader functionality despite DeepSeek's advancements in specific areas [1][36]. - Meta AI has reached nearly 1 billion monthly users, indicating significant growth and the importance of personalized AI interactions [2][21]. - The company is focusing on developing coding agents that will automate much of the coding process within the next 12 to 18 months, which is expected to increase the demand for human jobs rather than decrease it [1][16]. Model Development - The Llama 4 series includes models like Scout and Maverick, which are designed for efficiency and low latency, supporting multi-modal capabilities [4][41]. - The upcoming Behemoth model will exceed 2 trillion parameters, representing a significant leap in model size and capability [4]. - Meta is committed to open-sourcing its models after internal use, allowing others to benefit from their developments [4][41]. Competitive Landscape - Zuckerberg believes that open-source models are likely to surpass closed-source models in popularity, reflecting a trend towards more accessible AI technologies [5][36]. - The company acknowledges the impressive infrastructure and text processing capabilities of DeepSeek but emphasizes that Llama 4's multi-modal abilities give it a competitive edge [35][36]. - The licensing model for Llama is designed to facilitate collaboration with large companies while ensuring that Meta retains some control over its intellectual property [37][39]. User Interaction and Experience - Meta is exploring how AI can enhance user interactions, particularly through natural dialogue and personalized experiences [14][28]. - The integration of AI into existing applications like WhatsApp is crucial for user engagement, especially in markets outside the U.S. [21]. - The company is focused on creating AI that can assist users in complex social interactions, enhancing the overall user experience [27][28]. Future Directions - Zuckerberg envisions a future where AI seamlessly integrates into daily life, potentially through devices like smart glasses that facilitate constant interaction with AI [14][31]. - The development of AI will not only focus on productivity but also on entertainment and social engagement, reflecting the diverse applications of AI technology [25][26]. - The company is aware of the challenges in ensuring that AI interactions remain healthy and beneficial for users, emphasizing the importance of understanding user behavior [26][27].
实现商业化落地,人形机器人的核心点是上肢还是下肢?
Robot猎场备忘录· 2025-04-30 07:14
温馨提示 : 点击下方图片,查看运营团队2025年最新原创报告(共210页) 说明: 欢迎约稿、刊例合作、行业人士交流 , 行业交流记得先加入 "机器人头条"知识星球 ,后添加( 微信号:lietou100w ) 微信; 若有侵权、改稿请联系编辑运营(微信:li_sir_2020); 人形机器人要实现真正商业化落地是上肢重要还是下肢重要? 人形机器人真正落地实用场景,任务终结点是手臂和手,而小编注意到涉及手臂相关研究极少,是工业机械臂发 展多年,导致人形机器人机械臂结构和相关算法控制已完全成熟,只需要专注于"小脑"上层层面控制?但是参加 展会时,可明显看到人形机器人手臂运动过程中颤颤巍巍、卡顿、僵硬的现状,所 以这是"小脑"层面控制问题, 还是关节间问题? 目前业内对于灵巧手研究已经很多且备受重视,除了人形机器人本体厂商自研外,也出现了专注于灵巧手和触觉 感知研究的初创公司,也是目前人形机器人发展过程中核心卡点之一。 正文: 具身智能机器人是一个复杂的AI+机器人+自动驾驶的系统性学术+工程问题,远期AGI的物理世界载体,受算力、 软件算法、数据、硬件、工程化等多面因素影响;小编往 期文章 : 【原创】人形机 ...
刚刚!OpenAI回滚了最新版本的GPT-4o,因ChatGPT「过于谄媚」
机器之心· 2025-04-30 04:23
| 机器之心报道 | | --- | | 编辑:杨文、Panda | | 昨晚,奥特曼在 X 上发了条帖子,大意是由于发现 GPT-4o 「过于谄媚」的问题,所以从周一晚上开始回滚 GPT-4o 的最新更新。 | | 免费 ChatGPT 用户已 100% 回滚,付费用户完成回滚后会再次更新。同时,他还透露,团队正在对模型个性进行额外的修复,并将在未来几天分享更多信息。 | 就在刚刚,OpenAI 还专门发博客来回应此事,详细解释了事情的经过以及他们如何处理模型「拍马屁」的情况。 优化核心训练技术与系统提示:明确引导模型避免阿谀奉承。 增加更多限制措施:提升诚实性和透明度,这是模型规范中的重要原则。 扩大用户测试与反馈范围:在部署前让更多用户进行测试并提供直接反馈。 持续扩展评估工作:基于模型规范和持续研究,帮助识别出阿谀奉承之外的其他问题。 OpenAI 也指出,这个问题很重要。ChatGPT「阿谀奉承」的性格影响了大家对它的信任和使用体验。如果它总是说好听、但不真诚的话,就会让人觉得它不可 靠,甚至有些烦。 为了解决大模型过度逢迎的问题,OpenAI 除了撤销最新的 GPT-4o 更新外,还采取了更多措施 ...
Meta's LlamaCon was all about undercutting OpenAI
TechCrunch· 2025-04-30 00:15
Group 1 - Meta held its first AI developer conference, LlamaCon, announcing a consumer-facing AI chatbot app and a developer-facing API for Llama models [1] - The releases aim to expand the adoption of Meta's open Llama AI models, with a primary goal of competing against OpenAI [2][5] - The AI chatbot app features a social feed for sharing AI chats and offers personalized responses based on user activity within Meta apps [3] Group 2 - The Llama API simplifies app development by allowing developers to connect to Llama models with a single line of code, reducing reliance on third-party cloud providers [4] - Meta's strategy includes undercutting proprietary AI model providers like OpenAI, with executives previously focused on surpassing OpenAI's GPT-4 [5] - Meta views any AI lab that makes its models openly available as allies against closed model providers, emphasizing the value of open-source models [6][7] Group 3 - Meta's approach may also be influenced by regulatory considerations, as the EU AI Act provides advantages to companies distributing "free and open source" AI systems [7] - The company appears willing to launch AI products that bolster the open model ecosystem, even if it means not delivering the most advanced models [8]