reinforcement learning

Search documents
对谈 DeepSeek-Prover 核心作者辛华剑:Multi Agent 天然适合形式化数学 |Best Minds
海外独角兽· 2025-06-12 13:27
嘉宾:辛华剑 访谈:penny Era of Experience 这篇文章中提到:如果要实现 AGI, 构建能完成复杂任务的通用 agent,必须借助"经验"这一媒介,这里的"经验"就是指强化学 习过程中模型和 agent 积累的、人类数据集中不存在的高质量数据。 强化学习是 AGI 的关键解法。从 OpenAI o1 到 DeepSeek R1,我们不断在看到强化学习的潜力:DeepMind AlphaProof 被认为是"经验时代"初露端 倪的一个例子,作为第一个在 IMO 获奖的 AI,AlphaProof 借助 RL 算法自行"做题",积累经验,AlphaProof 的案例表明,在像数学这样人类高水 平知识接近极限的领域,RL 通过互动试错可以突破瓶颈,取得超人类的成果。 以 AlphaProof 为开端,整个数学证明领域也在最近半年迎来了 AI 突破的密集期:除了 AlphaProof ,OpenAI 的 o1 模型在数学推理上展现出了惊 人表现,DeepSeek-Prover 三部曲也在形式化数学证明上不断创造新纪录。 为了理解数学和 AGI 的关系,海外独角兽访谈了 DeepSeek-Prov ...
NVIDIA (NVDA) Conference Transcript
2025-06-11 12:45
Summary of NVIDIA Conference Call - June 11, 2025 Company Overview - **Company**: NVIDIA (NVDA) - **Event**: Conference Call - **Date**: June 11, 2025 Key Industry Insights - **AI Growth**: AI is recognized as the fastest-growing technology in history, with global reach and significant potential for expansion in Europe, particularly in France and the EU [1][18] - **Quantum Computing**: The industry is shifting towards a hybrid model of quantum and classical computing, emphasizing the importance of GPU supercomputers for error correction and data generation [9][12][15] - **Sovereign AI**: The development of AI infrastructure in Europe is seen as crucial, with an estimated $1.5 trillion build-out expected over the coming years [17][18] Core Company Strategies - **Local Infrastructure**: NVIDIA is focusing on building AI factories and supercomputing centers for local consumption in Europe, which will support the region's heavy industry and robotics capabilities [16][17] - **Physical AI Models**: The company is developing multimodal physical AI models that can reason and execute tasks based on prompts, differentiating them from traditional LLMs [19][20] - **Gigawatt Projects**: NVIDIA is involved in multiple gigawatt projects across Europe, with a focus on regional cloud service providers and AI factories supported by government initiatives [24][26] Financial and Operational Insights - **Supply Chain Management**: NVIDIA's supply chain is robust, with the ability to forecast demand and place large orders with suppliers like TSMC and Micron, ensuring timely production of supercomputers [30][32] - **Market Demand**: The company is not limited by critical components but must forecast production accurately to meet the growing demand for AI technologies [30][33] - **Post-Training Opportunities**: NVIDIA sees significant potential in post-training processes, which involve reinforcement learning and human feedback to improve AI models [49][52] Challenges and Risks - **Geopolitical Concerns**: The company acknowledges the importance of local infrastructure due to data privacy and geopolitical issues, particularly in Europe [27][28] - **Dependency on Taiwan**: NVIDIA is actively working to reduce its dependency on Taiwan for chip manufacturing, with plans to build substantial AI supercomputer infrastructure in the United States [64][66] - **China Market**: The company has removed China from its forecasts due to export controls, resulting in a significant revenue drop, but remains optimistic about growth in other markets [70][71] Future Outlook - **AI Integration in Enterprises**: NVIDIA is focused on integrating AI into traditional enterprise IT systems, which presents a substantial market opportunity estimated in the hundreds of billions [96][98] - **Continuous Improvement**: The company emphasizes ongoing software improvements that enhance the performance of its hardware, ensuring long-term value for customers [114][115] - **Ecosystem Development**: NVIDIA is building an ecosystem around its NVLink technology, which is expected to facilitate partnerships and enhance its competitive position in the market [91][92] Conclusion NVIDIA is strategically positioned to capitalize on the rapid growth of AI and quantum computing, with a strong focus on local infrastructure development in Europe, robust supply chain management, and continuous innovation in AI technologies. The company faces challenges related to geopolitical risks and market dependencies but remains optimistic about its growth trajectory and market opportunities.
新“SOTA”推理模型避战Qwen和R1?欧版OpenAI被喷麻了
量子位· 2025-06-11 05:13
然而再一次遭到网友质疑:怎么又不跟最新版Qwen和DeepSeek R1 0528对比? (此前该公司发布Ministral 3B/8B时,声称"始终优于同行",却没有对比Qwen2.5) 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI "欧洲的OpenAI"Mistral AI终于发布了首款推理模型—— Magistral 。 在该模型发布的前几个小时,Mistral AI的CEO Arthur Mensch在接受炉边访谈时声称即将发布的Magistral能够与其他所有竞争对手相抗 衡。 在官方展示的基准测试结果中, DeepSeek-R1 的数据确实不是最新的 (在AIME-25数学测试中,DeepSeek-R1-0528的准确率已经从旧 版的70%提升至87.5%) ,并且比较行列里完全不见 Qwen 的身影。 不过,与同公司初期模型 Mistral Medium 3 相比,该框架在AIME-24上的准确率提升了50%。 此次Magistral发布了两种版本: Magistral Small ——24B参数的开源权重版本,可在Apache 2.0许可下自行部署。 Magistral Medium ...
ETH最新CMDP框架亮相ANYmal四足机器人首次实现与人类羽毛球“过招”
机器人大讲堂· 2025-06-03 10:52
机器人与人类协作最为关键的问题在于如何突破物理约束,提升机器人系统的稳定性和安全性。近日苏黎世联邦理工 学院机器人系统实验室提出了一套CMDP全新框架,该框架通过约束强化学习在减少约束违反、提升系统鲁棒性方面 的明显优势,能够有效提升足式机器人在复杂环境中的运动性能。 | | Reward | Violations per episode | | --- | --- | --- | | PPO (unconstrained) | 24.96 (± 0.67) | 533.44 (± 108.94) | | P30 | 24.13 (± 1.55) | 0.96 (± 1.35) | | N-P30 | 24.13 (± 1.14) | 0.49 (± 0.88) | | PPO-Lagrangian | 23.68 (± 1.87) | 0.99 (± 1.31) | | N-IPO | 24.67 (± 0.84) | 1.33 (± 1.69) | | CRPO | 22.28 (± 1.70) | 0.96 (± 1.22) | | FOCOPS | 22.65 (± 3.02) | 15.82 (+ ...
Claude 4 核心成员:Agent RL,RLVR 新范式,Inference 算力瓶颈
海外独角兽· 2025-05-28 12:14
编译:haozhen 编辑:Siqi 海外独角兽原创编译 转载请注明 Anthropic 在上周五发布了 Claude 4,这是目前最前沿的 Coding 模型,也是最强的 Agentic 模型,可 以连续编程 7 个小时。本文是对 Anthropic 两位核心研究员 Sholto Douglas 和 Trenton Bricken 最新访 谈的编译,其中,Sholto 专注于 RL scaling,Trenton 则在做机制可解释性的研究: • 2025 年在模型训练上,最大的变化是 RL 终于有效,只要有合适的反馈机制,模型就能达到专家级 人类的表现和可靠性; • 今年年底会出现可以替代初级程序员的 Agent,到明年这个时候软件工程类的 Agent 将会在实际任 务中创造价值; • 可验证奖励强化学习 RLVR 的范式已在编程和数学领域得到证明,因为这些领域很容易获得此类清 晰的信号; • 模型自我意识的发展关键在于 reward。因为模型会以某种方式追求 reward,而这种追求会深刻地影 响模型的"人格"和个性,最终带来自我意识; • 让 AI 获得诺贝尔奖比获普利策小说奖更容易,因为要让模型具备像 ...
312条轨迹激发241%性能!上交大与SII开源电脑智能体,超越 Claude 3.7
机器之心· 2025-05-25 03:51
自 Anthropic 推出 Claude Computer Use,打响电脑智能体(Computer Use Agent)的第一枪后,OpenAI 也相继推出 Operator,用强化学习(RL) 算法把电脑智能体的能力推向新高,引发全球范围广泛关注。 业界普遍认为,需要海量的轨迹数据或复杂的强化学习才能实现电脑智能体的水平突破——这可能意味着大量的人工轨迹标注,以及大规模虚拟机环境的构 建,以支撑智能体的学习与优化。 然而,来自上海交通大学和 SII 的最新研究却给出了一个非共识答案: 仅需 312 条人类标注轨迹 ,使用 Claude 3.7 Sonnet 合成更丰富的动作决策,就 能激发模型 241% 的性能,甚至 超越 Claude 3.7 Sonnet extended thinking 模式 ,成为 Windows 系统上开源电脑智能体的 新一代 SOTA 。 312 条轨迹在不同软件上的分布 思维链补全:让「动作」有「思考」的支撑 这一发现传递出一个关键信号: 当前大模型已经具备了使用电脑完成任务的基础能力,其性能瓶颈主要在于长程推理(long-horizon planning)能力的激 发, ...
港大马毅谈智能史:DNA 是最早的大模型,智能的本质是减熵
晚点LatePost· 2025-05-23 07:41
理解智能,并不只是研究者和工程师的课题。 文 丨 程曼祺 刘倩 大模型看起来已具备智能的形式:能陪你聊天,步步思考,做高等数学题,高效地写代码……这对很 多人来说已经足够——足以带来更多研究成果、产品机会、巨额投资和股价攀升。 而马毅是那类觉得不够的人,他于无声处开始提问:智能的本质是什么? 这个问题的题面简单,答案却尚无共识。马毅认为,对智能的理解不应过于表面和短期,而应回到源 头厘清智能诞生和发展的历史。 自 2000 年从伯克利大学博士毕业以来,马毅先后任职于伊利诺伊大学香槟分校(UIUC)、微软亚研 院、上海科技大学、伯克利大学和香港大学,现担任香港大学计算与数据科学学院院长。他和团队提 出的压缩感知技术,到现在还在影响计算机视觉中模式识别领域的发展。 在浩瀚的宇宙里,除了我们正在一手制造却尚不完全可解释的 "机器智能" ,人类所见识过的智能只有 一个大样本:生命。 马毅认为,智能的本质是 "学习"——生命就是智能的载体,从 DNA 出现,到神经系统诞生和寒武纪 物种大爆发,再到人类的语言与数学的诞生,智能有不同的表现形式;但不变的是,智能都是在学习 外部世界的知识与规律,从而进行预测,使知识可以为我 ...
Pony Ai(PONY) - 2025 Q1 - Earnings Call Transcript
2025-05-20 13:00
Pony AI (PONY) Q1 2025 Earnings Call May 20, 2025 08:00 AM ET Speaker0 Ladies and gentlemen, thank you for standing by, and welcome to PonyAI Inc. First Quarter twenty twenty five Earnings Conference Call. At this time, all participants are in listen only mode. After the management's prepared remarks, there will be a question and answer session. As a reminder, today's conference call is being recorded and a webcast replay will be available on the company's Investor Relations website at ir.ponyai.com. I will ...
Unleashing the Power of Reasoning Models
DDN· 2025-05-15 19:50
Today I want to talk about building the future with design matters and want to talk about this kind of insights and future trends as well for this year. I want to focus on how we solve the customer's problem and less about ourself. So I want to start off with something huge because for a lot of us we know about AGI or artificial general intelligence.I think it's basically means that um we want to have AI that's uh achieving the the level of intelligence comparable to human and also maybe even surpass human ...
全球闲置算力训个模型,性能媲美R1,老黄天塌了!Karpathy曾投资它
量子位· 2025-05-13 04:45
白交 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 一夜之间,老黄天塌了(doge)。 全球首个分布式RL训练模型 INTELLECT-2 发布,它仅通过整合全球闲置或分散的计算资源,就完成了模型的强化学习训练,训练成本大大 降低。 其模型性能与DeepSeek-R1媲美! 一旦范式成立,这也就意味RL训练摆脱了对集中式算力的依赖,世界上任何一个人都可以参与到模型训练当中,大公司垄断算力时代可能就 此终结。 Just like this~算力来算力来,算力从四面八方来。 此模型版本有19个人/机构提供了算力资源支持(源自模型回答,还包括它自己) 除了贡献算力,还有不少大佬愿意投钱,包括不限于Karpathy大神、FlashAttention作者Tri Dao大神、HuggingFace联创兼CEO Clem Delangue等等。 据团队成员介绍,他们从编写模型强化学习框架 prime-rl ,到今天发布大概 只用了两个月时间 。 目前基础设施已到位,并且经过验证,超过那些先进实验室只是时间问题。 (比如OpenAI?) 有人已经开始断言:未来的顶级开源模型将以分布式方式进行训练。 INTELLEC ...