机器之心

Search documents
60年前数学大师没解开的难题,被一位牛津博士生搞定了
机器之心· 2025-05-24 03:13
选自量子杂志 作者:Leila Sloman 机器之心编译 加法,这项我们从幼儿园就掌握的运算,竟然蕴藏着未解之谜。 它是一项简单的运算:我们学到的第一个数学真理便是 1 加 1 等于 2。但加法能够产生的各种模式仍存在很多未解之谜。 在探索这个谜团的过程中,数学家们也希望了解加法能力的极限。自 20 世纪初以来,他们一直在研究 「无和集」(sum-free set) 的性质。 无和集指的是这样一个整数子集:其中任意两个元素的和,不属于这个集合本身。例如,奇数集合就是一个典型的无和集。因为任意两个奇数相加得到偶数,不 在集合内。 自 1965 年起,传奇数学家 Paul Erdős(保罗・爱多士,为现时发表论文数最多的数学家,多达 1525 篇,曾和 511 人合写论文)在一篇论文中提出了一个关于无和 集普遍性的简单问题 : 一个整数集合中,最大的不含任意两数相加结果的子集究竟能有多大? 此后数十年,这个看似简单的问题却困住了无数数学家。 直到今年二月,在 Erdős 提出该问题的六十年后,终于被牛津大学博士生 Benjamin Bedert 破解了。 Bedert 证明了对于任意包含 N 个整数的集合,存在 ...
DeepSeek用的GRPO有那么特别吗?万字长文分析四篇精品论文
机器之心· 2025-05-24 03:13
Core Insights - The article discusses recent advancements in reasoning models, particularly focusing on GRPO and its improved algorithms, highlighting the rapid evolution of AI in the context of reinforcement learning and reasoning [1][2][3]. Group 1: Key Papers and Models - Kimi k1.5 is a newly released reasoning model that employs reinforcement learning techniques and emphasizes long context extension and improved strategy optimization [10][17]. - OpenReasonerZero is the first complete reproduction of reinforcement learning training on a foundational model, showcasing significant results [34][36]. - DAPO explores improvements to GRPO to better adapt to reasoning training, presenting a large-scale open-source LLM reinforcement learning system [48][54]. Group 2: GRPO and Its Characteristics - GRPO is closely related to PPO (Proximal Policy Optimization) and shares similarities with RLOO (REINFORCE Leave One Out), indicating that many leading research works do not utilize GRPO [11][12][9]. - The core understanding is that current RL algorithms are highly similar in implementation, with GRPO being popular but not fundamentally revolutionary [15][6]. - GRPO includes clever modifications specifically for reasoning training rather than traditional RLHF scenarios, focusing on generating multiple answers for reasoning tasks [13][12]. Group 3: Training Techniques and Strategies - Kimi k1.5's training involves supervised fine-tuning (SFT) and emphasizes behavior patterns such as planning, evaluation, reflection, and exploration [23][24]. - The training methods include a sequence strategy that starts with simpler tasks and gradually increases complexity, akin to human learning processes [27][28]. - The paper discusses the importance of data distribution and the quality of prompts in ensuring effective reinforcement learning [22][41]. Group 4: DAPO Improvements - DAPO introduces two distinct clipping hyperparameters to enhance the learning dynamics and efficiency of the model [54][60]. - It also emphasizes dynamic sampling by removing samples with flat rewards from the batch to improve learning speed [63]. - The use of token-level loss rather than per-response loss is proposed to better manage learning dynamics and avoid issues with long responses [64][66]. Group 5: Dr. GRPO Modifications - Dr. GRPO aims to improve learning dynamics by modifying GRPO to achieve stronger performance with shorter generated lengths [76][79]. - The modifications include normalizing advantages across all tokens in a response, which helps in managing the learning signal effectively [80][81]. - The paper highlights the importance of high-quality data engineering in absorbing the effects of these changes, emphasizing the need for a balanced distribution of problem difficulty [82][89].
矩阵乘法新突破!XX^T原来可以更快!RL助力搜索,世界纪录又被提升了5%
机器之心· 2025-05-24 03:13
深圳市大数据研究院、香港中文大学(深圳)研究团队最新研究发现, 这类特殊的矩阵乘法可以进一步加速,并在强化学习与组合优化技术的结合下 发掘出了一种新的算法,节省 5% 的乘法数量。 论文标题: XX t Can Be Faster 论文链接:https://arxiv.org/abs/2505.09814 该成果在国际社交媒体平台 X 引发热烈讨论,并引起 MIT、斯坦福、哈佛及 Google DeepMind 科学家的广泛关注。 背景 矩阵乘法优化堪称计算机科学领域的「珠穆朗玛峰」。自 1969 年 Strassen 算法横空出世以来,这个充满组合爆炸可能性的数学迷宫就持续考验着人类智 慧的边界。 Google DeepMind 为此专门投入四年心血,先后推出 AlphaTensor、AlphaEvolve 等机器学习系统来攻克这一难题。这就像短跑运动员将百米纪录从 9.58 秒推进到 9.57 秒——每个 0.01 秒的突破背后,都是对计算理论极限的重新定义。 (矩阵乘以自身的转置)这类特殊的矩阵乘法广泛存在于各类数据科学的实际应用中,实际应用包括: 这类操作每分钟在全球执行数万亿次,假如能减少该操作的计 ...
Meta CEO X 微软 CEO 对话解读:「蒸馏工厂」为何成为开源的魅力之源?
机器之心· 2025-05-23 15:30
Group 1 - The core discussion at LlamaCon 2025 focused on the transformative impact of AI on the boundaries between documents, applications, and websites, as articulated by Satya Nadella [5][6] - Nadella emphasized that modern AI acts as a "universal converter," understanding user intent and enabling a shift from "tool-oriented computing" to "intent-oriented computing," enhancing user experience [6][7] - Nadella identified the current AI wave as a significant technological platform shift, necessitating a complete overhaul of the technology stack to optimize for AI workloads [7] Group 2 - Nadella noted that approximately 20% to 30% of Microsoft's internal code is now generated by AI, indicating a broad application of AI in software development beyond mere code completion [7][8] - Zuckerberg projected that by 2026, half of Meta's development work will be completed by AI, showcasing the growing reliance on AI in the tech industry [8] - The dialogue also highlighted the strategic value of both open-source and closed-source models, with Nadella advocating for a flexible approach that supports both [9][10] Group 3 - The concept of "distillation factories" was introduced as a key area for future development in the AI ecosystem, with both CEOs agreeing on the importance of infrastructure and toolchains for model distillation [10][11] - Nadella pointed out the trend towards multi-model applications and the necessity of standardized protocols for seamless collaboration among various AI models [10] - Zuckerberg acknowledged Microsoft's unique advantages in supporting multi-model collaboration infrastructure, reinforcing the significance of the "distillation factory" concept [10]
论坛报名已启动,速来锁定席位!解码具身智能模型革命
机器之心· 2025-05-23 06:49
具身智能(Embodied AI)正在以前所未有的速度和具象化的姿态,成为全球科技界最受瞩目的焦点之一。 从波士顿动力(Boston Dynamics)Atlas 机器人令人惊讶的运动能力,到春晚宇树的惊艳亮相,再到李飞飞教授等顶 尖学者对具身智能通用能力的深入探索,这些在物理世界中灵活行动、理解并执行任务的智能体,「通用机器人」的 时代,是不是真的不远了? 这一浪潮的兴起,汇聚了机器人学、计算机视觉、自然语言处理、强化学习等多个领域的最新成果,并加速催化出视 觉-语言大模型(VLM/VLA)等具身大模型技术。 如何让机器智能不仅「看懂」物理世界,更能像人类一样「理解」、「规划」并「操作」,是当下学术和业界共同面 临的巨大挑战和机遇。 张江作为浦东人工智能产业生态集聚地,聚焦具身、 科学、应用前沿技术方向,凭借独特的产业生态优势,构建了人 工智能全产业链。基于此,张江集团携手机器之心共同策划此次论坛,以「具身·无界:智能模型的范式创新与架构革 命」为题,邀请众多具身智能厂商及顶尖技术专家代表,共议具身 AI 模型发展之道。 具身厂商齐聚,顶尖专家同台论道 论坛汇聚具身 AI 领域知名厂商 CEO/CTO,行业 ...
SIGIR 2025 | 解决扩展和迁移难题,华为新加坡提出InstructRAG,提升高达19%
机器之心· 2025-05-23 06:49
Core Viewpoint - The article discusses the InstructRAG framework, which leverages Retrieval-Augmented Generation (RAG) to enhance task planning capabilities of large language models (LLMs) by addressing scalability and transferability challenges [1][2][30]. Group 1: Challenges in Task Planning - Scalability is defined as the ability to expand the instruction graph by combining existing instructions into new sequences, enabling LLMs to tackle tasks without predefined paths [1][2]. - Transferability involves developing technologies that allow models to quickly adapt to new tasks and learn effectively from limited examples [2]. Group 2: InstructRAG Framework Components - The InstructRAG framework consists of three main components: 1. Instruction Graph, which organizes past instruction paths [4]. 2. RL-Agent, a reinforcement learning agent that expands the graph coverage [4]. 3. ML-Agent, a meta-learning agent that enhances task generalization capabilities [4]. Group 3: Instruction Graph - The Instruction Graph is a directed graph that organizes past instruction paths, where nodes represent instruction sets and edges represent tasks [6]. Group 4: RL-Agent Functionality - The RL-Agent operates as a Markov Decision Process (MDP) to select nodes in the instruction graph, effectively exploring its scalability [7]. - It utilizes state, action, reward, and policy learning to optimize the selection of instruction paths [8]. Group 5: ML-Agent Functionality - The ML-Agent enhances transferability by selecting relevant paths from the RL-Agent's candidates and generating prompts for LLMs [9]. - Its training involves pre-training and fine-tuning phases to optimize performance [10][11]. Group 6: Overall Framework and Training - The overall framework includes training, few-shot learning, and testing phases, enhancing scalability through the RL-Agent and transferability through the ML-Agent [13][16]. Group 7: Experimental Results - InstructRAG demonstrated superior performance across multiple datasets, achieving a 19.2% improvement over the best baseline method in various tasks [22][30]. - The framework showed strong generalization capabilities when applied to unseen tasks, maintaining effectiveness with limited examples [23][28]. Group 8: Robustness and Component Importance - InstructRAG exhibited robust performance against noise, with only an 11.1% performance drop at 50% noise, compared to a 27.2% drop for the baseline [25]. - Each component of InstructRAG significantly contributes to its performance, as evidenced by ablation studies [26][27]. Group 9: Future Directions - Future work will focus on further enhancing the generalization capabilities of InstructRAG [30].
CVPR 25 |全面提升视觉感知鲁棒性,生成模型快速赋能三维检测
机器之心· 2025-05-23 04:17
论文第一作者林宏彬来自香港中文大学(深圳)理工学院的Deep Bit 实验室、深圳市未来智联网络研究院,导师为李镇老师。目前实验室的研究方向包括:自动驾 驶、医学成像和分子理解的多模态数据分析和生成等。 论文标题: DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation 论文链接: https://www.arxiv.org/abs/2503.11122 GitHub: https://github.com/Hongbin98/DriveGEN 任务背景 随着新能源汽车产业的持续发展,智能驾驶辅助技术的应用越来越广泛。其中,基于纯视觉的自动驾驶方案只需使用多视角图像进行环境感知与分析,具有 成本低、效率高的优势,因而备受关注。然而在实际应用中,视觉感知模型的泛化能力至关重要。 来自香港中文大学(深圳)等单位的学者们提出了一种名为 DriveGEN 的无训练自动驾驶图像可控生成方法。该方法无需额外训练生成模型,即可实现训 练图像数据的可控扩充,从而以较 ...
以加代乘?华为数学家出手,昇腾算子的高能设计与优化,性能提升30%!
机器之心· 2025-05-23 04:17
Core Viewpoint - The article discusses the rapid advancements in large language models (LLMs) and the challenges they face in inference, particularly regarding speed and energy efficiency. It highlights Huawei's innovative solutions to optimize these models through hardware-software integration, focusing on three key technologies that enhance inference speed and energy efficiency [2][4][11]. Group 1: Key Technologies - AMLA technology transforms complex multiplication into addition operations, significantly increasing chip utilization rates to 71% and improving performance by over 30% in the attention operator [4][5]. - The fusion operator optimization combines multiple operators into a single composite operator, enhancing parallel processing and reducing redundant data movement, leading to substantial performance improvements in model inference [7][9]. - SMTurbo technology enables ultra-low latency memory sharing across 384 cards, achieving sub-microsecond delays and enhancing memory access throughput by over 20% in cross-machine communication scenarios [10][9]. Group 2: Future Developments - Future research on AMLA will focus on optimizing the MLA operator for quantization scenarios, expanding its application [12]. - The fusion operator optimization will explore its application across more model architectures, promoting efficient inference of large language models on Huawei's Ascend hardware [12]. - Load/Store optimization will balance read and write loads, aiming for practical benefits in large batch sizes within Deepseek dispatch and combine scenarios [12].
四位图灵奖掌舵:2025智源大会揭示AI进化新路径
机器之心· 2025-05-23 04:17
2006 年,多伦多大学 Geoffrey Hinton 教授等人提出逐层预训练方法,突破了深层神经网络训练的 技术瓶颈,为深度学习的复兴奠定了基础。 这个初夏 四位图灵奖得主 强化学习作为智能体与环境交互的学习范式,其核心思想早于深度学习兴起。2013 年 DeepMind 提 出的 DQN 已初步实现深度学习与强化学习的结合,而 2016 年 AlphaGo 的成功则将深度学习与强化 学习的融合推向公众视野,显著提升了这一交叉领域的关注度。 2025 年 6 月 6-7 日 中国,北京 与全球创新力量共赴智源大会 即刻报名,探寻 AI 时代的无尽边域 基础理论 在 AI 发展史上,连接主义(以神经网络为代表)与行为主义(以强化学习为代表)虽源自不同理论脉 络,但二者的技术交叉早有端倪。这两条主线原本独立成长、各自发展,如今交织融合,万宗归一,共 同构成了下一代通用人工智能的基石。 6 月 6 日,关于深度学习和强化学习的探讨,将在 2025 智源大会继续开展,如 「双星交汇 」般的时 空对话,总结过往、共探智能之谜的终极答案。 与此同时,推理大模型的兴起、开源生态的加速、具身智能的百花齐放,成为 2025 ...
RSS 2025|物理驱动的世界模型PIN-WM:直接从视觉观测估计物理属性,可用于操作策略学习
机器之心· 2025-05-23 00:01
Core Viewpoint - The article discusses the development of a Physics-Informed World Model (PIN-WM) that enhances the ability of robots to learn non-prehensile manipulation skills and effectively transfer these skills from simulation to real-world applications [2][4][43]. Group 1: Introduction and Background - The research team from National University of Defense Technology, Shenzhen University, and Wuhan University addresses the challenges in robot operation involving complex physical mechanisms such as friction and collision [1]. - The existing simulation environments often have significant discrepancies with real-world physics, complicating the Sim2Real transfer of robot control strategies [1]. Group 2: Methodology - PIN-WM utilizes differentiable physics and rendering to directly identify rigid body physical properties from visual observations, requiring only a small number of task-agnostic interaction trajectories for learning [3][11]. - The team introduces a Physics-Aware Digital Cousins (PADC) approach, which generates variations of the world model by perturbing identified parameters to model potential biases, thereby improving the robustness of strategy learning [3][11]. Group 3: Framework and Process - The framework consists of two main phases: system identification and strategy training, transitioning from real to simulation and back to real [10][12]. - The system identification phase involves estimating rendering and physical properties through multi-view images and interaction videos, optimizing parameters based on rendering loss [12]. Group 4: Experimental Results - The effectiveness of PIN-WM was evaluated through experiments on classic non-prehensile tasks such as "Push" and "Flip," which are sensitive to physical mechanisms [14]. - In simulation experiments, PIN-WM outperformed data-driven methods and other physical parameter identification methods, demonstrating superior generalization and performance in both "Push" and "Flip" tasks [16][17]. - Real-world experiments confirmed the advantages of PIN-WM, showing higher success rates and fewer steps required to complete tasks compared to baseline methods [17][19]. Group 5: Conclusion - The research team successfully demonstrated that PIN-WM significantly enhances the performance of non-prehensile manipulation skills in transferring from simulation to real-world scenarios, marking a notable advancement in robotic learning [43].