机器之心

Search documents
让强化学习快如闪电:FlashRL一条命令实现极速Rollout,已全部开源
机器之心· 2025-08-12 09:51
Core Viewpoint - The article discusses the development and implementation of FlashRL, an open-source reinforcement learning solution that utilizes quantized rollouts without sacrificing downstream performance, addressing the challenges of rollout-training mismatch through the introduction of Truncated Importance Sampling (TIS) [4][16][37]. Group 1: DAPO and Rollout Challenges - DAPO, developed by Tsinghua AIR and ByteDance, is an open-source SOTA system for large-scale LLM reinforcement learning, achieving a score of 50 on the AIME 2024 benchmark with the Qwen2.5-32B model [1]. - The research team identified that rollout generation is a major bottleneck in reinforcement learning training, consuming approximately 70% of total training time [3]. - The application of 8-bit quantization during rollout generation, combined with TIS technology, significantly accelerates the process while maintaining downstream performance [3][4]. Group 2: FlashRL Implementation - FlashRL is the first open-source reinforcement learning implementation that applies INT8/FP8 during the rollout phase, achieving performance parity with BF16 without any performance loss [4][15]. - The introduction of TIS mitigates the rollout-training mismatch, allowing quantized rollout training to achieve performance levels comparable to BF16 rollout training, and even surpassing naive BF16 rollout training [16][37]. - FlashRL supports online quantization and has been integrated with existing inference engines like vLLM to enhance their capabilities for models with parameter updates [22]. Group 3: Performance and Acceleration - FlashRL's INT8 rollout can provide up to 1.7 times throughput improvement while retaining the advantages of reinforcement learning [23]. - In standard environments, the acceleration observed with 8-bit quantization is more pronounced in larger models, with a speedup of up to 1.75 times for the 32B model compared to BF16 [29]. - In memory-constrained environments, INT8 quantization can lead to over 3 times speedup in generation speed, highlighting its potential for larger models [34]. Group 4: Validation and Usage - The effectiveness of FlashRL was validated in training the DAPO-32B model, demonstrating that INT8 rollout significantly improves training speed without compromising accuracy on the AIME benchmark [36][37]. - FlashRL can be easily implemented with a single command, allowing users to integrate it into their RL training without code modifications [41].
从物竞天择到智能进化,首篇自进化智能体综述的ASI之路
机器之心· 2025-08-12 09:51
Core Insights - The article discusses the limitations of static large language models (LLMs) and introduces the concept of self-evolving agents as a new paradigm in artificial intelligence [2] - A comprehensive review has been published by researchers from Princeton University and other top institutions to establish a unified theoretical framework for self-evolving agents, aiming to pave the way for artificial general intelligence (AGI) and artificial superintelligence (ASI) [2][32] Definition and Framework - The review provides a formal definition of self-evolving agents, laying a mathematical foundation for research and discussion in the field [5] - It constructs a complete framework for analyzing and designing self-evolving agents based on four dimensions: What, When, How, and Where [8] What to Evolve? - The four core pillars for self-improvement within the agent system are identified: Models, Context, Tools, and Architecture [11] - Evolution can occur at two levels for models: optimizing decision policies and accumulating experience through interaction with the environment [13] - Context evolution involves dynamic management of memory and automated optimization of prompts [13] - Tools evolution includes the creation of new tools, mastery of existing tools, and efficient management of tool selection [13] - Architecture evolution can target both single-agent and multi-agent systems to optimize workflows and collaboration [14] When to Evolve? - Evolution timing determines the relationship between learning and task execution, categorized into two main modes: intra-test-time and inter-test-time self-evolution [17] How to Evolve? - Intra-test-time self-evolution occurs during task execution, allowing agents to adapt in real-time [20] - Inter-test-time self-evolution happens after task completion, where agents iterate on their capabilities based on accumulated experiences [20] - Evolution can be driven by various methodologies, including reward-based evolution, imitation learning, and population-based methods [21][22] Where to Evolve? - Self-evolving agents can evolve in general domains to enhance versatility or specialize in specific domains such as coding, GUI interaction, finance, medical applications, and education [25] Evaluation and Future Directions - The review emphasizes the need for dynamic evaluation metrics for self-evolving agents, focusing on adaptability, knowledge retention, generalization, efficiency, and safety [28] - Future challenges include developing personalized AI agents, enhancing generalization and cross-domain adaptability, ensuring safety and controllability, and exploring multi-agent ecosystems [32]
身家25亿刀,是四家公司创始人,这位伯克利教授还在给本科生上课
机器之心· 2025-08-12 07:34
机器之心编译 编辑:泽南 一直在探索,一直能搞定。 「我认为他是我们这个时代最优秀的计算机科学家之一。他是一位真正的智者,不仅懂得如何将学术研究与商业系统相结合,更懂得如何构建整个社会的人类生 态系统和技术进步,」加州大学圣巴巴拉分校(UCSB)教授 William Wang 说道。 在 AI 领域里,有很多学者投身工业领域获得了成功,不过像加州大学伯克利分校(UC Berkeley)教授 Ion Stoica 那样,能一边教书,一边多次创业成功,「打出 品牌」走出一条正循环道路的,可能还不多见。 Ion Stoica 教授为人们所知的身份包括伯克利教授、ACM Fellow,也包括 Databricks、Anyscale、LMArena、Conviva 等公司的联合创始人。他的经历最近得到了 《福布斯》的报道。 在最近火热的人工智能领域,我们提起大模型水平的比对,肯定绕不开 ChatBot Arena,这个平台就是 Ion Stoica 和他的学生们创办的。据说它最初设立的原因是用 作给伯克利开源的 Vicuna 模型和斯坦福的 Alpaca 打擂台。 如今,ChatBot Arena 是一个托管了 400 ...
商汤王晓刚:世界模型将加快AI从数字空间进入物理世界,「悟能」想做那个桥梁
机器之心· 2025-08-12 07:34
Core Viewpoint - The article discusses the emergence of embodied intelligence and the significance of the "world model" as a core component in advancing AI towards human-like intelligence, highlighting the competitive landscape in the AI industry as it evolves towards embodied intelligence [1][2]. Industry Developments - Major companies like Google, Huawei, and ByteDance are launching various embodied intelligence platforms and models, indicating a rapid evolution in this field [3]. - SenseTime, leveraging its expertise in computer vision and multi-modal large models, aims to empower the industry through its "Wuneng" embodied intelligence platform, which integrates years of technological accumulation [3][5]. Technical Challenges - The industry faces challenges such as data scarcity, difficulty in large-scale production, and the need for generalization in embodied intelligence applications [5][13]. - The reliance on computer vision expertise is seen as a potential solution to enhance the learning of world models and improve the capabilities of embodied intelligence [14]. World Model Significance - The world model is recognized as a crucial element for predicting and planning in autonomous systems, enabling robots to interact intelligently with their environments [12][17]. - SenseTime's "Kaigu" world model is designed to provide extensive data and facilitate simulation-based learning, significantly reducing data collection costs [17][20]. Platform Features - The "Wuneng" platform offers a comprehensive approach by combining first-person and third-person perspectives for robot learning, enhancing the understanding of robot behavior [27][29]. - The platform aims to address the data challenges in the industry by providing synthetic data and facilitating the development of various robotic applications [26][31]. Future Implications - As embodied intelligence matures, it is expected to transform human-robot interactions and create new social networks involving robots, enhancing their roles in daily life [36][37]. - The integration of embodied intelligence into common environments like homes and workplaces is anticipated to unlock significant value and functionality [39].
LLM总是把简单任务复杂化,Karpathy无语:有些任务无需那么多思考
机器之心· 2025-08-12 03:10
机器之心报道 编辑:冷猫 随着推理大模型和思维链的出现与普及,大模型具备了「深度思考」的能力,不同任务的泛用性得到了很大的提高。 借助思维链,大模型能够对任务进行深入分析,完成任务规划与拆解,从而胜任长周期、复杂度高的工作。同时,我们也能更直观地了解模型的推理与分析过 程,从中发现执行环节中的问题,并有针对性地调整指令,以更高效地完成目标。 可以说,有了「深度思考」的推理模型,才有了现在拥有多种辅助功能与自主能力的 AI 智能体。 这不,AI 领域的大牛 Andrej Karpathy 也感觉到不对劲,发了长文推来指出这个令人无语的现象。 Karpathy 说,「 LLM 在默认状态下正变得比我日常使用需求更具『自主代理(Agentic)』倾向 ,甚至有些超出了我的平均使用场景」。 最明显的的确是编码任务,模型现在往往会进行较长时间的推理,倾向于在整个代码库中列出并搜索(grep)文件,会反复进行网络搜索,对一些在开发中、且明 显并不完整的代码里极少出现的边缘情况过度分析、过度思考,甚至在非常简单的查询中,也常常需要几分钟后才返回结果。 尤其是在简单的任务中,比如在运行脚本前快速检查索引错误或其他低级错误,根 ...
东方理工·甬江论坛|新大学、新使命,邀你共启未来
机器之心· 2025-08-12 03:10
Core Viewpoint - The Eastern Institute of Technology (EIT) in Ningbo is hosting the 2025 Yongriver Forum to attract outstanding scholars for interdisciplinary academic exchanges and to enhance its research capabilities [4][5]. Group 1: Forum Details - The Yongriver Forum will take place on November 8-9, 2025, and aims to foster academic collaboration among scholars from various fields [3][4]. - EIT is focusing on four major disciplinary clusters: Science, Engineering, Information Technology, and Business Management, with an emphasis on cutting-edge interdisciplinary fields [7][8]. Group 2: Recruitment and Benefits - Applicants for academic positions must hold a Ph.D., have published in top-tier journals, and possess strong communication skills in both Chinese and English [10][11]. - EIT offers competitive salaries, research startup funding, and comprehensive benefits including housing allowances and high-end medical insurance [10][11]. Group 3: Application Process - Interested applicants can apply by scanning a QR code or clicking a link, with a submission deadline of October 20, 2025 [14][15]. - Required application materials include a CV, research statement, teaching statement, and contact information for references [16][17]. Group 4: Institutional Overview - EIT is a newly established research university supported by both private and public funding, focusing on fundamental research and technological innovation [20][21]. - Since its inception in 2020, the Yongriver Forum has successfully recruited over 40 high-level talents, contributing significantly to the university's faculty development [23]. Group 5: Research Achievements - EIT has signed contracts with 100 academic leaders, including 16 academicians and 52 high-level national talents, with a strong emphasis on international experience among faculty [25][26]. - The university has published 524 papers in top-tier journals and secured over 2.37 billion RMB in competitive research funding [26]. Group 6: Undergraduate Program - EIT will commence its first undergraduate admissions in 2025, offering four majors aligned with future development needs [28][29]. - The first cohort will consist of 74 students, with admission scores ranging from 656 to 691 [33][34]. Group 7: Strategic Partnerships - EIT has established strategic partnerships with 12 international universities and 24 domestic institutions, focusing on resource sharing and collaborative research [31].
ICCV 2025 | 小红书AIGC团队提出图像和视频换脸新算法DynamicFace
机器之心· 2025-08-12 03:10
本论文主要作者来自小红书 AIGC 团队(Dynamic-X-Lab),Dynamic‑X‑LAB 是一个专注于 AIGC 领域的 研究团队,致力于推动姿态驱动的人像生成与视频动画技术。他们以高质量、高可控性的生成模型为核 心,围绕文生图(t2i)、图像生成(i2i)、图像转视频(i2v)和风格迁移加速等方向展开研究,并通过完 整的开源方案分享给开发者与研究者社区。 近年来,扩散模型在图像与视频合成领域展现出前所未有的生成能力,为人脸生成与编辑技术按下了加 速键。特别是一张静态人脸驱动任意表情、姿态乃至光照的梦想,正在走向大众工具箱,并在三大场景 展现巨大潜力: 人脸视频生成的核心难题在于,如何在根据参考图像和外部动作序列,严谨地保持源参考人脸身份特征 不被损伤的同时,还要维持目标人脸动作的一致性。现有方法在追求真实动态表现时,通常会遭遇以下 三大挑战: 小红书提出 DynamicFace,让视频人脸交换迈入「电影级」工业流水线! 方法介绍 论文标题:DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using ...
是「福尔摩斯」,也是「列文虎克」,智谱把OpenAI藏着掖着的视觉推理能力开源了
机器之心· 2025-08-12 03:10
Core Viewpoint - The article discusses the capabilities and applications of the open-source visual reasoning model GLM-4.5V, highlighting its advanced image recognition, reasoning abilities, and potential use cases in various fields [6][11][131]. Group 1: Model Capabilities - GLM-4.5V demonstrated strong visual reasoning skills by accurately identifying locations from images, outperforming 99.99% of human players in a global game [9][10]. - The model can analyze complex images and videos, providing detailed insights and summaries, which indicates its potential as a GUI agent application [10][11]. - It excels in recognizing and interpreting visual elements, even in challenging scenarios such as visual illusions and occlusions [19][20][54]. Group 2: Practical Applications - GLM-4.5V can accurately predict geographical locations from images, providing detailed location data in JSON format [21][27]. - The model's ability to read and interpret complex documents, including charts and graphs, enhances its utility for users needing local processing without cloud dependency [101][109]. - It can assist in various tasks, such as coding, video summarization, and document analysis, making it a versatile tool for developers and researchers [58][71][128]. Group 3: Technical Specifications - GLM-4.5V features 106 billion total parameters and supports 64K multi-modal long contexts, enhancing its processing capabilities [127][128]. - The model employs advanced techniques such as 2D-RoPE and 3D-RoPE for improved image and video processing, showcasing its technical sophistication [127][128]. - Its training involved a three-phase strategy, including pre-training, supervised fine-tuning, and reinforcement learning, which contributed to its state-of-the-art performance in various benchmarks [128][130]. Group 4: Industry Impact - The open-source nature of GLM-4.5V allows for greater transparency and customization, enabling developers to tailor the model to specific business needs [131][132]. - The shift from performance benchmarks to real-world applications signifies a growing emphasis on practical utility in AI development, with GLM-4.5V positioned as a foundational model for various industries [131][132]. - This model represents an opportunity for developers to collaboratively shape the future of AI, moving beyond mere competition to creating real-world value [133].
Lumina-mGPT 2.0:自回归模型华丽复兴,媲美顶尖扩散模型
机器之心· 2025-08-12 00:15
辑、可控生成和密集预测在内的广泛任务。 本文第一作者辛毅为南京大学 & 上海创智学院博士生,现于上海人工智能实验室实习,研究方向为图像 / 视频生成、多模态生成与理解统一等。通讯作者为上海 人工智能实验室青年科学家 — 高鹏。本文其他作者来自上海人工智能实验室、香港中文大学、上海交通大学、上海创智学院、浙江工业大学等。 核心技术与突破 完全独立的训练架构 不同于依赖预训练权重的传统方案,Lumina-mGPT 2.0 采用纯解码器 Transformer 架构,从参数初始化开始完全独立训练。这带来三大优势:架构设计不受限制 (提供了 20 亿和 70 亿参数两个版本)、规避授权限制(如 Chameleon 的版权问题)、减少预训练模型带来的固有偏差。 上海人工智能实验室等团队提出Lumina-mGPT 2.0 —— 一款独立的、仅使用解码器的自回归模型,统一了包括文生图、图像对生成、主体驱动生成、多轮图像编 论文标题:Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling 论文链接:arxiv.org/pdf/2507.17801 GitHub 地 ...
刚刚,OpenAI拿下IOI金牌,仅次于前五名人类选手!参赛推理模型才夺得IMO金牌
机器之心· 2025-08-12 00:15
| 机器之心报道 | | --- | 编辑:杜伟 IOI 2025(即第 37 届国际信息学奥林匹克)在玻利维亚的苏克雷举行,7 月 27 日正式开幕,并已于 8 月 3 日落下了帷幕。在此次赛事中,中国队大获全胜,全员 金牌夺冠。 而就在不久前,OpenAI 刚刚在 IMO(国际数学奥林匹克竞赛)2025 中拿到了金牌级别的成绩。 一觉醒来,OpenAI 的大模型又完成了一项壮举! 在全球顶级编程赛事之一 ——2025 年国际信息学奥林匹克(IOI)中, OpenAI 的推理模型取得了足以摘得金牌的高分,并在 AI 参赛者中排名第一 ! 在比赛中,OpenAI 参加了 IOI 的在线 AI 赛道,在 330 位人类参赛者中, 所用推理模型的得分只落后于 5 位人类选手 ,拿下了 AI 参赛者中的 No 1。 与人类选手一样,OpenAI 遵守了 5 小时答题时间和 50 次提交的限制。 同样地,OpenAI 没有使用互联网或 RAG(检索增强生成),仅能访问一个基础的终端工具。 | Rank | First Name | Last Name | ID | Team | SO ... | tri ... | ...