Workflow
机器之心
icon
Search documents
让强化学习快如闪电:FlashRL一条命令实现极速Rollout,已全部开源
机器之心· 2025-08-12 09:51
Core Viewpoint - The article discusses the development and implementation of FlashRL, an open-source reinforcement learning solution that utilizes quantized rollouts without sacrificing downstream performance, addressing the challenges of rollout-training mismatch through the introduction of Truncated Importance Sampling (TIS) [4][16][37]. Group 1: DAPO and Rollout Challenges - DAPO, developed by Tsinghua AIR and ByteDance, is an open-source SOTA system for large-scale LLM reinforcement learning, achieving a score of 50 on the AIME 2024 benchmark with the Qwen2.5-32B model [1]. - The research team identified that rollout generation is a major bottleneck in reinforcement learning training, consuming approximately 70% of total training time [3]. - The application of 8-bit quantization during rollout generation, combined with TIS technology, significantly accelerates the process while maintaining downstream performance [3][4]. Group 2: FlashRL Implementation - FlashRL is the first open-source reinforcement learning implementation that applies INT8/FP8 during the rollout phase, achieving performance parity with BF16 without any performance loss [4][15]. - The introduction of TIS mitigates the rollout-training mismatch, allowing quantized rollout training to achieve performance levels comparable to BF16 rollout training, and even surpassing naive BF16 rollout training [16][37]. - FlashRL supports online quantization and has been integrated with existing inference engines like vLLM to enhance their capabilities for models with parameter updates [22]. Group 3: Performance and Acceleration - FlashRL's INT8 rollout can provide up to 1.7 times throughput improvement while retaining the advantages of reinforcement learning [23]. - In standard environments, the acceleration observed with 8-bit quantization is more pronounced in larger models, with a speedup of up to 1.75 times for the 32B model compared to BF16 [29]. - In memory-constrained environments, INT8 quantization can lead to over 3 times speedup in generation speed, highlighting its potential for larger models [34]. Group 4: Validation and Usage - The effectiveness of FlashRL was validated in training the DAPO-32B model, demonstrating that INT8 rollout significantly improves training speed without compromising accuracy on the AIME benchmark [36][37]. - FlashRL can be easily implemented with a single command, allowing users to integrate it into their RL training without code modifications [41].
从物竞天择到智能进化,首篇自进化智能体综述的ASI之路
机器之心· 2025-08-12 09:51
Core Insights - The article discusses the limitations of static large language models (LLMs) and introduces the concept of self-evolving agents as a new paradigm in artificial intelligence [2] - A comprehensive review has been published by researchers from Princeton University and other top institutions to establish a unified theoretical framework for self-evolving agents, aiming to pave the way for artificial general intelligence (AGI) and artificial superintelligence (ASI) [2][32] Definition and Framework - The review provides a formal definition of self-evolving agents, laying a mathematical foundation for research and discussion in the field [5] - It constructs a complete framework for analyzing and designing self-evolving agents based on four dimensions: What, When, How, and Where [8] What to Evolve? - The four core pillars for self-improvement within the agent system are identified: Models, Context, Tools, and Architecture [11] - Evolution can occur at two levels for models: optimizing decision policies and accumulating experience through interaction with the environment [13] - Context evolution involves dynamic management of memory and automated optimization of prompts [13] - Tools evolution includes the creation of new tools, mastery of existing tools, and efficient management of tool selection [13] - Architecture evolution can target both single-agent and multi-agent systems to optimize workflows and collaboration [14] When to Evolve? - Evolution timing determines the relationship between learning and task execution, categorized into two main modes: intra-test-time and inter-test-time self-evolution [17] How to Evolve? - Intra-test-time self-evolution occurs during task execution, allowing agents to adapt in real-time [20] - Inter-test-time self-evolution happens after task completion, where agents iterate on their capabilities based on accumulated experiences [20] - Evolution can be driven by various methodologies, including reward-based evolution, imitation learning, and population-based methods [21][22] Where to Evolve? - Self-evolving agents can evolve in general domains to enhance versatility or specialize in specific domains such as coding, GUI interaction, finance, medical applications, and education [25] Evaluation and Future Directions - The review emphasizes the need for dynamic evaluation metrics for self-evolving agents, focusing on adaptability, knowledge retention, generalization, efficiency, and safety [28] - Future challenges include developing personalized AI agents, enhancing generalization and cross-domain adaptability, ensuring safety and controllability, and exploring multi-agent ecosystems [32]
身家25亿刀,是四家公司创始人,这位伯克利教授还在给本科生上课
机器之心· 2025-08-12 07:34
Core Insights - The article highlights the achievements and contributions of Ion Stoica, a prominent computer science professor at UC Berkeley, who has successfully bridged academic research and commercial ventures in the AI field [2][4][6]. Company and Industry Summary - Ion Stoica is a co-founder of several successful startups, including Databricks, Anyscale, and Conviva, with Databricks currently valued at $62 billion and projected to reach an annual revenue of $3.7 billion by July [10][15]. - ChatBot Arena, now known as LMArena, was established by Stoica and his students to compare AI models, hosting over 400 models and receiving more than 3.5 million user votes [4][6]. - Conviva, founded in 2006, focuses on streaming video quality and has a valuation of $300 million as of its last funding round in 2017 [9][10]. - Anyscale, co-founded by Stoica, has raised $260 million and is valued at $1.4 billion, providing a platform for developers to scale AI applications [15][16]. - Stoica's research has been supported by major tech companies like Microsoft, NVIDIA, Google, and IBM, showcasing the strong industry-academic collaboration [6][14]. - The article emphasizes the importance of open-source projects in Stoica's work, with both Databricks' Spark and Anyscale's Ray being open-source initiatives [16][19]. - Stoica's influence extends to his students, with over 80 benefiting from his mentorship, many of whom are now employed in academia or startups, including at Databricks [20][22].
商汤王晓刚:世界模型将加快AI从数字空间进入物理世界,「悟能」想做那个桥梁
机器之心· 2025-08-12 07:34
Core Viewpoint - The article discusses the emergence of embodied intelligence and the significance of the "world model" as a core component in advancing AI towards human-like intelligence, highlighting the competitive landscape in the AI industry as it evolves towards embodied intelligence [1][2]. Industry Developments - Major companies like Google, Huawei, and ByteDance are launching various embodied intelligence platforms and models, indicating a rapid evolution in this field [3]. - SenseTime, leveraging its expertise in computer vision and multi-modal large models, aims to empower the industry through its "Wuneng" embodied intelligence platform, which integrates years of technological accumulation [3][5]. Technical Challenges - The industry faces challenges such as data scarcity, difficulty in large-scale production, and the need for generalization in embodied intelligence applications [5][13]. - The reliance on computer vision expertise is seen as a potential solution to enhance the learning of world models and improve the capabilities of embodied intelligence [14]. World Model Significance - The world model is recognized as a crucial element for predicting and planning in autonomous systems, enabling robots to interact intelligently with their environments [12][17]. - SenseTime's "Kaigu" world model is designed to provide extensive data and facilitate simulation-based learning, significantly reducing data collection costs [17][20]. Platform Features - The "Wuneng" platform offers a comprehensive approach by combining first-person and third-person perspectives for robot learning, enhancing the understanding of robot behavior [27][29]. - The platform aims to address the data challenges in the industry by providing synthetic data and facilitating the development of various robotic applications [26][31]. Future Implications - As embodied intelligence matures, it is expected to transform human-robot interactions and create new social networks involving robots, enhancing their roles in daily life [36][37]. - The integration of embodied intelligence into common environments like homes and workplaces is anticipated to unlock significant value and functionality [39].
LLM总是把简单任务复杂化,Karpathy无语:有些任务无需那么多思考
机器之心· 2025-08-12 03:10
机器之心报道 编辑:冷猫 随着推理大模型和思维链的出现与普及,大模型具备了「深度思考」的能力,不同任务的泛用性得到了很大的提高。 借助思维链,大模型能够对任务进行深入分析,完成任务规划与拆解,从而胜任长周期、复杂度高的工作。同时,我们也能更直观地了解模型的推理与分析过 程,从中发现执行环节中的问题,并有针对性地调整指令,以更高效地完成目标。 可以说,有了「深度思考」的推理模型,才有了现在拥有多种辅助功能与自主能力的 AI 智能体。 这不,AI 领域的大牛 Andrej Karpathy 也感觉到不对劲,发了长文推来指出这个令人无语的现象。 Karpathy 说,「 LLM 在默认状态下正变得比我日常使用需求更具『自主代理(Agentic)』倾向 ,甚至有些超出了我的平均使用场景」。 最明显的的确是编码任务,模型现在往往会进行较长时间的推理,倾向于在整个代码库中列出并搜索(grep)文件,会反复进行网络搜索,对一些在开发中、且明 显并不完整的代码里极少出现的边缘情况过度分析、过度思考,甚至在非常简单的查询中,也常常需要几分钟后才返回结果。 尤其是在简单的任务中,比如在运行脚本前快速检查索引错误或其他低级错误,根 ...
东方理工·甬江论坛|新大学、新使命,邀你共启未来
机器之心· 2025-08-12 03:10
Core Viewpoint - The Eastern Institute of Technology (EIT) in Ningbo is hosting the 2025 Yongriver Forum to attract outstanding scholars for interdisciplinary academic exchanges and to enhance its research capabilities [4][5]. Group 1: Forum Details - The Yongriver Forum will take place on November 8-9, 2025, and aims to foster academic collaboration among scholars from various fields [3][4]. - EIT is focusing on four major disciplinary clusters: Science, Engineering, Information Technology, and Business Management, with an emphasis on cutting-edge interdisciplinary fields [7][8]. Group 2: Recruitment and Benefits - Applicants for academic positions must hold a Ph.D., have published in top-tier journals, and possess strong communication skills in both Chinese and English [10][11]. - EIT offers competitive salaries, research startup funding, and comprehensive benefits including housing allowances and high-end medical insurance [10][11]. Group 3: Application Process - Interested applicants can apply by scanning a QR code or clicking a link, with a submission deadline of October 20, 2025 [14][15]. - Required application materials include a CV, research statement, teaching statement, and contact information for references [16][17]. Group 4: Institutional Overview - EIT is a newly established research university supported by both private and public funding, focusing on fundamental research and technological innovation [20][21]. - Since its inception in 2020, the Yongriver Forum has successfully recruited over 40 high-level talents, contributing significantly to the university's faculty development [23]. Group 5: Research Achievements - EIT has signed contracts with 100 academic leaders, including 16 academicians and 52 high-level national talents, with a strong emphasis on international experience among faculty [25][26]. - The university has published 524 papers in top-tier journals and secured over 2.37 billion RMB in competitive research funding [26]. Group 6: Undergraduate Program - EIT will commence its first undergraduate admissions in 2025, offering four majors aligned with future development needs [28][29]. - The first cohort will consist of 74 students, with admission scores ranging from 656 to 691 [33][34]. Group 7: Strategic Partnerships - EIT has established strategic partnerships with 12 international universities and 24 domestic institutions, focusing on resource sharing and collaborative research [31].
ICCV 2025 | 小红书AIGC团队提出图像和视频换脸新算法DynamicFace
机器之心· 2025-08-12 03:10
Core Viewpoint - The article discusses the innovative DynamicFace method for high-quality and consistent face swapping in images and videos, leveraging diffusion models and composable 3D facial priors to enhance identity and motion consistency in generated content [6][7][21]. Group 1: Technology Overview - DynamicFace integrates diffusion models with composable 3D facial priors to achieve high-quality face swapping, addressing challenges in maintaining identity and motion consistency [7][9]. - The method explicitly decouples facial conditions into five independent representations: identity, pose, expression, lighting, and background, enhancing the accuracy of generated images and videos [9][10]. - A dual-stream injection mechanism is designed to ensure high fidelity in identity retention, utilizing a Face Former for global identity consistency and a ReferenceNet for fine-grained texture transfer [10][11]. Group 2: Industry Applications - In the film industry, directors can use a single still image of an actor to create real-time digital doubles for complex expressions and lighting adjustments, reducing the need for costly reshoots [6]. - The gaming industry benefits from personalized character creation, allowing players to upload selfies to generate customizable 3D avatars with realistic expressions and movements [6]. - In social media and e-commerce, content creators can produce various promotional videos using a single brand image, while virtual influencers can maintain consistent appearances during live streams [6]. Group 3: Performance Comparison - DynamicFace outperforms existing state-of-the-art methods in both identity consistency and motion consistency, achieving a 99.20% ID retrieval rate and significantly lower pose and expression discrepancies compared to competitors [23][24]. - The quantitative experiments conducted on FaceForensics++ and FFHQ datasets demonstrate DynamicFace's superior performance in maintaining high-quality facial generation while ensuring motion accuracy [24]. Group 4: Future Implications - The article suggests that the refined decoupling approach of DynamicFace could inspire future work in controllable generation, potentially leading to advancements in various applications within the digital content creation space [28].
是「福尔摩斯」,也是「列文虎克」,智谱把OpenAI藏着掖着的视觉推理能力开源了
机器之心· 2025-08-12 03:10
Core Viewpoint - The article discusses the capabilities and applications of the open-source visual reasoning model GLM-4.5V, highlighting its advanced image recognition, reasoning abilities, and potential use cases in various fields [6][11][131]. Group 1: Model Capabilities - GLM-4.5V demonstrated strong visual reasoning skills by accurately identifying locations from images, outperforming 99.99% of human players in a global game [9][10]. - The model can analyze complex images and videos, providing detailed insights and summaries, which indicates its potential as a GUI agent application [10][11]. - It excels in recognizing and interpreting visual elements, even in challenging scenarios such as visual illusions and occlusions [19][20][54]. Group 2: Practical Applications - GLM-4.5V can accurately predict geographical locations from images, providing detailed location data in JSON format [21][27]. - The model's ability to read and interpret complex documents, including charts and graphs, enhances its utility for users needing local processing without cloud dependency [101][109]. - It can assist in various tasks, such as coding, video summarization, and document analysis, making it a versatile tool for developers and researchers [58][71][128]. Group 3: Technical Specifications - GLM-4.5V features 106 billion total parameters and supports 64K multi-modal long contexts, enhancing its processing capabilities [127][128]. - The model employs advanced techniques such as 2D-RoPE and 3D-RoPE for improved image and video processing, showcasing its technical sophistication [127][128]. - Its training involved a three-phase strategy, including pre-training, supervised fine-tuning, and reinforcement learning, which contributed to its state-of-the-art performance in various benchmarks [128][130]. Group 4: Industry Impact - The open-source nature of GLM-4.5V allows for greater transparency and customization, enabling developers to tailor the model to specific business needs [131][132]. - The shift from performance benchmarks to real-world applications signifies a growing emphasis on practical utility in AI development, with GLM-4.5V positioned as a foundational model for various industries [131][132]. - This model represents an opportunity for developers to collaboratively shape the future of AI, moving beyond mere competition to creating real-world value [133].
Lumina-mGPT 2.0:自回归模型华丽复兴,媲美顶尖扩散模型
机器之心· 2025-08-12 00:15
Core Viewpoint - Lumina-mGPT 2.0 is an innovative stand-alone autoregressive image model that integrates various tasks such as text-to-image generation, subject-driven generation, and controllable generation, showcasing significant advancements in image generation technology [5][9][21]. Group 1: Core Technology and Breakthroughs - Lumina-mGPT 2.0 employs a fully independent training architecture, utilizing a pure decoder Transformer model, which allows for two parameter versions (2 billion and 7 billion) and avoids biases from pre-trained models [4][5]. - The model incorporates a high-quality image tokenizer, SBER-MoVQGAN, which was selected based on its optimal reconstruction quality on the MS-COCO dataset [7]. - A unified multi-task processing framework is introduced, enabling seamless support for various tasks including text-to-image generation and image editing [9]. Group 2: Efficient Inference Strategies - The model introduces two optimizations to enhance generation speed while maintaining quality, including model quantization to 4-bit integers and a sampling method that reduces GPU memory consumption by 60% [11][13]. - The optimizations allow for parallel decoding, significantly accelerating the generation process [13]. Group 3: Experimental Results - In text-to-image generation benchmarks, Lumina-mGPT 2.0 achieved a GenEval score of 0.80, ranking it among the top generative models, particularly excelling in tests involving "two objects" and "color attributes" [14][15]. - The model demonstrated superior performance in the Graph200K multi-task benchmark, confirming the feasibility of a pure autoregressive model for multi-modal generation tasks [17]. Group 4: Future Directions - Despite optimizations, Lumina-mGPT 2.0 still faces challenges with sampling time, which affects user experience, indicating a need for further enhancements [21]. - The focus will expand from multi-modal generation to include multi-modal understanding, aiming to improve overall functionality and performance [21].
刚刚,OpenAI拿下IOI金牌,仅次于前五名人类选手!参赛推理模型才夺得IMO金牌
机器之心· 2025-08-12 00:15
Core Insights - OpenAI's reasoning model achieved a gold medal score at the 2025 International Olympiad in Informatics (IOI), ranking first among AI participants [1][5][9] - The model's performance marked a significant improvement from the previous year, rising from the 49th percentile to the 98th percentile [9] - OpenAI utilized a general reasoning model without specific training for the IOI, demonstrating the strength of its general reasoning capabilities [15][14] Group 1 - The 2025 IOI took place in Sucre, Bolivia, from July 27 to August 3, with the Chinese team winning all gold medals [1] - OpenAI's model scored just behind five human competitors among 330 participants, adhering to the same constraints as human contestants [5][6] - The model did not use the internet or retrieval-augmented generation (RAG), relying solely on a basic terminal tool [6] Group 2 - OpenAI's performance in recent competitions, including AtCoder and IMO, showcases the advancements made through new research methods [9] - The model used for IOI was the same as the one that won gold at the IMO, indicating its versatility across different competitive domains [14] - The strategy involved sampling answers from various models and using heuristic methods to select submissions, leading to a top-six finish overall [14] Group 3 - OpenAI's co-founder Greg Brockman praised the model's "gold medal-level performance" at the IOI [13] - The success of the model without specialized training has sparked discussions about its capabilities and potential future applications [15][17] - There is anticipation for a public version of the model that could leverage the techniques used in the recent competitions [17]