Workflow
机器之心
icon
Search documents
成本暴降88%!通义实验室、北大发布ZeroSearch,无需搜索即可激活LLM检索能力
机器之心· 2025-05-29 04:53
方法 无需搜索的强化学习框架 本文作者来自通义实验室和北京大学,第一作者是北京大学智能学院博士生孙浩,主要研究方向是RAG和Agent,在 NeurIPS、ACL、EMNLP 等国际顶级会议上 发表多篇论文,师从张岩教授。该工作在阿里巴巴通义实验室RAG团队实习期间完成。 信息检索能力对提升大语言模型 (LLMs) 的推理表现至关重要,近期研究尝试引入强化学习 (RL) 框架激活 LLMs 主动搜集信息的能力,但现有方法在训练过程中 面临两大核心挑战: 为了解决这些问题,我们提出了 ZeroSearch 框架 —— 无需真实搜索,直接用大语言模型模拟搜索引擎,并引入课程学习策略,在显著降低 88% 成本的同时,在 多项任务上性能超过依赖真实搜索引擎的方法。 传统训练方法需要在 Rollout 阶段频繁与真实搜索引擎交互,产生大量 API 开销,而大语言模型在预训练阶段积累了丰富的世界知识,具备根据 query 返回相关信 息的能力,因此 ZeroSearch 创新性地引入大语言模型作为模拟搜索引擎(Simulation LLM),无需真实搜索,即可为策略模型生成检索文档,大幅降低了训练成 本: $$\oper ...
刚刚,AI科学家Zochi在ACL「博士毕业」,Beta测试今日上线
机器之心· 2025-05-29 04:53
Core Viewpoint - The article highlights the achievement of Intology's AI scientist Zochi, which has become the first AI system to independently pass peer review at a top-tier scientific conference, specifically the ACL main conference, indicating a significant milestone in AI research capabilities [1][3][5]. Group 1: AI Research Achievements - Zochi's paper titled "Tempest: Automatic Multi-Turn Jailbreaking of Large Language Models with Tree Search" has been accepted at ACL 2025, showcasing its ability to conduct independent scientific research [8][11]. - The acceptance rate for main conference papers at top-tier conferences like ACL is around 20%, making Zochi's achievement particularly noteworthy [3]. - Zochi's research demonstrated a 100% success rate on GPT-3.5-turbo and a 97% success rate on GPT-4 in its multi-turn attack methodology, indicating the effectiveness of its approach [11]. Group 2: Methodology and Innovation - The research utilized a tree search method to autonomously explore multiple adversarial prompt branches, integrating cross-branch learning and partial compliance tracking [9]. - Zochi's approach to scientific discovery involved minimal human intervention, primarily in formatting and creating figures, while it independently defined research directions and conducted experiments [8][9]. - The system's innovative method, CS-ReFT, achieved a 93.94% success rate in model adaptation using only 0.0098% of parameters, surpassing GPT-3.5-Turbo [21]. Group 3: Industry Impact and Criticism - The acceptance of Zochi's work has sparked discussions in the AI academic community regarding the implications of AI-generated research and the integrity of the peer review process [16][17]. - Intology faced criticism for its practices, as other teams like Sakana had previously informed conference organizers about their AI-generated submissions, raising concerns about transparency [16][17]. - Zochi's continuous output of high-quality research papers, with scores significantly above the average for AI-generated submissions, emphasizes its advanced capabilities in tackling complex scientific challenges [23].
原来Veo 3早有苗头!人大联合值得买科技在CVPR 2025提出全新「图像到有声视频」生成框架
机器之心· 2025-05-29 03:04
Core Viewpoint - The article discusses the innovative framework JointDiT, which enables the generation of synchronized audio and video content from static images, marking a significant advancement in AI multimodal generation [1][5][28]. Group 1: Introduction to JointDiT - JointDiT is a collaborative effort between the Renmin University of China and ZhiDeMai Technology AI team, focusing on multimodal understanding, generation, and interaction [1]. - The framework aims to transform static images into dynamic videos with corresponding sounds, achieving high-quality joint generation of video and audio [1][6]. Group 2: Significance of Image-to-Sounding-Video (I2SV) - The task of generating synchronized audio and video from images (I2SV) is defined as a new frontier in AI multimodal generation, addressing the need for cohesive sensory experiences [6][12]. - Traditional models have struggled to integrate visual and auditory elements effectively, often resulting in semantic misalignment and timing issues [8][10]. Group 3: Technical Innovations of JointDiT - JointDiT employs a novel architecture that decomposes and reorganizes pre-trained models for audio and video, facilitating a unified generation framework [13]. - The framework introduces a Perceiver Joint Attention mechanism to enhance cross-modal interaction, improving synchronization and semantic consistency [15]. - JointCFG, a joint classifier-free guidance mechanism, is implemented to ensure deep collaboration between audio and video, enhancing overall generation quality [17]. Group 4: Experimental Results - JointDiT demonstrates significant improvements in video quality and audio naturalness, outperforming traditional pipeline methods in key metrics such as FVD and FAD [21]. - In subjective user evaluations, JointDiT ranked first across multiple categories, including video quality, audio quality, and overall effect, surpassing competitors by nearly 20% [21]. Group 5: Practical Applications and Future Directions - The advancements presented by JointDiT have implications for entertainment content creation and film production, as well as for the development of more generalized multimodal models [28]. - Future research aims to expand JointDiT to incorporate image, text, audio, and video modalities, paving the way for more intelligent multimodal generation systems [28][29].
DeepSeek-R1今天一次「小更新」,颠覆了大模型格局,网友:尽快放R2
机器之心· 2025-05-29 03:04
机器之心报道 昨晚,DeepSeek 官方宣布其 R1 推理模型升级到了最新版本(0528),并在今天凌晨公开了模型及权重。 编辑:泽南、Panda 超出所有人的期待。 千呼万唤始出来,DeepSeek 迎来了推理模型更新。 HuggingFace 链接:https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 模型文件上传时间是凌晨 1 点,不知 DeepSeek 工程师们是不是加班到了最后一刻。也有网友表示,这回又在端午节假期前发新模型,简直比放假通知还靠谱。 这次更新的升级版 R1 参数量高达 6850 亿,体量巨大,虽然开源了出来,但大多数人只能围观。如果「满血版」不进行蒸馏,是肯定无法在消费级硬件上本地运 行的。 不过这种不说话直接放链接的态度还是引来了网友们的普遍欢迎。 根据 DeepSeek 的小范围通知,更新后的 R1 版本采用 MIT 许可证,这意味着它可以用于商业用途,从版本号看来这是一个「小」升级,不过人们大量实测后发 现,新版大模型的性能提升颇为明显。 我们也能在新版 DeepSeek-R1 模型的配置文件中看到更多但并不出人意料的信息,包 ...
相约美国田纳西,CVPR 2025顶会饭局报名了!
机器之心· 2025-05-28 10:00
作为 AI 领域顶级会议,CVPR 具有极高的含金量,今年大会共接收 13008 份投稿,最终接收 2878 篇论文, 整体接收率为 22.1%。 CVPR 不仅是前沿研究的竞技场,更是全球 AI 人才交流的绝佳平台。 值此盛会,机器之心与上海人工智能实验室、东方菁汇、全球高校人工智能学术联盟共同攒个了饭局,诚 邀大家参加「云帆・CVPR 2025 AI Talent Meetup」,一起来见见老朋友,结识新朋友,聊聊最近的热点话 题 & 研究方向。 2025 年即将过半,AI 领域依旧以惊人的速度迭代,不断刷新我们对智能边界的想象。 新研究、新应用不断涌现。比如刚刚过去的 Google I/O 大会发布了全系列产品,大模型、编程工具、视频 生成、图像生成模型等等应有尽有。紧接着,知名 AI 创业公司 Anthropic 推出 Claude 4 系列大模型,代码 能力得到重大升级。而这些,全部是在一周之内发生的。 我们不得不感叹,AI 领域真是发展太快了。在这种情况下,我们如何以敏锐的洞察力捕捉技术趋势? 除了查找网络资料,参加顶会也是一个很好的学习渠道。 活动地点: 美国田纳西州纳什维尔・音乐城会议中心周边 ...
准确率92.7%逼近Claude 3.5、成本降低86%,开源代码定位新神器LocAgent来了
机器之心· 2025-05-28 10:00
又是一个让程序员狂欢的研究!来自 OpenHands、耶鲁、南加大和斯坦福的研究团队刚刚发布了 LocAgent —— 一个专门用于代码定位的图索引 LLM Agent 框 架,直接把代码定位准确率拉到了 92.7% 的新高度。该研究已被 ACL 2025 录用。 论文标题:LocAgent: Graph-Guided LLM Agents for Code Localization 图 2: 图中红色节点表示问题描述中明确提及的函数,黄色节点表示实际需要修改(修补)的函数。任务难度定义为代码图中从提及函数到目标修补函数之间的最短路径长度(最少跳数),图示例中任务 难度为 2 跳。 二、LocAgent:给 LLM 装上 「代码地图 」 一、痛点很真实:代码定位到底有多难? 相信每个程序员都有过这样的经历:看到一个 bug 报告,满脸问号地想 「这到底要改哪里? 」。传统方法要么靠关键词匹配(太粗糙),要么直接把整个代码库 丢给 LLMs(太低效),要么让 Agent 盲目遍历目录(太笨拙)。 问题的核心在于: 自然语言描述的问题和真正需要修复的代码位置之间,往往隔着好几层调用关系。比如用户反馈 「XSS 漏 ...
AI产品千篇一律?去Google Labs,淘下一个AI爆款
机器之心· 2025-05-28 10:00
Core Viewpoint - Google has introduced innovative AI applications through its Google Labs platform, showcasing experimental technologies and products that enhance user experience and creativity [3][22]. Group 1: Google Labs Overview - Google Labs is an experimental platform designed to showcase and test new technologies and product prototypes, allowing users to experience and provide feedback on potential future features or services [5]. - The platform is categorized into five sections: "Create," "Learn," "Develop," "Play," and "I/O New Products," filled with various intriguing AI tools [6]. Group 2: National Gallery Mixtape - The National Gallery Mixtape is a collaborative music experiment between the National Gallery in London and Google Arts & Culture, transforming classic artworks into unique musical compositions [12]. - Users can select up to six paintings from a curated collection of 200 masterpieces, including works from the Renaissance to modern art, such as Van Gogh's "Sunflowers" [13]. - The AI model Gemini analyzes the selected artworks' colors, themes, emotions, and historical contexts, generating music in real-time through MusicFX DJ based on these analyses [15]. Group 3: User Interaction and Customization - Users can create personalized music mixes by adjusting the volume, order, and layering of different music segments generated from selected artworks, with options to choose various music styles and moods [17][20]. - The platform also provides detailed information about the selected artworks, enhancing the educational aspect of the experience [19]. Group 4: Industry Context and Future Outlook - The AI industry is becoming increasingly competitive, with a noticeable rise in product homogeneity; however, Google continues to innovate with unique applications like those found in Google Labs [21]. - Google Labs, originally launched in 2002 and reactivated in 2023, aims to focus on AI-driven experimental projects, particularly in generative AI, search, and collaboration tools [22][23].
华为盘古首次露出,昇腾原生72B MoE架构,SuperCLUE千亿内模型并列国内第一
机器之心· 2025-05-28 08:09
Core Insights - The article discusses the emergence of the Mixture of Grouped Experts (MoGE) model by Huawei's Pangu team, which addresses the inefficiencies of traditional Mixture of Experts (MoE) models by ensuring balanced computational load across devices [2][6][31] - Pangu Pro MoE, built on the MoGE architecture, has demonstrated superior performance in industry benchmarks, achieving a score of 59 on the SuperCLUE leaderboard with only 72 billion parameters, making it competitive against larger models [3][26] Technical Innovations - The MoGE model introduces a grouping mechanism during the expert selection phase, which ensures that each token activates an equal number of experts within predefined groups, thus achieving load balancing across devices [2][12] - The architecture utilizes a batch-level auxiliary loss function to maintain balanced expert activation, enhancing overall model efficiency [16][18] Performance Metrics - Pangu Pro MoE achieves a throughput of 321 tokens/s on the Ascend 300I Duo platform and 1528 tokens/s on the Ascend 800I A2 platform, significantly outperforming other models of similar scale [24] - The model exhibits a nearly uniform expert load distribution, with each expert handling approximately 12.5% of the total token volume, indicating efficient resource utilization [29] Industry Impact - The introduction of Pangu Pro MoE signifies a shift from a "parameter arms race" to a focus on practical applications, reducing cloud inference costs and supporting high-concurrency real-time scenarios [31] - Huawei's innovations in the AI field aim to redefine the value of large models, providing a robust foundation for enterprises to deploy billion-parameter models effectively [31]
SIGGRAPH 2025 | CLR-Wire:曲线框可生成?可交互?深大VCC带你见证魔法
机器之心· 2025-05-28 08:09
深圳大学黄惠团队独立推出 CLR-Wire:连续潜空间驱动的三维曲线框生成方法,首次实现了将复杂的三维曲线框结构统一编码到连续的潜空间中,解决了传统方 法难以同时有效捕捉线框几何和拓扑信息的难题。这一创新技术能够实现复杂三维结构的高效生成与平滑插值,在工业设计、三维重建及内容创作等领域具有广 泛的实际应用前景。第一作者为深圳大学可视计算研究中心 (VCC) 博士研究生马雪奇,合作者刘奕林、高天龙、黄期瑞均为 VCC 研究生。CLR-Wire 相关代码已 全面开源,欢迎大家试用和建议。 在计算机图形学的世界里,当我们谈论三维线框插补时,我们在讨论些什么? 或许,是如何让一个圆柱平滑地演变为一个精致的碟状结构;或许,是如何巧妙地将一个醒酒器无缝过渡为圆润的花瓶;甚至,是如何从一栋带有屋顶的建筑 物,逐渐变化为简单明朗的方形结构,以及诸如漏斗或盘状结构之间的自由形态过渡。 该工作提出了 CLR-Wire,首先,通过多层交叉注意力将神经参数化曲线及其离散拓扑关系联合编码为定长潜向量,并借助变分自编码器构建连续的潜空间分布; 随后,采用流匹配方法实现从高斯噪声到完整线框的生成,并支持无条件生成以及基于点云、图像的条件生 ...
LLM加RL遭质疑:故意用错奖励,数学基准也显著提升,AI圈炸了
机器之心· 2025-05-28 08:09
Core Insights - The article discusses a recent paper that challenges the effectiveness of reinforcement learning (RL) in training large language models (LLMs), particularly in the context of using false rewards to enhance performance [3][4][5]. Group 1: Findings on Reinforcement Learning - The study reveals that using false rewards, including random and incorrect rewards, can significantly improve the performance of the Qwen2.5-Math-7B model on the MATH-500 benchmark, with random rewards improving scores by 21% and incorrect rewards by 25% compared to a 28.8% improvement with true rewards [5][10]. - The research questions the traditional belief that high-quality supervision signals are essential for effective RL training, suggesting that even minimal or misleading signals can yield substantial improvements [7][19]. Group 2: Model-Specific Observations - The effectiveness of RL with false rewards appears to be model-dependent, as other models like Llama3 and OLMo2 did not show similar performance gains when subjected to false rewards [16][17]. - The Qwen model demonstrated a unique ability to leverage code generation for mathematical reasoning, achieving a code generation frequency of 65% prior to RL training, which increased to over 90% post-training [28][34]. Group 3: Implications for Future Research - The findings indicate that future RL research should explore the applicability of these methods across diverse model families, rather than relying solely on a single model's performance [25][49]. - Understanding the pre-existing reasoning patterns learned during pre-training is crucial for designing effective RL training strategies, as these patterns significantly influence downstream performance [50].