机器之心

Search documents
token危机解决?扩散模型数据潜力3倍于自回归,重训480次性能仍攀升
机器之心· 2025-08-10 04:31
机器之心报道 编辑:杜伟 扩散语言模型(DLMs)是超强的数据学习者。 token 危机终于要不存在了吗? 近日,新加坡国立大学 AI 研究者 Jinjie Ni 及其团队向着解决 token 危机迈出了关键一步。 在当前大语言模型(LLM)的持续发展中,面临的挑战之一是可用的高质量训练文本数据(tokens)即将枯竭,并成为限制模型性能持续提升的关键瓶颈。另外, 新增的高质量数据来源少,获取成本高,去重后更加稀缺。因此,当模型规模继续扩大,所需数据量按 Scaling Laws 成倍增加时,就出现了「优质 token 不够训 练」的危机。 针对这一现象, 该团队从零开始预训练了扩散语言模型(DLMs)与自回归(AR)模型,其中规模最高至 80 亿参数、4800 亿 tokens、480 个 epoch 。 研究有以下三项重要发现: 此外,团队还剖析了并行研究《Diffusion Beats Autoregressive in Data-Constrained Settings》中的严重方法论缺陷 —— 以共同提升开放评审的标准! Jinjie Ni 在社媒 X 上详细介绍了其团队的研究结论、研究方法,接下来 ...
腾讯张正友:具身智能必须回答的三个「真问题」
机器之心· 2025-08-10 04:31
Core Viewpoint - Tencent has launched the Tairos platform for embodied intelligence, aiming to provide a modular support system for the development and application of large models, development tools, and data services [2][3]. Group 1: Platform Development - The Tairos platform is a culmination of over seven years of research by Tencent's Robotics X Lab, which has developed various robotic prototypes to explore full-stack robotic technologies [2][3]. - The establishment of the Tairos platform reflects Tencent's response to current industry challenges and its strategic positioning for future ecosystems [2][3]. Group 2: Architectural Choices - The debate between end-to-end and layered architectures in embodied intelligence is ongoing, with a preference for layered architecture due to its efficiency and practicality [4][5]. - Layered architecture allows for the integration of human prior knowledge into model structures, enhancing training efficiency and reducing data dependency [6][7]. Group 3: Knowledge Feedback Mechanism - The SLAP³ architecture proposed by Tencent includes multi-modal perception models, planning models, and action models, with dynamic collaboration and information flow between layers based on task complexity [7][11]. - A memory bank captures unique interaction data from the action model, which can be used to update the perception and planning models, creating a feedback loop for continuous learning [11][12]. Group 4: Evolution of Models - The architecture is designed for continuous iteration, allowing for the adjustment of prior knowledge as new insights are gained, similar to the evolution of the Transformer architecture [12][15]. - The goal is to transition towards a more efficient and native multi-modal intelligence form, despite current limitations in data availability and model exploration [15][16]. Group 5: Innovation and Commercialization - The influx of talent and capital into the embodied intelligence field is beneficial, but there is a need for balance between short-term commercial gains and long-term technological goals [23][24]. - Companies must maintain a clear vision of their ultimate objectives and have the courage to forgo immediate commercial opportunities to focus on foundational scientific challenges [25].
联合理解生成的关键拼图?腾讯发布X-Omini:强化学习让离散自回归生成方法重焕生机,轻松渲染长文本图像
机器之心· 2025-08-10 04:31
Core Insights - The article discusses the advancements in image generation technology, particularly focusing on the X-Omni model developed by Tencent's team, which significantly enhances the quality of autoregressive image generation through reinforcement learning [2][4][5]. Group 1: Model Development - The X-Omni model utilizes reinforcement learning to improve the aesthetic quality of generated images and its ability to follow complex instructions, showcasing superior performance in rendering long texts [5][6]. - The model architecture is based on discrete tokens and employs a diffusion decoder to generate images, allowing for a unified approach to visual understanding and generation [6][11]. Group 2: Reinforcement Learning Approach - The reinforcement learning process incorporates a comprehensive reward model that evaluates image generation quality from multiple dimensions, including human aesthetic preferences and text-image semantic alignment [9][12]. - The introduction of the GRPO reinforcement learning method enhances the model's image generation capabilities, demonstrating that RL optimization surpasses traditional supervised fine-tuning methods [8][19]. Group 3: Performance Evaluation - The X-Omni model outperforms existing models in various benchmarks, achieving high scores in both text rendering and instruction-following capabilities, with scores of 0.901 in English and 0.895 in Chinese for text rendering [13][14]. - In instruction-following assessments, X-Omni achieved an overall score of 87.65, indicating its effectiveness in understanding and executing complex prompts [14]. Group 4: Unique Findings - Unlike traditional autoregressive models that rely heavily on classifier-free guidance (CFG) to enhance generation quality, X-Omni can produce high-quality images without CFG, demonstrating a high degree of integration between visual and language generation mechanisms [17]. - The research highlights the unique advantages of reinforcement learning in image generation, providing more comprehensive and efficient optimization signals compared to conventional methods [19].
40年后,Dijkstra算法极限再被突破,清华段然团队更快最短路径算法摘STOC最佳论文
机器之心· 2025-08-10 04:00
Core Viewpoint - The article discusses a groundbreaking research by a team from Tsinghua University that presents a new algorithm for finding the shortest path in directed graphs, which significantly improves upon the traditional Dijkstra algorithm by eliminating unnecessary sorting steps, thus reducing computational complexity [9][13][17]. Summary by Sections Dijkstra Algorithm Overview - Dijkstra's algorithm, introduced in 1956, is a classic method for finding the shortest path from a source node to all other nodes in a graph, widely used in various applications such as network routing and map navigation [11]. - The algorithm operates by continuously selecting the current shortest node and updating the distances to its adjacent nodes until all shortest paths are found [11][17]. New Research Breakthrough - The new algorithm developed by the Tsinghua team breaks the O(m + n log n) time complexity barrier established by Dijkstra's algorithm for sparse graphs, demonstrating that Dijkstra is not the optimal choice for single-source shortest path (SSSP) problems [17][18]. - The research introduces a deterministic O(mlog2/3n) time algorithm for SSSP in directed graphs with non-negative real edge weights, marking a significant advancement in the field [17][18]. Methodology and Implementation - The new approach focuses on calculating distances without the need for sorting, utilizing a layered recursive method to group nodes and only perform detailed shortest path calculations on key nodes [13][14]. - The algorithm employs a divide-and-conquer strategy, reducing the size of the frontier set of nodes and thus minimizing the computational overhead associated with maintaining a globally ordered set of nodes [22][24]. Technical Details - The algorithm is designed to handle constant-degree graphs and operates under a comparison-addition model, where each operation takes unit time [17][19]. - It introduces a bounded multi-source shortest path (BMSSP) problem, allowing for efficient distance calculations without the need for sorting all nodes [24][27]. Conclusion - This research not only enhances the efficiency of shortest path calculations but also opens new avenues for further exploration in graph algorithms, potentially impacting various fields that rely on efficient routing and pathfinding solutions [9][13].
数据困局下的具身智能,谁能率先破局?
机器之心· 2025-08-10 01:30
机器之心PRO · 会员通讯 Week 32 --- 本周为您解读 ② 个值得细品的 AI & Robotics 业内要事 --- 1. 数据困局下的具身智能,谁能率先破局? 真实数据是否注定是通用机器人的必经之路?合成数据是否永远只能「补量」?遥操作作为当前最直接的数据采集方式,能否 在控制效率和扩展能力之间找到可持续平衡?Sim2Real 的大规模部署是否需要一种「标准化仿真」平台?在多模态遥操作系统 中,语言 + 手势 + 触觉的融合是否意味着人类操控门槛正在被技术主动下探?... 2. OpenAI 董事会主席:「按 token 计费」大错特错!市场终将选择「按成果付费」 Bret Taylor 为何称「应用 AI」才是创业者的生路?「长尾 Agent 公司」将如何取代传统 SaaS?「按 token 计费」有什么根本 缺陷?为什么 AI 市场终将选择「按成果付费」?结果导向的商业模式如何适应当前的 AI 缺陷?Bret Taylor 的商业模式在 Sierra 实践效果如何?什么是 AI 编程的新范式?... 本期完整版通讯含 2 项专题解读 + 30 项 AI & Robotics 赛道要事速递, ...
GPT-5问题太多,奥特曼带团回应一切,图表弄错是因「太累了」
机器之心· 2025-08-09 06:02
| 机器之心报道 | | --- | 机器之心编辑部 前期有多期望,后期就有多失望,这大概是大多数业界人士在看到 GPT-5 这场事先张扬的高调发布后的最大心声。 当然,也许在内部测试的时候,OpenAI 确实觉得 GPT-5 是目前最为强大的模型,可是走进真实世界后却好像并非如此。 一位 X 网友发现 GPT-5 在解决可能属于小学水平的数学题时无能为力,吐槽到底被官方称为「博士」水平的智力是哪个学校颁发的? 不仅是数学,自 GPT-5 发布以来,各种社交媒体上充斥着各种 GPT-5 在逻辑、编码任务中「失误」的案例。 前期的高调炒作、直播中的低水准图表错误、用户试用后的失望,等等,不仅让 GPT-5 没有收到预期的鲜花与掌声,更多是吐槽和质疑声的时候,OpenAI 联合创 始人兼首席执行官 Sam Altman 似乎也开始「坐不住了」,表示 GPT-5 的发布过程确实存在一点问题。 GPT-5 发布后不久, 在 Reddit r/ChatGPT 的 AMA 活动中,Sam Altman 和 GPT-5 团队核心成员针对网友们的提问进行了回答,从发布会上出现的令人尴尬的「图 表犯罪」失误,到用户抱怨 GPT ...
ARPO:智能体强化策略优化,让Agent在关键时刻多探索一步
机器之心· 2025-08-09 06:02
Core Viewpoint - The article introduces a novel method called Agentic Reinforced Policy Optimization (ARPO), designed to enhance the performance of large language models (LLMs) in multi-round interactions by addressing the challenges of uncertainty and exploration during tool usage [3][41]. Group 1: Research Motivation and Background - The emergence of Agentic Reinforcement Learning (RL) is driven by the need for LLMs to engage in dynamic multi-round interactions with external tools, moving from static problem-solving to a more interactive agent-environment reasoning paradigm [8]. - Existing Agentic RL methods often underestimate the value of multi-round interactions due to sparse rewards and overuse of tools, leading to a lack of fine-grained exploration of tool usage [8][41]. - The study identifies a significant increase in entropy (uncertainty) after tool calls, indicating an opportunity for exploration that current methods do not fully leverage [14][16]. Group 2: ARPO Methodology - ARPO introduces an entropy-driven adaptive rollout strategy that enhances exploration during high-entropy tool usage phases, allowing for more diverse reasoning paths [11][20]. - The method includes four key steps: initialization of global rollout, monitoring entropy changes, adaptive branching based on entropy, and defining termination conditions for the rollout process [24][27]. - ARPO incorporates advantage attribution estimation to help the model better internalize the value differences in tool usage at each step [28][30]. Group 3: Experimental Results - ARPO outperforms existing sample-level RL methods, achieving better performance with only half the tool call budget across 13 challenging benchmarks, demonstrating its efficiency in training multi-round reasoning agents [21][41]. - The method shows consistent improvements in performance metrics such as Pass@3 and Pass@5, particularly in dynamic, multi-round tasks [37][39]. - In comparative tests, ARPO achieves higher accuracy than GRPO and DAPO in various tasks, including deep search and knowledge-intensive reasoning [41][42]. Group 4: Future Directions - Future research may explore the application of ARPO in multi-modal tasks, expanding its capabilities beyond text-based reasoning to include images and videos [42]. - There is potential for integrating a broader range of external tools to enhance complex task performance through optimized tool usage strategies [42]. - The scalability and real-time deployment of ARPO in larger models and dynamic environments could further improve its practical value and cost-effectiveness [42].
ICCV 2025 | 新型后门攻击直指Scaffold联邦学习,NTU联手0G Labs揭示中心化训练安全漏洞
机器之心· 2025-08-09 03:59
Core Viewpoint - The article introduces BadSFL, a novel backdoor attack method specifically designed for the Scaffold Federated Learning (SFL) framework, highlighting its effectiveness, stealth, and persistence compared to existing methods [2][39]. Group 1: Background on Federated Learning and Scaffold - Federated Learning (FL) allows distributed model training while protecting client data privacy, but its effectiveness is heavily influenced by the distribution of training data across clients [6][10]. - In non-IID scenarios, where data distribution varies significantly among clients, traditional methods like FedAvg struggle, leading to poor model convergence [7][10]. - Scaffold was proposed to address these challenges by using control variates to correct client updates, improving model convergence in non-IID settings [7][12]. Group 2: Security Vulnerabilities in Scaffold - Despite its advantages, Scaffold introduces new security vulnerabilities, particularly against malicious clients that can exploit the model update mechanism to inject backdoor behaviors [8][9]. - The reliance on control variates in Scaffold creates a new attack surface, allowing attackers to manipulate these variates to guide benign clients' updates towards malicious objectives [9][16]. Group 3: BadSFL Attack Methodology - BadSFL operates by subtly altering control variates to steer benign clients' local gradient updates in a "poisoned" direction, enhancing the persistence of backdoor attacks [2][9]. - The attack utilizes a GAN-based data poisoning strategy to enrich the attacker's dataset, maintaining high accuracy for both normal and backdoor samples while remaining covert [2][11]. - BadSFL demonstrates superior persistence, maintaining attack effectiveness for over 60 rounds, which is three times longer than existing benchmark methods [2][32]. Group 4: Experimental Results - Experiments conducted on MNIST, CIFAR-10, and CIFAR-100 datasets show that BadSFL outperforms four other known backdoor attacks in terms of effectiveness and persistence [32][33]. - In the initial 10 rounds of training, BadSFL achieved over 80% accuracy on backdoor tasks while maintaining around 60% accuracy on primary tasks [34]. - Even after the attacker ceases to upload malicious updates, BadSFL retains backdoor functionality significantly longer than benchmark methods, demonstrating its robustness [37][38].
用户痛批GPT-5,哭诉「还我GPT-4o」,奥特曼妥协了
机器之心· 2025-08-09 03:59
| 机器之心报道 | | --- | 对于用顺手了这些旧模型的人来说,这个更改真是无比难受。很多用户希望这些「老朋友」赶紧回来。尤其是 GPT-4o。 机器之心编辑部 o4 回归,你那可以了吗? 等了好久,终于等到 GPT-5 。但大家似乎对这个模型并不满意。 可以使用 GPT-5 的小伙伴,现在打开页面,是这样的。 以前的模型都消失了,原因在于,作为 GPT-5 发布的一部分,OpenAI 移除了 ChatGPT 中的模型选择器。这个下拉菜单此前汇集了 OpenAI 一系列名称容易混淆的 模型,用户可以根据不同需求在它们之间切换。例如,用户可以选择 GPT-4o 来处理复杂任务,或者选择更高效的 o4 mini 模型来完成负担较轻的工作。用户还可 以在不同代际的模型之间切换,例如从去年发布的 GPT-4o 切换到更新的 GPT-4.1。 以前是这样的 然而,随着新模型的发布,OpenAI 将 GPT-5 设为 ChatGPT 的默认模型,并会根据任务类型自动为用户分配不同的子版本。 为了表达心中的不满,很多人玩起了梗图,看起来又好笑,又无奈。 来源: https://x.com/pengkeshen281/ ...
上海AI Lab、浙大EagleLab等提出RRVF:利用「验证非对称性」,只输入图片学习视觉推理
机器之心· 2025-08-09 03:59
Core Insights - The article discusses the concept of "Asymmetry of Verification," which posits that verifying the quality of a solution is often easier than creating one from scratch, thus reshaping the future of AI [3][4] - The RRVF (Reasoning-Rendering-Visual-Feedback) framework exemplifies how to leverage this principle to tackle complex visual reasoning challenges [4][19] Summary by Sections Research Background - The research was conducted by a team from Shanghai AI Lab, Zhejiang University EagleLab, and Shanghai Chuangzhi Academy, focusing on multimodal large models and reasoning [2] Verification Asymmetry - The principle of verification asymmetry suggests that tasks with objective truths and quick verification can be efficiently solved by AI through iterative guess-and-check methods [3] RRVF Framework - RRVF operates without expensive image-text paired data, allowing models to self-validate in a closed-loop system [9][11] - The framework consists of three main components: Iterative Visual Reasoning, Visual Feedback, and Visual Judge, which collectively enhance the model's learning process [11][12][13] Experimental Results - RRVF demonstrated superior performance compared to traditional supervised fine-tuning (SFT), achieving a code execution rate of 97.83% without any standard code answers [21] - The 7B model trained with RRVF outperformed the 72B model that provided feedback, showcasing a self-learning effect [22] - RRVF maintained high performance on unseen datasets, indicating strong generalization capabilities [23] Implications for AI Development - The findings suggest that the future bottleneck in AI development may lie in designing efficient verification environments rather than solely in model size [23]