Workflow
机器之心
icon
Search documents
视频生成 vs 空间表征,世界模型该走哪条路?
机器之心· 2025-08-24 01:30
Core Insights - The article discusses the ongoing debate in the AI and robotics industry regarding the optimal path for developing world models, focusing on video generation versus latent space representation [6][7][10]. Group 1: Video Generation vs Latent Space Representation - Google DeepMind's release of Genie 3, which can generate interactive 3D environments from text prompts, has reignited discussions on the effectiveness of pixel-level video prediction versus latent space modeling for world models [6]. - Proponents of video prediction argue that accurately generating high-quality videos indicates a model's understanding of physical and causal laws, while critics suggest that pixel consistency does not equate to causal understanding [10]. - The latent space modeling approach emphasizes abstract representation to avoid unnecessary computational costs associated with pixel-level predictions, focusing instead on learning temporal and causal structures [9]. Group 2: Divergence in Implementation Approaches - There is a clear divide in the industry regarding the implementation of world models, with some experts advocating for pixel-level predictions and others supporting latent space abstraction [8]. - The video prediction route typically involves reconstructing visual content frame by frame, while the latent space approach compresses environmental inputs into lower-dimensional representations for state evolution prediction [9]. - The debate centers on whether to start from pixel-level details and abstract upwards or to model directly in an abstract space, bypassing pixel intricacies [9]. Group 3: Recent Developments and Trends - The article highlights various recent models, including Sora, Veo 3, Runway Gen-3 Alpha, V-JEPA 2, and Genie 3, analyzing their core architectures and technical implementations to explore trends in real-world applications [11].
第一名方案公开,代码智能体安全竞赛,普渡大学拿下90%攻击成功率
机器之心· 2025-08-23 10:51
Core Insights - The article highlights the vulnerabilities of AI programming assistants, indicating that even well-aligned large language models can inadvertently generate code with security flaws, which can be exploited by malicious users to accelerate malware development [2][4][29] - The Amazon Nova AI Challenge showcased the effectiveness of red team strategies in identifying security vulnerabilities in AI code models, with the PurCL team achieving over 90% success in attacks [7][29] Group 1: AI Model Security Challenges - Recent studies reveal that the security of AI models is compromised by subtle flaws in the reasoning chain, not just by explicit input-output issues [2][4] - The PurCL team developed a comprehensive red team system based on AI cognitive modeling, which was shared with the research community [3][21] - The challenge of aligning code models lies in extending alignment techniques to complex real-world problems and enhancing the security relevance of model reasoning [4][32] Group 2: Amazon Nova AI Challenge - The competition involved 12 teams over eight months, with a total investment of one million dollars, focusing on identifying vulnerabilities in AI code models [3][7] - The competition's structure included red teams attempting to find vulnerabilities and blue teams applying security alignment practices to defend against these attacks [7][29] - The PurCL team emerged as the winner of the red team category, demonstrating the inadequacy of current AI safety research in addressing real-world model security issues [7][29] Group 3: AI Cognitive Modeling - The PurCL team proposed a cognitive modeling approach that divides human cognition into "problems," "inference," and "solutions," which can be applied to AI code generation [12][14] - Their research identified that existing security classifiers struggle with domain-specific knowledge, leading to a significant drop in effectiveness in complex fields like cybersecurity [19][20] - The team developed a knowledge modeling system to identify potential security risks in complex domains, revealing significant gaps in current alignment solutions [23][29] Group 4: ASTRA Reasoning Path Analysis - The ASTRA method was created to analyze the reasoning paths of AI models, identifying weaknesses in their inference processes [25][29] - This method allows for the generation of targeted input modifications to bypass model defenses, significantly enhancing red team testing depth [25][29] - The PurCL team found that many state-of-the-art models, including GPT-5, could assist in generating malicious code under certain conditions [29][30]
OpenAI重大发现:GPT-4b micro改造诺奖研究,山中因子重编程效率提高50倍
机器之心· 2025-08-23 10:51
Core Viewpoint - The collaboration between OpenAI and Retro Bio aims to enhance the efficiency of stem cell reprogramming through the development of a new model, GPT-4b micro, which significantly improves the reprogramming efficiency of Yamanaka factors by 50 times compared to standard methods [2][3][26]. Group 1: Collaboration and Investment - OpenAI announced its partnership with Retro Bio to develop a new model, GPT-4b micro, which focuses on enhancing Yamanaka factors for stem cell reprogramming [2]. - Sam Altman personally invested $180 million in Retro Bio prior to this collaboration [3]. Group 2: Technological Advancements - The new model, GPT-4b micro, has a similar architecture to GPT-4o but employs a novel training method and a custom biological dataset to allow scientists to redesign proteins according to their needs [9]. - The model can handle a context length of up to 64,000 tokens, a first for protein sequence models, and exhibits scaling laws similar to language models, indicating predictable improvements with larger datasets [12]. Group 3: Research Findings - The Retro team utilized human fibroblasts to create a wet lab screening platform, where GPT-4b micro proposed diverse "RetroSOX" sequences that outperformed wild-type SOX2 in expressing pluripotency markers [14][15]. - For KLF4, the model generated enhanced RetroKLF variants, achieving a hit rate close to 50%, significantly higher than traditional methods [18]. - Combining the best RetroSOX and RetroKLF variants led to notable increases in early and late pluripotency markers, with the appearance of late markers occurring days earlier than with standard OSKM combinations [20]. Group 4: Clinical Potential and Validation - The study demonstrated that over 30% of cells began expressing key pluripotency markers within 7 days using mRNA delivery methods, with over 85% activating endogenous expression of critical stem cell markers by day 12 [24]. - The engineered variants showed robust genomic stability and the ability to differentiate into all three germ layers, supporting their potential for cell therapy applications [24]. Group 5: Future Outlook - OpenAI's work illustrates that specialized models can lead to rapid breakthroughs in scientific research, potentially solving problems in days that previously took years [32].
「只参与,不参赛」奖牌数却仅次于宇树,这个幕后玩家如何做到的?
机器之心· 2025-08-23 10:51
Core Viewpoint - The article highlights the emergence of "Accelerated Evolution" as a significant player in the humanoid robotics industry, showcasing its innovative approach and competitive edge against established companies like "Yushu Technology" [1][12][30]. Group 1: Competition and Achievements - The 2025 World Humanoid Robot Games showcased a mix of humor and advanced technology, with robots competing in various events, including soccer [1]. - Yushu Technology's robots, G1 and H1, won the most medals, while the startup "Accelerated Evolution" with its T1 robot secured the third most medals [3][4]. - In the pure AI soccer project, teams predominantly used the T1 robot, indicating its dominance and the shift towards a standardized platform for competition [5][6]. Group 2: Technological Philosophy - Accelerated Evolution focuses on a "product + ecosystem" strategy, emphasizing the development of a robust developer ecosystem rather than just competing on product features [12][25]. - The company prioritizes enhancing the physical capabilities of robots before integrating advanced AI, contrasting with the industry trend of rapidly deploying AI models [17][18]. - The choice of soccer as a testing ground allows for the development of practical skills applicable in real-world scenarios, such as dynamic balance and autonomous decision-making [24]. Group 3: Market Position and Strategy - Accelerated Evolution aims to establish itself as a platform company in the humanoid robotics field, akin to Apple's ecosystem, by providing a comprehensive platform for developers [25][29]. - The company has achieved significant production capabilities, delivering hundreds of robots within a year and maintaining a strong market presence [35][36]. - The team comprises experts from top institutions and tech companies, combining hardware expertise with software development skills, which enhances its competitive advantage [36][37]. Group 4: Future Outlook - The humanoid robotics market in China is projected to grow significantly, with estimates reaching 6 trillion yuan by 2050, indicating vast potential for companies like Accelerated Evolution [37][38]. - The company is positioned to tap into various market segments, including education, research, and home assistance, aiming for a comprehensive approach to robotics [38][41]. - The ongoing evolution in the robotics field suggests that Accelerated Evolution is not just competing for medals but is also focused on shaping the future of personal robotics [44].
Chain-of-Agents: OPPO推出通用智能体模型新范式,多榜单SOTA,模型代码数据全开源
机器之心· 2025-08-23 04:42
Core Insights - The article introduces a novel agent reasoning paradigm called Chain-of-Agents (CoA), which enhances multi-agent collaboration and efficiency compared to traditional multi-agent systems (MAS) [2][6][36] - CoA allows for dynamic activation of multiple roles and tools within a single model, facilitating end-to-end multi-agent collaboration without complex prompt and workflow designs [6][36] Limitations of Traditional MAS - High computational costs due to frequent redundant communication and complex workflow designs [3] - Limited generalization ability requiring extensive prompt design and workflow configuration for new tasks [3] - Lack of data-driven learning capabilities, making it difficult to improve performance through task data [3] Advantages of CoA and AFM - CoA reduces communication overhead and supports end-to-end training, significantly improving system efficiency and generalization capabilities [6][36] - The Agent Foundation Model (AFM) demonstrates superior performance across nearly 20 complex tasks, achieving a 55.4% success rate on the GAIA benchmark with a 32B model [6][24] - AFM reduces reasoning costs (token consumption) by up to 85.5% while maintaining leading performance [6] CoA Architecture - CoA features a hierarchical agent architecture with two core components: role-playing agents (Thinking, Planning, Reflection, Verification) and tool agents (Search, Crawl, Code) [10][13] - The framework supports diverse agent reasoning and task execution types [10] Training Framework - A specialized CoA fine-tuning framework is developed to build AFM, involving task data collection, multi-agent capability distillation, supervised fine-tuning, and reinforcement learning [11][14] - Approximately 87,000 structured task-solving trajectories were generated for training [15] Experimental Validation - AFM models exhibit robust performance in multi-hop question answering (MHQA) tasks, achieving new benchmarks across various datasets [19][22] - In mathematical reasoning tasks, AFM-RL-32B achieved an average accuracy of 78.0%, outperforming existing models [26] Efficiency Analysis - AFM shows significant advantages in tool calling efficiency and reasoning costs, requiring fewer tool calls and lower token consumption per successful task [31][33] - The model's performance in test-time scaling is validated across multiple benchmarks, demonstrating robust generalization and reasoning capabilities [31] Future Directions - Potential exploration of dynamic role generation capabilities to enhance adaptability to unknown tasks [39] - Integration of cross-modal tool fusion to expand application scenarios beyond text-based tools [39] - Development of efficient memory mechanisms for long-term tasks to reduce repetitive reasoning costs [39]
Coinbase强制全员上手AI工具,拒绝者直接开除
机器之心· 2025-08-23 04:42
Core Viewpoint - The article discusses Coinbase's controversial decision to fire engineers who refused to adopt AI programming tools, emphasizing the company's stance that AI is essential for their operations [5][11]. Group 1: AI Adoption in Programming - The use of AI in programming has become standard among developers, with Google claiming that 50% of its code is AI-generated [2]. - There is a growing community of developers who rely entirely on AI for coding, known as Vibe Coders, while some programmers still prefer traditional coding methods [4]. Group 2: Coinbase's Decision - Coinbase CEO Brian Armstrong announced the firing of engineers who did not use AI programming tools, stating that the company had purchased enterprise licenses for GitHub Copilot and Cursor [6]. - Armstrong expressed shock at the slow adoption rate of AI among engineers and implemented a mandatory trial period for AI tools, leading to the dismissal of those who did not comply [8][10]. Group 3: Reactions and Implications - The decision sparked significant discussion online, with mixed reactions from the tech community, including claims that the prevalence of AI programming is overestimated [13][14]. - Armstrong acknowledged that his approach was high-pressure and not well-received by some employees, but he aimed to convey that using AI is not optional [11].
抢天才还是拼算力?前 Llama 推理负责人详解 AI 的真实天花板
机器之心· 2025-08-23 01:30
Group 1 - The chaos and talent poaching in the AI competition are merely noise, ultimately overshadowed by exponential growth in computing power [1][5][10] - The bottleneck in the industry is not management but computing power, which can be translated into expanding GPU/TPU scale [8][9] - Organizational chaos may cause delays, but the exponential growth of computing power is more decisive for the industry [9][10] Group 2 - The frequent shifts in research direction in leading laboratories do not slow down overall progress as long as high-quality models are produced [5][6] - The focus on "genius researchers" and their high-profile moves distracts from the more significant trends in computing power and organizational efficiency [11]
KDD 2025 Best Paper Runner-Up | EI-BERT:超紧凑语言模型压缩框架
机器之心· 2025-08-22 07:55
本文第一作者王茂林,为香港城市大学博士生,导师为赵翔宇教授。合作者包括蚂蚁集团储俊,臧晓玲,赵耀,谢锶聪和钟文亮。该论文荣获 2025 年 KDD ADS Track Best Paper Award Runner-Up。 研究背景与动机 在移动计算时代,将高效的自然语言处理模型部署到资源受限的边缘设备上面临巨大挑战。这些场景通常要求严格的隐私合规、实时响应能力和多任务处理功 能。 现有的 BERT 模型压缩技术仅能实现 15-20MB 的压缩,远不能满足 移动设备 4MB 的严格内存限制 。特别是在金融应用场景中,本地 AI 处理对保护用户隐私至 关重要,同时还需确保 约 300 毫 秒的实时 响应 。这种差距凸显了对极致压缩框架的迫切需求。 方法:多阶段的极值压缩框架 EI-BERT 框架通过三个关键步骤实现极致压缩: 硬令牌剪枝 智能筛选重要词汇,大幅减少存储需求; 交叉蒸馏 确保高效知识传递,突破传统方法局限; 模 块 化 量化 采用 INT8 量化进一步优化存储。 其中,交叉蒸馏方法创新性地让 教师模 型 "站 在学生模型的角度" ,通过参数集成和师生互动的动态适应机制,实现精准的知识转移。该方法有 ...
全球首款AI原生游戏引擎再进化:GTA6再不来,我们就AI一个
机器之心· 2025-08-22 07:55
Core Viewpoint - The article discusses the delay of GTA 6 and highlights the advancements in AI-driven game engines, particularly focusing on the evolution of the Mirage game engine from version 1 to version 2, which aims to create interactive worlds similar to GTA [1][22]. Group 1: Mirage Game Engine Development - Mirage 1 was the first real-time world model-driven AI native UGC game engine, but it had limitations in scene generation [3][4]. - Mirage 2 has been released just over a month after Mirage 1, showcasing significant improvements in flexibility, intelligence, and performance [5][8]. - The new version allows users to create, experience, and modify any game world instantly, supporting image uploads and real-time dialogue for world modifications [8][17]. Group 2: Performance Enhancements - Mirage 2 has made notable advancements in generation performance, with improvements in object proportions, scene understanding, and overall scene precision [19][21]. - The interaction latency has been reduced to 200 milliseconds, allowing for smoother gameplay on a single consumer-grade GPU [18][20]. - The engine supports various styles of scenes, enhancing user experience and engagement [10][13]. Group 3: Comparison with Competitors - Mirage 2 is positioned to compete with DeepMind's Genie 3, offering more interactive capabilities such as running, jumping, and attacking, with a longer interaction horizon [17][18]. - Despite its advancements, Mirage 2 still faces challenges in visual consistency and character control precision, particularly during rapid scene changes [21][24]. Group 4: Future Prospects - The rapid development of Mirage 2 within a month raises questions about the potential advancements in AI-driven UGC game engines by the time GTA 6 is released in nine months [22].
从繁杂技巧到极简方案:ROLL团队带来RL4LLM新实践
机器之心· 2025-08-22 04:58
本研究由淘天集团算法技术—未来生活实验室与爱橙科技智能引擎事业部联合完成 ,核心作者 刘子贺,刘嘉顺, 贺彦程和王维埙等 。未来生活实验室汇聚淘天 集团的算力、数据与顶尖技术人才,专注于大模型、多模态等前沿 AI 方向,致力于打造基础算法、模型能力及各类 AI Native 应用,引领 AI 在生活消费 领域的技术创新。爱橙科技则在大模型训练与优化方面具有丰富的实践经验。双方此前联合开源了高效大模型强化学习训练框架 ROLL,此次论文工作同样 是基于 ROLL 框架的实践探索。 近年来,强化学习(Reinforcement Learning, RL)在提升大语言模型(LLM)复杂推理能力方面展现出显著效果,广泛应用于数学解题、代码生成等任 务。通过 RL 微调的模型常在推理性能上超越仅依赖监督微调或预训练的模型。也因此催生了大量的相关研究。但随之而来的,是一系列令人困惑的现象: 不同研究提出了不同的 RL 优化技巧,却缺乏统一的实验对比和机制解释,有的甚至得出相互矛盾的结论。对于研究者和工程师而言,这种 "方法多、结论 乱" 的局面,反而增加了落地应用的难度。 为此,阿里巴巴淘天集团和爱橙科技联合多所高校,基 ...