机器之心

Search documents
在WAIC现场,全球首个拥有「原生记忆力」的大模型亮相,但不是Transformer
机器之心· 2025-07-26 09:32
Core Viewpoint - Google is initiating a transformation in AI architecture, moving beyond the limitations of the existing attention mechanisms with the introduction of the MoR architecture, indicating a consensus on the need for architectural innovation in the AI field [2][3]. Group 1: RockAI's Innovations - RockAI has developed the Yan architecture, which is a non-Transformer framework that significantly reduces computational complexity, allowing for offline operation on low-power devices like Raspberry Pi [5][10]. - The Yan 2.0 Preview model possesses native memory capabilities, enabling it to remember interactions over time, unlike traditional models that forget previous conversations [6][12]. - The architecture allows for end-to-end memory integration, making it easier for users as it does not require manual management of external knowledge bases [19][22]. Group 2: Challenges with Transformer Models - Transformer models face issues such as data scarcity and high computational requirements, making it difficult to deploy on low-power devices without significant modifications [9][10]. - The reliance on massive datasets for pre-training is becoming increasingly challenging due to the difficulty in acquiring high-value data [9]. - Current models often lack the ability to learn and update parameters during inference, limiting their adaptability [9][10]. Group 3: Vision for AI Development - RockAI aims to create intelligent devices that can learn and evolve independently, moving away from static models that require cloud connectivity [24][25]. - The concept of collective intelligence is emphasized, where individual devices with learning capabilities can share knowledge and evolve together, contributing to the development of AGI [26][29]. - The company’s mission is to ensure that every device possesses its own intelligence, promoting offline smart capabilities [27][28]. Group 4: Market Reception and Future Directions - RockAI's approach has gained recognition at industry events, with hardware manufacturers showing interest in their technology due to its unique memory capabilities [34][35]. - The company is committed to continuing its challenging yet correct technological path, focusing on enhancing autonomous learning capabilities [36][37]. - RockAI's long-term vision reflects a commitment to fundamental technological advancements, akin to the journeys of leading AI research labs [37][38].
手机AGI助手还有多远?移动智能体复合长程任务测试基准与调度系统发布
机器之心· 2025-07-26 09:32
Core Insights - The article discusses the transition from atomic task automation to complex long-range task management in mobile agents, highlighting the challenges faced by current systems in handling composite tasks that require multi-application interaction and information synthesis [4][6][10]. Group 1: Current State of Mobile Agents - Multi-modal large models (MLLM) have shown promising results in single-screen actions and short-chain tasks, indicating initial maturity in edge task automation [4]. - Existing mobile GUI agents exhibit significant capability gaps when faced with complex long-range tasks, struggling with generalization from atomic to composite tasks [6][10]. Group 2: Proposed Solutions - Researchers introduced a dynamic evaluation benchmark called UI-Nexus, which covers complex long-range tasks across 50 applications, designed with 100 task templates averaging 14.05 optimal steps [7][21]. - The multi-agent task scheduling system, AGENT-NEXUS, was proposed to facilitate instruction distribution, information transfer, and process management without modifying the underlying agent models [7][19]. Group 3: Task Complexity and Types - The article categorizes composite tasks into three types based on subtask dependencies: Independent Combination, Context Transition, and Deep Dive, each presenting unique challenges for mobile agents [11][13][21]. - A detailed analysis of error cases revealed that mobile agents often fail due to poor progress management and information handling, leading to issues like context overflow and information transfer failures [16][32]. Group 4: Experimental Findings - Testing across various mobile agents showed that task completion rates were below 50%, with AGENT-NEXUS improving completion rates by 24% to 40% while only increasing inference costs by about 8% [27][30]. - The performance of agents improved significantly when given manually split atomic instructions, particularly for UI-TARS, which increased its completion rate from 11% to 60% [29]. Group 5: Future Outlook - The article envisions a new generation of AI operating systems capable of efficiently coordinating and managing complex task demands, transforming mobile devices into intelligent personal assistants [34][36].
GPT4核心成员、清华校友赵晟佳任Meta超级智能实验室首席科学家
机器之心· 2025-07-26 08:19
Core Viewpoint - Meta has established the Meta Superintelligence Labs (MSL) to enhance its research and development in advanced artificial intelligence, aiming to realize Mark Zuckerberg's vision of long-term general intelligence [1][2]. Group 1: Establishment of MSL - MSL encompasses all foundational research, product development, and the FAIR team, along with a newly formed lab focused on next-generation model development [1]. - The lab is led by Alexandr Wang, former CEO of Scale AI, who is now the Chief AI Officer at Meta [4]. Group 2: Talent Acquisition - Following the underwhelming performance of Meta's Llama 4 model, the company has significantly increased its investment in talent acquisition, offering competitive salaries to attract top researchers from Silicon Valley [2]. - Meta is engaged in a talent war with OpenAI, actively recruiting leading experts from various AI organizations [3]. Group 3: Key Appointments - Shengjia Zhao, co-founder of ChatGPT, has been appointed as the Chief Scientist of MSL, working directly with Zuckerberg and Wang to set the research agenda and scientific direction for the lab [6][11]. - Zhao has a notable background, having joined OpenAI in June 2022 and contributing to several high-profile projects, including ChatGPT and GPT-4 [16][18]. Group 4: Research Direction - Zhao's recent research has introduced a new paradigm that may define the future scientific research direction for Meta's AI initiatives [12][13]. - The appointment of Zhao and the recruitment of top talent raise questions about the potential impact on Meta's research capabilities and the achievement of Zuckerberg's ambitious goals [22].
图灵奖得主Hinton国内首次现身演讲:AI超越人类后,我们该怎么做
机器之心· 2025-07-26 08:19
机器之心报道 机器之心编辑部 AI 一定会比人类更聪明,之后会发生什么? 今天上午,在世界人工智能大会 WAIC 上,2024 年诺贝尔物理学奖得主、2018 年图灵奖得主、人工智能教父杰弗里・辛顿(Geoffrey Hinton)发表了题为「 数字 智能是否会取代生物智能 」的开场演讲。 该演讲围绕人工智能领域的历史、未来发展方向、语言模型的原理、数字与生物计算特点以及 AI 发展带来的担忧等内容展开,辛顿高度评价了当前 AI 领域的大 模型技术,认为其与人类思考模式相同。 大语言模型,在用人类的方式思考? 非常感谢大家给我这样一个机会,让我来分享一下个人的观点 —— 有关 AI 的历史和它的未来。 在过去 60 多年来,学界对于 AI 有两种不同的理解范式,一个是逻辑型,认为符号规则的表达操作可以实现推理;另一种是图灵和冯诺依曼所相信的,认为智能 的基础在于学习神经网络中的链接,这个过程中理解是第一位的。 这让我们开始关注语言中词与词之间的关系。 心理学家有另一套理论,他们认为数字是语义学的特征。在 1985 年,我做了一个很小的模型,想把两大理论方向结合在一起,来更好地理解人类是如何理解词汇 的。我对每 ...
实测爆火的阶跃星辰Step 3,性能SOTA,开源多模态推理之王
机器之心· 2025-07-26 08:19
Core Viewpoint - The article highlights the launch of Step 3, a new generation of open-source base model by Jieyue Xingchen, which is positioned as a leading open-source VLM (Vision-Language Model) that excels in various benchmarks and has significant commercial potential [1][2][11]. Group 1: Model Features and Performance - Step 3 is recognized for its strong performance, surpassing other open-source models in benchmarks such as MMMU, MathVision, and SimpleVQA [1][41]. - The model integrates multi-modal capabilities, combining text and visual understanding, which is essential for real-world applications [10][39]. - Step 3 is designed to balance intelligence, cost, efficiency, and versatility, addressing key challenges in AI deployment [7][8]. Group 2: Technical Innovations - The underlying architecture of Step 3 utilizes a proprietary MFA (Multi-matrix Factorization Attention) design, optimizing for efficiency and performance, particularly on domestic chips [29][31]. - The model features a total parameter count of 321 billion, with 316 billion dedicated to LLM (Large Language Model) and 5 billion for the visual encoder, showcasing its extensive capabilities [33][34]. - Step 3 employs advanced distributed inference techniques, enhancing resource allocation and reducing operational costs [38]. Group 3: Commercialization and Market Impact - The launch of Step 3 marks a significant step towards commercialization for Jieyue Xingchen, with expectations of substantial revenue growth, projected to approach 1 billion yuan in 2025 [54]. - The model has already been integrated into various smart devices, with partnerships established with over half of the top 10 domestic smartphone manufacturers [54]. - The establishment of the "Model-Chip Ecological Innovation Alliance" with multiple chip manufacturers signifies a strategic move to foster collaboration and reduce costs in the AI ecosystem [51][52]. Group 4: Industry Positioning - Step 3 is positioned as a solution to the pressing industry need for a practical, open-source multi-modal reasoning model, filling a significant market gap [58][60]. - The article emphasizes the shift from competitive pricing strategies to collaborative innovation as a sustainable growth path for the industry [59][60]. - Jieyue Xingchen's rapid iteration and comprehensive model matrix have solidified its reputation as a leader in the multi-modal AI space [57].
Who’s Adam?最逆天的NeurIPS评审出炉了
机器之心· 2025-07-25 10:34
机器之心报道 机器之心编辑部 这两天,大家都收到 NeurIPS 2025 的评审结果了吧? 按照以往经验,应该到了吐槽评审意见的环节。 这不,我们刚刚在 X 上看到今年最逆天的一个 NeurIPS 评论。 来自北大校友,西北大学工业工程与管理科学系的助理教授 Yiping Lu 的 X 账号。 刚刚发出数小时,已经被查看了十几万次。 审稿人意见如下: 两个架构都使用 Adam 优化。 「Adam」 是谁 / 是什么? 我认为这是一个非常严重的拼写错误,作者本应在投稿前删除。 没错,这正是 Lu 老师 NeurIPS 论文的评审意见。 Dan Roy教授都忍不住开喷:NeurIPS评审完全是一坨。 | 〔〕 Dan Roy 已转帖 | | | --- | --- | | Yiping Lu @2prime PKU · 9小时 | | | Anyone knows adam? | | | "dimension"? | · I. 336: "Both architectures are optimized with Adam | | Who/what is "Adam"? I think this is a ve ...
Agent KB:经验池让Agents互相学习!GAIA新开源SOTA,Pass@1性能最高提升6.66
机器之心· 2025-07-25 07:15
近日,来自 OPPO、耶鲁大学、斯坦福大学、威斯康星大学麦迪逊分校、北卡罗来纳大学教堂山分校等多家机构的研究团队联合发布了 Agent KB 框架。 这项工作通过构建一个经验池并且通过两阶段的检索机制实现了 AI Agent 之间的有效经验共享。Agent KB 通过层级化的经验检索,让智能体能够从其他 任务的成功经验中学习,显著提升了复杂推理和问题解决能力。 论文地址: https://arxiv.org/abs/2507.06229 开源代码: https://github.com/OPPO-PersonalAI/Agent-KB Agent 记忆系统:从独立作战到协同学习 在 AI Agent 的发展历程中,记忆(memory)系统一直是实现持续学习和智能进化的关键组件。广义上的 Agent 记忆系统有用于存储当前对话或任务中的 临时信息的短期记忆,也有保存重要的知识、经验和学习成果的长期记忆,还有处理当前任务时的活跃信息缓存的工作记忆,部分还包括记录特定场景下的 问题解决策略的情境记忆。 然而,现有的记忆系统存在一个根本性限制:不同的 Agent 框架下的经验无法有效共享。由于不同的任务往往有着不同的 ...
A800、H800都低到这个价了,这个暑假搞了点算力福利
机器之心· 2025-07-25 07:15
Core Viewpoint - The article promotes a summer cash consumption rebate activity by Yingbo Cloud, targeting university users to enhance their AI research capabilities through discounted computing power services [1]. Group 1: Activity Details - The activity runs from now until August 31 [4]. - Users can receive corresponding cash vouchers based on their cash consumption, with a tiered rebate structure where spending over 10,000 yuan directly earns a 30% rebate [5][6]. - There are three additional benefits: registration and first recharge vouchers, recharge bonuses, and cash consumption bonuses, all expiring on August 31 [7]. Group 2: Pricing and Discounts - A800 pricing starts at 4.26 yuan per card per hour and H800 at 9.33 yuan per card per hour for users who meet the consumption threshold [2][9]. - Pricing examples show that the cost per hour decreases with higher consumption, with A800 at 4.26 yuan and H800 at 9.33 yuan for spending over 10,000 yuan [9]. Group 3: Voucher Details - Vouchers are valid for three months, encouraging users to plan their usage to avoid expiration [11]. - Specific recharge amounts yield different voucher values, such as 100 yuan for a 1,000 yuan recharge and 1,600 yuan for an 8,000 yuan recharge [8]. Group 4: Company Background - Yingbo Cloud, a wholly-owned subsidiary of Hongbo Co., Ltd., was established in June 2022 and focuses on providing GPU computing services and supporting AI technology development [14][15]. - The company aims to empower various sectors, including AIGC and university research, by offering comprehensive intelligent computing services [16].
150PB工业数据+智能体革命,西门子开启AI制造新纪元
机器之心· 2025-07-25 04:29
Core Viewpoint - Siemens is at the forefront of integrating AI into industrial processes, exemplified by its Industrial Copilot and Industrial Foundation Model, which enhance automation and efficiency in manufacturing environments [9][30][65]. Group 1: Historical Context and Development - The journey of Siemens in industrial AI began in 1964 with the creation of the Zuse Graphomat Z64, marking the start of computer-generated art and the long evolution towards AI in industry [2][4]. - Over the past 60 years, Siemens has transformed its Erlangen factory into a hub for over 100 AI applications, utilizing digital twin technology to mirror real-world processes [6][9]. Group 2: Industrial Copilot and AI Integration - The Industrial Copilot acts as a bridge between human language and machine operations, allowing users to issue natural language commands that the system translates into actionable tasks [10][18]. - This system significantly improves efficiency, enabling engineers to generate automation code quickly, reducing development time by nearly 50% and deployment time by 30% [14][15]. Group 3: Industrial Foundation Model (IFM) - The Industrial Foundation Model is a collection of models rooted in 150PB of validated industrial data, designed to understand and operate within the constraints of industrial environments [24][28]. - Unlike general-purpose AI models, the IFM is tailored to comprehend machine language and industrial logic, making it suitable for complex manufacturing processes [25][28]. Group 4: Data and Knowledge as Competitive Advantages - Siemens possesses a unique data asset of 150PB, which spans various stages of product design and manufacturing, providing a competitive edge in AI model training [34][36]. - The company’s extensive experience and industry know-how are critical in navigating the complexities of data collection, cleaning, and model deployment in industrial settings [40][41]. Group 5: Strategic Moves and Future Outlook - Recent strategic actions include the acquisition of Altair for over $10 billion, enhancing Siemens' capabilities in industrial simulation and AI-driven optimization [67]. - Siemens is also focusing on reskilling its workforce to ensure that employees can effectively collaborate with AI technologies, emphasizing the importance of cultural acceptance of AI in industrial environments [62][65].
夸克、浙大开源OmniAvatar,一张图+一段音,就能生成长视频
机器之心· 2025-07-25 04:29
近期,夸克技术团队和浙江大学联合开源了 OmniAvatar,这是一个创新的音频驱动全身视频生成模 型, 只需要输入 一张图片 和 一段音频 ,OmniAvatar即可生成相应视频 , 且显著提升了画面中人物 的唇形同步细节和全身动作的流畅性。此外,还可通过 提示词 进一步精准控制人物姿势、情绪、场景 等要素。 OmniAvatar已开源: 以下,是OmniAvatar在播客、唱歌、交互、动态背景等场景下的部分案例。 实验表明,OmniAvatar在唇形同步、面部及半身视频生成、文本控制等多个维度上,均取得领先表 现,并更好地平衡了视频质量、准确度、审美三要素。 Model:https://huggingface.co/OmniAvatar/OmniAvatar-14B Code:https://github.com/Omni-Avatar/OmniAvatar Arxiv:https://arxiv.org/abs/2506.18866 Project Page:https://omni-avatar.github.io/ | Methods | FID t | FVDt | | Sync-Ct Sync- ...