机器之心 - filings, earnings calls, financial reports, news

机器之心

Search documents

人人都能炼专属Agent，上海交大开源端侧Agent全栈工具链，真实场景性能超GPT-5！

机器之心· 2025-09-10 07:31

打开手机，让 AI Agent 自动帮你完成订外卖、订酒店、网上购物的琐碎任务，这正成为智能手机交互的新范式。就在刚刚，这一局面迎来了新的破局者。来自上海交通大学 IPADS 实验室的团队，正式开源了一套名为 MobiAgent 的移动端智能体 "全家桶"。 APP：https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk 一个能自主处理大部分日常任务的个人专属智能体，正在从科幻走进现实。然而，通往 "解放双手" 的最后一公里却并不好走。如何高效地训练和在手机端部署 Agent 模型，长期以来似乎都是少数大厂的 "自留地"。从高质量操作数据的获 Agent 养成全攻略：三步走要让 AI 学会玩手机，首先得让它看懂人是怎么操作的。MobiAgent 的第一大核心，就是贡献了一套 AI 辅助的敏捷数据收集 "流水线 " 。过去，给 AI 准备 "教材"（标注数据）又贵又慢。现在，MobiAgent 用一个轻量级小工具，就能记录下人类在手机上的所有点击、滑动、输入等操作轨迹。对于一些简单的任务，这一录 ...

机器之心· 2025-09-10 04:05

编辑：+0、张倩想象一下，如果 ChatGPT 等 AI 大模型在生成的时候，能把自己不确定的地方都标记出来，你会不会对它们生成的答案放心很多？机器之心报道上周末， OpenAI 发的一篇论文引爆了社区。这篇论文系统性地揭示了幻觉的根源，指出问题出在奖励上 —— 标准的训练和评估程序更倾向于对猜测进行奖励，而不是在模型勇于承认不确定时给予奖励。可能就是因为意识到了这个问题，并找出了针对性的解法，GPT-5 的幻觉率大幅降低。随着 AI 大模型在医疗咨询、法律建议等高风险领域的应用不断深入，幻觉问题会变得越来越棘手，因此不少研究者都在往这一方向发力。除了像 OpenAI 那样寻找幻觉原因，还有不少人在研究幻觉检测技术。然而，现有的幻觉检测技术在实际应用中面临瓶颈，通常仅适用于简短的事实性查询，或需要借助昂贵的外部资源进行验证。针对这一挑战，来自苏黎世联邦理工学院（ETH）和 MATS 的一项新研究提出了一种低成本、可扩展的检测方法，能够实时识别长篇内容中的「幻觉 token」，并成功应用于高达 700 亿（70B）参数的大型模型。论文标题：Real-Time Detection of ...

AI应用元年，这场标杆赛事见证了中国创新速度与野心

机器之心· 2025-09-10 04:05

机器之心原创编辑：吴昕一场关于未来金融智能的集体预演，见证了创业者们的冲刺，也折射出一个行业的进化。 2025 年的 AI ，正在上演「双线长跑」。一端是大模型底层的持续进化，远未触顶；另一端是场景应用集中爆发。来自 a16z 最新发布的全球百强 GenAI 应用榜单，释放出一个清晰信号，在「 AI 如何改造行业」应用上，中国玩家已展现出全球领先优势。与此同时，国务院印发的「人工智能 + 」行动计划又添了一把柴。 AI 的赋能范围，正从新质生产力的试点，扩展到全社会，被视作未来现代化的核心引擎。这股脉动，在 AFAC2025 金融智能创新大赛上体现得淋漓尽致。作为连续举办三年的金融智能标杆赛事，它已成为海内外 AI 创业团队的聚合地。在为期三个月的赛程中， 11 支队伍从初创组脱颖而出 —— 获奖方案直击真实金融痛点，覆盖底层技术突破与复杂系统工程，落地性极强，跨界创新尤为显著。 11 支获奖团队的项目方向、技术亮点和应用场景，大都直击真实金融痛点，落地性极强，「跨界」创新明显。中国的应用落地速度是全球领先的，另一位评委、 xcube.co 首席幕僚长兼董事、新加坡金融科技节和 GFT ...

苹果发布会：耳机测心率、手表听音乐、iPhone Air超级薄

机器之心· 2025-09-09 23:21

机器之心报道编辑：杨文、Panda 北京时间 9 月 10 日凌晨 1 点，伴随着 Tim Cook 的一声「Good Morning」，这场主题为「 Awe Dropping 」的 2025 苹果秋季新品发布会正式拉开帷幕。发布会持续 75 分钟，Air Pods、Apple Watch 和 iPhone17 系列轮番上阵，其中印象最深刻的卖点就是：耳机测心率、手表听音乐、iPhone Air 超级薄。今年的 iPhone 17 系列总共分为四款机型，价格如下： iPhone 17 起售价 799 美元 / 5999 元；以上机型都将于 9 月 12 日星期五开始预订，并计划于下周五（ 9 月 19 日）发货。至于大众瞩目的 AI 功能，发布会上介绍的可谓是少之又少。即使是提到的大多数面向消费者的 AI 功能，比如视觉智能和 iMessage、FaceTime 中的实时翻译，早在今年 6 月的 WWDC 大会上就已经展示过了，而且这些功能也并不是苹果的创新，谷歌和三星等竞争对手早在一年前就推出了类似的功能。更有意思的是，发布会开始前半小时，苹果的股价就先跌为敬，发布会后股价下跌 1.48% ...

从第一性原理出发的RAG推理新范式来了，蚂蚁DIVER登顶权威基准

机器之心· 2025-09-09 11:46

在当前由大语言模型（LLM）驱动的技术范式中，检索增强生成（RAG）已成为提升模型知识能力与缓解「幻觉」的核心技术。然而，现有 RAG 系统在面对需多步逻辑推理任务时仍存在显著局限，具体挑战如下：为建立严格的评估体系，学术界提出了 BRIGHT—— 首个面向推理密集型检索的权威测试集。该基准涵盖了源自经济学、心理学、数学及编程等多个知识密集型领域的真实查询。这些查询的共性在于其答案无法通过传统的直接检索显式获得，使得很多 RAG 系统失效。而 BRIGHT 必须通过多步推理构建证据链，也就是所谓的「第一性原理」，从「根源」推导，而非「类比」来解决问题。论文标题： DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval arXiv 地址：https://arxiv.org/pdf/2508.07995 代码与模型开源地址：表面相关性 (Surface Relevance)：基于 TF-IDF/BM25 等传统方法过度依赖词汇重叠度，倾向于召回与查询共享关键词的文档，导致检索结果停留于浅层文本匹配 ...

文心新出的推理大模型，给了我们信心

机器之心· 2025-09-09 11:46

机器之心报道机器之心编辑部当下的大语言模型，不怕它搞不定，就怕它胡说八道：有「幻觉」存在，我们经常会下意识地不信任 AI 输出的结果。就在上周，OpenAI 的论文《Why Language Models Hallucinate》广为流传。研究人员指出，要想消除幻觉，需要修正模型训练时的评分机制并开发全新的技术。不过 AI 领域里，技术的发展速度一直比想象得快，就像是对 OpenAI 研究的呼应，今天上午 WAVE SUMMIT 深度学习开发者大会 2025 上，百度发布的新模型就把「可信度」提升了一大截，除了更准确的事实性，更有指令遵循、智能体等能力的显著提升。今天发布的是文心大模型 X1.1 深度思考模型，它是百度在 4 月份发布的旗舰模型 X1 的升级版，发布即上线，所有人都可以免费体验。同时该模型通过百度智能云千帆平台向企业客户与开发者开放使用。升级后的模型主攻事实性、指令遵循以及智能体、工具调用能力，带来了综合能力的显著提升。用一组数据说话，相较于文心 X1，X1.1 的事实性提升 34.8%，指令遵循提升 12.5%，智能体提升 9.6%。这意味着它提供信息时更加可靠、执行任务 ...

SFT远不如RL？永不过时的剃刀原则打开「终身学习」大模型训练的大门

机器之心· 2025-09-09 11:46

Core Viewpoint - The article discusses the challenges and advancements in large models, particularly focusing on the phenomenon of catastrophic forgetting and the advantages of reinforcement learning (RL) over supervised fine-tuning (SFT) in mitigating this issue [1][3][29]. Group 1: Large Models and Their Challenges - The era of large models has arrived, becoming a core component of intelligent infrastructure supporting various applications such as language processing, visual analysis, and robotics [1]. - Most deployed large models are "static" and lack the ability for dynamic learning and self-improvement, which is essential for achieving more general artificial intelligence (AGI) [2][3]. - Catastrophic forgetting occurs when models lose previously learned skills while learning new tasks, posing a significant challenge for long-term learning agents [3]. Group 2: Research Insights on Catastrophic Forgetting - Researchers have proposed various methods to address catastrophic forgetting, including regularization, experience replay, and parameter tuning [5]. - A recent study from MIT's Improbable AI Lab revealed fundamental patterns and training strategies related to forgetting in large models, gaining significant attention [6][7]. Group 3: Findings from the Study - The study compared two common post-training methods: supervised fine-tuning (SFT) and reinforcement learning (RL), finding that RL is less prone to forgetting [8][29]. - A new principle called the "forgetting law" was introduced, indicating that the KL divergence between the fine-tuned strategy and the baseline strategy is a key predictor of forgetting [10][30]. - The research demonstrated that RL maintains better retention of prior knowledge while learning new tasks compared to SFT, which often sacrifices old knowledge for new performance [15][29]. Group 4: Mechanisms and Theoretical Contributions - The study identified that the online nature of RL contributes to its KL divergence minimization, which helps retain prior knowledge [21][30]. - The authors provided a theoretical basis for RL's KL-minimizing behavior, explaining that RL naturally prefers solutions closer to the original model [24][30]. - The findings suggest that future training methods should aim to minimize KL divergence to achieve continuous learning without forgetting [31][32].

DPad: 扩散大语言模型的中庸之道，杜克大学陈怡然团队免训推理加速61倍

机器之心· 2025-09-09 08:56

论文作者团队：来自杜克大学 CEI Center，由实习生陈欣骅、黄思韬及郭聪博士共同完成，指导教师为李海教授、陈怡然教授。扩散大语言模型（dLLMs）凭借并行解码与独特的全局规划能力，有望解决自回归（AR）大模型的效率瓶瓶颈和规划能力缺陷。但其「全局规划」能力依赖于其双向注意力对所有后文的关注，这带来了严重的计算冗余，从而导致现有开源模型的潜力远远未被释放。当前的 dLLM 存在「路线之争」：一是保留全局规划能力但推理效率极低的「全局双向注意」（如 LLaDA），二是追求速度却牺牲规划能力的「块内双向注意」（如 Block Diffusion）。如何在这两条路线之间调和折中，让模型既能「着眼全局」，又能加速推理，已成为学界日益关注的问题。针对以上问题，杜克大学陈怡然团队另辟蹊径，揭示了 dLLM 中实现全局规划的「草稿纸机制」，并发现其存在高度冗余。据此，他们提出免训练方法 DPad（Diffusion Scratchpad），通过先验地丢弃大量无效后缀 token，既极大地降低了计算量，又保留了核心规划能力，尝试在两条路线中走出一条「中间路线」。该方法与现有优化技术结合后，在几乎无损 ...

硅谷也996实锤了？AI的火，烧掉了硅谷的周末

机器之心· 2025-09-09 08:56

Core Viewpoint - The "996" work culture, initially seen as a phenomenon unique to Chinese tech companies, is increasingly becoming a reality in Silicon Valley, with evidence of longer working hours and changes in employee consumption patterns [2][3][9]. Group 1: Evidence of 996 in Silicon Valley - A blog post by Ara Kharazian, an economist at fintech company Ramp, highlights the increase in Saturday work hours among employees in San Francisco, reflected in their consumption trends [3][7]. - Data from Ramp shows a significant increase in dining and takeout spending on Saturdays in 2025 compared to 2024, indicating that employees are working longer hours on weekends [7][8]. - This trend is unique to San Francisco, as other major tech hubs do not show a similar increase in Saturday spending, with New York's increase being only a quarter of that in San Francisco [8][9]. Group 2: Broader Implications and Reactions - The increase in Saturday spending is not limited to tech companies but is observed across various industries in San Francisco, suggesting a widespread adoption of longer working hours [9]. - Some industry leaders express concerns that forcing employees to work long hours can lead to talent attrition, ultimately harming company progress [18][20]. - The phenomenon of "996" is contrasted with a more relaxed work culture in Europe, where the concept of "996" humorously refers to taking significant time off rather than long working hours [25][26].

Altman亲自发博客点赞，这两大杰出人才是谁？

机器之心· 2025-09-09 06:45

Core Viewpoint - OpenAI's recent advancements in AI technology, particularly with ChatGPT, are attributed to the contributions of two key researchers, Jakub Pachocki and Szymon Sidor, who have effectively combined cutting-edge research with engineering practices to solve numerous challenges [1][3][4]. Group 1: Contributions of Jakub Pachocki - Jakub Pachocki is recognized as a pivotal figure at OpenAI, serving as the Chief Scientist and leading significant projects such as the development and pre-training of GPT-4 [4][8]. - He played a crucial role in the OpenAI Five project, where AI defeated human champions in the game Dota 2, which bolstered confidence in the potential of large-scale reinforcement learning (RL) [4][8]. - Pachocki's academic background includes a focus on high-dimensional convex optimization, which is closely related to the training of modern neural networks [6][8]. Group 2: Contributions of Szymon Sidor - Szymon Sidor, who graduated from MIT, has made significant contributions to various core projects at OpenAI, including the development of large-scale RL systems and advancements in robotics [12][13]. - His early research explored the intersection of reinforcement learning and natural language processing (NLP), laying the groundwork for techniques used in aligning ChatGPT and training reasoning models [12][14]. - Sidor's involvement in the OpenAI Five project and his contributions to the GPT-4 technical report highlight his integral role in the company's advancements [13][14]. Group 3: Internal Dynamics and Leadership Changes - Following the unexpected dismissal of CEO Sam Altman, both Jakub Pachocki and Szymon Sidor, along with other key personnel, resigned in protest, which triggered a significant employee backlash [16][17]. - The internal crisis led to a restructuring of OpenAI's leadership, with Pachocki being appointed as the new Chief Scientist after Altman's return [17].

Previous Next