机器之心 - filings, earnings calls, financial reports, news - Reportify

机器之心

Search documents

解道奥赛题成本5000美元？陶哲轩警告，AI下一步要规模化的「更便宜」

机器之心· 2025-07-25 04:29

人工智能和数学是密不可分的。机器之心报道机器之心编辑部 AI 的发展离不开数学的进步，同时 AI 的进步也离不开解决数学问题的能力。在刚结束不久的 IMO 竞赛中，谷歌的新一代 Gemini 进阶版模型成功解决了六道超高难度试题中的五道，达到了今年 IMO 的金牌水平（35/42)，成为首个获得奥赛组委会官方认定为金牌的 AI 系统。加州大学洛杉矶分校数学系终身教授，菲尔兹奖获得者，被称为「数学莫扎特」的华人数学家陶哲轩，参加了今年度 IMO 竞赛的颁奖典礼。他同样也对在 IMO 取得成绩的 AI 模型十分关注。但他同样表达了一定程度的担忧，希望明年能够在更加受控的环境下对 AI 模型进行科学比较和评估。陶教授认为：一些在标准考试条件下可能连铜牌都难以稳定获得的学生或队伍，在某些经过修改的赛制下，反而可能稳定地达到金牌水平。因此，在没有采用统一、非参赛队自选的控制性测试方法的前提下，对于不同 AI 模型在类似 IMO 等竞赛中的表现，应当谨慎看待，避免作出过于简单化的「对等」比较。陶教授对人工智能的发展和评估的关心是一贯的。就在刚刚，他在 mathstodon 上发表了对于人 ...

Gemini进阶版模型

Gemini进阶版模型

港科大&北京人形提出LOVON：足式机器人开放世界全域目标追踪新范式！

机器之心· 2025-07-25 04:29

Core Viewpoint - The LOVON framework represents a significant advancement in the field of robotics, enabling legged robots to autonomously navigate complex, dynamic environments by integrating large language models, open vocabulary visual detection, and precise language-motion mapping [2][5][20]. Group 1: Introduction to LOVON - The LOVON framework addresses the challenges of long-range multi-target navigation in open environments, overcoming limitations of traditional methods that struggle with real-time visual disturbances and target loss [1][5]. - It combines task planning capabilities of large language models with open vocabulary visual detection and a language-motion mapping model, allowing for efficient navigation in dynamic, unstructured settings [2][5]. Group 2: Core Modules of LOVON - LOVON integrates three core modules to create a closed loop of language, vision, and motion, enhancing the robot's navigation capabilities [9]. - The framework employs Laplacian variance filtering technology to stabilize visual processing, improving the detection rate of clear frames by 25% during robot movement [11][12]. - An adaptive execution logic allows robots to respond to unexpected situations, such as target loss or external disturbances, by switching to search mode or seamlessly executing new commands [13][15]. Group 3: Performance Metrics - In simulation environments like GymUnreal, LOVON achieved a success rate of 1.00, significantly outperforming traditional methods, which had a success rate of 0.94 [18]. - The training efficiency of LOVON is remarkable, requiring only 1.5 hours compared to 360 hours for the best competing model, indicating a 240-fold improvement [18]. Group 4: Real-World Applications - LOVON has been successfully deployed on various legged robot platforms, including Unitree Go2, B2, and H1-2, showcasing its plug-and-play capability without the need for extensive customization [19]. - The framework is poised to transform applications in smart homes, industrial inspections, and field research, providing robust support for diverse tasks [20][21]. Group 5: Key Features - LOVON demonstrates exceptional open-world adaptability, enabling robots to recognize a wide range of objects in unfamiliar environments [23]. - It excels in multi-target long-range tracking, executing complex tasks smoothly and without interruption [23]. - The framework exhibits strong robustness in dynamic environments, maintaining stable tracking of moving targets across various terrains [23]. - LOVON's anti-interference capabilities allow it to quickly reacquire targets and continue tasks despite disruptions [23].

足式机器人开放世界全域目标追踪

大语言模型（LLMs）

LOVON（Legged Open-Vocabulary Object Navigator）框架

足式机器人开放世界全域目标追踪

大语言模型（LLMs）

LOVON（Legged Open-Vocabulary Object Navigator）框架

Meta出走华人创业团队，种子轮800万美元，要打造视觉AI记忆大脑

机器之心· 2025-07-25 02:03

机器之心报道机器之心编辑部 Dr. Shawn Shen ，联合创始人兼首席执行官（左）； Ben (Enmin) Zhou ，联合创始人兼首席技术官（右）。由前 Meta Reality Labs 顶尖科学家团队创立的 AI 研究实验室 Memories.ai ，正式宣布完成 8 00 万美元种子轮融资。本轮融资由 Susa Ventures 领投，三星风投（Samsung Next）、Fusion Fund 等知名机构跟投。 Memories.ai 团队已经在大模型领域完成了一项重大的突破成果，剑指 AI 系统的「记忆缺失」问题，为视觉模型创造了强大的「记忆大脑」。「最强大脑」众所周知，大模型是标标准准的「金鱼记忆」。比如，大多数 AI 系统都缺乏对历史画面的记忆，难以理解前后之间的关联。就像我们经常开的玩笑，「记忆是个先进先出栈」，只不过大模型的栈容量似乎总是不够用。这种「金鱼记忆」限制了它们在需要深入理解场景和动态变化的应用中发挥作用，尤其是在视频密集型任务里表现不佳。为了彻底解决这个问题， Memories.ai 通过其核心创新 —— 大视觉记忆模型（LVMM），为 ...

人工智能视觉记忆

大视觉记忆模型（LVMM）

人工智能视觉记忆

大视觉记忆模型（LVMM）

北大-灵初重磅发布具身VLA全面综述！一文看清VLA技术路线与未来趋势

机器之心· 2025-07-25 02:03

Core Insights - The article discusses the rapid advancements in Vision-Language-Action (VLA) models, which are capable of extending intelligence from the digital realm to physical tasks, particularly in robotics [1][9]. - A unified framework for understanding VLA models is proposed, focusing on action tokenization, which categorizes eight main types of action tokens and outlines their capabilities and future trends [2][10]. VLA Unified Framework and Action Token Perspective - VLA models rely on at least one visual or language foundation model to generate action outputs based on visual and language inputs, aiming to execute specific tasks in the physical world [9][11]. - The framework categorizes action tokens into eight types: language description, code, affordance, trajectory, goal state, latent representation, raw action, and reasoning [10][16]. Action Token Analysis - **Language Description**: Describes actions in natural language, divided into sub-task level (language plan) and atomic action level (language motion) [16][20]. - **Code**: Represents task logic in code form, allowing for efficient communication between humans and robots, but faces challenges related to API dependencies and execution rigidity [22][23]. - **Affordance**: A spatial representation indicating how objects can be interacted with, emphasizing semantic clarity and adaptability [25][26]. - **Trajectory**: Represents continuous spatial states over time, utilizing video data to enhance training data sources [29][30]. - **Goal State**: Visual representation of expected outcomes, aiding in action planning and execution [34][35]. - **Latent Representation**: Encodes action-related information through large-scale data pre-training, enhancing training efficiency and generalization [36][37]. - **Raw Action**: Directly executable low-level control commands for robots, showing potential for scalability similar to large language models [38][39]. - **Reasoning**: Expresses the thought process behind actions, enhancing model interpretability and decision-making [42][45]. Data Resources in VLA Models - The article categorizes data resources into a pyramid structure: network data and human videos at the base, synthetic and simulation data in the middle, and real robot data at the top, each contributing uniquely to model performance and generalization [47][48][49]. Conclusion - VLA models are positioned as a key pathway to embodied intelligence, with ongoing research focusing on action token design, challenges, and future directions, as well as the practical applications of VLA technology in real-world scenarios [51].

视觉 - 语言 - 动作模型（VLA）

VLA模型Psi R1

视觉 - 语言 - 动作模型（VLA）

VLA模型Psi R1

MeanFlow再下一城，北大提出机器人学习新范式MP1，实现速度与成功率双SOTA

机器之心· 2025-07-24 09:33

作者介绍：盛举义，北京大学在读博士研究生，研究方向为机器人操作技能学习方法研究；王梓懿、李培铭，北京大学在读硕士研究生，研究方向为视频理解分析；刘勇，浙江大学控制科学与工程学院教授，研究领域为自主机器人与智能系统；刘梦源，北京大学深圳研究生院助理教授，研究领域为人类行为理解与机器人技能学习。在目前的 VLA 模型中，「A」— 动作生成模型决定了动作生成的质量以及速度。具体而言，生成式模型在推理速度与任务成功率之间存在「根本性权衡」。其中，Diffusion Models（如 Diffusion Policy 和 DP3）通过多步迭代生成高质量动作序列，但推理速度较慢，难以满足实时控制要求；而 Flow-based 模型（如 FlowPolicy）尽管能提供快速推理，但需要额外的架构约束或一致性损失（consistency loss）来保证轨迹的有效性，这增加了设计复杂性并可能限制性能和泛化能力。此外，机器人操作面临另一个挑战，即数据高效的少样本泛化。标准模仿学习策略容易出现「特征坍塌（feature collapse）」，即将需要不同动作的关键状态错误地映射到相似的潜在表征 latent ...

Dispersive Loss

Dispersive Loss

倒计时3天！锁定直播，共赴2025 WAIC云帆奖颁奖典礼&挚友之夜！

机器之心· 2025-07-24 09:33

2025 WAIC 云帆奖颁奖典礼暨云帆青年挚友之夜 7月27日18:30 正式启幕！此次活动定向邀请 150 位云帆奖得主、召集人、评委、候选人，以及全球 AI 技术领袖、学术新锐和顶尖投资人，致力于打造 WAIC 期间含全球 AI 顶尖人才浓度最高的年轻舞台。全球 AI 新星竞逐云帆奖 • 见证荣耀时刻： 2025 WAIC 云帆奖得主重磅揭晓； • 聆听先锋开讲：感受新一代 AI 掌舵者的智慧、激情与远见； • 把握前沿脉动：从现场热烈氛围中，捕捉 AI 领域最值得关注的新方向与新力量。 2025 WAIC 云帆奖汇聚全球顶尖 AI 学术与产业力量，由清华大学交叉信息研究院院长及人工智能学院院长、上海期智研究院院长姚期智，清华大学智能科学讲席教授、智能产业研究院院长张亚勤，上海人工智能实验室主任、首席科学家、清华大学惠妍讲席教授周伯文担任奖项召集人，携手寻找真正引领变革的顶尖 AI 青年人才。奖项以其广泛的国际影响力，吸引了全球顶尖青年 AI 人才竞相角逐！面向未来 AI 青年力量锁定直播，见证历史，共襄盛举激动人心的颁奖典礼盛况全球直播！您将： WAIC 云帆奖于2 ...

创智突破：AI首次自主发现106个超越人类设计的神经网络架构

机器之心· 2025-07-24 06:50

科学发现还是人类专利吗？当世界还在为 AI 在数学竞赛中达到金牌水平而惊叹时，一项更加深远的突破正在悄然发生。与解决 IMO 题目这种封闭性问题不同，真正的科学发现是一个开放性的、长期的认知过程 —— 需要提出原创问题、设计实验方案、观察现象规律、形成科学假设，然后在不断的试错和迭代中逼近真理。这个过程的复杂度远超任何标准化测试，它要求的不是计算能力，而是真正的科学创新思维。由创智学院领衔的研究团队今日发布的 AI 超智能系统首次证明，AI 已经具备了进行完整科学发现的能力 —— 该系统在完全自主的条件下发现了 106 个超越人类设计的神经网络架构（在多个基准测试中超越了如 Mamba2 和 Gated DeltaNet 等强大的基线模型），更恐怖的是，它初步验证了科学突破可以像训练模型一样进行工业化量产。标志着我们正式跨入了长期自主超智能（Long-Horizon Superintelligence）的新纪元，科学发现进入 Scaling Law 时代！从数学金牌到科学发现：认知复杂度的代际跃迁近期 AI 领域最引人注目的成就之一是在数学竞赛中的突破表现。Google 等研究 ...

科学发现缩放定律

科学发现缩放定律

DeepRare 重磅发布：全球首个可循证智能体诊断系统，直击医学Last Exam难题

机器之心· 2025-07-24 06:50

Core Viewpoint - The article discusses the challenges of diagnosing rare diseases and introduces DeepRare, an innovative AI-driven diagnostic system designed to improve the accuracy and efficiency of rare disease diagnosis [1][4][40]. Group 1: Rare Disease Challenges - Over 350 million people globally are affected by rare diseases, with more than 7,000 types identified, 80% of which are genetic [1]. - Patients often face significant delays in diagnosis, averaging over 5 years, with more than 7 consultations and 3 misdiagnoses, leading to a misdiagnosis rate of 40%-50% [1]. - The high heterogeneity of symptoms and fragmented information complicates the diagnostic process, making traditional AI models inadequate [2]. Group 2: DeepRare System Overview - DeepRare is the world's first reasoning-based intelligent diagnostic system for rare diseases, developed by Shanghai Jiao Tong University in collaboration with several institutions [4][6]. - The system utilizes a multi-agent architecture combined with large language models to simulate the diagnostic reasoning process of clinical doctors [6]. - It supports multi-modal inputs, including free text, structured phenotype data, and genomic data, allowing for adaptive responses to various input scenarios [8]. Group 3: Diagnostic Workflow and Performance - The diagnostic process in DeepRare consists of two main stages: gene analysis and knowledge matching, utilizing over 40 medical tools and databases for comprehensive reasoning [11][13]. - The system has shown significant performance improvements, with an average Recall@1 of 57.18%, surpassing existing methods by 23.79 percentage points [23]. - In real clinical cases, DeepRare achieved a Recall@1 of 70.6%, significantly outperforming Exomiser [33]. Group 4: Case Study and Impact - A case study highlighted the successful diagnosis of a 20-month-old child with Prader-Willi syndrome using DeepRare, which had previously gone undiagnosed [18][19]. - The system's ability to provide accurate diagnostic suggestions not only aids in clinical decision-making but also offers hope to families facing undiagnosed conditions [20]. Group 5: Future Prospects - DeepRare represents a paradigm shift in rare disease diagnosis, with potential applications in research to accelerate the interpretation of ambiguous variants and expand treatable rare diseases [40][41]. - The online platform for DeepRare has been launched, facilitating structured input and diagnostic suggestions for clinical doctors [38].

多智能体架构

可溯源推理

DeepRare（罕见病推理型智能体诊断系统）

多智能体架构

可溯源推理

DeepRare（罕见病推理型智能体诊断系统）

Vibe Coding爆火，YouWare靠「社区+产品思路」突围

机器之心· 2025-07-24 04:08

Core Viewpoint - The article discusses the emergence of "Vibe Coding," a new paradigm in AI-driven creative processes that allows users to interact with AI in a more intuitive and collaborative manner, moving away from traditional programming methods [1][5]. Group 1: Vibe Coding Concept - Vibe Coding was introduced by Andrej Karpathy in February 2025, emphasizing a creative process where users can engage with AI without needing to understand coding, focusing instead on ideas and concepts [2]. - The approach redefines the relationship between humans and machines from a directive model to a collaborative one, likening it to the dynamic between a director and a cinematographer [5]. Group 2: Industry Developments - The acquisition of Windsurf by Google DeepMind for $2.4 billion highlights the competitive landscape for AI coding talent and technology, underscoring the growing importance of Vibe Coding in the industry [1]. - YouWare, a platform designed to facilitate Vibe Coding, aims to create a community where users can easily share and develop AI-driven applications without needing extensive technical knowledge [9][29]. Group 3: User Experience and Community Engagement - YouWare's design focuses on intuitive user experiences, allowing users to generate shareable content through simple natural language prompts without any coding [15][20]. - The platform has successfully attracted 100,000 creative Vibe Coders and accumulated 300,000 projects within four months, indicating strong community engagement [29]. Group 4: Product Features and Innovations - YouWare's AI App Generator enables users to create applications with AI capabilities using just a prompt, streamlining the process and eliminating the need for complex API configurations [17][28]. - Features like the "Boost" function enhance user-generated content, making it more visually appealing and fostering a friendly interaction atmosphere through emoji-based feedback [23][24]. Group 5: Future Trends in AI - The article suggests a shift in focus from merely developing AI models to applying AI technology effectively, highlighting the need for a new generation of AI product managers to define relevant problems for AI solutions [49][50]. - The ongoing AI APP Challenge by YouWare encourages community participation and innovation, reflecting the platform's commitment to fostering a vibrant creative ecosystem [52].

AI下半场应用AI技术的产品能力重要性凸显

AI App Generator

AI下半场应用AI技术的产品能力重要性凸显

AI App Generator

维也纳免费约饭！ACL 2025期间这场晚宴不容错过!

机器之心· 2025-07-24 04:08

Core Insights - The AI field continues to develop rapidly, with new research emerging, particularly in video generation and autonomous agents, leading to significant advancements in state-of-the-art (SOTA) technologies [2][3]. Event Overview - The ACL 2025 conference is a major platform for researchers and industry professionals in natural language processing to share the latest findings and discuss future trends [3]. - A special event, "Yunfan・ACL 2025 AI Talent Meetup," is organized to facilitate informal discussions on cutting-edge technologies and talent interactions, co-hosted by several prominent organizations [4]. Meetup Details - The meetup is scheduled for July 30, 2025, from 16:00 to 20:30 in Vienna, Austria, with an expected attendance of 250 participants [6]. - The agenda includes sessions for young scholars, talent showcases, and networking dinners, aimed at discussing key issues in technology and application [6]. - There will also be opportunities for job seekers and recent graduates to engage with companies through poster presentations and recruitment discussions [7]. Previous Events - The organizing company has successfully hosted several similar events, including the "Yunfan・ICLR 2025 AI Talent Meetup" and "CVPR 2025 Paper Sharing Session," enhancing brand influence and talent acquisition for partners [10].

自然语言处理

自然语言处理