机器之心

Search documents
Meta超级智能实验室重组为四个部门,某些高管将离开
机器之心· 2025-08-20 00:15
机器之心报道 编辑:Panda 据《彭博社》报道,Meta 将重组其超级智能实验室(Meta Superintelligence Labs,MSL)。 具体来说,MSL 以及 Meta 之前的 FAIR 等 AI 部门将被重组为四个专注于 AI 的新部门: Meta 首席 AI 官 Alexandr Wang 在一份内部备忘录中表示,超级智能实验室将被划分为更小的部门,分别专注于 AI 研究、基础设 施、硬件、产品集成以及公司的长期超级智能目标。 他写到:「超级智能即将到来,为了认真对待它,我们需要围绕实现它的关键领域进行组织构建。」 值得注意的是,Meta 首席 AI 科学家、FAIR 的创始负责人 Yann LeCun 的名字并未出现在这些报道中。 这种大力挖角和组织变动也给 Meta 内部造成了一定的混乱,参阅我们之前的报道《 是的,LeCun 要向 28 岁的 Alexandr Wang 汇 报!这是 Meta 新 AI 团队的一些独家内部消息 》。 据《纽约时报》另一篇援引知情人士的报道, 预计一些高管将在此次重组后离职 。据报道, Meta 还在考虑将第三方 AI 模型集 成到其产品中 ,这标志着 ...
DeepSeek开源新基础模型,但不是V4,而是V3.1-Base
机器之心· 2025-08-20 00:15
| 机器之心报道 | | --- | | 编辑:Panda | | 昨晚,深度求索在用户群里宣布「DeepSeek 线上模型版本已升级至 V3.1,上下文长度拓展至 128k」并更新了 UI (去掉了 DeepThink 旁的 R1 标示)之后,在 Hugging Face 发布了一款新模型 DeepSeek-V3.1-Base 。 | | 模型地址:https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base | | 从名字也能看出来,该模型是 DeepSeek-V3 系列最新的基础模型。至于为什么命名为 V3.1,而不是像之前以前命名为 V3 带四位日期数字的形式(如 V3-0324), | | 尽管社区有诸多猜测,但深度求索官方尚未给出明确说明 —— 和该公司之前的操作一样,这一次同样是模型先行,说明和宣传还在后面。 | | 该模型一发布就吸引了 AI 社区的广泛关注,短短几个小时就已经冲上了 Hugging Face 热门模型榜第 4 位! | 具体技术指标上,DeepSeek-V3.1-Base 与 DeepSeek-V3 差别不大,同样的参数量,采用了 ...
ICCV 2025 | RobustSplat: 解耦致密化与动态的抗瞬态3DGS三维重建
机器之心· 2025-08-19 09:45
Core Viewpoint - The article discusses the RobustSplat method, which addresses the challenges of 3D Gaussian Splatting (3DGS) in rendering dynamic objects by introducing a delayed Gaussian growth strategy and a scale-cascade mask guidance method to reduce rendering artifacts caused by transient objects [2][21]. Research Motivation - The motivation stems from understanding the dual role of Gaussian densification in 3DGS, which enhances scene detail but also risks overfitting dynamic areas, leading to artifacts and scene distortion. The goal is to balance static structure representation and dynamic interference suppression [6][8]. Methodology - **Transient Mask Estimation**: Utilizes a Mask MLP with two linear layers to output pixel-wise transient masks, distinguishing between transient and static regions [9]. - **Feature Selection**: DINOv2 features are chosen for their balance of semantic consistency, noise resistance, and computational efficiency, outperforming other feature sets like Stable Diffusion and SAM [10]. - **Supervision Design**: Combines image residual loss and feature cosine similarity loss for mask MLP optimization, enhancing dynamic area recognition [12]. - **Delayed Gaussian Growth Strategy**: This core strategy postpones the densification process to prioritize static scene structure optimization, reducing the risk of misclassifying static areas as transient [13]. - **Scale-Cascade Mask Guidance**: Initially estimates transient masks using low-resolution features, then transitions to high-resolution supervision for more accurate mask predictions [14]. Experimental Results - Experiments on NeRF On-the-go and RobustNeRF datasets show that RobustSplat outperforms baseline methods like 3DGS, SpotLessSplats, and WildGaussians across various metrics, including PSNR, SSIM, and LPIPS [16][21]. Summary - RobustSplat effectively reduces rendering artifacts caused by transient objects through its innovative strategies, demonstrating superior performance in complex scenes with dynamic elements while preserving detail [19][21].
强化学习之父Richard Sutton最新演讲揭示OaK架构:通向超级智能的八步愿景
机器之心· 2025-08-19 09:45
Core Viewpoint - Richard Sutton, the father of reinforcement learning and 2024 ACM Turing Award winner, presented a vision for achieving general artificial intelligence (AGI) and superintelligence through the OaK architecture, which is based on experiential learning and outlines a clear roadmap for AI development [2][4]. Group 1: OaK Architecture Overview - The OaK architecture is not a complete algorithm but a vision that breaks down the goals for AI development into eight necessary steps, highlighting the current gaps and potential development paths [2][6]. - Sutton emphasizes the importance of a simple and general AI agent architecture that learns from experience rather than relying on pre-defined domain knowledge [10][13]. Group 2: Key Concepts in OaK Architecture - The architecture focuses on "open-ended abstraction," allowing the agent to continuously develop its conceptual framework and understanding of the world without being limited by predefined knowledge [13][28]. - Sutton introduces two critical concepts: design time (before deployment) and runtime (during operation), advocating for learning based on experience during runtime to adapt to the complexities of the world [18][20]. Group 3: Learning and Decision-Making - The architecture proposes that agents should learn solely from runtime experiences, as the complexity of the world cannot be fully anticipated or pre-defined [30][31]. - Sutton argues that the agent's knowledge is inherently approximate due to the vast complexity of the world, necessitating a focus on learning and planning during runtime [37][38]. Group 4: Reinforcement Learning and Reward Hypothesis - The reinforcement learning framework is defined by the goal of maximizing a scalar reward signal, which is central to the agent's learning process [42][47]. - Sutton posits that even a simple reward signal can lead to the emergence of intelligent behavior in a sufficiently complex environment [51]. Group 5: Common Agent Model - The common model of intelligent agents includes components such as perception, value function, reactive policy, and transition model, which are interconnected to facilitate learning and planning [58][61]. - This model serves as a foundation for the OaK architecture, which seeks to enhance it by introducing higher-level abstractions and multiple value functions for different subproblems [67][72]. Group 6: Implementation Steps of OaK Architecture - The implementation of the OaK architecture involves eight parallel steps, including learning strategies for maximizing rewards, generating new state features, and constructing corresponding subproblems [82][85]. - Each step is contingent on the successful realization of continuous deep learning and the ability to generate and evaluate new features [86][90]. Group 7: Future Directions and Challenges - Sutton acknowledges that while some steps in the OaK architecture are feasible, significant challenges remain, particularly in achieving reliable continuous learning in nonlinear deep learning networks [89][96]. - The architecture aims to create a system that evolves through an open-ended cycle of exploration and learning, with the ultimate goal of enhancing the agent's ability to abstract and generalize from experiences [160].
X-SAM:从「分割一切」到「任意分割」:统一图像分割多模态大模型,在20+个图像分割数据集上均达SoTA
机器之心· 2025-08-19 06:33
Core Viewpoint - The article discusses the development of X-SAM, a unified multimodal large language model for image segmentation, which enhances the capabilities of existing models by allowing for pixel-level understanding and interaction through visual prompts [4][26]. Background and Motivation - Segment Anything Model (SAM) excels in dense segmentation mask generation but is limited by its reliance on single input modes, hindering its applicability across various segmentation tasks [4]. - Multimodal large language models (MLLMs) have shown promise in tasks like image description and visual question answering but are fundamentally restricted in handling pixel-level visual tasks, which limits the development of generalized models [4]. Method Design - X-SAM introduces a unified framework that extends the segmentation paradigm from "segment anything" to "any segmentation" by incorporating visual grounded segmentation (VGS) tasks [4]. - The model employs a dual projectors architecture to enhance image understanding and a segmentation connector to provide rich multi-scale information for segmentation tasks [11][12]. - X-SAM utilizes a three-stage progressive training strategy to optimize performance across diverse image segmentation tasks, including segmentor fine-tuning, alignment pre-training, and mixed fine-tuning [16][22]. Experimental Results - X-SAM has been evaluated on over 20 segmentation datasets, achieving state-of-the-art performance across seven different image segmentation tasks [19]. - The model's performance metrics indicate significant improvements in various segmentation tasks compared to existing models, showcasing its versatility and effectiveness [20][21]. Summary and Outlook - X-SAM represents a significant advancement in the field of image segmentation, establishing a foundation for future research in video segmentation and the integration of temporal information [26]. - Future directions include expanding the model's capabilities to video segmentation tasks, potentially enhancing video understanding technologies [26].
7年了,OpenAI官方给出五代GPT对比,网友却怀念起「狂野」初代
机器之心· 2025-08-19 06:33
Core Viewpoint - The article discusses the evolution of the GPT series models from GPT-1 to GPT-5, highlighting significant improvements in their capabilities, particularly in understanding and generating coherent responses to complex queries [2][5][49]. Group 1: Evolution of GPT Models - GPT-1 was characterized by awkward and nonsensical responses, demonstrating limited understanding of complex questions [2][11][12]. - GPT-5, in contrast, provides well-structured, coherent, and contextually relevant answers, showcasing a significant leap in performance over its predecessors [5][20][49]. - The internal experiences of OpenAI personnel reflect the profound changes in the model's capabilities over seven years of development [6][49]. Group 2: Specific Comparisons - When asked about the feasibility of annual full-body MRI scans for cancer detection, GPT-1's response was illogical and confusing, while GPT-2 offered a slightly better but still unhelpful answer [11][12]. - GPT-4 provided a more reliable response but lacked a personal touch, whereas GPT-5 not only addressed the question effectively but also engaged with emotional value, resembling a conversation with a knowledgeable doctor [20][21][49]. - The article emphasizes that GPT-5's responses are not only accurate but also considerate, indicating a shift towards more human-like interaction [20][21][49]. Group 3: User Reactions - The article notes varied user opinions on the different GPT models, with some expressing nostalgia for the wild and unpredictable nature of GPT-1, suggesting it had a unique appeal [50][51]. - There are comments suggesting that GPT-1 felt more like a "true AGI" compared to its successors, indicating a divergence in user preferences [53][54].
妙笔生维:线稿驱动的三维场景视频自由编辑
机器之心· 2025-08-19 02:43
Core Viewpoint - The article discusses the development of Sketch3DVE, a novel method for 3D scene video editing that allows users to manipulate videos using simple sketches, enhancing creativity and personalization in video content creation [3][22]. Part 1: Background - Recent advancements in video generation models have significantly improved text-to-video and image-to-video generation, with a focus on precise control over camera trajectories due to its important application prospects [6]. - Existing methods for video editing are categorized into two types: one directly uses camera parameters as model inputs, while the other constructs explicit 3D representations from single images to render new perspective images [8][9]. - Despite these advancements, editing real videos with significant camera motion remains a challenge, as video editing requires maintaining original motion patterns and local features while synthesizing new content [8][9]. Part 2: Algorithm Principles - Users begin by selecting the first frame of a 3D scene video, marking the editing area with a mask and drawing a sketch to specify the geometry of new objects [12]. - The system employs the MagicQuill image editing algorithm to process the first frame, generating the edited result, and utilizes the DUSt3R algorithm for 3D reconstruction to analyze the entire input video [13]. - A 3D mask propagation algorithm is designed to accurately transfer the mask from the first frame to subsequent frames, ensuring consistency across different perspectives [14]. - The final video generation model integrates edited images, multi-view videos, and original input videos to produce a scene-edited video with precise 3D consistency [14]. Part 3: Effect Demonstration - The method allows users to create high-quality 3D scene video edits, enabling operations such as adding, removing, and replacing objects while maintaining good 3D consistency [16]. - The approach can handle complex scenarios involving shadows and reflections, producing reasonable editing results due to training on real video datasets [17]. - Users can also edit the first frame using image completion methods, demonstrating the versatility of the system in generating realistic 3D scene video edits [19]. - Sketch3DVE offers an effective solution to traditional model insertion challenges, allowing for personalized 3D object generation and high-fidelity scene video editing without requiring extensive expertise [22].
清华叉院教授手把手教你用强化学习训练智能体
机器之心· 2025-08-19 02:43
在大模型智能体(Agent)时代,最重要的技术之一就是通过智能体强化学习(Agentic RL)训练通用智能体。ASearcher 是 AReaL 团队的第一个 Agentic RL 项 目,基于 AReaL 的全异步 Agentic RL,打造端到端搜索智能体(Search Agent)。 AReaL 允许智能体进行至多 128 次复杂环境交互;同时,极简代码设计让用户在单文件内就能实现复杂的长程工具调用(Long-Horizon Tool Use)。 本次分享中,吴翼教授将带领 AReaL 团队&ASearcher 项目核心成员,以多轮搜索智能体(multi-turn search agent)为例,手把手教大家用最少的代码实现极速 Agentic RL 训练。 分享主题: 清华叉院教授手把手教你用强化学习训练智能体 分享摘要 : 直播前准备: 嘉宾简介: 直播时间 : 北京时间 8 月 21 日 19:30-20:30 直播预约: 本次直播设有 QA 环节,欢迎大家加群一起来聊。 1. Agentic RL 的难点:长程工具调用 2. ASearcher 项目:全异步 RL 解锁 Agent 长程工具调 ...
开源版Genie 3世界模型来了:实时+长时间交互,单卡可跑,国内公司出品
机器之心· 2025-08-19 02:43
Core Viewpoint - The article discusses the launch of the open-source interactive world model "Matrix-Game 2.0" by Kunlun Wanwei, which demonstrates significant advancements in real-time interactive generation and simulation of complex environments, rivaling the capabilities of proprietary models like Google DeepMind's Genie 3 [1][3][11]. Group 1: Product Overview - Matrix-Game 2.0 is an open-source model with 1.8 billion parameters, capable of running on a single GPU and achieving a frame rate of 25 FPS for virtual environment generation [12][36]. - The model allows users to upload images and interact with the generated virtual world using keyboard controls, enabling real-time movement and perspective changes [19][40]. - It has been noted for its ability to simulate realistic environments, including complex terrains and dynamic elements, enhancing user immersion [8][21]. Group 2: Technical Innovations - The model employs a novel visual-driven interactive world modeling approach, moving away from traditional language-based prompts to focus on visual understanding and physical law learning [35][40]. - Matrix-Game 2.0 integrates a self-regressive diffusion generation mechanism, which helps in producing longer videos while minimizing content deviation and error accumulation [42][45]. - The data production pipeline utilized for training includes over 1.2 million video clips, achieving an accuracy rate exceeding 99% [37][38]. Group 3: Market Impact and Future Prospects - The emergence of Matrix-Game 2.0 signifies a shift in the world model landscape, indicating that such technologies are moving towards practical applications in various fields, including gaming and robotics [57][59]. - The article highlights the potential of world models to serve as training environments for AI, addressing challenges like data scarcity and generalization in embodied intelligence [57][58]. - Kunlun Wanwei's continuous efforts in open-source projects are expected to accelerate the practical implementation of world models, enhancing their utility across different sectors [54][59].
图生视频新玩法刷爆外网:图上画两笔就能动起来,终于告别文本提示
机器之心· 2025-08-19 02:43
机器之心报道 编辑:杜伟、杨文 现在,AI看你画的就能懂。 Higgsfield AI 这家公司,有点意思。 不仅三天两头上线新功能,在 X 上疯狂刷存在感,还一度被传出和 Meta 洽谈收购事宜,虽然最后不了了 之。 该公司专注于 AI 视频生成,最擅长电影级镜头控制技术,三个月前曾凭借 AI 运镜视频生成火出圈,我们 还专门报道过: 一张照片实现超 70 种百万级运镜!这款 AI 神器给了摄影师一记「铁拳」 前几天,它又先后发布了 Draw-to-Video 和 Product-to-Video 功能。 前者只需上传一张静态图像,在上面绘制图形、文字或箭头等元素,即可生成具有电影质感的视频画面。 该功能一经发布就在外网爆了,短短 4 天时间 X 上的浏览量就超 530 万。 后者则可以通过简单的拖拽操作,免费生成精美的、电影级的广告视频。截至目前也已在 X 上收获 160 万 次浏览量。 据 The Information 报道, Meta Platforms 正在寻求与开发人工智能视频生成与编辑模型的初创公司建立合作关系,曾与视频生成初创公 司 Higgsfield 探讨过潜在的收购事宜,但这些谈判目前 ...