机器之心 - filings, earnings calls, financial reports, news

机器之心

Search documents

LLM工业级自进化：北邮与腾讯AI Lab提出MoE-CL架构，解决大模型持续学习核心痛点

机器之心· 2025-09-30 00:27

在工业级大语言模型（LLM）应用中，动态适配任务与保留既有能力的 "自进化" 需求日益迫切。真实场景中，不同领域语言模式差异显著，LLM 需在学习新场景合规规则的同时，不丢失旧场景的判断能力。这正是大模型自进化核心诉求，即 "自主优化跨任务知识整合，适应动态环境而无需大量外部干预"。为解决此问题，北邮百家 AI 团队与腾讯 AI Lab 团队提出参数高效的对抗性混合专家架构 MoE-CL，专门用于 LLM 的自进化持续指令微调。其核心设计在于 "解耦 LoRA 专家" 与 "GAN 对抗降噪" 的结合：为每个任务配置专属 LoRA 专家以保留任务特定知识，避免参数更新相互干扰；同时设置共享 LoRA 专家，通过生成对抗网络（GAN）中的任务感知鉴别器抑制无关噪声，确保跨任务知识高效且精准传递，最终实现 "知识保留" 与 "跨任务泛化" 的平衡，这也是 LLM 自进化的核心逻辑。从实验效果来看，MoE-CL 的自进化能力已在实际场景与基准测试中得到验证。在腾讯真实业务场景 A/B 测试中，它将人工介入成本降低 15.3%；在公开 MTL5 跨域基准与工业级 Tencent3 基准测试中，其平均准确率 ...

Claude Sonnet 4.5来了！能连续编程30多小时、1.1万行代码

机器之心· 2025-09-30 00:27

Core Insights - The article discusses the recent advancements in AI models, particularly the release of Claude Sonnet 4.5 by Anthropic, which is positioned as a leading model in various benchmarks and applications [1][4][5]. Model Performance - Claude Sonnet 4.5 achieved significant performance improvements in various benchmarks, including: - 77.2% in Agentic coding [2] - 82.0% in SWE-bench Verified [2] - 61.4% in OSWorld for computer use, up from 42.2% in the previous version [11] - The model shows enhanced capabilities in reasoning and mathematics, with a perfect score of 100% in high school math competitions [12][13]. Developer Tools and Features - Anthropic introduced the Claude Agent SDK, allowing developers to create their own intelligent agents [4][35]. - New features include checkpoint functionality for saving progress, a revamped terminal interface, and native VS Code extensions [8][4]. Safety and Alignment - Claude Sonnet 4.5 is noted for being the most aligned model to human values, with improvements in reducing undesirable behaviors such as flattery and deception [27][5]. - The model is released under AI safety level 3 (ASL-3), incorporating classifiers to detect potentially dangerous inputs and outputs [32]. User Experience and Applications - Early user experiences indicate that Claude Sonnet 4.5 performs exceptionally well in specialized fields such as finance, law, and STEM [13][21]. - The "Imagine with Claude" feature allows real-time software generation without pre-defined functions, showcasing the model's adaptability [36][38].

Artificial Intelligence

Artificial Intelligence

强强联手！深度求索、寒武纪同步发布DeepSeek-V3.2模型架构和基于vLLM的模型适配源代码

机器之心· 2025-09-29 11:05

Core Viewpoint - The release of DeepSeek-V3.2 by DeepSeek Company and its adaptation by Cambricon signifies a strong collaboration among leading tech firms in China's AI industry, aiming to enhance efficiency in long-text training and inference [2][3][4]. Group 1: Model Release and Features - DeepSeek Company launched the experimental version DeepSeek-V3.2-Exp, which introduces a sparse attention mechanism for optimizing long text training and inference [2]. - The new model has a substantial size of 671GB, requiring approximately 8-10 hours for download under ideal bandwidth conditions [3]. Group 2: Collaboration and Industry Impact - Cambricon's quick adaptation to DeepSeek-V3.2-Exp indicates prior collaboration and communication between the two companies, reflecting a trend of low-profile yet effective partnerships in the tech industry [3]. - The collaboration between leading companies in the AI model and chip sectors is expected to significantly reduce training and inference costs for users, facilitating the emergence of AI applications [4].

刚刚，DeepSeek开源V3.2-Exp，公开新稀疏注意力机制DSA

机器之心· 2025-09-29 10:29

Core Viewpoint - DeepSeek has released the experimental version DeepSeek-V3.2-Exp, which introduces a new sparse attention mechanism aimed at optimizing training and inference efficiency in long-context scenarios [3][5][10]. Summary by Sections Model Release - DeepSeek-V3.2-Exp has been open-sourced with a parameter count of 685 billion [3]. - The release includes a paper detailing the new sparse attention mechanism [5]. Sparse Attention Mechanism - The DeepSeek Sparse Attention (DSA) is the only architectural improvement in version 3.2, focusing on enhancing computational efficiency when processing extended text sequences [5][6][10]. - DSA achieves fine-grained sparse attention while maintaining nearly the same output quality as its predecessor, DeepSeek-V3.1-Terminus [9]. Performance Comparison - A comparison of benchmark results between DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp shows that the new version performs comparably across various tasks [11]. - Specific benchmark results include: - MMLU-Pro: 85.0 (V3.1) vs. 85.0 (V3.2) - AIME 2025: 88.4 (V3.1) vs. 89.3 (V3.2) - Codeforces: 2046 (V3.1) vs. 2121 (V3.2) [11]. Future Developments - The upcoming release of Z.ai's GLM-4.6 model is noted, with GLM-4.5 being the previous flagship model [12].

稀疏注意力机制

Transformer架构

Artificial Intelligence

DeepSeek-V3.2-Exp

DeepSeek Sparse Attention (DSA)

DeepSeek-V3.1-Terminus

稀疏注意力机制

Transformer架构

Artificial Intelligence

DeepSeek-V3.2-Exp

DeepSeek Sparse Attention (DSA)

DeepSeek-V3.1-Terminus

SALMONN 系列音视频理解大模型霸榜回归！推理增强、高帧率、无文本泄漏全线突破

机器之心· 2025-09-29 08:28

Core Insights - The SALMONN family has expanded significantly with the introduction of new models, including video-SALMONN 2/2+, video-SALMONN-o1, and F-16, solidifying its leadership in open-source audio-visual understanding models [1][6][36] - The video-SALMONN 2+ model focuses on high-quality, complete video descriptions, achieving state-of-the-art results in caption integrity and accuracy [4][6] - The F-16 model is designed for high frame rate video understanding, addressing the limitations of existing models that operate at low frame rates [25][31] Model Performance - The video-SALMONN 2+ model outperforms competitors like GPT-4o and Google Gemini 1.5 Pro in various audio-visual understanding benchmarks, demonstrating superior performance in tasks such as Video-MME and WorldSense [6][7] - The model's ability to generate high-quality descriptions enhances its performance in question-answering tasks, indicating a robust understanding of audio-visual content [6][9] - The introduction of the AVUT benchmark aims to create a fair evaluation standard for audio-visual understanding, addressing the issue of text shortcuts in existing benchmarks [32][35] Technical Innovations - The process DPO (pDPO) training method enhances the model's ability to perform step-level optimization in audio-visual contexts, improving its self-checking capabilities [24] - The F-16 model employs multi-frame joint alignment compression to maintain semantic integrity while reducing computational costs, achieving significant advancements in high frame rate video tasks [25][29] - The video-SALMONN-o1 model introduces reasoning enhancement, allowing for evidence-based multi-step reasoning in audio-visual scenarios, which is a significant advancement over existing systems [21][22] Future Directions - The SALMONN series is expected to continue evolving, with ongoing iterations aimed at improving model capabilities and establishing a comprehensive ecosystem for audio-visual understanding [36][38]

Artificial Intelligence

Artificial Intelligence

SALMONN系列音视频理解大模型

腾讯混元3D-Omni：3D版ControlNet突破多模态控制，实现高精度3D资产生成

机器之心· 2025-09-29 06:55

近年来，3D 原生生成模型在游戏、影视和设计领域的资产创建中展现出强大潜力。然而，大多数现有方法仍主要依赖图像作为条件输入，缺乏细粒度、多模态的控制能力，限制了其在实际生产流程中的应用。为解决这一瓶颈，腾讯混元团队推出了混元 3D-Omni ，一个基于 Hunyuan3D 2.1 构建的统一多模态可控 3D 生成框架。该框架不仅支持图像作为输入，还可接受点云、体素、边界框与骨骼姿态等多种控制信号，实现对生成物体几何结构、拓扑与姿态的精细控制。一、背景与挑战图 1 ：混元 3D-Omni 可支持多种模态作为控制条件，实现精细化 3D 资产生成混元 3D-Omni 是一个支持多种控制条件的 3D 资产创建系统。它通过两个关键性的创新来推动尖端 3D 生成技术的发展：其一，采用轻量化的统一控制编码器，实现多种控制条件的统一支持；其二，引入渐进式难度感知训练策略，提升模型对多模态融合的鲁棒性。作为业界首个统一多种条件控制的 3D 生成模型，混元 3D-Omni 可融合多达四类控制条件，显著提升生成结果的可控性及质量。同时该系统将完整开放推理代码以及权重，加速可控 3D 生成模型在学 ...

首个零样本跨本体泛化开源具身模型：智源RoboBrain-X0 技术细节全解析

机器之心· 2025-09-29 06:55

| 机器之心发布 | | --- | 机器之心编辑部为具身智能行业提供了一个可复用、可扩展的通用基座，同时开源训练数据集。今天，北京智源人工智能研究院（BAAI）正式开源 RoboBrain-X0，一个能够在零样本泛化、轻量微调条件下，驱动多种不同真实机器人完成复杂任务的具身智能基座大模型。其核心突破在于：用统一的动作空间与分层任务拆解，实现了「一个基座模型，N种身体」，为通用具身智能提供一条切实可行的路径。 RoboBrain-X0 源自 RoboBrain 的多模态基座能力，在 RoboBrain 2.0 数据基础上，进一步融合了真实机器人动作数据。通过统一建模视觉、语言与动作，它实现了跨本体的泛化与适配，具备从感知到执行的一体化能力。据智源团队公开的评测，RoboBrain-X0 在多个主流机器人本体上的真机实验显示：这些结果意味着，RoboBrain-X0 不仅是理论上的「通用基座」，而且已在工程实践中迈出了从单点突破到规模化落地的关键一步。作为新一代跨本体基座大模型，RoboBrain-X0 突破对单一机器人体系的依赖，实现异构本体统一建模，并具备实用级 zero-sho ...

在具身智能的岔路口，这场论坛把数据、模型、Infra聊透了

机器之心· 2025-09-29 02:52

Core Viewpoint - The field of embodied intelligence is experiencing unprecedented attention, yet key issues remain unresolved, including data scarcity and differing technical approaches [1][2][3] Group 1: Data and Technical Approaches - The industry is divided into two factions: the "real machine" faction, which relies on real-world data collection, and the "synthetic" faction, which believes in the feasibility of synthetic data for model training [5][12] - Galaxy General, representing the synthetic faction, argues that achieving generalization in embodied intelligence models requires trillions of data points, which is unsustainable through real-world data alone [8][9] - The "real machine" faction challenges the notion that real-world data is prohibitively expensive, suggesting that with sufficient investment, data collection can be scaled effectively [12][14] Group 2: Model Architecture - Discussions around the architecture of embodied intelligence models highlight a divide between end-to-end and layered approaches, with some experts advocating for a unified model while others support a hierarchical structure [15][19] - The layered architecture is seen as more aligned with biological evolution, while the end-to-end approach is criticized for potential error amplification [19][20] - The debate extends to the relevance of VLA (Vision-Language Alignment) versus world models, with some experts arguing that VLA is currently more promising due to its data efficiency [21][22] Group 3: Industry Trends and Infrastructure - The scaling law in embodied intelligence is beginning to emerge, indicating that expanding model and data scales could be effective [24] - The industry is witnessing an acceleration in the deployment of embodied intelligence technologies, with various companies sharing their experiences in human-robot interaction and industrial applications [24][29] - Cloud service providers, particularly Alibaba Cloud, are emphasized as crucial players in supporting the infrastructure needs of embodied intelligence companies, especially as they transition to mass production [29][31] Group 4: Alibaba Cloud's Role - Alibaba Cloud has been preparing for the exponential growth in data and computational needs associated with embodied intelligence, having developed capabilities to handle large-scale data processing and model training [33][35] - The company offers a comprehensive suite of cloud-based solutions to support both real and synthetic data production, enhancing efficiency and reducing costs [35][36] - Alibaba Cloud's unique position as a model provider and its engineering capabilities are seen as significant advantages in the rapidly evolving embodied intelligence landscape [37][41]

千寻智能高阳团队最新成果：纯视觉VLA方案从有限数据中学到强大的空间泛化能力

机器之心· 2025-09-29 02:52

设想一下刚学开车的情况：在训练场上，我们可能会反复练习特定动作：到了某个位置就踩刹车，拐到某个点就打方向盘。久而久之，这些动作会形成 "条件记忆"，一旦环境发生变化，就容易手忙脚乱。最近，千寻智能的研究人员注意到，基于模仿学习的视觉运动策略中也存在类似现象，并在论文《Do You Need Proprioceptive States in Visuomotor Policies?》中对此进行了深入探讨。论文链接：https://arxiv.org/abs/2509.18644 项目主页：https://statefreepolicy.github.io 文中研究人员提出了一种名为 State-free Policy 的策略，与 State-based Policy 相比，即便在训练数据中桌面高度、机器人位置和目标物体等都被严格固定的情况下，机器人仍能展现出强大的空间泛化能力。例如：在夹笔任务中，获得桌面高度的泛化能力（标准桌高为 80 cm）：在叠衣服任务中，即使机械臂位置大幅偏离标准位置，机器人仍然能出色完成任务：在全身机器人从冰箱拿饮料的过程中，即使冰箱位置发生移动，机器人也能够适应：事实上 ...

机器之心· 2025-09-28 10:29

Core Insights - The article discusses the evolution of recommendation systems, highlighting the limitations of traditional systems that rely on past data and lack real-time interaction with users [2][9] - Meta's new approach, RecoWorld, introduces a dual-view architecture that allows for multi-round interactions between users and the recommendation system, aiming to enhance user retention [3][4] Group 1: RecoWorld Overview - RecoWorld features a unique dual-view architecture that simulates user interactions and allows the recommendation system to adjust its content dynamically based on user feedback [4][12] - The system utilizes a user simulator that mimics real user behavior, providing feedback such as complaints or likes, which informs the recommendation system's adjustments [13][14] - The design of RecoWorld enables a dynamic feedback loop where user instructions lead to system adjustments, fostering a two-way dialogue between users and the recommendation system [18] Group 2: Mechanism and Functionality - The core mechanism of RecoWorld involves a "virtual duet" where simulated users interact with the recommendation system, helping it learn how to retain users effectively [12][16] - The user simulator can perform various actions such as clicking, skipping, or liking, and its decisions are influenced by environmental factors and past interactions [14][16] - The ultimate goal of RecoWorld is to optimize long-term user retention by maximizing session duration and minimizing session gaps, which correlates with daily active users (DAU) [16] Group 3: Future Implications - RecoWorld represents a foundational infrastructure for recommendation system research, akin to OpenAI's Gym for reinforcement learning, allowing for safe experimentation with new algorithms [21] - The shift from one-way recommendations to interactive systems signifies a transformation where users can direct the algorithm, enhancing the personalization of content [22][24] - Future recommendation systems are envisioned to be more intelligent and responsive, capable of understanding user preferences and adapting in real-time [25][24]

Meta Platforms(US:META)