机器之心
Search documents
给大模型「精准手术」:美团智能客服提出逆向学习技术精准纠偏,风险控制提升38%
机器之心· 2025-09-19 10:43
Core Viewpoint - Meituan's intelligent customer service has introduced a new reverse learning technology that effectively suppresses specific errors and risk behaviors in models, improving key risk control indicators by over 38 percentage points while maintaining overall service quality [2][6]. Group 1: Background and Mechanism - The intelligent customer service system utilizes an end-to-end large model agent combined with a data feedback mechanism to create a closed-loop optimization scheme that automatically collects and utilizes real dialogue data from online services [3]. - This scheme enhances the model's ability to follow instructions, express naturally, and reason through complex states, leading to a significant increase in the overall problem-solving rate across various business scenarios [3]. Group 2: Challenges and Solutions - Despite the improvements from the data feedback mechanism, the reliance on unverified online interactions can introduce erroneous strategies or inappropriate behaviors, leading to a decline in key service quality indicators [4]. - Reverse learning is proposed as a surgical-like behavior editing technique aimed at precisely "removing" undesirable behaviors or sensitive knowledge from the model while preserving its original capabilities [6]. Group 3: Adaptive Learning Method - The adaptive learning method (ALKN) focuses on systematically collecting dialogue data that needs to be "forgotten" and provides clear optimization targets for reverse learning [9]. - The algorithm includes three key components: low-entropy loss function optimization, symmetric transformation iterative training, and adaptive parameter localization, which together enhance training stability and performance retention [11][12]. Group 4: Performance and Future Outlook - The adaptive reverse learning method demonstrates significant advantages over various baseline methods, maintaining overall performance while effectively suppressing undesirable behaviors [15]. - Future developments may integrate reverse learning with reinforcement learning algorithms to create a hybrid optimization framework, enhancing decision-making robustness in dynamic environments [17].
攻克大模型训推差异难题,蚂蚁开源新一代推理模型Ring-flash-2.0
机器之心· 2025-09-19 10:43
Core Viewpoint - The article discusses the release of Ring-flash-2.0 by Ant Group's Bailing team, highlighting its potential to reshape the competitive landscape of large models by achieving high performance with lower activation parameters and improved training stability [1][4][26]. Performance Overview - Ring-flash-2.0 features a total of 100 billion parameters and 6.1 billion activations, achieving a score of 86.98 in mathematical AIME and an Elo score of 90.23 on CodeForces, with a throughput of over 200 tokens per second [1][21]. - The model's performance is comparable to state-of-the-art (SOTA) levels of 40 billion dense models, demonstrating significant advancements in reasoning tasks [1][21]. Technical Innovations - The introduction of the icepop algorithm allows for stable long-term reinforcement learning (RL) training by freezing tokens with large discrepancies in training and inference accuracy, preventing gradient backpropagation [6][10][13]. - The two-staged RL approach combines supervised fine-tuning (SFT) with reinforcement learning using verifiable rewards (RLVR) and human feedback (RLHF), optimizing the training process [14][16]. Cost Efficiency - Ring-flash-2.0 achieves a performance equivalent to a 40 billion dense model while only activating 6.1 billion parameters, marking a turning point in cost efficiency within the large model competition [17][21]. - The model's design allows for high sparsity and low activation, significantly reducing inference costs in high-concurrency scenarios [21]. Market Implications - The competitive landscape for large models is shifting from a focus on parameter quantity to cost-effectiveness, with Ring-flash-2.0 positioned as a leading solution in this new era [18][25]. - The article suggests that Ring-flash-2.0 may signify the beginning of a "high cost-performance era" in the field of large models, following the advancements initiated by GPT-4 [26].
理解帮助生成?RecA自监督训练让统一多模态模型直升SOTA
机器之心· 2025-09-19 00:46
谢集,浙江大学竺可桢学院大四学生,于加州大学伯克利分校(BAIR)进行访问,研究方向为统一多模态理解生成大模型。第二作者为加州大学伯克利分校的 Trevor Darrell,第三作者为华盛顿大学的 Luke Zettlemoyer,通讯作者是 XuDong Wang, Meta GenAl Research Scientist,博士毕业于加州大学伯克利分校 (BAIR 实验室),这篇工作为他在博士期间完成。 背景:统一多模态理解与生成模型的挑战 统一多模态模型(Unified Multimodal Models, UMMs)旨在将视觉理解和生成统一于单一模型架构。UMM 继承了多模态大语言模型 (Multimodal Large Language Models, MLLMs) 可以很轻松地辨别物体的左右、颜色、种类。但是很多生成模型连「一只黑色的猫和白色的狗」,「黄色西兰花」都无法生成。这体现了当前统 一多模态模型在视觉理解和生成能力上的不平衡:它们往往在理解图像内容方面表现出色,但在根据文本描述生成图像时却力不从心。这是为什么呢? 实际上,图片是一个「稠密」的模态,文字是一个「稀疏」的模态,从一个稠密的信息 ...
英伟达50亿美元入股英特尔,将发布CPU+GPU合体芯片,大结局来了?
机器之心· 2025-09-19 00:46
机器之心报道 机器之心编辑部 他们共同宣布,要把电脑上的 CPU 和 GPU 合成为超级 SoC。 周四晚间,英伟达收购 50 亿美元英特尔股份的新闻引爆了科技圈。 两家公司在 9 月 18 日同时发布公告,宣布达成长期战略合作。英伟达将投资 50 亿美元购买英特尔普通股,基于全新合作,两家公司将共同开发多代定制 数据中心和 PC 产品。 在具体内容上,两家公司将专注于利用 NVIDIA NVLink 无缝连接 NVIDIA 和 Intel 架构 —— 将英伟达的 AI 和加速计算优势与英特尔领先的 CPU、x86 生态系统相结合,为客户提供顶尖解决方案。 对于数据中心,英特尔将构建英伟达定制版 x86 CPU,英伟达会将其集成到其 AI 基础设施平台中并提供给市场。 在个人计算领域,英特尔将打造并向市场推出集成 RTX GPU 芯片组的 x86 系统级芯片 (SoC)。这些全新的 x86 RTX SoC 将为各种需要集成世界一流 CPU 和 GPU 的 PC 提供支持。 英伟达计划将以每股 23.28 美元的价格向英特尔普通股投资 50 亿美元(相当于 5% 股份)。不过,此项投资仍需满足惯例成交条件,包括 ...
刚刚,OpenAI在ICPC 2025编程赛上满分登顶,Gemini也达到金牌水平
机器之心· 2025-09-18 04:32
Core Insights - OpenAI and Gemini have both achieved gold medal levels in the ICPC 2025 competition, showcasing significant advancements in AI capabilities in competitive programming [1][26][46] Group 1: OpenAI's Performance - OpenAI solved all 12 problems in 5 hours, outperforming all human teams and achieving the highest rank [1][10] - The AI system submitted correct answers for 11 problems on the first attempt, with the most challenging problem solved after 9 attempts [10][11] - OpenAI's participation utilized a "general reasoning model ensemble" without any specific optimizations for the ICPC competition [15] Group 2: Gemini's Performance - Gemini solved 10 out of 12 problems in 677 minutes, ranking second among human teams [3][28] - The AI began its competition 10 minutes late but still achieved gold-level performance [28] - Gemini demonstrated advanced problem-solving capabilities, including solving a problem that no human team could [33][38] Group 3: Competition Context - The ICPC is recognized as the largest and most prestigious university-level programming competition, attracting participants from nearly 3,000 universities across 103 countries [6][46] - The competition emphasizes the importance of perfect solutions and time management, with only the top four teams receiving gold medals [6][46] Group 4: Implications for AI - The success of AI in the ICPC highlights its potential to provide innovative solutions and complement human expertise in complex problem-solving scenarios [46] - AI is transitioning from a mere information processing tool to a key player in assisting with intricate reasoning tasks [46]
OneSearch,揭开快手电商搜索「一步到位」的秘技
机器之心· 2025-09-18 04:32
机器之心编辑部 还有一个多月,一年一度的"双十一"购物节就要来了! 作为消费者,你通常会如何寻找心仪的商品呢?或许你兴致勃勃地在搜索框里敲下关键词,却发现呈现出来的商品列表总是差强人意。那么,问题究竟出在哪 里? 这一切,还要从电商平台常用的传统搜索架构说起。目前主流系统采用"召回 -> 粗排 -> 精排" 的级联式架构。 那么,到底是哪些环节导致我们总是看到不满意的商品?原因在于: 机器之心发布 1、OneSearch:电商搜索端到端生成式框架 为解决传统电商搜索系统面临的诸多挑战,工业界通常采用级联式架构,以实现较高的商业效益和系统稳定性。然而,随着大语言模型的兴起,研究者开始探索 如何借助其强大的语义理解与世界知识进一步优化搜索体验。 在此背景下, 快手提出了业界首个工业级部署的电商搜索端到端生成式框架 ——OneSearch。 召回层:比如你搜索 "红色连衣裙",系统会迅速从数亿商品中筛选出上万个包含 "红色""连衣裙" 关键词的商品。这步追求快和全,但精度不高 —— 难免会出 现一些标题党商品(比如标题强行蹭热点,写 "红色连衣裙" 但其实卖的是搭配的开衫) 粗排层:系统使用轻量级模型对这上万个商品 ...
从一个公众号智能体说起:好用的Agent,究竟需要什么?
机器之心· 2025-09-18 04:32
Core Viewpoint - The article discusses the evolution and practical applications of AI agents, particularly focusing on Tencent's "公众号智能体" (Public Account Intelligent Agent) and its role in enhancing user experience and operational efficiency in various industries [2][8][35]. Group 1: AI Agent Functionality - The "公众号智能体" can automatically read and update articles from authorized public accounts, addressing the challenge of information overload for users [5]. - A basic yet practical feature of the agent is the article recommendation assistant, which filters and summarizes relevant articles based on user needs [6][8]. - The agent's capabilities highlight the need for a robust industrial platform to support effective AI applications in real-world scenarios [8][11]. Group 2: Industrialization of AI Agents - Tencent's ADP 3.0 platform was launched to facilitate the development of intelligent agents, transitioning from simple applications to complex business services [12][16]. - The platform supports advanced capabilities such as "Agentic RAG," which allows agents to autonomously plan and execute complex tasks by breaking them down into manageable steps [17]. - Workflow capabilities enhance the reliability and stability of complex processes, ensuring compliance with standard operating procedures in industries like hospitality [19][20]. Group 3: Multi-Agent Collaboration - The platform introduces multi-agent collaboration modes, allowing for the integration of agents into workflows and the distribution of tasks among different agents [21][24]. - This collaborative approach increases the capacity to handle complex tasks but also raises challenges in managing communication and synchronization between agents [23]. Group 4: Open Ecosystem and Integration - The ADP platform features a "model plaza" that supports third-party models, reflecting a trend towards flexibility and avoiding vendor lock-in for enterprises [25]. - Over 140 high-quality plugins are available, enabling users to select the most cost-effective models for their needs [26]. - Tencent plans to open-source key technologies in the agent domain, promoting a collaborative ecosystem and demonstrating confidence in its foundational technologies [27]. Group 5: User Accessibility and Market Integration - The article emphasizes the importance of seamless integration of agents into existing user workflows to maximize their value [30]. - A case study of "绝味食品" illustrates how AI agents can enhance marketing efforts, achieving significant improvements in sales performance and customer engagement metrics [31]. - The ultimate goal is to make AI agents not just backend tools but frontline assets that directly interact with consumers, addressing the critical "last mile" challenge in AI application [32][35]. Group 6: Future Directions and Competitive Landscape - The focus of competition in the AI agent space is shifting from model capabilities to practical engineering and ecosystem integration [36]. - The article concludes that a clear and pragmatic framework for developing reliable AI agents is essential for future advancements in the industry [37].
B站出海的强有力支柱:最新开源文本转语音模型IndexTTS-2.0标志零样本TTS进入双维度时代
机器之心· 2025-09-18 04:32
最近在 B 站上,你是否也刷到过一些 "魔性" 又神奇的 AI 视频?比如英文版《甄嬛传》、坦克飞天、曹操大战孙悟空…… 这些作品不仅完美复现了原角色的音 色,连情感和韵律都做到了高度还原!更让人惊讶的是,它们居然全都是靠 AI 生成的! 英文版 甄嬛传他来 了 论文标题: IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech 让坦克飞 B 站开源 index-tts-2.0 长视频测试,效果真的强,曹操大战孙悟空 如果让 AI 开中文苹果发布会, indextts2 效果展示 据悉,这些视频都是运用了 哔哩哔哩 Ind ex 团队最 新开源的文本转语音模型 IndexTTS-2.0 , 这一模型从 demo 发布起,就在海内外社区引发了不少的关注。目前 该工作在 Github 已超过 10k stars 。 论文链接:https://arxiv.org/abs/2506.21619 近年来,大规模文本转语音(Text-to-Spe ...
通义DeepResearch震撼发布!性能比肩OpenAI,模型、框架、方案完全开源
机器之心· 2025-09-18 01:01
Core Insights - The article discusses the advancements of Tongyi DeepResearch, highlighting its transition from basic conversational capabilities to sophisticated research functionalities, achieving state-of-the-art (SOTA) results across multiple benchmarks while being fully open-source [1][3]. Data Strategy - The improvement in model capabilities is attributed to a multi-stage data strategy designed to generate high-quality training data without relying on expensive manual annotations [5]. - The team introduced Agentic Continual Pre-training (CPT) to establish a solid foundation for the model, utilizing a systematic and scalable data synthesis approach [6]. - The data generation process involves restructuring and constructing questions based on a wide array of knowledge documents, web crawler data, and knowledge graphs, creating an open-world knowledge memory anchored by entities [6]. Reasoning Modes - Tongyi DeepResearch features both a native ReAct Mode and a Heavy Mode for managing complex multi-step research tasks [11]. - In ReAct Mode, the model excels in a standard thinking-action-observation cycle, supporting extensive interaction rounds with a context length of 128K [12]. - Heavy Mode employs a new IterResearch paradigm to deconstruct tasks into research rounds, allowing the agent to maintain cognitive focus and high-quality reasoning [13][14]. Training Methodology - The training process integrates Agentic CPT, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL), establishing a new paradigm for agent model training [17][20]. - The team customized RL algorithms based on GRPO, ensuring that learning signals align with the model's current capabilities, and implemented strategies to enhance training stability [21]. - Dynamic indicators during training show significant learning effects, with rewards consistently increasing, indicating effective exploration and adaptation [23]. Application Deployment - Tongyi DeepResearch has empowered various internal applications within Alibaba, including the creation of a simulated training environment to reduce development costs and improve speed [27]. - The team developed a stable and efficient tool sandbox to ensure reliable tool calls during agent training and evaluation [27]. - The collaboration with Gaode App focuses on enhancing complex query experiences in navigation and local services, showcasing the practical application of agent capabilities [28]. Legal Intelligence - Tongyi Falvui serves as a legal intelligence agent, providing professional legal services such as legal Q&A, case law retrieval, and document drafting, leveraging innovative agent architecture [30]. - The performance metrics of Tongyi Falvui indicate superior quality in answer points, case citations, and legal references compared to other models [31]. Research Contributions - The Tongyi DeepResearch team has consistently published technical reports, contributing to the open-source community and advancing the field of deep research agents [33].
让机器人「不只是走路」,Nav-R1引领带推理的导航新时代
机器之心· 2025-09-18 01:01
Core Insights - The article discusses the challenges in enabling robots to understand and execute complex navigation commands in real-world environments, emphasizing the need for improved reasoning, path planning, and action execution capabilities [2][4]. Group 1: Key Innovations - The paper introduces a new foundational model called Nav-R1, which integrates perception, reasoning, and action in 3D environments, enhancing the robot's ability to think clearly before acting [5]. - A large dataset, Nav-CoT-110K, consisting of approximately 110,000 Chain-of-Thought trajectories, is constructed to facilitate cold-start training, allowing the model to learn reasoning and action decision-making before reinforcement learning optimization [8]. - Nav-R1 employs three complementary reward mechanisms during reinforcement learning: Format Reward, Understanding Reward, and Navigation Reward, which collectively enhance the model's logical behavior and alignment with human expectations [9][13]. Group 2: Experimental Results - Nav-R1 demonstrates significant improvements in success rates and path efficiency across various navigation tasks, achieving approximately an 8% increase compared to other advanced methods [14]. - In real-world experiments, Nav-R1 was tested on a mobile robot platform, showing robust performance in navigating complex indoor environments such as meeting rooms and corridors [18][23]. Group 3: Practical Applications - The capabilities of Nav-R1 suggest potential applications in service robots and home assistants, where understanding and navigating cluttered environments is crucial for user experience [29]. - In healthcare settings, Nav-R1 can enhance the navigation of robots in hospitals and nursing homes, ensuring safe and reliable operation in complex environments [30]. - The model's reasoning and control capabilities are also applicable in augmented reality (AR) and virtual reality (VR) scenarios, where virtual agents need to navigate physical spaces [31]. - In industrial and hazardous environments, Nav-R1's robustness and generalization abilities make it suitable for tasks in factories, mines, and disaster sites [32].