Workflow
量子位
icon
Search documents
WAIC现场最“聪明”展台!AI对话眼睛耳朵能力全打开
量子位· 2025-07-28 06:42
Core Viewpoint - The article highlights the advancements in Agora's conversational AI engine, showcasing its new features that enhance real-time interaction and user experience in various applications [4][5][31]. Group 1: Upgrades of the Conversational AI Engine - The upgraded conversational AI engine includes a selective attention locking feature that allows it to accurately capture user commands in noisy environments, filtering out 95% of background noise [12][16]. - The engine now has visual understanding capabilities, enabling it to recognize and interpret images in real-time, enhancing its contextual awareness during interactions [18][23]. - Integration with mainstream digital human solutions allows for more human-like interactions, where digital avatars can express emotions and gestures, making conversations feel more natural [25][30]. Group 2: Applications and Market Position - The conversational AI engine has been successfully implemented across various sectors, including education and smart hardware, demonstrating its versatility and reliability [38][44]. - Agora's long-standing expertise in Real-Time Engagement (RTE) technology positions it favorably in the growing market for multimodal AI interactions, which combine audio and visual inputs [49][50]. - The focus on user experience rather than just technical specifications is expected to enhance the competitive edge of Agora's products in the evolving AI landscape [51][52].
抗干扰能力提升近40% !无需对抗训练,北航上海AI Lab新蒸馏方法提升模型鲁棒性 | ICML 2025
量子位· 2025-07-28 06:42
ROME团队 投稿 量子位 | 公众号 QbitAI 在人工智能模型规模持续扩大的今天,数据集蒸馏(Dataset Distillation,DD)方法能够通过使用更少的数据,达到接近完整数据的训练效 果,提升模型训练效率,降低训练成本。 但是,通过数据集蒸馏训练的模型,要在安全性要求比较高的任务中(如医疗诊断、自动驾驶),实现不受干扰并保持性能效果,还有一定难 度。 来自北京航空航天大学、上海人工智能实验室和英国利物浦大学的研究团队,提出了名为 ROME 的新方法,这是首次将 信息瓶颈理论 引入 数据集蒸馏任务。该方法无需对抗训练,即可显著提升模型的对抗鲁棒性,最大提升近40%。 实验结果显示,在不同数据集上,相较于以往最优方法,ROME的鲁棒性均实现了大幅超越, 最高从此前43.97%暴涨至103.09% 。 目前,相关成果已被国际机器学习顶会ICML 2025正式接收,项目代码与数据已全面开源。 其核心思想是 通过最小化输入数据与其中间层潜在表示之间的冗余信息,同时增强该表示对于最终标签信息的有效性,从而从源头上提升合 成数据的对抗鲁棒性 。 此外,ROME还引入了基于条件熵瓶颈(Conditional ...
最高能效比!他又死磕“存算一体”2年,拿出全新端边大模型AI芯片
量子位· 2025-07-28 06:42
Core Viewpoint - The article highlights the launch of the M50 AI chip by Houmo Intelligent, which boasts the highest energy efficiency in the industry for integrated storage and computing, marking a significant advancement in AI technology [3][4][8]. Group 1: Product Launch and Specifications - The M50 chip features 160 TOPS@INT8 physical computing power, 100 TFLOPS@bFP16 floating-point computing power, and a bandwidth of 153.6 GB/s, with a typical power consumption of only 10W [4][8]. - The M50 is built on the second-generation integrated storage and computing technology developed by Houmo Intelligent, which allows for significant improvements in energy efficiency [8][9]. Group 2: Technological Innovation - The integrated storage and computing technology merges computation and storage, eliminating the need for data transfer between memory and processing units, thus overcoming the "power wall" and "storage wall" limitations of traditional architectures [11][12]. - The M50 utilizes SRAM-CIM technology, which involves deep structural changes to SRAM arrays, enabling parallel loading and computation, thereby doubling efficiency [12][15]. Group 3: Software and Ecosystem - Accompanying the M50 is the new compiler toolchain, Houmo Avenue®, which simplifies the optimization process for developers, allowing for automatic search of the best strategies [24]. - The company has developed a complete product matrix that includes various hardware solutions for both terminal and edge computing, enhancing the accessibility of AI capabilities across different applications [28][36]. Group 4: Market Positioning and Future Outlook - Houmo Intelligent's focus on integrated storage and computing is seen as a necessary differentiation strategy in a competitive landscape dominated by giants like NVIDIA and Huawei [37][40]. - The company aims to address the increasing demand for computing power and bandwidth in the era of large models, with a vision of making AI capabilities ubiquitous in everyday devices [41][42].
LeCun回应赵晟佳出任“首席科学家”
量子位· 2025-07-28 06:42
Core Viewpoint - The appointment of Shengjia Zhao as the Chief Scientist of Meta's Superintelligence Labs signifies a strategic shift in Meta's AI leadership, emphasizing the importance of young talent in the rapidly evolving AI landscape [1][29]. Group 1: Leadership Changes - Shengjia Zhao, a 90s-born Chinese scientist and a key member of ChatGPT and o3, has been appointed as the Chief Scientist of Meta's Superintelligence Labs [1][29]. - Yann LeCun, a Turing Award winner born in 1960, remains the Chief Scientist of Meta's Fundamental AI Research (FAIR) and has confirmed his ongoing role [2][3][5]. - There is public speculation regarding LeCun's position and the dynamics within Meta's AI teams, particularly following Zhao's appointment [11][28]. Group 2: Structural Changes in AI Teams - FAIR, founded by LeCun in December 2013, has been a core institution for AI research at Meta, achieving significant breakthroughs in various fields [17]. - Recently, FAIR has been integrated into the newly formed Meta Superintelligence Labs, indicating a shift in its operational focus [15][19]. - The restructuring has led to a perceived marginalization of FAIR, as it now operates alongside a separate team focused on consumer products and AGI research [22][23]. Group 3: Zhao's Background and Contributions - Zhao graduated from Tsinghua University and later obtained a PhD from Stanford University, where he received multiple prestigious awards [30][32]. - He has been a pivotal figure at OpenAI, contributing to the development of ChatGPT and other models, and is recognized for his work in chain-of-thought reasoning models [32][33][34]. - Zhao's leadership in Meta's AI strategy is anticipated to bring innovative advancements to the company [35].
只需一次指令微调,大模型变身全能专家天团,8B模型性能反超全微调基线 | ACL25 Oral
量子位· 2025-07-28 06:42
Core Insights - The article discusses the limitations of current methods for upgrading large language models (LLMs) and introduces a new framework called Sparse Interpolation Mixture of Experts (SIMoE) that allows for efficient and effective model adaptation with minimal fine-tuning costs [1][4]. Group 1: Limitations of Current Methods - Existing upgrade methods for LLMs face two main limitations: reliance on manual experience for selecting upgrade locations and lack of a systematic mechanism to balance expert specialization and collaboration [4][7]. - The first limitation involves a static upgrade strategy that ignores the dynamic differences between model layers and task-specific requirements, leading to suboptimal performance [7][8]. - The second limitation is the inefficiency in expert collaboration, where traditional methods either force collaboration among experts or train them independently, resulting in knowledge redundancy and poor generalization [9][10]. Group 2: Introduction of SIMoE - SIMoE offers a novel solution by enabling automatic upgrades of standard LLMs to high-performance sparse expert models through a single-stage fine-tuning process [4][6]. - The framework utilizes structured sparse optimization to identify neuron-level expert parameters, combining shared incremental parameters with orthogonal penalties to achieve dual breakthroughs in performance and efficiency [4][14]. Group 3: Performance Metrics - SIMoE has demonstrated superior performance metrics, with an 8B model outperforming the fully fine-tuned baseline by 1.6% in ROUGE-L scores, a 10% increase in safety metrics, and a 30% reduction in inference memory [6][24]. - In various benchmark tests, SIMoE has shown significant improvements in accuracy across multiple tasks, including a 2.8% increase in zero-shot settings and a 75.02% accuracy in few-shot scenarios [24][27]. Group 4: Innovations in SIMoE - The framework introduces a structured sparse upgrade mechanism that transforms the selection of upgrade locations into a learnable sparse optimization problem, enhancing global optimization capabilities [15][16]. - Additionally, SIMoE incorporates a "non-involution protocol" within expert teams to balance collaboration and specialization, ensuring efficient knowledge transfer and minimizing parameter redundancy [20][22]. Group 5: Experimental Validation - SIMoE has been validated through extensive experiments on both visual and natural language models, showcasing its effectiveness in small sample learning and cross-task generalization [22][25]. - The results indicate that SIMoE consistently outperforms baseline models across various datasets and tasks, reinforcing its potential as a leading framework for LLM adaptation [24][27].
万万没想到,这家央企竟让香农和图灵又“握了一次手”
量子位· 2025-07-28 05:35
金磊 发自 WAIC 量子位 | 公众号 QbitAI 有点意思,能 让香农和图灵又握上一次手 的,竟然是一家 央企 。 他们俩握手是什么意思呢? 这两位大师,一位定义了"信息"如何高效、准确地传递,另一位则开启了"智能"如何被创造和模拟的探索。 而二人的握手,则是信息技术和通信技术的一次融合。 例如当你身处浩瀚的海洋之上,这里是传统通信的"死亡地带",卫星信号微弱且昂贵,别说视频通话,连发一张清晰的图片都可能要耗费半 天。 然而现在,正因为他俩的"握手",在海上流畅地打视频电话已经变成一种现实: 这背后,并非是电信公司发射了什么超级卫星,或者铺设了跨洋光缆。实际上,船员的手机与外界交换的数据量,仅仅是传统视频通话的百 分之一,甚至千分之一。 这,就是由 中国电信人工智能研究院(TeleAI) 研发布局的 智传网(AI Flow) ,不是你以为的简单数据传输,而是让智能体之间互相 连接,高效协作,突破单模型的性能极限,实现连接与交互的智能涌现! 技术一经发布,可谓是惊呆了外国友人,有位网友给出了这样的评价: 重大突破:一个可能重塑GenAI工作方式的AI框架。 那么智传网到底是如何打破传递智能的"壁"的呢? ...
拆箱开源版Coze:Agent核心三件套大公开,48小时揽下9K Star
量子位· 2025-07-28 03:25
Core Viewpoint - The article discusses the recent open-source release of Coze's products, which aims to facilitate the development and deployment of AI agents, marking a significant step towards making agent technology more accessible and practical for developers [1][45]. Group 1: Open Source Products - Coze has released two new open-source products: Coze Studio and Coze Loop, alongside the previously released Eino framework, creating a comprehensive open-source ecosystem for agent development [2][5][32]. - Coze Studio is a low-code platform designed to simplify the creation of AI workflows, while Coze Loop focuses on the development, evaluation, and monitoring of agents [12][21][25]. - The open-source products are licensed under the Apache 2.0 license, allowing for commercial use and modifications without the requirement to open-source changes [7][57]. Group 2: Market Trends and Challenges - The article highlights the growing popularity of agents, transitioning from novelty items to practical tools, as evidenced by the increasing support from major companies and the emergence of various successful agent applications [3][46]. - Despite the enthusiasm, the widespread adoption of agents faces challenges, including inconsistent user experiences and high development barriers, which Coze aims to address through its open-source offerings [47][50]. Group 3: Development and Evaluation Capabilities - Coze Studio provides a complete workflow engine, allowing developers to easily create agents by dragging and dropping functional components, thus lowering the technical barrier for entry [16][19]. - Coze Loop offers a comprehensive solution for prompt development, evaluation, and monitoring, enabling developers to assess agent performance across multiple dimensions [25][30]. - Eino, the earlier released framework, provides a unified component abstraction and flexible orchestration capabilities, enhancing the development process for AI applications [36][39]. Group 4: Future Implications - The open-source initiative is expected to accelerate the deployment of agents across various industries, particularly in internal automation, small teams, and vertical sectors like healthcare and finance [43][42]. - Coze's open-source strategy is seen as a proactive move to capitalize on the impending explosion of agent technology, aiming to create a robust ecosystem that fosters collaboration and innovation among developers [45][56].
AI幻觉成WAIC首个关键词,Hinton敲响警钟,讯飞星火X1升级展示治理新突破
量子位· 2025-07-28 02:26
Core Viewpoint - The term "hallucination" has become a hot topic at WAIC this year, highlighting the challenges and risks associated with AI models, particularly in their reliability and practical applications [1][12][20]. Group 1: AI and Hallucination - Nobel laureate Hinton emphasized the complex coexistence of humans and large models, suggesting that humans may also experience hallucinations similar to AI [2][3][15]. - Hinton warned about the potential dangers of AI, advocating for the development of AI that does not seek to harm humanity [4][20]. - The phenomenon of hallucination, where AI generates coherent but factually incorrect information, is a significant barrier to the reliability and usability of large models [5][18]. Group 2: Technological Developments - The upgraded version of iFlytek's large model, Spark-X1, focuses on addressing hallucination issues, achieving notable improvements in both factual and fidelity hallucination governance [7][30]. - The performance comparison of various models shows that Spark-X1 outperforms others in text generation and logical reasoning tasks, with a hallucination rate significantly lower than its competitors [8][30]. - iFlytek's advancements include a new reinforcement learning framework that provides detailed feedback, enhancing the model's training efficiency and reducing hallucination rates [27][29]. Group 3: Industry Implications - The collaboration between major AI companies like Google, OpenAI, and Anthropic on hallucination-related research indicates a collective effort to ensure AI safety and reliability [9][21]. - The ongoing evolution of AI capabilities raises concerns about the potential for AI to exceed human control, necessitating a focus on safety measures and governance frameworks [19][24]. - The concept of "trustworthy AI" is emerging as a critical factor for the successful integration of AI across various industries, ensuring that AI applications are reliable and effective [25][44].
什么是真正好用的推理模型?阶跃Step 3:开源的,多模态的,低成本的,国产芯片适配的
量子位· 2025-07-27 11:57
Core Viewpoint - The article emphasizes the significance of the new multi-modal reasoning model, Step 3, released by Jumpshare, which fills a gap in the current AI landscape by combining strong reasoning capabilities with efficiency and open-source accessibility [4][5][25]. Group 1: Model Features - Step 3 is a 321 billion parameter MoE model with multi-modal reasoning capabilities, set to be officially open-sourced on July 31 [5][24]. - It achieved new state-of-the-art (SOTA) results in open-source multi-modal reasoning benchmarks [6]. - The model's reasoning decoding cost is only one-third of DeepSeek's, demonstrating superior efficiency [8]. Group 2: Market Trends - The industry is shifting towards multi-modal models, with reasoning capabilities becoming a focal point as generative AI enters a reasoning era [10]. - Efficiency, cost, and deployment friendliness are now critical factors in evaluating model performance, beyond just ranking [11][12]. Group 3: Innovations in Step 3 - Step 3 incorporates two key innovations: the AFD distributed reasoning system and the MFA attention mechanism, enhancing decoding efficiency and reducing reasoning costs [31][35]. - AFD separates Attention and FNN tasks to optimize resource usage, while MFA improves KV cache and computation efficiency [32][36]. Group 4: Cost Efficiency - Step 3's design allows it to operate with significantly lower costs compared to competitors, achieving a cost that is only 30% of DeepSeek-V3's on certain hardware [42]. - The model supports FP8 quantization, further reducing memory access and latency [41]. Group 5: Industry Collaboration - Jumpshare has initiated the "Model-Chip Ecological Innovation Alliance" with nearly ten chip and infrastructure partners to enhance model adaptability and computational efficiency [54][55]. - This collaboration aims to ensure that the model can run effectively on various hardware, including domestic chips [51][52]. Group 6: Application and Market Potential - Jumpshare's multi-modal reasoning model has been successfully integrated into various applications, including automotive and mobile devices, with significant interest from top manufacturers [60][69]. - The company anticipates a revenue of nearly 1 billion RMB in 2025, indicating a clear commercialization path for its technology [74].
一机迷航,双机成行!北航高低无人机协同导航方案:高空掌全局+低空查细节,复杂场景不迷航
量子位· 2025-07-27 11:57
Core Viewpoint - The article discusses a new paradigm for drone navigation called "high-low drone collaboration," where a high-altitude drone acts as a "panoramic commander" for global perception and reasoning, while a low-altitude drone serves as a "ground scout" for precise navigation and target search, enabling quick target identification in complex environments [1]. Group 1: Navigation and Target Identification - The high-low drone collaboration allows for efficient target finding, as a single drone may either fly too high to see ground details or too low to recognize larger landmarks [1]. - The system can also locate small targets, such as a dog, through coordinated efforts [3]. - If the target has specific letters or descriptions, the drones can match these precisely [5]. - The drones can achieve accurate identification based on environmental details surrounding the target [7]. Group 2: Data Set and Framework Development - The research team constructed the HaL-13k dataset to support their tasks, enhancing the original UAV-Need-Help dataset with high-altitude drone trajectory and perception data [9][11]. - A collaborative framework named AeroDuo was designed and evaluated in the Openuav simulation environment, demonstrating effective balance among environmental coverage, navigation accuracy, and autonomy [9]. Group 3: Enhancements in Perception and Decision-Making - To improve the perception and decision-making capabilities of the high-low drone system, a multi-modal unified framework called Pilot-LLM was developed, utilizing large language models for multi-modal reasoning [13]. - A global map construction module was proposed to integrate historical information from high-altitude drones, enhancing environmental understanding and target localization [13]. - A lightweight decoder is used to generate target probability distribution maps, balancing exploration capabilities and spatial modeling effectiveness [14]. Group 4: Low-Altitude Drone Navigation Strategy - The low-altitude drone employs a three-stage navigation search strategy, starting with selecting high-confidence areas based on the high-altitude drone's predicted probability map [16]. - A reinforcement learning-based obstacle avoidance strategy is utilized for safe and flexible path execution [16]. - The collaborative model can be easily expanded to multiple drone cooperation, allowing the high-altitude drone to predict multiple potential target locations and assign tasks to various low-altitude drones using optimization algorithms [16]. Group 5: Real-World Application and Future Prospects - Optimizing action control ensures safe obstacle avoidance, and supplementing real-world data for model training aids in transitioning the high-low drone system from simulation to real-world scenarios [17]. - The research findings are set to be published in ACM MM 2025, indicating ongoing advancements in drone technology [10].