Workflow
量子位
icon
Search documents
阿里云秘密武器亮相顶会:狂砍82%英伟达含量,213块GPU干了1192块的活
量子位· 2025-10-21 23:50
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 阿里云秘密武器亮相顶会SOSP:用新技术砍掉82%的英伟达GPU需求。 一时引起不小关注与讨论。 这项研究由阿里巴巴与北大合作,阿里云CTO周靖人带队。 研究提出最新GPU池化系统Aegaeon,用token级别的自动扩缩容技术,硬是把GPU使用量从1192个"瘦身"到213个。 这项研究出发点在对阿里云自身业务一项观察。 在Model Studio(百炼平台)上,他们发现了一个让人头疼的现象:17.7%的GPU被分配去服务那些几乎没人用的冷门模型,而这些模型只 处理了总请求量的1.35%。 之前要同时运行这些模型时,要么给每个模型单独分配GP,很多冷门模型的GPU经常空着浪费,要么用旧方法让一个GPU跑2-3个模型(因 为GPU 内存不够,跑不了更多),总之资源利用率特别低。 Aegaeon瞄准这一痛点,通过精细化的资源调度,彻底改变了GPU资源分配的游戏规则。 冷门模型占用长尾资源 具体来说,在他们统计的779个模型中,有94.1%的模型属于长尾模型,平均每秒请求量不到0.2个。 与此同时,那些热门模型比如DeepSeek和通义千问虽然请求量大,但也经 ...
AI牛马实现“干中学”!上海AI Lab联合推出智能体自我进化新框架
量子位· 2025-10-21 23:50
Core Viewpoint - The article discusses the introduction of the MUSE framework, which aims to enhance the capabilities of LLM agents by enabling them to accumulate experience and evolve continuously, addressing the challenges of long-horizon tasks and memory limitations [1][5]. Group 1: MUSE Framework Overview - MUSE stands for Memory-Utilizing and Self-Evolving, designed to create a closed-loop system for LLM agents that allows them to learn from experience and evolve over time [5]. - The framework consists of a hierarchical memory module that organizes different levels of experience, including strategic, procedural, and tool memory [7][8]. Group 2: Key Mechanisms of MUSE - The first step involves a hierarchical memory module that allows agents to retain and apply historical knowledge, overcoming the "forgetfulness" of traditional LLMs [7]. - The second step is self-reflection, where agents evaluate their task execution and convert raw execution trajectories into structured experiences, refining their standard operating procedures (SOPs) [10][11]. - The third step focuses on self-evolution, enabling agents to continuously improve through a cycle of planning, execution, reflection, and experience extraction [13][15]. Group 3: Experimental Results - MUSE demonstrated state-of-the-art (SOTA) performance in the TAC benchmark, achieving a score of 51.78%, surpassing existing methods that used larger models [16]. - The framework's ability to accumulate experience leads to improved performance over time, showcasing its potential for long-term productivity tasks [19]. Group 4: Future Prospects - The MUSE framework signifies a new phase of experience-driven lifelong learning for AI agents, moving beyond static testing models [29]. - Future research directions include optimizing memory, enriching experience sources, integrating human feedback, and developing comprehensive evaluation standards for long-term tasks [30][31].
讯飞刚发的财报:净利润暴涨了202%
量子位· 2025-10-21 09:05
Core Viewpoint - The latest quarterly report from Keda Xunfei shows significant growth in revenue and profit, driven by advancements in AI technology and its industrial application [1][2]. Financial Performance - Keda Xunfei achieved a revenue of 6.078 billion yuan in Q3 2025, representing a year-on-year increase of 10.02% [4]. - The net profit attributable to shareholders reached 172.25 million yuan, a remarkable increase of 202.40% compared to the previous year [4]. - The net profit excluding non-recurring items was 26.24 million yuan, up 76.5% year-on-year [4]. - Operating cash flow showed strong performance with a net amount of 895 million yuan, reflecting a growth of 25.19% [6]. Business Operations - The report indicates that the two core profit indicators demonstrate the company's improved profitability in its main business [5]. - For the first three quarters of 2025, total revenue reached 16.99 billion yuan, a 14.41% increase year-on-year, with a net loss of 0.67 billion yuan, significantly narrowing the loss by 80.6% compared to the previous year [8][9]. AI Technology and Market Position - Keda Xunfei's advancements in AI large models have become a key driver for revenue growth, with significant progress in core technology, product deployment, and ecosystem development [13]. - The "Xunfei Spark" model has undergone critical upgrades, outperforming competitors in various capabilities, including mathematics and translation [14][15]. - The company has secured the highest number and amount of bids for large model projects in the industry, with Q3 bids totaling 545 million yuan, surpassing the combined total of the second to fifth competitors [16]. Research and Development - Keda Xunfei continues to increase its R&D investment, planning to raise up to 4 billion yuan through the issuance of A-shares to fund the development of the Spark education model and computing platform [18][19]. Ecosystem Growth - The AI ecosystem is showing strong growth, with 690,000 new developers added for large models and a total of 1.22 million ecosystem developers [17].
Embedding黑箱成为历史!这个新框架让模型“先解释,再学Embedding”
量子位· 2025-10-21 09:05
UIUC团队 投稿 量子位 | 公众号 QbitAI 让模型先解释,再学Embedding! 来自UIUC、ANU、港科大、UW、TAMU等多所高校的研究人员,最新推出 可解释的生成式Embedding框架——GRACE 。 过去几年,文本表征 (Text Embedding) 模型经历了从BERT到E5、GTE、LLM2Vec,Qwen-Embedding等不断演进的浪潮。这些模型 将文本映射为向量空间,用于语义检索、聚类、问答匹配等任务。 简单来说, GRACE不再是"把文本压成向量",而是"让模型先解释,再学Embedding" —— 模型首先生成每个文本的"推理说明(rationale)",然后再将这些rationale编码成Embedding。奖励信号会鼓励模型产生更有逻辑、更语义 一致的推理。 方法总览:生成、表征、优化三位一体 概括而言,GRACE包含三个关键模块: 然而,大多数方法有一个共同缺陷: 它们把大语言模型当成"哑巴编码器"使用—— 输入文本,输出向量,却无法告诉我们为什么这两个文本相似 。 这种 "对比学习+池化" 的做法虽然有效,但本质上抛弃了大语言模型 (LLM) 的推理与生成能 ...
“最美产品经理”宋紫薇,创业AI硬件首款产品曝光
量子位· 2025-10-21 09:05
Core Viewpoint - The article discusses the entrepreneurial venture of Song Ziwei, a former product manager at vivo, who is entering the AI smart hardware market with a focus on an "AI makeup mirror" [1][2][4]. Group 1: Company Overview - Song Ziwei's startup, "Wei Guang Dian Liang," completed its angel round financing in September, with investors including Zhongke Chuangxing and Jiuhua Venture Capital [4][5]. - The financing will primarily be used for AI hardware research and development, application software development, and team building to accelerate technological innovation and market expansion [5][6]. Group 2: Product Focus - The company aims to create AI hardware that is fashionable and appealing to young users, integrating AI Agent technology with high-frequency life scenarios [7][9]. - The first product being developed is an "AI makeup mirror," which aims to differentiate itself from previous generations of smart mirrors that have been criticized for lacking true intelligence [18][22]. Group 3: Market Context - The smart makeup mirror market is not new, having seen initial interest in 2017, but many products have been deemed overpriced and underperforming [18][21]. - Competitors like the domestic brand Jiayao have explored intelligent features, such as AI voice interaction and skin detection, achieving international sales success [23][25]. Group 4: Technological Advancements - The advancements in AI multimodal capabilities over the past two years may enhance the functionality of makeup mirrors, allowing features like virtual makeup trials and personalized makeup suggestions based on various factors [27][28]. - The competitive edge of future AI makeup mirrors will rely on the underlying algorithms and cloud software, potentially leading to a Hardware as a Service (HaaS) model [29][30]. Group 5: Entrepreneurial Background - Song Ziwei, born in 1994 and a graduate in physics from Shanghai University, previously worked at Huawei and vivo, where she gained recognition as a product manager [34][35][36]. - Her rise to fame began in 2019 during the iQOO Neo launch, where her expertise and presence garnered significant attention, leading to her nickname "the most beautiful product manager" [37][40]. - After leaving vivo and briefly joining Li Auto, she transitioned to entrepreneurship, indicating a clear vision for her startup shortly after her departure [44][48][50].
直击IROS现场:宇树禾赛自变量杭州论剑,美团C位攒局
量子位· 2025-10-21 05:41
henry 发自 IROS 量子位 | 公众号 QbitAI 要说IROS Day 1谁最炸—— 美团 当仁不让。 一手 2025美团机器人研究院学术年会 ,展厅直接被围到到水泄不通。 毕竟,一整个机器人圈的技术天团都来了: 美团副总裁 毛一年 、港大 席宁 教授、禾赛创始人 李一帆 、自变量机器人CEO 王潜 、宇树创始人、CEO 王兴兴 、星海图联创 许华哲 、 清华 丁文伯 教授、浙大 许超 教授、清华 赵明国 教授…… 这些大咖不光人到了,主题演讲也是亮点连连、金句频出: 圆桌环节同样火花四射:围绕机器人的第一性原理,王兴兴与许华哲现场过招,当前的机器人软硬件到底谁在拖后腿? 接下来,让我们一起深入看看本次年会的更多精彩内容。 机致生活-Robotics for Better Life 几乎所有在做具身智能的团队,如今都在强调一个共识: 不要拿着锤子找钉子。 技术不是目的,而是工具。要回到场景,解决实际问题,让技术真正成为生产力。 而谈到"场景",放眼整个行业,大概没有谁比美团更懂。 更重要的是,在美团的战略体系中, "场景"与"科技" 的关系早就被想得非常透。 美团副总裁、机器人研究院理事长毛一年在 ...
苹果AI选Mamba:Agent任务比Transformer更好
量子位· 2025-10-21 05:41
Core Viewpoint - The article discusses the advancements in AI models, particularly focusing on the Mamba model, which shows potential to surpass Transformer models in efficiency and generalization capabilities for long tasks and multi-interaction agent tasks [1][10]. Group 1: Transformer Limitations - Transformer models, while intelligent, face significant computational costs that grow quadratically with the length of the input sequence, making them inefficient for long documents [4][5]. - For instance, processing 1,000 words requires handling 1 million word pair relationships, and for documents with tens of thousands of words, the computational burden can reach billions [5]. Group 2: Mamba Model Advantages - Mamba, as a state space model (SSM), utilizes a lightweight design that does not rely on global attention mechanisms, instead maintaining an updated internal state to understand input information [7][10]. - This approach results in three significant advantages: linear growth in computational requirements with sequence length, support for streaming processing, and stable memory usage that does not increase significantly with longer sequences [13]. Group 3: Performance Enhancements with Tools - The introduction of external tools enhances Mamba's performance, allowing it to handle complex tasks more effectively. For example, in multi-digit addition tasks, Mamba with pointer tools can achieve near 100% accuracy after training on 5-digit addition, while Transformers struggle with 20-digit tasks [15]. - In code debugging tasks, Mamba's ability to simulate interactive debugging processes leads to significantly higher accuracy compared to Transformers when faced with complex codebases [15]. - Mamba's combination with external tools addresses its memory limitations, resulting in improved efficiency and performance in agent-based tasks [16][18].
ChatGPT也遭殃,亚马逊服务器故障,半个互联网都崩了
量子位· 2025-10-21 03:38
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 亚马逊一声咳嗽,半个互联网都地震了。 由于亚马逊 AWS服务器宕机 ,大量互联网服务被迫中断,ChatGPT也被殃及。 故障发生在美国东部us-east-1区域,是AWS全球服务最核心的一块。 根据故障追踪平台Downdetector的数据,当天 累计收到超过650万份用户故障报告 。 | | | AWS这波事故,也让Reddit在内的社交平台服务崩溃,人们差点连吐槽都没法吐。 而且连AWS自己的客户支持提单系统都挂了,想报个错同样找不到门路。 不过好在马斯克的X没用AWS,也就没受影响,才给了网友们机会讨论AWS的这波故障。 有网友用表情包调侃,马斯克才是这波事件的最大赢家。 但玩笑归玩笑,被这件事波及的人,可能一点也笑不出来…… 亚马逊服务故障波及各行各业 亚马逊的这次宕机波及面究竟有多广?先来看开发者群体的情况。 除了Docker,另一个重要开发工具 npm 也出现了同样的问题,还有备受青睐的AI编程工具 Cursor、Vercel 一样未能幸免。 除了开发者,其他打工人也受到影响——视频会议软件 Zoom 、OpenAI同款办公平台 Slack ...
长序列推理不再卡顿!北大华为KV缓存管理框架实现4.7倍推理加速
量子位· 2025-10-21 03:38
LouisKV团队 投稿 量子位 | 公众号 QbitAI 北大华为联手推出KV cache管理新方式,推理速度比前SOTA提升4.7倍! 大模型处理长序列时,KV cache的内存占用随序列长度线性增长,已成为制约模型部署的严峻瓶颈。 为此,来自北京大学与华为的研究团队联合提出了 LouisKV ——一个专为长输入、长输出等各类长序列场景设计的高效KV cache 检索框 架。 它通过创新的语义感知检索策略与解耦的精细化管理机制,在几乎不损失模型精度的前提下,实现了高达4.7倍的推理加速,为突破LLM长序 列推理瓶颈提供了全新的解决方案。 关键洞察 传统上,学术界与工业界提出了多种KV cache优化方案,其中 KV Cache Retrieval 是极具前景的方向之一。 该类方法将完整的KV cache卸载至容量更大的CPU内存中,并在推理时仅将最关键的KV子集检索回GPU进行计算,从而有效缓解GPU 显存 压力。 然而,现有的KV retrieval方法仍面临着 效率 和 精度 的双重瓶颈: 为了设计更高效的检索策略,研究团队首先对不同长序列任务中关键 KV 的访问模式进行实验分析,得到了两个关键洞察。 ...
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-21 03:38
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 企业榜 2025 人工智能年度潜力创业公司 聚焦于中国人 ...