量子位

Search documents
 大模型公司不搞浏览器搞Agent,实测找到原因了
 量子位· 2025-10-31 06:27
 Core Insights - The article discusses the emergence of a desktop agent named "Xiao Yue," which can interact with the entire computer system through natural language commands, enabling users to perform various tasks seamlessly [1][2][40].   Group 1: Product Features - Xiao Yue is designed to operate as a floating ball on the desktop, distinguishing itself from browser-based agents by being more interactive and visually appealing [3][6]. - The agent supports multiple functionalities, including internet access, browser searching, Excel processing, and local system interaction [6]. - Notably, Xiao Yue can reuse operation steps through "smart plans" and set up scheduled tasks for automatic execution, allowing for parallel task processing [8][28].   Group 2: Practical Applications - The agent can assist users in setting up programming environments, significantly reducing the time spent on this task, which is traditionally cumbersome [8][14]. - For instance, Xiao Yue can automatically create a conda virtual environment with specific packages installed, demonstrating its capability to handle complex programming tasks [14][25]. - The agent can also upgrade existing projects, such as enhancing a simple Snake game by replacing its interface and adding features like a score leaderboard [21][24].   Group 3: Limitations and Future Trends - Despite its advanced features, users have reported that Xiao Yue can be slow, with task completion times measured in minutes, which may not meet the expectations of impatient users [36][37]. - The current version of Xiao Yue is only available for Mac, with a Windows version reportedly in development [39]. - The article emphasizes that the trend of agents taking over computer operations is a significant development in human-computer interaction, suggesting a future where users can interact with computers as easily as conversing with another person [40][47].
 微软独家:OpenAI最新季度净亏损115亿美元
 量子位· 2025-10-31 06:27
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 咋回事啊,难不成小弟最近忙着给苹果做应用,真给老板整急眼了?? 来,一起看看,这到底是怎么一出。 这下知道OpenAI为啥要转型公共利益公司了…… 眼尖的网友发现,OpenAI上季度居然亏了 115亿美元 ! 重点是,这可不是哪家媒体的小道消息啊,而是OpenAI的最大金主——微软自己亲手捅出来的。 微软因小弟血亏31亿 咱就是说,微软在这波AI浪潮里真是赚得盆满钵满。 2025年第三季度, 微软净利润高达277亿美元,同比涨了12个百分点 。 不过,都赚了这么多,微软居然还有点「不开心」。 大概意思是,明明这季度利润还能一路高歌猛进闯过300亿大槛的,都怪有个小弟拖了后腿! 本年度的净利润和EPS受到来自OpenAI投资亏损的负面影响,分别减少了31亿美元和每股0.41美元。 | (In millions, except percentages and per share amounts) | | | | Three Months Ended September 30, | Percentage Change | | --- | --- | --- | ...
 Kimi开源新线性注意力架构,首次超越全注意力模型,推理速度暴涨6倍
 量子位· 2025-10-31 06:27
 Core Insights - The era of Transformers is being redefined with the introduction of the Kimi Linear architecture, which surpasses traditional attention models under the same training conditions [2][10].   Group 1: Kimi Linear Architecture - Kimi Linear employs a novel attention mechanism that reduces the KV cache requirement by 75% and achieves up to 6 times faster inference in long-context tasks [4][26]. - The architecture introduces Kimi Delta Attention (KDA), which allows for fine-grained control over memory retention, enabling the model to discard redundant information while preserving important data [12][10]. - KDA's state update mechanism is based on an improved Delta Rule, ensuring stability even with sequences of millions of tokens, preventing gradient explosion or vanishing [13][14].   Group 2: Performance and Efficiency - The model utilizes a 3:1 mixed layer design, combining three layers of linear attention followed by one layer of full attention, balancing global semantic modeling with resource efficiency [15]. - Kimi Linear has demonstrated superior performance across multiple benchmark tests, such as MMLU and BBH, outperforming traditional Transformers while maintaining accuracy in mathematical reasoning and code generation tasks [22][26]. - The architecture's deployment is seamless with existing vLLM inference frameworks, allowing for easy upgrades of Transformer-based systems to Kimi Linear [21].   Group 3: Industry Trends - The dominance of Transformers is being challenged, with alternative models like state space models (SSM) showing potential for efficient computation and long sequence modeling [28][30]. - Companies like Apple are exploring SSM architectures for their energy efficiency and lower latency, indicating a shift away from traditional Transformer reliance [30]. - The emergence of Kimi Linear signifies a move towards diverse innovations in AI architecture, suggesting a departure from the conventional Transformer path [32].
 国产GPU第一股IPO获批,募资80亿
 量子位· 2025-10-31 04:09
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 科创板国产GPU第一股要来了! 证监会官网显示, 摩尔线程IPO注册申请已获批准 。 从6月30日递交招股书开始,摩尔线程仅用时4个月,就快速通过注册。 此次IPO,摩尔线程计划募集资金80亿元。 受理四个月通过IPO注册 摩尔线程此次计划募集的80亿元资金,主要将被用于研发。 其中25.09亿元用于摩尔线程新一代自主可控AI训推一体芯片研发项目,25.02亿元用于摩尔线程新一代自主可控图形芯片研发项目,19.81亿 元用于摩尔线程新一代自主可控AISoC芯片研发项目. 另外,还有10.06亿元用作补充流动资金。 | | | | 序号 | 项目名称 | 项目实 | 募投项目 | 募集资金 拟投入金 | 项目备案编号 | 项目环保 | | --- | --- | --- | --- | --- | --- | --- | | | | 施单位 | 投资总额 | | | 批文号 | | | | | | 额 | | | | 1 | 摩尔线程新一 代自主可控 AI 训推一体芯片 | 摩尔线 | 250.957.98 | 250.957.98 | 京朝科信局备 |  ...
 最火VLA,看这一篇综述就够了
 量子位· 2025-10-31 04:09
 Core Insights - The article discusses the rapid growth and significance of the Vision-Language-Action (VLA) field, highlighting its potential to enable robots to understand human language, perceive the world, and perform tasks effectively [5][6].   Definition and Standards - VLA models must utilize a pre-trained backbone on large-scale visual-language data to qualify as VLA, emphasizing the importance of language understanding, visual generalization, and task transfer capabilities [7][8]. - Models that merely combine separate visual and text encoders are classified as "Multimodal Policies," while Large Behavior Models (LBMs) refer to strategies trained on extensive robot demonstration data [10][12].   Trends in VLA - **Trend 1: Efficient Architecture Paradigms**     The emergence of discrete diffusion models allows for parallel generation of action sequences, improving efficiency and performance [14][16].  - **Trend 2: Embodied Chain-of-Thought (ECoT)**     ECoT enhances robot intelligence by enabling them to generate intermediate reasoning steps before executing actions, improving planning and interpretability [17][18][20].  - **Trend 3: Action Tokenization**     This trend focuses on converting continuous robot actions into discrete tokens that VLMs can understand, enhancing efficiency and integration of reasoning with actions [21][24].  - **Trend 4: Reinforcement Learning (RL)**     RL is reintroduced as a fine-tuning tool for VLA strategies, addressing limitations of imitation learning in extreme scenarios [25][26].  - **Trend 5: Efficiency Optimization**     Efforts to optimize VLA models aim to reduce costs and hardware requirements, making the field more accessible to smaller research labs [27][28].  - **Trend 6: Video Prediction for Physical Intuition**     Video generation models provide inherent understanding of temporal dynamics and physical laws, enhancing robot control capabilities [29][35].  - **Trend 7: Realistic Evaluation Benchmarks**     New evaluation methods are being developed to overcome saturation in existing benchmarks, focusing on future frame prediction and action generation capabilities [36][39].  - **Trend 8: Cross-Modal Learning**     Innovations in architecture are essential for developing universal robot strategies that can operate across different action spaces [40][42].   Challenges and Future Directions - The article highlights the "performance ceiling" issue in mainstream simulation evaluations, where high scores do not necessarily translate to real-world capabilities [43][44]. - Two critical areas needing more attention are data quality and in-context learning, which could be pivotal for breakthroughs in VLA research [48][49].
 量子位2025年度榜单冲刺申报中!企业/产品/人物榜正在征集
 量子位· 2025-10-31 04:09
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 1、注册地在中国,或主营业务主要面向中国市场; 2、主营业务属于人工智能及相关产业,或已将人工智能广泛应用于主营业务,并在细分领域居于行业领先地位; 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 3、具备成熟的产品或服务,已获得实际客户应用及市场认可; 4、近一年在技术 ...
 首个实例理解3D重建模型!NTU&阶越提出基于实例解耦的3D重建模型,助理场景理解
 量子位· 2025-10-31 04:09
iGGT团队 投稿 量子位 | 公众号 QbitAI 人类能自然地感知3D世界的几何结构与语义内容 ,但对AI而言,这"两者兼得"一直是巨大挑战。 传统方法将3D重建(底层几何)与空间理解(高层语义)割裂处理 ,导致错误累积且无法泛化 。而新方法试图将3D模型与特定的视觉语言 模型(VLM)"锁死" ,这不仅限制了模型的感知能力(例如,无法区分同一类别的两个不同实例 ),更阻碍了其适应更强下游任务的扩展性 现在,NTU联合StepFun提出了IGGT (Instance-Grounded Geometry Transformer) ,一个创新的端到端大型统一Transformer,首次将空 间重建与实例级上下文理解融为一体。 为解决上述问题,本研究的主要贡献在于: 端到端统一框架: 提出IGGT,一个大型统一Transformer,将空间重建和实例级上下文理解的知识统一在同一个模型中进行端到端训练 。 大规模实例数据集: 构建了一个全新的大规模数据集 InsScene-15K,包含15K个场景 、2亿张图像 ,以及通过新颖数据管线标注的高质量、3D一致的实例 级掩码 。 实例解耦与即插即用: 首创"实例接地的 ...
 自动驾驶公司,正在标配飞书
 量子位· 2025-10-31 04:09
一凡 发自 凹飞寺 量子位 | 公众号 QbitAI 代表科技前沿的自动驾驶公司,有什么新共识? 2025年,行业迎来快速发展。L2辅助驾驶搭载量爆发增长, Momenta 城市NOA市场占有率稳居头部, 地平线 征程芯片量产突破1000万大 关, 元戎启行 方案量产上车超13万辆。 小鹏 和 理想 ,则开始向L4进发。 在自动驾驶领域, 小马智行 今年冲刺落地千台规模Robotaxi车队, 文远知行 集齐7国自动驾驶牌照, 新石器 交付无人小车超1万辆。 这些物理AI的弄潮儿们,来自不同领域,擅长不同业务,押注的技术路线也不同,但在知识沉淀和提高效率的工具选择上,却达成共识,纷纷 拥抱了 飞书 。 为什么会出现这种现象? 量子位带着问题,在对话多名一线从业者后找到了答案: 用AI精益生产AI。 自动驾驶行业,正在用AI精益生产AI 精益生产是发源自汽车行业的理念,这是一个 不断改进 的过程,主要是通过 自动化 和 准时化 ,尽可能消除浪费,进而降低成本,让企业 的产品更具优势。 在AI时代,AI既是精益生产的工具,也可以是精益生产的成果。 用AI精益生产AI的过程,就是提高效率,加快研发的过程。 具体可以拆 ...
 OpenAI首个GPT-5找Bug智能体:全自动读代码找漏洞写修复
 量子位· 2025-10-31 00:58
henry 发自 凹非寺 量子位 | 公众号 QbitAI AI Coding火了大半年,AI Debugging也来了! 刚刚,OpenAI发布由GPT-5驱动的"白帽"Agent—— Aardvark(土豚) 。 这只"AI安全研究员"能帮助开发者和安全团队, 在大规模代码库中自动发现并修复安全漏洞 。 据OpenAI报告,Aardvark已识别出了 92% 的已知与人工注入漏洞,而且能定位仅在复杂条件下出现的问题。 OpenAI副总裁 Matt Knight 表示: 我们的开发者发现,土豚在清晰地解释问题并引导他们找到修复方案方面确实非常有价值。这个信号告诉我们,我们正走在一条有意义 的道路上。 而且,不仅OpenAI。 整个10月 Anthropic 、 谷歌 、 微软 基本上是前脚跟后脚发布了类似的白帽Agent。 Agentic AI +自动修补漏洞 OpenAI对这款白帽Aardvark的官方描述是—— 代理型安全研究员 (agentic security researcher) Aardvark的核心任务是持续分析源代码仓库,以识别安全漏洞、评估可利用性、确定风险等级,并提出有针对性的修复方案 ...
 Windows AI助手免费进化!能操作电脑、登录网页、生成代码
 量子位· 2025-10-31 00:58
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 为什么深度研究智能体需要"计算机使用" ?微软给出几个理由: 具体效果可观看视频演示: Windows Copilot正式更新,人人都能免费拥有操作电脑界面的AI助手了。 具体来说是Microsoft 365 Copilot中的Researcher智能体,新增了"计算机使用"(Computer Use)的能力,可以生成更智能的研究、更深 入的洞察和更全面的报告。 AI助手从"说"到"做" 不同于以往只能通过API调用特定功能,支撑计算机使用能力的是一系列可由Researcher编排层调用的新工具。 编排层连接到一个沙箱环境,提供每一步操作的截图。 这项更新目前已经在Microsoft 365 Copilot的预览版中上线,可参加Frontier Program测试计划获取。 在专注于复杂多步骤浏览任务的基准测试BrowseComp中,Researcher with Computer Use的性能比当前版本的 Researcher提升了44% 。以下是其中一个任务示例: 在2010年代末期,一家采用非传统管理结构(设有多个CEO)的公司提供脑外科手术辅助服务 ...










