Workflow
量子位
icon
Search documents
快慢思考不用二选一!华为开源7B模型实现自由切,精度不变思维链减近50%
量子位· 2025-09-10 08:01
允中 发自 凹非寺 量子位 | 公众号 QbitAI 国产 自研开源模型 ,让模型不用在快思考和慢思考间二选一了! 华为最新发布 op enPangu-Em bedded-7B-v1.1 ,参数只有7B,却身怀 双重"思维引擎" 。 要知道,长期以来,大模型快思考与慢思考模式不可兼得,这成为业界的一大痛点。在当前大模型混战中,各家巨头都在寻求破局之道, 但此前开源领域一直缺乏一款可自由切换快慢思维模式的模型。 要快,还是要慢?AI在面对不同难度的问题时也有"选择困难症"。 而现在,openPangu-Embedded-7B-v1.1,通过 渐进式微调策略 和 独特的快慢思考自适应模式 ,既支持手动切换"快思考"或"慢思考"模 式,也能根据问题难度自动在两种思维模式间无缝转换。 简单问题它秒答如飞,复杂任务它深思熟虑,一举填补了开源大模型在这一能力上的空白,让效率与准确率实现双赢。 在通用、数学、代码等多个权威评测中,该 模型精 度相较于此前模型 大幅提升,且 引入模式自动切换并没有牺牲精度 。在CMMLU等基 准中,openPangu-Embedded-7B-v1.1 保持精度的同时,平均思维链长度缩短近50 ...
首个Data Agent基准测试来了!2007个测试任务将数据库、PDF、视频、音频异构数据源一网打尽
量子位· 2025-09-10 08:01
FDABench团队 投稿 量子位 | 公众号 QbitAI 数据智能体到底好不好用?测评一下就知道了! 南洋理工大学、新加坡国立大学携手华为 开源 推出 首个专门针对数据智能体(Data Agents)异构混合数据分析的综合性基准测试 FDABench 。 该基准横跨50+数据领域、设置了多种难度等级和任务类型,还独创了 Agent-Expert协作框架 ,确保测试用例质量和数据一致性,同时支 持Data Agent、RAG、语义算子以及四种典型Data Agent工作流模式。 团队使用FDABench对各种数据智能体系统进行了评估,发现每个系统在响应质量、准确性、延迟和token成本方面都表现出独特的优势。 下面详细来看。 将 数据库、 PDF、视频、音频异构数据源一网打尽 面对数据驱动决策的需求日益增长,这催生了对能够整合结构化和非结构化数据进行分析的数据智能体的迫切需求。 △ Data Agent 样例 为应对这些挑战,团队提出了 FDABench ,这是首个专门为评估多源数据分析场景中的智能体而设计的数据智能体基准。 首先,由于难以设计出能评估智能体在多源分析任务中各项能力的测试用例,全面的数据智能 ...
英伟达新GPU,超长上下文/视频生成专用
量子位· 2025-09-10 01:28
henry 发自 凹非寺 量子位 | 公众号 QbitAI 老黄对token密集型任务下手了。 刚刚,在AI Infra Summit上,英伟达宣布推出专为处理 百万token 级别的代码生成和 生成式视频 应用的全新GPU—— NVIDIA Rubin CPX GPU 。 老黄表示:Rubin CPX是 首款 为超大上下文AI量身定制的CUDA GPU,可以让模型"一口气"推理数百万token。 而且,RubinCPX还能让你越用越省钱:每投资 1亿 美元,就能获得 50亿 美元的token收益。 (50倍,你就赚吧,老黄说的) 对于"老黄画的饼", Cursor 、 Runway 、 Magic 等行业大佬也表示RubinCPX将分别在 代码生产力 、 生成式影像创作 、以及 大模型 自主代理 上带来突破。 那么好了好了,这GPU到底什么来头? 首款专为超大上下文AI打造的CUDA GPU Rubin CPX基于NVIDIA Rubin架构,采用单片设计,内置NVFP4计算资源,主打AI推理的高性能和高能效。 它的性能提升,主要体现在以下几个方面: 在这里,我们可以简单地拿A100来对比一下。 在算力方面 ...
Claude用户退订潮!被指高峰期偷换缩水模型,工程师列9大罪状呼吁全网退订
量子位· 2025-09-10 01:28
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 点赞者就2000多,用实际行动退订的也不少。 退订者中有最高价 20倍Max套餐 的重度用户。 原本在开发者社区口碑甚好,甚至 Claude Code单产品年化收入估算达到5亿美元 的Anthropic,到底因何犯了众怒? 工程师Ahmad Osman细数几大罪状: 就这甚至还没列完,可想而知这位开发者有多愤怒了。 Claude出现大危机,不是因为最近的某些骚操作,而是 产品本身就出了问题 。 已经有AI工程师带头呼吁大家退订(这里PoS指Piece of Shit,也就是一坨 )。 评论区有人补充,最糟糕的是模型悄悄变差,而你白白浪费了一小时才能意识到,没有哪个专业的开发环境是不能固定版本的。 好,现在骂也骂了退也退了,活还是得干,总不能退回到古法手工写代码吧。 那么以后用啥? 有很多人集中转投去了隔壁 OpenAI Codex ,甚至惊动了奥特曼本曼。 OpenAI Codex强势崛起 如果你在前几天打开美国贴吧Reddit的Claude Code吧,就会发现怎么全是讨论OpenAI Codex的,都要怀疑是不是走错门了。 在白天高峰时段,用到的是缩水 ...
库克挤爆牙膏!5999元iPhone17上高刷,新款耳机能测心率+同传
量子位· 2025-09-09 20:23
按库克的说法,这波新品一切都以设计为核心。 效果上看,iPhone系列的镜头模组也确实基本告别了过去的"浴霸"模式。 克雷西 鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 标准版iPhone终于也用上高刷了! 刚刚结束的苹果春晚上,iPhone、AirPods Pro和Apple Watch相继登台亮相。 | ... | 0000 | 00000 | | --- | --- | --- | | 新款 | 新款 | 新款 | | iPhone 17 Pro | iPhone Air | iPhone 17 | | 创新设计,打造巅峰性能和超 | 迄今最薄 iPhone,身藏高 | 拉高好感度,再添耐用性。 | | 长续航。 | 能内核。 | | 当然最令果粉激动的,还是这次iPhone全系都安排了高刷。 确实,型号基础,刷新率就不基础。 除了iPhone,耳机和手表也迎来重要升级,牙膏直接挤爆。 比如AirPods Pro也变身智能穿戴,不仅能够进行同声传译,还支持心率检测。 还有Apple Watch也支持了5G通信,还新增了重磅健康功能。 不得不说这一波库克的刀法是真的变温柔了 (希望英伟达的老黄也 ...
Transformer作者:DeepSeek才有搞头,OpenAI指望不上了
量子位· 2025-09-09 20:23
Core Viewpoint - The article discusses Ashish Vaswani's shift towards open-source AI research, criticizing closed-source companies for hindering scientific exploration and innovation in the field of artificial intelligence [2][28]. Group 1: Background and Motivation - Ashish Vaswani, co-creator of the Transformer model, believes that companies like OpenAI are too focused on commercialization, neglecting foundational research [2][4]. - After facing challenges with the Scaling Law and internal conflicts at Adept AI, Vaswani decided to establish Essential AI to pursue open-source research [10][11][14]. - The shift to focus entirely on foundational research was supported by the board and investors, indicating a recognition of the need for more open alternatives in AI [19][20]. Group 2: Vision for Open Source - Vaswani aims to create a platform similar to DeepSeek in the Western world, emphasizing the importance of open-source AI in education and healthcare [6][26]. - He argues that AI should not only serve commercial interests but also benefit the general public, allowing access to advanced technologies in underserved areas [28][29]. - Essential AI's recent research suggests that breakthroughs in large language models can be achieved during the pre-training phase, potentially reducing training costs significantly [31][33]. Group 3: Challenges with Closed Source - The article highlights the "innovator's dilemma" faced by AI companies, where the focus on commercialization can stifle innovation and long-term research [39][42]. - Many AI firms have shifted resources away from research to prioritize immediate profitability, especially in a challenging market environment [36][38]. - Vaswani expresses concern that the best-performing models from closed-source companies may actually impede progress in AI development [35]. Group 4: Funding and Sustainability - Vaswani proposes a cross-subsidization model, where revenue from certain services can support open-source initiatives, ensuring the sustainability of Essential AI [53][57]. - He believes that open-source models can be more profitable in the long run, as they foster a collaborative ecosystem that drives innovation [61].
人类秒懂,AI崩溃:一个简单测试,就让GPT-5、Gemini等顶级模型集体“翻车”
量子位· 2025-09-09 12:20
Core Viewpoint - The article discusses the limitations of AI models in understanding distorted text, highlighting that while humans can easily comprehend such text, AI struggles significantly, indicating a fundamental gap in AI's text recognition capabilities [2][23]. Group 1: AI Performance in Text Recognition - A research team from various institutions conducted experiments showing that leading AI models like OpenAI's GPT-5 and Google's Gemini perform poorly when faced with distorted Chinese characters and English words [2][4]. - In tests involving 100 four-character Chinese idioms, AI models achieved a maximum average matching rate of only 12.1%, while humans scored 100% [7]. - For 100 eight-letter English words, AI models also failed to provide correct answers, demonstrating a consistent inability to recognize and interpret distorted text [10][20]. Group 2: Human vs. AI Understanding - Humans can easily read distorted text due to their advanced visual processing and understanding of language structure, while AI relies on pattern matching without grasping the underlying text structure [9][24]. - The inability of AI to separate and combine symbols leads to failures in recognizing even slightly altered text, which humans can still understand [26][28]. - The research emphasizes that human reading comprehension is a multi-modal process, relying on various sensory inputs and reasoning abilities, unlike AI's current capabilities [29]. Group 3: Implications for AI Development - The findings suggest that to enhance AI's resilience in text recognition, there is a need to rethink how vision-language models (VLMs) integrate visual and textual information, potentially requiring new training data and structural insights [28]. - The results also highlight the challenges AI faces in practical applications, such as education and security, where non-standard text recognition is crucial [30].
文心X1.1发布!这三大能力突出,一手实测在此
量子位· 2025-09-09 12:20
西风 发自 凹非寺 量子位 | 公众号 QbitAI 刚刚,百度深度思考模型升级上线了! 升级后的文心 大模型X1 .1 ,在 事实性、指令遵循、智能体 等能力上均有显著提升。 官方展示了其在智能客服场景复杂长程任务中的应用,在System Prompt中输入用户的问题后,文心X1.1借助模型本身智能体能力,即可自 动拆分复杂任务,调用不同工具逐步规划执行,且严格遵循服务流程和业务规则。 再用它编写python脚本,让25个彩色粒子在真空圆柱形容器里弹跳、留轨迹,还要带容器旋转和场景缩放。 效果丝滑,粒子全程守规矩没出界: 用HTML动 画整活归并排序,排序过程动态可视化,算法步骤一目了然: 最新开源思考模型ERNIE-4.5-21B-A3B-Thinking 发布,该模型在ERNIE-4.5-21B-A3B基础上训练而来,在内容创作、逻辑推理、数学计 算、代码生成与工具调用等多个任务中表现卓越。 此外,百度发布了 ERNIEKit文心大模型开发套件 ,提供更加便捷的模型后训练方案,仅 需 4张GPU即可对ERNIE-4.5-300B-A47B模型进 行高效调优 ,进一步降低开发者将模型 落地到实际应用的门槛 ...
一致性对标Nano Banana,国产Vidu Q1同时支持7张参考 | 实测
量子位· 2025-09-09 12:20
不圆 发自 凹非寺 量子位 | 公众号 QbitAI 最近AI生图赛道简直卷疯了! 从Nano Banana的爆火,到即梦AI 4.0,豆包4.0接连上线,一直专注于视频大模型的Vidu也按捺不住了: Vidu Q1 参考生图堂堂登场!同时支持 7张 参考。 主体一致性比起谷歌Nano Banana也毫不逊色。 (Nano Banana最多支持3张参考图) 量子位抢先实测了这款模型,它的表现相当不错——能够自由引用的7张参考图,带来了极高的可操作性。 用简单的自然语言描述即可。 或者是直接生成时尚大片,现场拍摄啥的都省了。 我们探索了很多有趣的玩法,提示词、图片都放在下面了,一起来看一下! 7张参考图,能怎么玩? 我们实测了几种玩法,比如让各种违和的元素凑成一张和谐的画面、或者是制作时尚大片…… 可以说,只要有创意,万物皆可合成。 万物皆可合成 无论是让秦始皇骑北极熊在上海喝柠檬水: 还是让李白坐火箭成功登月: 参考图一放,就看Vidu Q1的结果是否符合想象。 潮流单品秒变OOTD 既然有那么多参考,岂不是可以直接全套换装? 所有单品一键上身,是时候展现搭配之力了(摩拳擦掌)。 用这套提示词,不管是地中海还 ...
AlphaGo作者领衔,8个机械臂协同干活0碰撞,DeepMind新作登Science子刊
量子位· 2025-09-09 12:20
Core Viewpoint - The article discusses the innovative RoboBallet project, which combines Graph Neural Networks (GNN) with Reinforcement Learning to enhance multi-robot collaboration in complex environments, showcasing significant advancements in robotic motion planning and task allocation [5][9][24]. Summary by Sections Introduction - RoboBallet is a collaborative robotic system that allows multiple robotic arms to work together efficiently in a shared space without collisions [1][2]. Technical Innovation - The project utilizes GNNs for strategy networks and state-action value estimation, enabling the control of up to 8 robotic arms and managing 56 degrees of freedom [6][9]. - It addresses three complex sub-problems: task allocation, task scheduling, and motion planning, which are traditionally challenging for existing algorithms [10][12]. Methodology - The environment is modeled as a graph structure, where nodes represent robots, tasks, and obstacles, and edges denote relationships among them [11][14]. - The GNN processes dynamic graph sizes and generates joint velocity commands for the robotic arms based on observed states [14][15]. Performance Metrics - RoboBallet was tested in a simulated environment with 4 to 8 robots, 40 tasks, and 30 obstacles, demonstrating superior performance compared to traditional methods [18][19]. - The planning speed is remarkable, with each planning step taking approximately 0.3 milliseconds on an NVIDIA A100 GPU, achieving over 300 times real-time planning speed [21]. - The average execution time decreased by about 60% as the number of robots increased from 4 to 8 [22]. Generalization and Applications - The model exhibits zero-shot transfer capabilities, allowing it to adapt to new environments without additional training [24]. - RoboBallet's efficiency can optimize work unit layouts, reduce task execution time by 33%, and enhance fault-tolerant planning [24].