量子位
Search documents
对话原力灵机周而进:模型2.4B就够用,关键是“具身原生”;能闭环才是最高效方法
量子位· 2026-02-13 05:42
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 一个 (暂时) 只做具身大脑的公司,抛出了一个 只有2.4B参数 的具身模型。 目前行业风向标如Physical Intelligence的π 0总计约33亿参数,π 0.6的参数量也约莫在50亿以上。 在一个甚至连硬件形态都还没定型的行业里,2.4B参数到底够不够用? 这家公司给出的答案是,够用。 而且 足以支撑它实时处理三视角的728x728画面,推理延迟仅60毫秒;配合强化学习机制,它还能在真机上不断试错进化。 这就是具身智能创企原力灵机推出的首个具身原生模型产品DM0。 2.4B的轻量小蛋糕, RTX 5090就能跑。 因为从零训练以及对具身数采有不同于行业的看法等原因,该公司称它为"首个具身原生大模型"。 与模型同时发布的还有开源具身原生框架Dexbotic 2.0,以及具身原生量产工作流 DFOL。 这具身软件三件套背后技术路线的操盘手,是原力灵机合伙人、负责大模型的周而进。 他在AI圈早已名声在外。 周而进现在才33岁,但 这人已经在AI领域出名13年了 —— 早在2013年,深度学习和人工智能还是冷门的时候,大二的旷视实习生周而进就以一作身 ...
姚顺宇谷歌首秀,Gemini新模型刷爆SOTA:人类仅剩7人捍卫碳基编程
量子位· 2026-02-13 05:42
Core Insights - Google has significantly upgraded its AI model, Gemini 3 Deep Think, in response to competition from Claude Opus 4.6 and GPT Codex 5.3 [1] Performance Metrics - Gemini 3 Deep Think achieved an unprecedented score of 84.6% on the ARC-AGI-2 benchmark, surpassing previous models that scored between 60%-70% [3][26] - In the Humanity's Last Exam (HLE), it scored 48.4%, setting a new state-of-the-art (SOTA) [4][22] - The model also scored 3455 Elo points on Codeforces, ranking it as the 8th in the world [2] - In the International Math Olympiad 2025, it reached gold medal level with a score of 81.5% [5][33] Cost Efficiency - The upgrade has reduced the reasoning cost by 82%, from $77.16 to $13.62 per task [29] Applications and Capabilities - Gemini 3 Deep Think can analyze sketches, model complex shapes, and generate files for 3D printing [8] - It successfully identified a subtle logical flaw in a complex mathematical paper that was missed during human peer review [10][11] - The model optimized a method for growing complex crystals, achieving a thickness greater than 100 microns, which was previously difficult [14] Research and Development Team - The development team includes notable Chinese scientists, such as Yi Tay and Shunyu Yao, who have significant backgrounds in AI and physics [36][41] - Yi Tay has previously worked on early large language models and returned to Google DeepMind after a stint in a startup [38] - Shunyu Yao has a strong academic background, having published in top journals and worked on advanced topics in quantum physics [41][42]
一键搞定百万行Excel和PPT排版!这家杭州电力AI初创要给打工人减负
量子位· 2026-02-13 02:52
允中 发自 凹非寺 量子位 | 公众号 QbitAI AI承诺的生产力革命,正被困在云端的围墙里。 你一定也遇到过这种尴尬: ChatGPT输出的答案逻辑满分,但面对你几百兆的本地数据库,它"罢工"了; Manus概念惊艳,但每一次上传、脱敏、同步,都在无形中消耗你的耐心。 更别提那些碰不得、不能传的 敏感数据 。 正因如此,我们更需要的,或许是一个 深植于系统底层、无惧数据规模、且忠诚的"AI合伙人" ,而非纯聊天搭子。 智能体的演进正在印证这一趋势——从OpenClaw (原ClawdBot/Moltbot) 开始,AI从云端走向本地执行,但复杂的 执行边界、权限控制 与长期稳定性 ,依然是通用方案的"深水区"。 现在,那个填补空白的角色来了: XMO-AgentBox , 一个 本地智能体工作台 。 带着"电力调度级"基因,要让AI在本地真正扎根 它并非横空出世。其背后的杭州 曦谋决策 ,是一家深耕电力能源行业、专攻大规模数学优化与时序预测的公司。 其掌舵人 辛焱 ,曾穿梭于 西门子、清华大学与阿里达摩院 ,有着 10年以上电力行业算法与产品 的实战积累。 从支撑省网级电力调度到领衔AI调度大赛,他们这 ...
1美金时薪雇个全栈替身,MiniMax M2.5让打工人也能体验当老板的感觉
量子位· 2026-02-13 02:52
Core Insights - The article discusses the launch of MiniMax's new model M2.5, which excels in full-stack coding and intelligent agent capabilities, positioning itself alongside Claude Opus 4.6 in performance [2][7][21] - M2.5 is designed to handle both front-end and back-end development, offering a comprehensive solution for coding tasks and data management at a low cost [6][10][61] - The model's rapid processing speed and cost-effectiveness signal a significant advancement in AI applications, indicating an impending explosion of AI utility in various sectors [59][61][64] Group 1: Model Capabilities - M2.5 supports multiple platforms including PC, mobile apps, React Native, and Flutter, making it a versatile tool for developers [3][4] - It can generate complete, functional code for complex projects, such as an e-commerce website with advanced features [12][14] - The model has achieved an impressive 80.2% score on the SWE-Bench Verified leaderboard and ranks first in multi-language tasks [8][10] Group 2: Performance Metrics - M2.5 operates at a speed of 100 transactions per second (TPS), which is double that of mainstream flagship models, enhancing its efficiency in data processing and bug fixing [21][61] - The model's activation parameter count is only 10 billion, making it the smallest flagship model in its class while still delivering top-tier performance [20][64] Group 3: Practical Applications - M2.5 has been successfully integrated into real-world scenarios, such as automating financial report generation and data organization tasks [27][32] - The model's ability to analyze data and provide business insights, such as identifying cost-saving opportunities, showcases its advanced analytical capabilities [38][40] Group 4: Industry Implications - The rapid advancements in the M2 series indicate a shift towards more capable AI models that can independently manage complex tasks, reducing the need for constant developer oversight [59][66] - M2.5's introduction is seen as a catalyst for broader AI adoption across industries, with the potential to transform workflows and productivity [59][64]
小米的首代机器人VLA大模型来了!丝滑赛德芙,推理延迟仅80ms丨全面开源
量子位· 2026-02-12 12:42
Core Insights - The article discusses the rising prominence of embodied intelligence and robotics, highlighting the increasing interest from both large and small companies, as well as capital investment and media coverage [2][3] - There is a growing expectation for embodied robots to transition from being merely demonstrative to becoming practical tools that enhance productivity in real-world applications [3][4] - Xiaomi's new embodied VLA model, Xiaomi-Robotics-0, addresses critical issues such as the frequent pauses and slow corrections in robotic execution, aiming for greater autonomy and efficiency [7][8] Group 1: Industry Trends - The embodied robotics sector is at a pivotal point, characterized by impressive demonstrations of capabilities while also facing scrutiny regarding their actual value in industrial settings [3][4] - The industry is experiencing a paradigm shift where the focus is on the autonomy of robots, moving beyond human-assisted operations to fully autonomous systems [4][6] Group 2: Xiaomi-Robotics-0 Innovations - Xiaomi-Robotics-0 features three core technological innovations: architecture design, pre-training strategies, and post-training mechanisms, all aimed at enabling robots to understand complex environments and execute actions continuously and accurately [12][13] - The model employs a dual-brain architecture, separating the "brain" for decision-making and the "small brain" for generating continuous action blocks, which enhances the smoothness and precision of robotic movements [16][21] - A two-phase pre-training approach is utilized to maintain the model's visual understanding while training it to perform actions, ensuring that the robot can interpret complex instructions and plan continuous movements [24][30] Group 3: Performance Metrics - Xiaomi-Robotics-0 has achieved outstanding results in various benchmarks, surpassing approximately 30 existing models in environments like LIBERO, CALVIN, and SimplerEnv [44][45] - The model demonstrated a 100% success rate in the Libero-Object task and maintained high throughput in real-world tasks such as towel folding and LEGO disassembly, showcasing its practical capabilities [47][54][57] - The model's performance indicates that it does not sacrifice understanding capabilities for control abilities, maintaining high scores across multiple evaluation metrics [49][58] Group 4: Strategic Direction - Xiaomi's approach in the embodied intelligence field appears to focus on practical applications rather than merely showcasing advanced technology, aiming to address real-world industrial challenges [61][65] - The company has recently open-sourced its models, including TacRefineNet, which enhances fine-grained control without relying on visual input, indicating a commitment to transparency and collaboration within the industry [74][76] - This open-source strategy lowers barriers for smaller developers, allowing them to build upon Xiaomi's foundational work and contribute to the development of specialized applications in robotics [78][79]
这个春节P图不求人!小红书开源图像编辑新SOTA
量子位· 2026-02-12 11:00
允中 发自 凹非寺 量子位 | 公众号 QbitAI AI生图领域,又出了个"狠角色"。 今日,小红书基础模型 FireRed-Image-Edit 正式亮相。 FireRed-Image-Edit之所以能被称为"狠角色",不仅在于榜单上的惊艳表现,更源于小红书团队为其量身定制的一套"高难度考卷"与"进阶版 练功房"。 1、重新定义标准:RedEdit Bench 在AI生图领域,现有的基准测试往往难以覆盖用户真实的复杂需求。为此,团队推出了 RedEdit Bench 这一深度评测方案。 全场景覆盖 :包含15个子任务。除了常规的画面增删改外,该评测集还前瞻性地纳入了 人像美化、低画质增强 等高频实战场景。 对比结果显示,FireRed-Image-Edit凭借 更精准的理解力、更强的ID保持度及高效的架构 ,在多项权威测试中脱颖而出,在ImgEdit、 GEdit等多个榜单中取得了 SOTA ,达到业界领先水平。 △ 主流榜单和自建评测集上的指标对比 这种高效架构背后的技术底座,来自小红书Super Intelligence Team在图像生成与编辑领域的一次重要探索。 划重点!目前该 项目代码、技术报告 ...
马斯克回应xAI联创离职潮:这是组织的进化
量子位· 2026-02-12 11:00
Core Viewpoint - The recent departures at xAI, including two co-founders, have raised questions about whether this is a normal turnover or indicative of deeper issues within the company. Elon Musk has stated that the restructuring is aimed at improving execution efficiency and is not merely a layoff [4][9][10]. Group 1: Company Restructuring - Musk emphasized that the departures are part of a necessary organizational restructuring to enhance efficiency at the current scale of the company [9][10]. - The company is still actively recruiting, indicating a focus on future growth despite the recent changes [5][12]. - The restructuring is described as a response to the company's rapid growth, suggesting that some individuals may be better suited for earlier stages of a startup rather than a more mature organization [11][14]. Group 2: Employee Departures - The recent wave of departures includes co-founders Tony Wu and Jimmy Ba, who publicly expressed intentions to pursue new opportunities, suggesting a voluntary exit rather than a forced departure [15][17]. - The narrative from departing employees contrasts with Musk's framing of the situation, leading to speculation about the true nature of these departures [20][21]. - A significant number of core team members have left xAI since February, indicating a potential trend of increasing turnover within the organization [27][38]. Group 3: External Factors - xAI was recently acquired by SpaceX in an all-stock deal, which is expected to influence the company's operational focus and priorities moving forward [38]. - The company is facing external regulatory scrutiny, particularly related to its AI products, which may contribute to internal pressures and the recent departures [40][42]. - The competitive landscape in AI, with major players like OpenAI and Google ramping up efforts, raises concerns about the potential impact of talent turnover on xAI's future capabilities [43][44].
2025具身智能创投全景:554亿热钱,4大估值梯队,10亿元现金流门槛|量子位智库报告
量子位· 2026-02-12 09:30
以下文章来源于量子位智库 ,作者量子位智库 量子位智库 . 连接AI创新,提供产业研究 分析师 王昕祎 量子位智库 | 公众号 AI123All 2025年1月28日,蛇年春晚,16台宇树机器人身着东北花袄,攥紧红手帕,扭起了秧歌。 宇树科技创始人王兴兴可能也没想到,这场春晚表演会成为2025年具身智能融资 狂飙的起点 。 紧接着,3月,刚刚成立的 它石智航 连拿两轮创纪录天使轮,累计融资超16亿人民币;6月, 银河通用 拿下宁德时代领投的11亿元B轮融 资、12月又再融约20亿人民币,估值飙升至210亿人民币,刷新纪录。 2025年,具身智能赛道的全年投资事件从2024年的173起暴涨至 447起 ,涌入资本总量从137亿飙升至 554亿 ,增长分别超250%和 400%。 1亿元 单笔融资, 10亿元 累计融资成为赛道新门槛。 同时, 财务资本 、 产业巨头 、 国资队伍 三路人马齐聚牌桌。 阿里巴巴 投资总额拿下全年第一, 深创投 、 北京机器人产业发展投资基金 、 招商局创投 、 央视融媒体产业基金 等国资频频出手首形科 技、银河通用等明星企业。 2025年的具身智能赛道,上演了一场前所未有的资本狂欢 ...
2026拜年别写对联了,让AI替你写首歌吧
量子位· 2026-02-12 09:30
西风 发自 凹非寺 量子位 | 公众号 QbitAI AI又在某个方面悄悄超越我了——这次是五音比我全! 刚听到这首新歌,我还以为出自哪位实力派"小刀郎"……一两句话说不清,直接来听吧: 故事大概是这样的: 一位刚考完试、顺利毕业的少年,诉说着对老师与同窗的不舍,藏着少年独有的懵懂青涩,也怀揣着对未来的满心憧憬。 制作精良吧?跃动的节奏、流畅的旋律、跌宕的情绪,不失专业水准。 但你敢信?从写词到编曲,整首歌全是AI一键生成。 "小刀郎"们,只是一句话表达了自己的想法,然后等待不到一分钟,就能产出2–6分钟完整音乐,整体结构稳定、音调不跑偏、人声音色自然 不漂移的那种。 这一切,出自于专注自研音乐大模型的AI公司 自由量级 ,刚刚发布的新模型—— 音 潮V3. 0 。 相较于前代,音潮V3.0在 演唱质量、整体悦耳度与记忆点、编曲丰富度、音乐完整性 等方面均实现显著提升。 目前, 音潮V3.0 已正式登 陆网页端与官方App,面向所有用户 免 费 开放试 用 。 既然如此,量子位童鞋又要整活了,实测走起~ AI"灵魂歌手"为你写歌 打开App,可以看到有四种创作模式: 一句 话写歌、照 片写歌、歌词写歌、热歌改 ...
华为升级行业Agent算法架构!MindScale自己写prompt和工作流,KV Cache减少5.7倍token
量子位· 2026-02-12 07:52
Core Viewpoint - The article emphasizes the significance of industry-specific agents in enhancing productivity and value creation through the application of large models in various sectors [1]. Group 1: Challenges in Industry Agent Development - The MindScale project identifies four core challenges in the widespread application of agents across industries: self-evolving workflows, automated prompt optimization, historical knowledge reuse, and complex reasoning evaluation [4]. - The project aims to address these challenges by providing solutions in collaboration with various partners [4]. Group 2: Workflow Development and Automation - The algorithm package includes the EvoFabric agent algorithm, which facilitates self-evolving workflows, allowing for rapid generation of executable workflows from natural language documents and historical tool libraries using SOP2Workflow [5][6]. - The traditional manual maintenance of workflows relies heavily on expert experience, which poses challenges in reusing historical knowledge and maintaining efficiency in training and inference [7]. Group 3: Prompt Optimization Techniques - The article discusses the implementation of a prompt optimization algorithm, SCOPE, which allows developers to optimize prompts between inference steps, achieving over 20% accuracy improvement in specific scenarios [11]. - The C-MOP model introduces a feedback loop for prompt optimization, addressing conflicts in text gradients and enabling automatic prompt optimization based on positive and negative feedback [11][14]. Group 4: Efficiency and Performance Enhancements - MindScale focuses on optimizing training and inference efficiency for industry-specific models, with the TrimR algorithm significantly reducing inference latency by up to 70% in high-concurrency scenarios without compromising accuracy [14][16]. - The introduction of KV-Embeddings redefines the use of KV Cache, enhancing performance in chain-of-embedding scenarios and reducing the number of generated tokens by up to 5.7 times [16]. Group 5: Hardware Adaptation and Implementation - MindScale includes code implementations that are compatible with Ascend hardware, enabling industry developers to build high-precision and efficient agents based on domestic computing power [18]. - The TrimR algorithm employs a lightweight verifier to detect and truncate unnecessary intermediate thoughts without requiring fine-tuning of the large model or verifier, suitable for high-concurrency production environments [19].