量子位
Search documents
中国机器人比赛应急救援,美国网友Reddit破防:我们还在给机器狗化妆拍段子
量子位· 2025-12-12 06:41
亨利 发自 凹非寺 量子位 | 公众号 QbitAI 崇"洋"媚外这一块,也是轮到美国网友了! 最近,一篇"中国机器人在比火场救人,美国机器狗还在给扎克伯格套脸?"的帖子被顶上了Reddit热门。 中国的机器人已经在比赛应急救援了,咱还在给机器狗化妆拍段子。说不落后,那是骗自己。 底下有位网友还来了句神补刀: 不是我们的科学家不干活,而是经费全被这种花里胡哨的玩梗项目吸走了(笑)。 这多多少少让一众美国网友有点破防。 毕竟,这可不是给机器人绑个消防栓,拍段子炒预期,而是已经切切实实地成为一个能上场PK、评分的项目了。 这位发帖的网友表示: 而这一救援项目,就出自最近在上海举办的 GDPS 2025(全球开发者先锋大会暨国际具身智能技能大赛) 。 有意思的是,因为这次GDPS 2025急的、破防的还并不在少数。 如此看来,中国具身好像反倒是外国人更关注,盯得更紧了。 这是怎么一回事? 中国具身还是外国人盯得紧 有一说一,最近外国网友明显开始关注中国具身智能的发展了,而且比咱自己人都盯得紧。 这次的GDPS 2025就是一个典型。 除了上面的机器人应急救援比赛以外,GDPS 2025比赛的规模也实属给外国网友刺激 ...
只需三步,就能认领一台AI手机!
量子位· 2025-12-12 06:41
Core Viewpoint - The article discusses the launch and capabilities of AutoGLM, an AI framework that allows smartphones to perform tasks autonomously based on natural language commands, marking a significant advancement in mobile AI technology [2][12]. Group 1: AutoGLM Overview - AutoGLM is a visual language model-based intelligent assistant framework for smartphones, enabling a paradigm shift from chat-based interactions to actionable tasks [12][13]. - The framework allows users to describe tasks in natural language, which the AI interprets to understand user intent and execute operations on the smartphone [13]. Group 2: Installation Process - The article outlines a simplified three-step process for users to install AutoGLM on their Android devices, utilizing tools like Claude Code and GLM-4.6 [8][11]. - The steps include installing ABD Keyboard, connecting the phone to a computer, and using Claude Code to execute the installation command [9][11]. Group 3: Development Timeline - The development of AutoGLM has spanned 32 months, with three significant milestones, including its open-source release, which allows local deployment and cloud-based experiences [14]. - Key milestones include the first AI agent capable of automatically operating a phone in October 2024, the first fully automated AI-issued red envelope in November 2024, and the release of AutoGLM2.0 in August 2025, which operates in a cloud environment [14].
量子位编辑作者招聘
量子位· 2025-12-12 06:41
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
10亿美元OpenAI股权兑换迪士尼版权!米老鼠救Sora来了
量子位· 2025-12-12 06:41
一水 发自 凹非寺 量子位 | 公众号 QbitAI 天下果然没有免费的午餐! 为了让"米老鼠"加入Sora,OpenAI刚刚正式官宣与迪士尼达成合作。 协议内容之一即为, OpenAI需向迪士尼出售价值10亿美元的公司股权,而且迪士尼还获得未来增持的权利。 消息一出,彭博社带头吃瓜,文章标题更是相当赤裸裸: 奥特曼刚刚意识到,在迪士尼没有免费的午餐。 如此大出血下,现在看OpenAI CEO奥特曼和总裁Greg Brockman的庆贺推文似乎也有点"强颜欢笑"的意味了(bushi。 不过作为交换,OpenAI旗下的视频生成工具Sora现在能"光明正大"地生成200多个热门IP角色。 什么米老鼠、白雪公主、巴斯光年、钢铁侠等,通通拿下。 包括ChatGPT Images也将具备同等能力。 该说不说,迪士尼"地表最强法务"的大名真不是虚的,现在连OpenAI也得变相"割肉"求平安了。 OpenAI迪士尼达成三年之约 调侃归调侃,还是来看看双方达成的正式协议。 根据OpenAI公告,迪士尼将成为 Sora的首个主要内容授权合作伙伴 ,合作时间为三年,第一年的授权许可具有排他性。 根据协议,Sora将获得迪士尼旗下 ...
谷歌智能体发力:增强版Gemini Deep Research和专属API都来了
量子位· 2025-12-12 06:41
Core Insights - OpenAI and Google are both making significant updates in the AI space, with Google launching an enhanced version of Gemini Deep Research aimed at reducing hallucinations and excelling in complex information retrieval and analysis tasks [1][3][10]. Group 1: Gemini Deep Research Enhancements - The enhanced Gemini Deep Research is built on Gemini 3 Pro and will soon be integrated into various Google services such as Google Search, NotebookLM, Google Finance, and the upgraded Gemini App [3][8]. - This version of Gemini Deep Research can perform iterative reasoning, allowing it to generate queries, read and integrate search results, and identify knowledge gaps, significantly improving its web search capabilities [10][12]. - In benchmark tests like HLE, BrowseComp, and DeepSearchQA, the enhanced model has achieved state-of-the-art (SOTA) results, showcasing its superior performance in complex research tasks [10][12]. Group 2: DeepSearchQA Benchmark - Google has released the DeepSearchQA benchmark dataset to provide a more comprehensive evaluation standard for deep search and research tasks, addressing the limitations of existing benchmarks [5][12]. - The dataset includes 900 manually designed causal chain tasks from 17 domains, requiring detailed answer sets, which better measure the model's multi-step reasoning and information fusion capabilities [12]. Group 3: Interactions API - Google has introduced the Interactions API, designed to provide a unified interface for developers to interact with Gemini 3 Pro and Deep Research agents [6][16]. - This API is particularly suited for scenarios requiring multi-step reasoning, tool invocation, and long-term task execution, enhancing the capabilities of existing models [17][18]. - The Interactions API simplifies workflows and adapts better to developer environments by expanding the core capabilities of content generation and supporting server-side state, interpretable data models, and remote tool support [18].
ToC智能体火得快,但更大的价值在企业丨中关村科金@MEET2026
量子位· 2025-12-12 05:30
Core Viewpoints - The transition from the mobile internet's "human-machine connection" to the AI era's "intelligent connection" signifies a profound restructuring within enterprises, where the essence lies in stronger connections rather than merely enhanced tools [1][9]. - Intelligent agents are emerging as super connectors, weaving together people, data, knowledge, and intelligence into the entire operational framework of enterprises, thus forming a new digital workforce [2][12]. Group 1: Intelligent Agent Implementation - The implementation of intelligent agents is not a one-time project but a long-term endeavor driven by continuous iteration across three elements: scenario selection, data and knowledge governance, and model construction [3][14][17]. - Enterprises are advised to focus on three key platforms for effective intelligent agent deployment: a large model platform for cognitive capabilities, an AI capability platform for perception, and an AI data platform for organizational memory [19][20][25]. Group 2: Market Opportunities and Applications - Intelligent agents are creating significant value in both internal and external enterprise operations, enhancing collaboration among employees and improving customer engagement through marketing, customer service, and sales empowerment [12][36]. - The marketing service scenario is highlighted as the most typical and effective application area for intelligent agents, enabling efficient interaction with millions of users through a unified management system [35][36]. Group 3: Industry-Specific Applications - In the financial sector, the company has served over 200 banks and 500 financial institutions, developing numerous intelligent agent solutions for risk control, consumer protection, and credit scenarios [41]. - The industrial sector is also seeing extensive applications of intelligent agents, with a focus on leveraging large language models and other advanced technologies to enhance operational efficiency and optimize processes [45][46]. Group 4: Global Expansion and Future Outlook - The company positions itself as a leading provider of enterprise-level large model technology and application services, actively expanding into international markets such as Hong Kong, Singapore, Malaysia, Thailand, and Indonesia [47]. - The future of intelligent agents in enterprises hinges on creating substantial value, with a higher demand for industry know-how, accuracy, and compliance compared to consumer-facing applications [49][50].
跳过“逐字生成”!蚂蚁集团赵俊博:扩散模型让我们能直接修改Token | MEET2026
量子位· 2025-12-12 03:00
Core Viewpoint - The article discusses the shift from autoregressive models to diffusion architecture in language models, highlighting the potential for faster generation speeds and lower computational costs with diffusion models [2][8]. Group 1: Diffusion Architecture Insights - Diffusion architecture allows for direct modification and control of tokens during inference, unlike autoregressive models that require re-generating entire segments [2][15]. - The recent release of LLaDA 2.0 marks a significant milestone, achieving a scale of 100 billion parameters for diffusion language models [4][44]. - The development of diffusion models is still in its early stages, but it has attracted attention from major companies like Google and ByteDance, as well as several startups [5][41]. Group 2: Technical Aspects and Comparisons - Diffusion models operate on a "fill-in-the-blank" mechanism rather than a sequential token generation, which can lead to more efficient data utilization [12][21]. - In terms of parameter efficiency, diffusion models can achieve similar performance with fewer parameters compared to autoregressive models under the same computational constraints [15][23]. - The unique characteristics of diffusion models allow for continuous training, unlike autoregressive models that plateau after several epochs [24][26]. Group 3: Future Directions and Community Engagement - The article emphasizes the need for further exploration of the scaling laws specific to diffusion language models, which differ from those of autoregressive models [56]. - The community is encouraged to participate in the development and optimization of diffusion models, as the ecosystem is still in its infancy [56]. - Upcoming collaborations and API releases are planned to enhance accessibility and integration of diffusion models into various applications [51].
港中文联手美团开源“视觉推理通才”!图像视频10类任务一网打尽
量子位· 2025-12-12 01:00
OneThinker团队 投稿 量子位 | 公众号 QbitAI 横扫31个主流基准、拿捏10类核心任务,视觉模型界的"通才"来了! 香港中文大学MMLab与美团研究团队开源提出 OneThinker ——一个基于RL的统一多模态视觉推理通才模型,覆盖图像与视频两种模态下 的十类核心视觉任务。 在31项主流视觉任务测试中,OneThinker均表现亮眼。它不仅能在多任务训练中实现相互促进,还能在从未见过的任务上做出合理推理,初 步展现了通才模型的泛化能力。 1. 无法统一建模现实复杂场景 2. 知识隔离,迁移受限 从"专才模型"到"通才系统" 现实世界中的视觉数据复杂多样,往往同时包含静态图像与动态视频信息。同时,视觉任务类型也高度多样化,例如问答、定位、分割、追踪 等。 在这种背景下,传统的"单任务、单模态"RL思考模型架构存在以下两个根本性问题: 虽然以Vision-R1、Video-R1、VLM-R1等为代表的工作,已经在图像问答、视频理解、目标检测等任务上取得显著效果。 但这类RL模型大部分存在一个限制: 只能处理单一模态或单一任务 。模态、任务之间缺乏关联,推理能力被割裂,难以泛化应用。 来看看On ...
GPT-5.2果然反超谷歌Gemini 3 Pro!北大数院校友核心贡献
量子位· 2025-12-12 01:00
Core Insights - OpenAI has released GPT-5.2, which significantly enhances capabilities in various practical fields, including spreadsheet creation, presentation design, coding, and understanding lengthy documents [1][2][3] - The model shows a marked improvement in visual understanding, accurately identifying more components on circuit boards [4] - GPT-5.2 has achieved a new state-of-the-art score of 90.5% in the ARC-AGI-1 test, with a dramatic reduction in task costs from $4,500 to $11.64, indicating a 390-fold efficiency increase over the past year [12][13] Performance Enhancements - GPT-5.2 demonstrates a 71% win rate against human experts in GDPval tests, completing tasks that typically take humans 4-8 hours in a fraction of the time [18][19] - In investment banking tasks, GPT-5.2 Thinking improved its score from 59.1% to 68.4%, reflecting a 9.3% increase in performance [21] - The model's coding capabilities have also improved, achieving an 80% score on SWE-bench Verified and 55.6% on the more challenging SWE-Bench Pro [25][26] Visual and Contextual Understanding - The model has shown a 50% reduction in error rates for understanding scientific paper graphics and has improved spatial awareness of elements in images [34][36] - GPT-5.2 Thinking is the first model to achieve near 100% accuracy on a 256k context length task, showcasing its ability to handle long documents effectively [30] Tool Utilization and Scientific Applications - Tool invocation capabilities have reached new heights, with GPT-5.2 achieving 98.7% in multi-turn interactions in telecom scenarios [40] - In scientific assessments, GPT-5.2 Pro scored 93.2% in GPQA Diamond evaluations, indicating its suitability for assisting researchers [45] Team and Development Insights - OpenAI's recent advancements have been attributed to a new wave of talent, many of whom have strong mathematical backgrounds and joined the company in 2024 [57][58][59]
高通万卫星:混合AI与分布式协同是未来 | MEET2026
量子位· 2025-12-11 11:37
Core Viewpoint - The evolution of AI applications can be categorized into four stages: Perception AI, Generative AI, Agent AI, and Physical AI [9]. Group 1: Stages of AI Evolution - The first stage, Perception AI, includes traditional technologies such as natural language processing, speech noise reduction, and image recognition, which have been commercialized in many terminal devices for years [13][14]. - The second stage, Generative AI, emerged with the rise of ChatGPT, focusing on pre-training with large datasets and completing specific tasks under human supervision, including text-to-image generation and chatbots [14][19]. - The third stage, Agent AI, allows for autonomous actions, predictions, intent understanding, and task orchestration with minimal human intervention [18][19]. - The fourth stage, Physical AI, is still in the research phase, where AI can understand the physical world and respond according to real physical laws [21][22]. Group 2: Current Industry Trends - The industry is currently transitioning from Generative AI to Agent AI, with a focus on enhancing terminal capabilities from single text modalities to multi-modal interactions [4][19]. - The deployment of large models on terminal devices faces challenges such as memory limitations, bandwidth constraints, and power consumption [6][30][34]. Group 3: Advantages and Challenges of Edge AI - The primary advantage of running large models on terminal devices is personalization, as data generation occurs close to the source, enhancing privacy and security [31]. - Edge AI also offers the benefits of being free and not requiring internet connectivity [32]. - Challenges include memory limitations that restrict model size, bandwidth limitations affecting inference speed, and the need for efficient power management in high-integration devices [34][35][36]. Group 4: Technological Innovations - Qualcomm has developed several technological innovations to address these challenges, including quantization and compression techniques to reduce memory usage, parallel decoding to enhance token generation speed, and advanced NPU architectures for improved performance [37][39][40]. - The parallel decoding technique allows for the generation of multiple tokens simultaneously, improving efficiency and user experience [41][42]. Group 5: Future of AI Experience - The future AI experience is expected to evolve towards a hybrid AI model, where efficient models run on the edge provide personalized services, while larger models in the cloud offer more powerful capabilities [55][57]. - Qualcomm aims to ensure seamless collaboration between edge and cloud environments through low-latency, high-speed, and secure connectivity [58].