Workflow
腾讯研究院
icon
Search documents
腾讯研究院AI速递 20260128
腾讯研究院· 2026-01-27 16:03
Group 1 - Microsoft has launched its self-developed AI chip Maia 200, which utilizes TSMC's 3nm process, featuring over 140 billion transistors and achieving FP4 performance exceeding 10 PetaFLOPS, three times that of Amazon's third-generation Trainium [1] - The Maia 200 chip is designed specifically for AI inference, equipped with 216GB of HBM3e memory and a bandwidth of 7TB/s, providing a 30% performance improvement per dollar compared to the latest hardware [1] - Maia 200 will support large models such as OpenAI's GPT-5.2 and is already deployed in a data center in the central United States, with a preview version of the SDK available [1] Group 2 - Anthropic has introduced the MCP service for Claude, integrating productivity tools like Figma, GitHub, and Canva, allowing users to directly invoke third-party services within conversations [2] - This upgrade transforms Claude from a passive chatbot into an intelligent platform capable of actively scheduling external resources, enabling users to command workflows across applications using natural language [2] - The MCP protocol is open-sourced, aiming to establish a competitive edge in defining the "operating system" of the AI era, with a focus on deep integration to enhance initial user experience [2] Group 3 - DeepSeek has open-sourced its OCR model DeepSeek-OCR 2, which employs a new decoder that allows the model to read in a structured order rather than mechanically scanning, improving its understanding of complex layouts and tables [3] - The model achieved a score of 91.09% in the OmniDocBench v1.5 test, a 3.73% improvement over its predecessor, with the reading order edit distance reduced from 0.085 to 0.057 [3] - This architecture has the potential to evolve into a unified multimodal encoder capable of processing text, speech, and visual content within the same parameter space [3] Group 4 - The Kimi K2.5 model has been released and open-sourced, recognized as one of the most intelligent and versatile models, supporting both visual and text inputs, as well as thinking and non-thinking modes [4] - K2.5 introduces agent cluster capabilities, allowing it to autonomously create up to 100 avatars to process 1500 steps in parallel, reducing actual runtime by up to 4.5 times [4] - Alongside this, Kimi Code has been launched, supporting terminal execution and integration with mainstream editors, enabling programming assistance through image and video inputs, with the Agent SDK set to be open-sourced [4] Group 5 - Alibaba has launched the flagship reasoning model Qwen3-Max-Thinking, which competes with GPT-5.2-Thinking and Claude-Opus-4.5 across 19 benchmark tests [5] - This model features adaptive tool invocation capabilities, automatically calling search engines and code interpreters as needed, eliminating the need for manual selection by users [5] - It employs an experience accumulation testing strategy that focuses computational resources on smarter reasoning processes rather than stacking parallel paths, achieving more accurate and efficient reasoning outcomes [5] Group 6 - Tencent's Sogou Input Method has announced a comprehensive AI upgrade with its 20th major version, integrating the mixed Yuan model, reaching over 100 million AI users, and averaging nearly 2 billion voice uses daily [6] - The AI voice model has improved fluency by 40% and achieved an accuracy rate of 98%, with dialect recognition enhanced by 30%, maintaining a 97% accuracy rate even in low-volume scenarios below 20 decibels [6] - The AI translation model now supports over 30 languages for instant translation, and the AI typing model's vocabulary has expanded exponentially, with local life vocabulary exceeding 50 million [6] Group 7 - Hyper3D has released Rodin Gen-2 Edit, a 3D generation platform that integrates natural language-based local editing capabilities, marking the first commercial product to combine 3D generation and editing into a complete workflow [7] - Users can select areas and input text commands for local adjustments, with the ability to import any existing models, including those generated by third-party AI, for editing, ensuring seamless integration with the original model [7] - This advancement signifies a shift in 3D generation from a "gacha" model to an iterative workflow era, with the platform now compatible with mainstream workflows like Blender, Maya, and Unity [7] Group 8 - Ant Group has unveiled its embodied research, introducing the high-precision spatial perception model LingBot-Depth, which significantly enhances depth output quality in complex material scenes like transparent and reflective surfaces without hardware changes [8] - The model utilizes a masked depth modeling approach, treating naturally missing depth from sensors as learning signals rather than noise, outperforming top-tier depth cameras in depth accuracy and pixel coverage [8] - In practical tests, the dexterous hand successfully grasped transparent glass cups and reflective stainless steel cups, with the model fully open-sourced and ready for deployment [8] Group 9 - Anthropic's CEO Dario Amodei has published a lengthy article warning that by 2027, humanity may face a "technological coming-of-age," with AI potentially forming a "data center genius nation" with 50 million "citizens" [9] - The article analyzes five major crises: risks of AI autonomy, misuse of biological weapons, authoritarian control, economic disruption, and existential crises, warning that AI could disrupt the balance between "capability" and "motivation" [9] - Anthropic advocates for a "Constitutional AI" approach and reasonable regulation to build defenses, despite being viewed as an outlier in the industry, with its valuation increasing sixfold over the past year, urging humanity to face civilizational tests with courage [9]
腾讯郭凯天:让AI成为尊重人、成就人、有温度的力量
腾讯研究院· 2026-01-27 15:33
2026 年 1 月 27 日,腾讯研究院主办的 腾 讯 科 技向善创新节 202 6 正式举办。 腾讯集团 高级副 总 裁 郭凯天 先生在现场进行了开场致辞。 以下为郭凯天先生的致辞全文: 各位朋友,大家好 : 很高兴新年伊始,就和大家在线下 再次相聚 。 今年是科技向善大会的第九年。在这些年,我们时时刻 刻对照科技向善这面镜子,跟用户感受对标,跟行业发展秩序对标 , 和时代和国家的期待 对标 。 一路走来,在科技向善的指引下,我们在乡村振兴、数字支教、公共应急、智慧养老等领域 , 持续输 出富含社会价值的公共产品 ; 在支持原创科技创新方面, 也 先后发起设立了科学探索奖和 新基石研 究员 计划,助力国家科技进步。 其实这些年里 , 从我们提出科技向善 开始 , 就经常有 很多人会问我 , 腾讯坚持科技向善 , 把 " 用 户为本,科技向善 " 作为使命 ,到 底得到了什么?这个问题我们 自己也 经常会去思考 、 去对照 。尽 管 得到什么并不重要, 但 我仍然相信我们得到的非常丰富 。 其中最核心的一点,我想是 科技向善让腾讯始终知道自己是谁,应该做什么 、 不应该做什么,腾讯跟 用户的关系 、 跟社会 ...
腾讯研究院AI速递 20260127
腾讯研究院· 2026-01-26 16:03
Group 1: Tencent's Innovations - Tencent launched the Mix Yuan 3.0 model with 80 billion parameters, utilizing MoE architecture for image editing and multi-image fusion, now available on Yuanbao and Mix Yuan official websites [1] - The model exhibits "thinking" capabilities, understanding content before reasoning for editing steps, enabling functions like adding, deleting, modifying, style changes, and old photo restoration [1] - Users can create memes, virtual character collaborations, and e-commerce poster designs, trained on millions of data points covering over 80 tasks [1] Group 2: Yuanbao's Social AI Features - Yuanbao initiated the internal testing of "Yuanbao Club," allowing users to create or join groups and interact with AI for chat summaries and interest tracking [2] - The platform will integrate Tencent Meeting's audio and video capabilities, supporting features like "watch together" and "listen together," with AI available for queries [2] - Tencent announced a 1 billion cash red envelope promotion for the Spring Festival, potentially reviving the popularity of WeChat red envelopes and encouraging users to transition from "single-player AI" to "social AI" [2] Group 3: Clawdbot and Open Source Developments - Clawdbot, an open-source project created by Peter Steinberger, can run locally and integrate with tools like WhatsApp, Telegram, and GitHub, receiving over 30,000 stars on GitHub [3] - MiniMax M2.1 serves as the core engine, demonstrating excellent performance in tool invocation at a low cost, enabling developers to implement complex workflows like car price comparison and email processing [3] - Users praise M2.1 for its remarkable "cost-performance ratio," allowing continuous operation of a super-intelligent workflow for just $10 per month [3] Group 4: Advances in AI Interaction - iFlytek's Starry Sky Intelligent Agent platform announced a major upgrade, fully integrating with the AIUI open platform for rapid customization of voice tones through natural language [4] - The upgrade enhances multimodal hyper-human interaction capabilities, allowing for voice replication and digital avatar creation from a single photo, with automatic expression and action generation [4] - RPA digital employees have upgraded intelligent components to assist with web automation and visual data processing, enabling non-programmers to quickly orchestrate automated workflows [4] Group 5: Insights from Toco AI - Toco AI, founded by former NetEase Cloud Music CTO, aims to introduce modeling methodologies into AI coding, addressing architecture and maintainability challenges [7] - The founder believes that standardized code will become less important, emphasizing the significance of business description, understanding, and long-term planning in the AI era [7] - Toco is positioned to redefine UML with an AI-native approach, embedding architect capabilities suitable for new projects and system restructuring, aiming to become an industry standard like Spring for Java [7] Group 6: Strategic Directions from Jiyue - Jiyue's new chairman, Yin Qi, focuses on foundational model development and terminal commercialization, dedicating over 80% of time to core product technology [8] - He asserts that AGI must interact with the physical world, identifying three core scenarios: individuals, transportation, and home, with vehicles as the primary entry point, ultimately leading to robotics [8] - Jiyue's 2026 strategy emphasizes breakthroughs in foundational models, multimodal integration of text, voice, and images, and differentiated VLA capabilities for terminal execution devices [8] Group 7: AI in Aerospace - The European Space Agency's FLPP program collaborates with German MT Aerospace to utilize AI-driven laser sensors for real-time defect detection, reducing carbon fiber tank weld analysis time by 95% [6] - NASA's Expedition 74 team tests AI-assisted tools for voice-to-text conversion, enhancing communication efficiency between crew members and ground control [6] - Research indicates that AI's "scientific autonomy" concept allows for real-time data analysis in extraterrestrial missions, though over-reliance on synthetic data may lead to "cognitive illusions" affecting reliability [6] Group 8: Palantir's Perspective on AI - Palantir's CEO critiques Silicon Valley's "dopamine economy" in his new work "Tech Republic," advocating a shift from consumer internet to "survival engineering," focusing on defense and energy sectors [11] - He argues that the strategic nature of AI prevents complete privatization, with the coupling of government and enterprise being a key variable in national competitiveness [11] - The article suggests using engineering thinking to combat corporate "spiritual hollowing," including clear objective functions, iterative cultural development, and retaining innovation redundancy [11]
是时候了,见个面吧
腾讯研究院· 2026-01-26 07:04
当远程的连接已如此紧密, 我们反而更需要一次真实的相聚。 当数字的潮水涌向远方, 我们选择在此刻锚定坐标。 时隔四年,腾讯科技向善创新节终于回归线下。 这不仅是一次简单的重启,更是四年来,我们再一次主动选择"在场"。 就在明天—— 2026年1月27日, 深圳市南山区荔园路9号 G&G 创意社区 我们在这里等你。 23 场分享,52 位嘉宾, 将共同探讨AI 如何塑造世界, 以及人如何在技术洪流中,坚守我们何以为人的根本。 今天下午 5:30,现场报名截止。 是时候了,见个面吧。 扫码报名: 腾讯 科技向善创新节 2026年1月27日 主会场 世界 9:30-9:40 致辞 郭凯天 腾讯集团高级副总裁、腾讯研究院理事长 9:40-9:55 Al时代,为谁而来? 袁晓辉 腾讯研究院创新研究中心主任、资深专家 9:55-10:05 远方"树洞"的回音 : 麦克法兰跨洋答网友问 艾伦·麦克法兰 英国历史学家、人类学家、剑桥大学国王学院终身院士 (10:05-10:35 AI原生一代:组织与人的进化 司晓 腾讯集团副总裁、腾讯研究院院长 卡兹克 虚实传媒CEO、"数字生命卡兹克"主理人 张笑宇 亚洲图书奖得主、新锐科 ...
腾讯研究院AI速递 20260126
腾讯研究院· 2026-01-25 16:01
生成式AI 一、OpenAI Codex预告,今先揭秘Codex CLI核心智能体循环 1. OpenAI CEO奥特曼预告下周起将发布Codex相关重磅内容,官方同步发布技术博客揭秘Codex CLI核心架构—— 智能体循环; 2. 智能体循环通过Responses API协调用户指令、模型推理与本地工具执行,采用"提示词前缀一致"策略触发缓存优 化性能; 3. Codex支持零数据保留配置保障隐私,利用自动压缩技术管理上下文窗口,后续将深入介绍工具调用和沙箱模型。 https://mp.weixin.qq.com/s/dEpQBtt3wHgzPz_HEIDVdA 二、谷歌 DeepMind 发布 D4RT,彻底颠覆了动态 4D 重建范式 1. 谷歌DeepMind发布D4RT,将3D重建、相机追踪、动态物体捕捉统一成"查询"动作,速度比现有SOTA快18至 300倍; 2. 核心创新是统一的时空查询接口,AI先全局"阅读"视频生成场景表征,再按需搜索任意像素的3D轨迹、深度和位 姿; 3. 该技术对具身智能、自动驾驶和AR意义重大,让AI实时理解动态环境,但训练仍需10亿参数模型和64个TPU。 1. 百 ...
腾讯研究院AI每周关键词Top50
腾讯研究院· 2026-01-24 02:33
Group 1: Key Trends in AI - The article highlights the emergence of significant AI keywords and trends, including advancements in models and applications across various companies [2][3][4]. - Notable AI models mentioned include GLM-4.7-Flash by Zhiyuan AI and Step3-VL-10B by Jieyue Xingchen, indicating a competitive landscape in AI model development [3]. - Companies like OpenAI and Anthropic are leading in AI applications, with innovations such as ChatGPT Translate and permanent memory features [3][4]. Group 2: Company Innovations and Developments - Tesla is advancing with its AI5 chip, showcasing the importance of hardware in AI development [3]. - Apple is introducing AI devices similar to AirTag, indicating a trend towards consumer-oriented AI products [4]. - OpenAI's recent court testimonies and the unveiling of new models reflect ongoing legal and ethical discussions in the AI sector [4]. Group 3: Perspectives on AI Future - Sequoia Capital asserts that AGI (Artificial General Intelligence) has arrived, suggesting a paradigm shift in AI capabilities [4]. - OpenAI emphasizes the importance of model understanding and collaboration, which could shape future AI interactions [4]. - Anthropic discusses the concept of a new AI constitution, indicating a focus on ethical frameworks in AI development [4].
没有人类参与的AI音乐才会趋于平庸|破晓访谈
腾讯研究院· 2026-01-23 08:48
生成式人工智能(GenAI)引爆了一场深刻的生产力范式革命。在文化产业领域,既有的内容创作、价 值生成、商业模式与消费形态等正面临着全面重塑,引发全行业对未来的深刻追问。 腾讯研究院与中国传媒大学文化产业管理学院合作推进《GenAI重塑文化产业》研究项目,聚焦GenAI 在长视频、短视频、音乐、动画、网络文学等重点领域的应用,分析多领域产品、产线、管线的变化趋 势,探索文化产业智能化发展路径。期望汇聚技术涌现的"智能之光"与人类永恒的"智慧之光",迎接文 化产业变革时刻。 继前四期专题访谈之后,本期我们聚焦腾讯音乐娱乐集团(以下简称TME),与腾讯音乐曲库版权高级 总监, 酷狗内容运营总监 兼启明星 AI 音乐项目负责人李玲玲,共话音乐产业在GenAI时代的技术变 革、模式创新、版权困局与创新突破。 本期嘉宾: 李玲玲 腾讯音乐曲库版权高级总监&酷狗内容运营总监,启明星AI音乐项目负责人 课题组: 中国传媒大学文化产业管理学院 :刘江红、田卉、 陈娴颖、李苏怡 等 腾讯研究院 :孙怡、田小军、冯宏声等 致谢: 腾讯音乐研究院: 陈晓宇 【 观点速览 】 1. GenAI对于音乐产业的核心价值:极大提升创作效率 ...
腾讯研究院AI速递 20260123
腾讯研究院· 2026-01-22 16:01
生成式AI 一、Runway发布全新Gen 4.5模型,57.1%的人分不清真假 1. Runway发布全新Gen 4.5图生视频模型,在镜头控制和故事叙事能力上实现显著提升,能在5秒内快速生成包含近 景、中景、远景的三个镜头; 2. 在1000人参与的测试中,仅有57%的人能分辨AI生成视频与真实视频,模型在人物面部一致性、光影逻辑和物理 规律表现上接近电影级水准; 3. 视频生成模型正进入新一轮升级期,真实度、声画同步、局部控制精细化和更长生成时长成为行业共同趋势。 https://mp.weixin.qq.com/s/_CryhmDwnF9C3O-6LM2-iQ 二、谷歌Gemini变身免费家教,接入SAT全真模考,讲错题 1. 谷歌联手The Princeton Review将全套SAT模拟题整合进Gemini,用户可免费进行全真模考,分数立等可取并获 得详细错题解析; 2. 测试涵盖阅读写作和数学两大模块,支持自定义倒计时和提示功能,Gemini会把解题思路拆解成详细步骤辅助理 解; 3. SAT只是第一步,谷歌计划将Gemini逐步扩展到更多标准化考试,同时通过垂直领域渗透策略让AI成为各行业的 ...
探元计划NextGenAI考古赛道:方案火热征集,四大场景命题等您共创
腾讯研究院· 2026-01-22 08:44
腾讯探元计划NextGen AI考古赛道四大"特定命题"正式发布,1月31日前面向全球公开征集技术团队揭 榜挂帅;同时,持续征集"开放命题"技术方案,欢迎技术团队携文化场景单位联合申报。 当千年瓷片在数字空间重聚轮廓,当水下题刻褪去岁月侵蚀的斑驳,当残破壁画重现昔日艺术风华,当海量 陶片缀连起殷商文明脉络——AI与考古的跨界融合,正为沉睡的文化遗产注入新生动能。四大著名文博场景 单位携世界级文化瑰宝重磅亮相,腾讯探元计划提供最高百万级的专项资助,面向全球招募顶尖技术团队, 以先进数字技术破解古老文物保护难题,共赴这场跨越时空的文化传承之约。 四大文化场景需求发布, 诚邀技术团队共创解法! 【场景一】 景德镇陶瓷智拼:御窑"古陶瓷基因库"的AI文物修复 作为千年瓷都的璀璨名片,明正统青花云龙纹大缸承载着御窑烧造的巅峰技艺与中华陶瓷文化的精髓, 却因历史变迁碎为15000片珍贵碎片。传统手工修复不仅周期漫长,更面临二次损伤的风险,海量碎片的 数字化复原需求迫切。 御窑 博物馆 古陶瓷基因库页面 古陶瓷基因库库房 我们期待您:构建纯自动化、非接触式3D碎片虚拟复原平台,攻克断裂面智能识别、几何特征精准匹 配、全局拼 ...
2025年AI治理报告:回归现实主义
腾讯研究院· 2026-01-22 08:44
宏观格局: 发展优先,安全"软着陆" 2025年2月的巴黎"人工智能行动峰会"是一个标志性时刻,与两年前布莱切利峰会笼罩的"安全焦虑"不 同,巴黎峰会的关键词悄然变更为"创新"与"行动",这一变化折射出全球治理的底层逻辑重构。在这种 背景下,全球监管竞速出现了"逆转",过去被视为"监管高地"的区域开始主动寻求松绑。 欧盟的自我修正 。随着《AI法案》进入实施期,复杂的合规成本开始显现,为了挽救产业竞争力,欧 盟在2025年不得不推出"数字综合提案 (Digit al O mnibus) ",推迟高风险义务生效时间并试图简化规 则,这表明即便是最坚定的监管者也必须在发展现实面前低头。 美国的"去监管化" 。特朗普政府展现了鲜明的"美国优先"色彩,撤销了前任政府侧重安全的行政令, 转而通过《确保国家人工智能政策框架》限制各州分散立法,试图以统一的联邦规则为产业扫清障碍。 如果说前两年全球对AI的态度还夹杂着"末日恐惧",那么2025年,风向已彻底改变。全球AI治理正在经 历一场深刻的"去理想化"进程。面对技术与产业的双重压力,各主要经济体不约而同地调整了身位:治 理的重心从"防范假设性的末日风险",迅速转移到了" ...