混元2.0
Search documents
腾讯升级大模型研发架构 引入前OpenAI研究员姚顺雨任要职
Xin Lang Cai Jing· 2025-12-17 13:57
智通财经记者获悉,混元旗舰模型TurboS是业界首个基于混合线性注意力机制落地的超大规模MoE模 型,自年初发布后保持着每月一个版本的迭代速度。12月5日,最新旗舰模型混元2.0推出;今日,腾讯 混元发布世界模型1.5,该模型系国内首个开放体验的实时世界模型。 实际上,不仅腾讯AI提速,几家大厂均是AI战略决心明确,年内重磅动作频频。 例如,本月推出的字节跳动旗下豆包手机颇受外界关注,明日上午火山引擎FORCE原动力大会召开, 预计将发布豆包大模型新品。 据悉,AI Infra部将负责大模型训练和推理平台技术能力建设,聚焦大模型分布式训练、高性能推理服 务等核心技术能力,构建大模型AI Infra核心竞争力,为大模型算法研发和业务场景落地提供技术支持 和服务。架构升级后的AI Data部、数据计算平台部,将分别负责大模型数据及评测体系建设、大数据 和机器学习的数据智能融合平台建设工作。王迪继续担任大语言模型部副总经理,向Vinces Yao汇报; 刘煜宏担任AI Data部负责人、陈鹏担任数据计算平台部负责人,均向公司副总裁蒋杰汇报。 腾讯方面表示,此次大模型研发架构升级,在进一步强化腾讯工程化优势的同时,旨 ...
腾讯(00700)AI,悄然加速
智通财经网· 2025-12-17 13:16
2025年的中国AI战场,已从模型参数之战演变为一场关于资本效率、基础设施与流量入口的综合博 弈。 据华尔街见闻了解,当前国内已形成一场路线迥异的三国杀,阿里选择重资产投入、字节则选择了流量 突围,腾讯选择'内生验证+外溢赋能',先在内部生态中完成打磨,再以接插件模式向用户输出成熟能 力。 在外界看来,这种策略似乎过于克制,甚至有些佛系。但最近,腾讯的AI正在悄然加速。 12月17日,据华尔街见闻消息,腾讯内部发布公告,宣布升级大模型研发架构,正式成立AI Infra部、 AI Data部、数据计算平台部。作为腾讯大模型体系的重要一环,新成立的AI Infra部将作为腾讯大模型 体系的底座,负责大模型分布式训练、高性能推理服务等核心技术。这意味着,腾讯正在从组织架构层 面全面强化其大模型的研发体系。 同时,公告显示,Vincesyao出任"CEO/总裁办公室"首席AI科学家,直接向腾讯总裁刘炽平汇报;同时兼 任AI Infra部、大语言模型部负责人,向腾讯技术工程事业群总裁卢山汇报。 这种汇报关系的扁平化,背后是腾讯提升AI研发效率,抢滩AI布局的明确信号。这一动作与谷歌异曲 同工。此前,谷歌也宣布将 AIS ...
出自“清华姚班”的姚顺雨带队,腾讯升级大模型研发架构
Nan Fang Du Shi Bao· 2025-12-17 12:09
此前有消息称,OpenAI著名研究者姚顺雨已经加入了腾讯混元大模型团队,他将在混元组建一支自己 领导的研究团队,年薪达上亿元。当时腾讯在其官方公众号辟谣称该消息为假消息。 作为AI界的顶尖人才,姚顺雨身上有不少"天才"叙事。资料显示,姚顺雨毕业于著名的"清华姚班",之 后在普林斯顿大学进修计算机科学博士。其间,"27岁入选MIT TR35"的光环,以及他作为清华大学学 生说唱社联合创始人的经历,常被外界津津乐道。 2024年8月,姚顺雨加入了OpenAI。在OpenAI期间,他担任研究科学家,专注于将大型语言模型从理论 研究推向实际应用,特别是AI Agent的开发。据悉,其主导开发了OpenAI 首个发布的智能体模型及产 品,同时参与了Deep Research项目。 腾讯混元是腾讯AI研发的核心团队,任何顶尖人才的加入都可能被视为其加强AI实力的信号。此前, 姚顺雨曾提出"AI下半场" 概念,即从追求模型规模到定义有用任务,引发业界共鸣。他的职业选择无 疑也被视为对AI未来发展方向的一种预示。 今年 5月姚顺雨在一次对谈中被问到,如果自己主管微信,会如何在微信中做agent。姚顺雨表示自己会 先观望和探索 ...
海外科技行业 2025 年第 46 期:GPT-5.2再提模型能力上限,阿里发力C端入口建设
GUOTAI HAITONG SECURITIES· 2025-12-13 12:52
Investment Rating - The report maintains an "Overweight" rating for the industry [1] Core Insights - The release of GPT-5.2 focuses on enhancing capabilities in professional knowledge work and enterprise applications, achieving a score of 70.9% in the GDPval test, significantly outperforming previous models [9][10] - Alibaba has established a C-end business unit to create a super app for AI, integrating various services to enhance user engagement with AI technology [10] - The Trump administration has approved the export of H200 AI chips to China, which will enhance domestic training capabilities, although the performance of H200 is relatively behind newer models [11][14] Summary by Sections Investment Recommendations - The report recommends focusing on AI computing, cloud vendors, AI applications, and AI social networking sectors [6][29] Industry News - The report highlights significant developments in the AI sector, including the release of new models by Tencent and Meituan, and advancements in AI semiconductor technology by Broadcom and Oracle [25][26][28] Market Performance - The report provides a review of market performance, noting fluctuations in major indices and specific stock performances within the tech sector [15][19]
腾讯研究院AI每周关键词Top50
腾讯研究院· 2025-12-13 02:33
Group 1: Key Trends in AI Industry - The article highlights the top 50 keywords in AI, showcasing significant developments and trends in the industry [2][3] - Major companies like NVIDIA, Google, and Meta are leading advancements in AI technologies, particularly in chip development and model architecture [3][4] Group 2: Chip Developments - NVIDIA's H200 export and new GPU architecture are pivotal in enhancing computational capabilities [3] - The CUDA Toolkit 13.1 is a significant release that supports developers in optimizing AI applications [3] Group 3: Model Innovations - Google introduced the Titans architecture and deep thinking models, indicating a focus on improving AI reasoning capabilities [3] - New models such as GLM-4.6V by Zhiyuan and LongCat-Image by Meituan reflect the ongoing innovation in AI model development [3] Group 4: AI Applications - Companies are integrating AI into various applications, including AI wearable devices by Meta and AI interviewers by Anthropic, showcasing the practical use of AI in everyday scenarios [3][4] - The introduction of tools like VibeVoice by Microsoft and Qwen3-TTS by Alibaba demonstrates the expanding role of AI in enhancing user experiences [3][4] Group 5: Industry Events and Perspectives - Events such as talent loss at Apple and red alerts at Microsoft highlight challenges faced by major tech companies in the AI landscape [4] - Various perspectives from industry leaders, including Yann LeCun and Andrew Ng, discuss the current state and future opportunities in AI applications [4]
人形机器人上岗忙,大模型迭代不停:12.1-12.7 核心动态
Sou Hu Cai Jing· 2025-12-10 09:45
1. 特斯拉Optimus展示慢跑能力:Optimus官方发布实验室慢跑视频,速度达2-2.5m/s,稳定性显著提升,同步展示多机充电场景,强化2026年量产预期。 12.1-12.7,全球 AI 领域模型迭代与技术突破并行,机器人产业政策支持、场景落地与出海加速共振,核心动态亮点纷呈。 5. 曹操出行与人形机器人入驻Robotaxi场景:曹操出行与越疆科技合作,越疆Atom人形机器人入驻杭州"绿色智能通行岛",完成导览、无人运维等多场景验 证。 6. 英伟达开源自动驾驶核心模型Alpamayo - R1:开放模型与数据集,提升自动驾驶AI推理效率与泛化能力,降低行业研发门槛,推动技术生态共建。 7. 杭州AI交警机器人"杭行1号"上岗:滨江区滨盛路长河路口正式投入"实习",执行交通疏导、信息交互等任务,助力智慧交管落地。 8. Mistral AI发布Mistral 3系列:欧洲AI公司推出新一代开放模型,全线采用Apache 2.0许可证,兼顾性能与开源生态,强化欧洲AI竞争力。 9. 优艾智合发布双臂协作巡操机器人"钧仪":全球首款,已在南方电网220千伏变电站高压室执行高危巡检,提升电力运维自动化与安 ...
腾讯研究院AI速递 20251209
腾讯研究院· 2025-12-08 16:01
Group 1: Microsoft VibeVoice-Realtime-0.5B - Microsoft has open-sourced the lightweight real-time TTS model VibeVoice-Realtime-0.5B, achieving a first package latency of only 300 milliseconds and gaining 12.3K stars within 12 hours of release [1] - The model utilizes an interleaved window architecture for smooth reading of long texts, supporting up to 4 characters in natural dialogue, with emotional recognition and expression capabilities, and a long-term context memory of up to 90 minutes [1] - It supports both Chinese and English speech generation, with a typo rate of approximately 2% on the LibriSpeech and SEED TTS test sets, and speaker similarity reaching above 0.65, making it suitable for AI assistants, meeting notes, and podcast generation [1] Group 2: Zhiyuan GLM-4.6V - Zhiyuan has officially launched and open-sourced the GLM-4.6V series multimodal large models, including the 106B-A12B base version and the 9B lightweight version Flash, with a context window increased to 128k tokens, reducing costs by 50% compared to GLM-4.5V [2] - The model architecture integrates Function Call capabilities natively into the visual model, enabling a seamless link from visual perception to executable actions [2] - The 9B version outperforms Qwen3-VL-8B, while the 106B parameter version competes with Qwen3-VL-235B, which has double the parameters, supporting applications such as mixed text and image layouts, visual shopping, and front-end replication [2] Group 3: Keling O1 Features - Keling O1 has introduced the "Subject Library" feature, allowing users to upload multi-angle reference images to create custom characters, props, and scenes, supporting up to 7 subjects in video O1 and 10 subjects in image O1 [3] - A new AI image completion feature can automatically expand more perspectives and intelligently generate subject descriptions based on a primary reference image, continuously updating with a vast official subject library [3] - The "Comparison Template" feature enables one-click integration of multimodal creation, allowing efficient side-by-side comparison of all inputs and final products, enhancing the potential for viral content [3] Group 4: Meituan LongCat-Image Model - Meituan's LongCat team has released and open-sourced the 6B parameter LongCat-Image model, achieving open-source SOTA levels in image editing benchmark tests such as ImgEdit-Bench (4.50) and GEdit-Bench (7.60/7.64) [4] - The model employs a unified architecture design for text-to-image and image editing, utilizing a progressive learning strategy, and has achieved a score of 90.7 in Chinese text generation, significantly leading in the evaluation of 8105 common Chinese characters [4] - The comprehensive open-source model includes multi-stage text-to-image and image editing capabilities, with strong competitive performance in GenEval (0.87) and DPG-Bench (86.8) [4] Group 5: Tencent HY 2.0 and DeepSeek V3.2 - Tencent has officially launched its self-developed large model HY 2.0, featuring a total parameter count of 406B (with 32B active parameters) and supporting a 256K ultra-long context window, placing it at the forefront of industry capabilities [6] - DeepSeek V3.2 has been integrated into Tencent's ecosystem, focusing on enhancing reasoning performance and long text generation quality, achieving capabilities comparable to GPT-5 in public reasoning evaluations, slightly below Gemini-3 Pro [6] - Both models have been deployed in Tencent's native applications such as Yuanbao and ima, with Tencent Cloud opening API and platform services, and various products like QQ Browser and Sogou Input Method gradually integrating these models [6] Group 6: Alibaba Qwen3-TTS - Alibaba's Tongyi team has released the new generation text-to-speech model Qwen3-TTS, offering 49 high-fidelity character voices, including distinct tones like "Mo Rabbit" (lively and cute) and "Cang Mingzi" (deep and wise) [7] - The model supports 10 languages (including Chinese, English, German, French, Spanish, Italian, Portuguese, Japanese, Korean, and Russian) and 9 Chinese dialects, preserving authentic intonation and regional accents [7] - In the MiniMax TTS multilingual test set, it outperformed competitors like MiniMax, ElevenLabs, and GPT-4o Audio Preview in average WER performance, with significant perceptual improvements in prosody control compared to the previous generation [7] Group 7: NVIDIA NVARC Model - NVIDIA's 4B small model NVARC topped the ARC-AGI 2 test with a score of 27.64%, surpassing GPT-5 Pro's score of 18.3%, with a task cost of only 20 cents, approximately 1/36 of GPT-5 Pro's cost per task [8] - The model employs a zero-pretraining deep learning approach, utilizing a large-scale synthesis of high-quality data (over 3.2 million enhanced samples) and fine-tuning techniques during testing for rapid adaptation to each question [8] - It simplifies puzzle understanding using a dialogue template with the Qwen3-4B small parameter model, leveraging the NeMo RL framework for supervised fine-tuning, moving complex reasoning to an offline synthesized data pipeline [8] Group 8: Pudu Robotics PUDU D5 Series - Pudu Robotics has launched the industry-level autonomous navigation quadruped robot PUDU D5 series, offering both wheeled and point-foot versions, equipped with NVIDIA Orin and RK3588 dual-chip architecture, achieving a total computing power of 275 TOPS [9] - The robot features a four-eye fisheye camera and dual 192-line LiDAR for centimeter-level precise positioning and environmental reconstruction, capable of carrying a load of 30 kilograms with a single charge range of 14 kilometers, and has an IP67 protection rating [9] - Utilizing a bionic wheeled-foot fusion system, it can reach speeds of up to 5 meters per second, with capabilities to climb slopes of 30° and navigate obstacles of 25 centimeters, suitable for various applications such as park inspections, material transportation, and guided distribution [9] Group 9: Karpathy's AI Prompting Strategy - Andrej Karpathy emphasizes that large language models should not be viewed as entities but as simulators, advising against using prompts like "What do you think?" as they imply a non-existent "you" [10] - He suggests more effective questioning strategies, such as "What kind of group of people is suitable for exploring the topic xyz? How would they respond?" to allow LLMs to guide or simulate multiple perspectives rather than being limited to a single AI persona [11] - Karpathy highlights that the "you" in models is deliberately designed and engineered, constructed through SFT and RLHF, and fundamentally remains a token simulation engine rather than an emergent "mind" built over time [11]
AI进化速递丨OpenAI最快将于下周二发布GPT-5.2
Di Yi Cai Jing· 2025-12-06 12:43
②Meta收购AI可穿戴公司Limitless;腾讯混元2.0上线。 ①OpenAI最快将于下周二发布GPT-5.2; ②Meta收购AI可穿戴公司Limitless; ③腾讯混元2.0上线; ④我国首个乳腺病理垂直大模型在天津发布。 ...
豆包AI助手"理想丰满现实骨感"?大摩:手机大厂更倾向自研,要落地很困难
硬AI· 2025-12-02 09:07
Core Viewpoint - Morgan Stanley expresses skepticism about the practical implementation of the Doubao AI assistant, despite its impressive demonstration of features, and maintains a positive outlook on "super apps" like WeChat, Taobao, and Meituan [2][3][4]. Group 1: Challenges in Implementation - The Doubao AI assistant requires deep system-level integration, necessitating modifications to the operating system, which directly impacts the core interests of smartphone manufacturers (OEMs) [4][6]. - The successful implementation and promotion of the Doubao AI assistant depend on extensive technical collaboration and commercial negotiations with various smartphone OEMs, which poses significant challenges [7][11]. Group 2: Competitive Landscape - Major hardware players, including Apple, Huawei, and Xiaomi, are likely to develop their own AI assistants rather than collaborate with ByteDance, leaving limited options for partnerships with Doubao [10][11]. - The competitive environment in the Chinese market presents high entry barriers for Doubao to establish a broad hardware ecosystem [11][12]. Group 3: Investment Strategy - Given the difficulties in hardware breakthroughs, Morgan Stanley recommends investing in software application giants with substantial traffic and use cases, asserting that the dominance of "super apps" remains unchallenged [13][14]. - The report reiterates "overweight" ratings for Tencent, Alibaba, and Meitu, providing specific rationales for each: - Tencent is viewed as the best AI application proxy in China, with plans to launch its next-generation AI model, Hunyuan 2.0 [14]. - Alibaba is identified as the best AI infrastructure stock, with accelerating cloud revenue growth expected [14]. - Meitu is recognized as a beneficiary of AI multimodal capabilities, particularly in its "last mile" service capabilities that general AI assistants cannot fully replace [14].
豆包AI助手"理想丰满现实骨感"?大摩:手机大厂更倾向自研,要落地很困难
美股IPO· 2025-12-02 05:02
Core Viewpoint - Morgan Stanley expresses skepticism about the implementation of Doubao AI assistant, despite its impressive demonstration of a rich functional ecosystem, emphasizing a preference for "super apps" like WeChat, Taobao, and Meituan [3][6][10] Group 1: Implementation Challenges - The demonstration of Doubao AI assistant showcased impressive "multimodal" and "agent" capabilities, but transitioning from demonstration to mass production poses significant challenges [7] - Deep system-level integration requires modifications to the operating system, directly impacting the core interests of smartphone manufacturers (OEMs) [5][8] - Major smartphone manufacturers are likely to develop their own AI assistants rather than collaborate with ByteDance, limiting the potential OEM partners for Doubao [8][9] Group 2: Market Dynamics - The reality is that major hardware players will not easily relinquish control, as companies like Apple, Huawei, and Xiaomi prefer to maintain their technological independence [8] - ByteDance has indicated it does not plan to develop its own smartphones but is exploring potential collaborations with various manufacturers, raising questions about the feasibility of this business model [9] Group 3: Investment Strategy - Given the difficulties in breaking through at the hardware level, Morgan Stanley recommends investing in software application giants with substantial traffic and scenarios [10] - The firm maintains a positive outlook on "super apps" in China, asserting their positions are unlikely to be undermined by system-level AI like Doubao [10] - Morgan Stanley reiterates "overweight" ratings for Tencent, Alibaba, and Meitu, providing specific rationales for each [11][12]