Workflow
大模型研发
icon
Search documents
腾讯 AI Lab副主任俞栋离职,混元团队“新老交替”进行中|智能涌现独家
3 6 Ke· 2025-12-29 06:02
文|周鑫雨 编辑|苏建勋 《智能涌现》从多名独立信源处获悉,近日,出于个人发展原因,原腾讯 AI Lab副主任俞栋将从腾讯离职。 截至发稿前,腾讯官方暂未回复。 在腾讯期间,俞栋带领研究团队在多个顶级学术会议及期刊发表数百篇论文,也推动了NLP和语音、数字人相关技术在腾讯业务中的应用。 在腾讯大模型"混元"的研发中,俞栋也颇有贡献。"混元"团队隶属于腾讯技术工程事业群(TEG),横跨大数据、AI Lab、机器学习平台部等部门。在混 元的研发体系中,俞栋也负责了多模态生成和理解,以及部分文本研究工作。 在混元的人才体系搭建上,腾讯不敢有丝毫懈怠,即使有"老将"离职,但依然有新鲜血液的更替。 2025年以来,随着DeepSeek的掀桌,大厂之间迅速形成了一个共识:基础模型是核心竞争力,基模能力决定了AI应用的体验上限。近期,围绕大模型研 发这一重点,《智能涌现》曾独家报道,腾讯内部正在进行一系列调整: 一方面,腾讯引入新血,加大人才投入。2025年下半年,前OpenAI研究院姚顺雨加入腾讯,出任"CEO/总裁办公室"首席AI科学家等职务后,混元也快速 吸引了字节、阿里、月之暗面等企业的数位核心员工。 另一方面,腾 ...
腾讯升级大模型研发架构,新成立AI Infra、AI Data等部门
Xin Lang Cai Jing· 2025-12-17 08:54
责任编辑:何俊熹 新浪科技讯 12月17日下午消息,腾讯升级大模型研发架构,新成立AI Infra部、AI Data部、数据计算平 台部,全面强化其大模型的研发体系与核心能力。Vincesyao出任"CEO/总裁办公室"首席AI科学家,向 腾讯总裁刘炽平汇报;同时兼任AI Infra部、大语言模型部负责人,向技术工程事业群总裁卢山汇报。 作为腾讯大模型体系的重要一环,AI Infra部将负责大模型训练和推理平台技术能力建设,聚焦大模型 分布式训练、高性能推理服务等核心技术能力,构建大模型AI Infra核心竞争力,为大模型算法研发和 业务场景落地提供稳定高效的技术支持和服务。 架构升级后的AI Data部、数据计算平台部,将分别负责大模型数据及评测体系建设、大数据和机器学 习的数据智能融合平台建设工作。王迪继续担任大语言模型部副总经理,向Vincesyao汇报。刘煜宏担 任AI Data部负责人、陈鹏担任数据计算平台部负责人,均向公司副总裁蒋杰汇报。(罗宁) 架构升级后的AI Data部、数据计算平台部,将分别负责大模型数据及评测体系建设、大数据和机器学 习的数据智能融合平台建设工作。王迪继续担任大语言模型部 ...
突发!Anthropic全面封禁中国控股公司使用Claude:无论你在哪,都别想绕过!
菜鸟教程· 2025-09-05 07:04
Core Viewpoint - The new policy announced on September 5, 2025, restricts access to Claude services for Chinese companies and entities with significant Chinese capital, impacting their ability to develop competitive AI models [1][9]. Group 1: Policy Implications - The policy applies to mainland Chinese companies and overseas subsidiaries with over 50% Chinese ownership, as well as entities using Claude indirectly through cloud services or third-party platforms [1]. - The restrictions are not limited to direct users of Claude but also include companies that access the service indirectly, regardless of their registration location [9]. Group 2: Competitive Concerns - There are concerns that Chinese companies could use subsidiaries to access Claude for military or intelligence applications, potentially accelerating their own AI model development to compete with U.S. and allied tech firms [5]. - Anthropic has chosen to prioritize security over profit, advocating for stricter export controls and enhanced domestic infrastructure for AI development [6]. Group 3: Industry Impact - The sudden shutdown of Claude's API could halt ongoing projects for multinational businesses, prompting a shift towards developing domestic AI models and ensuring compliance and security [10]. - As external access becomes increasingly restricted, the focus shifts to developing indigenous solutions to maintain competitiveness in the AI landscape [11].
智谱 GLM-4.5 团队深夜爆料:上下文要扩、小模型在路上,还承诺尽快发新模型!
AI前线· 2025-08-29 08:25
Core Insights - The GLM-4.5 model focuses on expanding context length and improving its hallucination prevention capabilities through effective Reinforcement Learning from Human Feedback (RLHF) processes [6][10][11] - The future development will prioritize reasoning, programming, and agent capabilities, with plans to release smaller parameter models [6][50][28] Group 1: GLM-4.5 Development - The team behind GLM-4.5 includes key contributors who have worked on various significant AI projects, establishing a strong foundation for the model's development [3] - The choice of GQA over MLA in the architecture was made for performance considerations, with specific weight initialization techniques applied [12][6] - There is an ongoing effort to enhance the model's context length, with potential releases of smaller dense or mixture of experts (MoE) models in the future [9][28] Group 2: Model Performance and Features - GLM-4.5 has demonstrated superior performance in tasks that do not require long text generation compared to other models like Qwen 3 and Gemini 2.5 [9] - The model's effective RLHF process is credited for its strong performance in preventing hallucinations [11] - The team is exploring the integration of reasoning models and believes that both reasoning and non-reasoning models will coexist and complement each other in the long run [16][17] Group 3: Future Directions and Innovations - The company plans to focus on developing smaller MoE models and enhancing the capabilities of existing models to handle more complex tasks [28][50] - There is an emphasis on improving data engineering and the quality of training data, which is crucial for model performance [32][35] - The team is also considering the development of multimodal models, although current resources are primarily focused on text and vision [23][22] Group 4: Open Source vs. Closed Source Models - The company believes that open-source models are closing the performance gap with closed-source models, driven by advancements in resources and data availability [36][53] - The team acknowledges that while open-source models have made significant strides, they still face challenges in terms of computational and data resources compared to leading commercial models [36][53] Group 5: Technical Challenges and Solutions - The team is exploring various technical aspects, including efficient attention mechanisms and the potential for integrating image generation capabilities into language models [40][24] - There is a recognition of the importance of fine-tuning and optimizing the model's writing capabilities through improved tokenization and data processing techniques [42][41]