基础模型
Search documents
阿里Qwen技术负责人林俊旸:模型即产品,做模型就是在做产品
Xin Lang Cai Jing· 2026-01-11 02:40
在林俊旸看来,伴随着主动学习的发展,Agent将具备长时间托管式工作的能力,在执行通用任务的过 程中自行进化、决定行动路径,这对模型能力上限提出了极高要求,也意味着做基础模型本身就是在做 产品。 新浪科技讯 1月11日上午消息,在AGI-Next前沿峰会上,Qwen技术负责人林俊旸在谈及基础模型与 Agent的关系时指出,"模型即产品,今天做基础模型本身,其实也就是在做产品,研究人员也需要像产 品经理一样,把研究成果做成真实世界可用的系统。" 新浪科技讯 1月11日上午消息,在AGI-Next前沿峰会上,Qwen技术负责人林俊旸在谈及基础模型与 Agent的关系时指出,"模型即产品,今天做基础模型本身,其实也就是在做产品,研究人员也需要像产 品经理一样,把研究成果做成真实世界可用的系统。" 在林俊旸看来,伴随着主动学习的发展,Agent将具备长时间托管式工作的能力,在执行通用任务的过 程中自行进化、决定行动路径,这对模型能力上限提出了极高要求,也意味着做基础模型本身就是在做 产品。 他进一步指出,Agent其实可以走向虚拟世界和物理世界,所以有了具身推理(Embodied Reasoning)。他进一步指出, ...
腾讯 AI Lab副主任俞栋离职,混元团队“新老交替”进行中|智能涌现独家
3 6 Ke· 2025-12-29 06:02
文|周鑫雨 编辑|苏建勋 《智能涌现》从多名独立信源处获悉,近日,出于个人发展原因,原腾讯 AI Lab副主任俞栋将从腾讯离职。 截至发稿前,腾讯官方暂未回复。 在腾讯期间,俞栋带领研究团队在多个顶级学术会议及期刊发表数百篇论文,也推动了NLP和语音、数字人相关技术在腾讯业务中的应用。 在腾讯大模型"混元"的研发中,俞栋也颇有贡献。"混元"团队隶属于腾讯技术工程事业群(TEG),横跨大数据、AI Lab、机器学习平台部等部门。在混 元的研发体系中,俞栋也负责了多模态生成和理解,以及部分文本研究工作。 在混元的人才体系搭建上,腾讯不敢有丝毫懈怠,即使有"老将"离职,但依然有新鲜血液的更替。 2025年以来,随着DeepSeek的掀桌,大厂之间迅速形成了一个共识:基础模型是核心竞争力,基模能力决定了AI应用的体验上限。近期,围绕大模型研 发这一重点,《智能涌现》曾独家报道,腾讯内部正在进行一系列调整: 一方面,腾讯引入新血,加大人才投入。2025年下半年,前OpenAI研究院姚顺雨加入腾讯,出任"CEO/总裁办公室"首席AI科学家等职务后,混元也快速 吸引了字节、阿里、月之暗面等企业的数位核心员工。 另一方面,腾 ...
A16z 4100万美元领投Mirelo,重磅押注欧洲音频大模型
深思SenseAI· 2025-12-27 01:11
Seb Johnson : 大家好,欢迎回到《Scaling Europe》节目。我是 Seb Johnson。我和 CJ Simon-Gabriel 一起在这里。CJ 是 Mirelo AI 的联合创始人 之一。 Mirelo AI 刚刚宣布了一个非常夸张的 4100 万美元种子轮 ,由 A16z 和 Index Ventures领投。 这是一笔很大的融资,而且由一些真正的顶级 VC 领投。我觉得特别有意思的是,你们是在欧洲做一个" 基础模型 "。所以对那些不了解的人,你能不能先快 速介绍一下 Mirelo AI ? CJ: 谢谢你邀请我。 我们主要聚焦在为 视频内容和游戏 做"音频"。所以我们现在做的主要是 音乐和 音效 。 我们的想法其实很简单, 你把你的视频给我,我们告诉你"哪里该用什么声音",并且把音频生成出来 。你可以生成音效,也可以加上音乐。 Seb Johnson: 你为什么决定做这个业务? 过去一年,AI 视频生成在模型能力与产品形态上快速迭代,视频产出的边际成本持续下降,生成速度与可控性显著提升。今天不少 AI 创作者都经历过:画面 几分钟出片,真正让人头大的,是后面的音效、配乐、节奏、氛 ...
宇信科技韩冬:AI技术发展的突然加速,DeepSeek的发布让他“没过好年”
Xin Lang Cai Jing· 2025-12-09 08:19
Core Insights - The "2025 China Enterprise Competitiveness Conference" was held in Beijing on December 9-10, where Han Dong, Vice President of Yuxin Technology, discussed the rapid acceleration of AI technology in 2024 and 2025, particularly highlighting the release of DeepSeek during the 2025 Spring Festival, which impacted his year-end planning as a digital transformation leader in a listed company [5]. Group 1 - AI technology is currently experiencing a trough phase in its lifecycle, particularly for generative AI and foundational models, which presents strategic opportunities for companies to position themselves effectively [5]. - The market sentiment has shifted from previous enthusiasm for models to a more pragmatic approach focused on practical implementation, with financial institutions, including large banks, reassessing the value of AI technology [5]. - The readiness of data infrastructure and AI data capabilities has rapidly advanced, moving from the nascent stage to near the expected peak, becoming a critical foundation for the successful deployment of AI technology [5].
博世最新一篇长达41页的自动驾驶轨迹规划综述
自动驾驶之心· 2025-12-05 00:03
Core Insights - The article discusses the advancements and applications of foundation models (FMs) in trajectory planning for autonomous driving, highlighting their potential to enhance understanding and decision-making in complex driving scenarios [4][5][11]. Background Overview - Foundation models are large-scale models that learn representations from vast amounts of data, applicable to various downstream tasks, including language and vision [4]. - The study emphasizes the importance of FMs in the autonomous driving sector, particularly in trajectory planning, which is deemed the core task of driving [8][11]. Research Contributions - A classification system for methods utilizing FMs in autonomous driving trajectory planning is proposed, analyzing 37 existing methods to provide a structured understanding of the field [11][12]. - The research evaluates the performance of these methods in terms of code and data openness, offering practical references for reproducibility and reusability [12]. Methodological Insights - The article categorizes methods into two main types: FMs customized for trajectory planning and FMs that guide trajectory planning [16][19]. - Customized FMs leverage pre-trained models, adapting them for specific driving tasks, while guiding FMs enhance existing trajectory planning models through knowledge transfer [19][20]. Application of Foundation Models - FMs can enhance trajectory planning capabilities through various approaches, including fine-tuning existing models, utilizing chain-of-thought reasoning, and enabling language and action interactions [9][19]. - The study identifies 22 methods focused on customizing FMs for trajectory planning, detailing their functionalities and the importance of prompt design in model performance [20][32]. Challenges and Future Directions - The article outlines key challenges in deploying FMs in autonomous driving, such as reasoning costs, model size, and the need for suitable datasets for fine-tuning [5][12]. - Future research directions include addressing the efficiency, robustness, and transferability of models from simulation to real-world applications [12][14]. Comparative Analysis - The study contrasts its findings with existing literature, noting that while previous reviews cover various aspects of autonomous driving, this research specifically focuses on the application of FMs in trajectory planning [13][14]. Data and Model Design - The article discusses the importance of data curation for training FMs, emphasizing the need for structured datasets that include sensor data and trajectory pairs [24][28]. - It also highlights different model design strategies, including the use of existing visual language models and the combination of visual encoders with large language models [27][29]. Language and Action Interaction - The research explores models that incorporate language interaction capabilities, detailing how these models utilize visual question-answering datasets to enhance driving performance [38][39]. - It emphasizes the significance of training datasets and evaluation metrics in assessing the effectiveness of language interaction in trajectory planning [39][41].
IJRR北邮首篇,联合三星中国研究院、清华大学等共同探讨“机器人操作大模型”
机器人大讲堂· 2025-11-24 08:31
实现电影"I,Robot"中的通用机器人是机器人研究学者一直追求的目标。然而,在非结构化场景中实现机器人 的通用操作仍然是有挑战的。基于学习的方法被认为是实现通用操作的有效路径,但是仍然存在1) 和人类非 自然交互 2) 数据稀缺 3)有限的感知能力 4)有限的决策能力 5)不准确的事前事后处理 6)不够鲁棒的策 略 7)环境转移性差等挑战。 近 期北京邮电大学方斌教授团队联合三星中国研究院、清华大学孙富春教授、刘华平教授以及德国汉堡大学 张建伟院士等发表在International Journal of Robotics Research的文章"What Foundation Models can Bring for Robot Learning in Manipulation : A Survey",探讨了基础模型如何赋能机器人智能操作。 https://journals.sagepub.com/eprint/NHMPYHAYJ6SUVQYSUWZI/full 基础模型的出现点燃了研究学者们解决上述问题的希望: 1)LLMs能够直接生成策略代码或动作序列,并促进 机器人与环境的自然交互。2)VFMs增强了 ...
中外专家共探AI技术前沿与产业赋能
Xin Lang Cai Jing· 2025-11-21 07:23
Core Insights - The fifth Intelligent Computing Innovation Forum was held in Hangzhou, focusing on the theme "Computing Relies on Intelligence, Computing for Intelligence," attracting international experts to discuss advancements in AI technologies and their applications across various scientific fields [1] Group 1: AI Model Development - Scientists are exploring the potential of AI in solving scientific problems, emphasizing that current large language models have not yet reached human-level reasoning capabilities [2] - The development of scientific foundational models requires collaboration with scientists to effectively tokenize and train diverse scientific data, addressing complex interdisciplinary issues [2] - The learning paradigm of foundational models is evolving through imitation learning, reinforcement learning, and autonomous learning, with a shift towards task processing applications [2] Group 2: Efficiency and Resource Consumption - The efficiency of foundational models is critical for large-scale AI application deployment, with a noted exponential increase in token consumption correlating with model capability improvements [3] - The cost of generating tokens decreases with higher reasoning efficiency, necessitating collaborative optimization across the industry to enhance model performance [3] Group 3: Practical Applications and Collaboration - The application of intelligent systems in dynamic environments is gaining attention, highlighting the importance of responsive robotics [4] - China is recognized for its leading capabilities in intelligent manufacturing, serving as an excellent testing ground for new technology applications [4] - There is a call for scientists worldwide to establish collaborative networks to enhance research outcomes and create new possibilities through cooperation [4]
刘德兵说上限,刘知远讲拐点:中国AI十年剧本被他们提前揭开了
3 6 Ke· 2025-11-20 09:57
他把当前在未来十年的阶段性,形容为"即将进入到人工智能革命高潮的前夜"。 在中关村举办的2025人工智能+大会,中国AI未来十年的关键"进度条"正在变得清晰。 大会间隙,人工智能百人会高级顾问——智谱董事长刘德兵与面壁智能联合创始人兼首席科学家、清华大学副教授刘知远接受了智东西的独家 采访。两位长期深耕一线的实践者,从基础模型到智能体演进,分享了他们对未来十年的观察与思考。 在谈到基础模型竞争时,刘德兵并不回避现实:在开源成为主流、结果可公开验证的当下,模型能力的差距会被迅速放大——"在一线开源模 型做到90分的情况下,再训一个85分的模型就没多少竞争力。" 他同时强调,坚持做难而正确的事情很重要,哪怕投入巨大,因为"基础模型决定了整个AI产业发展的上限"。他认为,未来的关键变量将更 多来自开源生态的成熟、行业场景的深度落地,以及AI逐渐成为"全民能力"所带来的广泛参与。 在刘知远看来,2025年的一个显著拐点是"AI+编程",这一能力正在成为软件生产力的重要支撑。 对于大模型如何迈向智能体,他强调的不是堆叠更多知识,而是让模型具备"在指定工作岗位上自主学习的成长能力",像大学毕业生一样,通 过真实任务的反馈 ...
中泰证券:Gemini 3 Pro能力全方位跃升 开创Agent平台新格局
Zhi Tong Cai Jing· 2025-11-20 08:01
Core Insights - The release of Gemini 3 by Google demonstrates significant advancements in AI model capabilities, indicating that the progress in model intelligence has not yet reached its ceiling [1][2] - The report suggests focusing on companies with strong fundamentals in the foundational computing layer, model layer, and B-end vendors that deeply integrate services into business processes [1] Investment Events - Google officially launched the Gemini 3 series, including the Gemini 3 Pro model, on November 18, 2025, achieving state-of-the-art (SOTA) performance across multiple evaluation dimensions [1] Performance Metrics - Gemini 3 Pro scored 37.5% in the Humanity's Last Exam, surpassing GPT-5.1 (26.5%) and Claude Sonnet 4.5 (13.7%), showcasing doctoral-level reasoning capabilities [2] - In the MathArena Apex test, Gemini 3 Pro achieved a score of 23.4%, significantly outperforming GPT-5.1 (1.0%) and Claude Sonnet 4.5 (1.6%), indicating a leap in deep reasoning abilities [2] Multi-Modal Architecture and User Interface - Gemini 3 Pro continues the original multi-modal architecture and introduces a Generative User Interface (Generative UI) that allows for customized interactive responses based on user prompts [3] - Google launched the Antigravity platform for AI agent development, enabling developers to utilize models like Gemini 3 Pro and Claude Sonnet 4.5 for free, enhancing programming efficiency through autonomous task execution [3] Search Enhancements - Google has upgraded its search capabilities with Gemini 3, improving query fan-out technology to enhance search efficiency and user experience through interactive tools and dynamic visual presentations [4] Ecosystem Trends - The report highlights a trend of major foundational model companies building comprehensive ecosystems, with firms like OpenAI, Anthropic, and Google transitioning from model providers to platform developers [5] - In coding scenarios, tools like Antigravity and Anthropic's Claude Code are being integrated into foundational models, blurring the lines between standalone SaaS products and model modules [5]
OmniDexGrasp 揭秘:基础模型 + 力反馈,让机器人 “看懂指令、灵活抓握” 的通用方案
具身智能之心· 2025-10-31 00:04
Core Insights - The article discusses the OmniDexGrasp framework, which addresses the challenges of dexterous grasping in robotics by combining foundation models with force feedback control to achieve generalizable and physically feasible grasping [1][2][21]. Group 1: Challenges in Dexterous Grasping - Current dexterous grasping solutions face a dilemma between data-driven approaches, which struggle with generalization due to limited datasets, and foundation models, which often fail to translate abstract knowledge into physical actions [2]. - The core issue is the inability to balance generalization and physical feasibility, leading to failures in grasping new objects or in complex scenarios [2]. Group 2: OmniDexGrasp Framework - OmniDexGrasp employs a three-stage approach: generating human grasping images, action transfer to robots, and force feedback control, effectively bridging the gap between abstract knowledge and physical execution [4][21]. - The framework retains the generalization capabilities of foundation models while ensuring physical feasibility through precise action transformation and control strategies [4]. Group 3: Key Modules of OmniDexGrasp - **Module 1**: Generates human grasping images to help robots understand how to grasp objects, utilizing a variety of input designs to accommodate different user needs [6][8]. - **Module 2**: Translates human grasping images into robot actions, addressing the challenge of aligning human intent with robotic capabilities through a three-step transfer strategy [9][12]. - **Module 3**: Implements force feedback control to ensure stable and safe grasping, adapting to the physical properties of objects and preventing damage during the grasping process [12][13]. Group 4: Experimental Results - OmniDexGrasp demonstrated an average success rate of 87.9% across six core grasping tasks, significantly outperforming traditional methods [15]. - In comparative tests, OmniDexGrasp showed superior generalization, especially with new objects, achieving success rates that far exceeded those of existing solutions [16][18]. Group 5: Future Directions - The framework suggests future enhancements through multi-modal observation integration and deeper control task development, aiming for end-to-end general manipulation capabilities [22]. - The potential for OmniDexGrasp to extend beyond grasping to broader manipulation tasks is highlighted, indicating its versatility in robotic applications [20].