大语言模型
Search documents
李建忠:大模型技术创新驱动的 AI 生态和应用演进
AI科技大本营· 2025-04-24 03:39
【导读】历经八年 AI 浪潮,从感知到生成,再到智能体时代,人工智能正以惊人速度演进。CSDN 高级副总裁、Boolan 首席技术专家李建忠,在 2025 全 球机器学习技术大会上,绘制了一幅宏大的 AI 发展蓝图,并创造性地将其与生物智能演化史进行对比,揭示了"语言"在智能跃迁中的核心地位。跟随李建 忠的思考,洞见 AI 的过去、现在与激动人心的未来。 作者 | 李建忠 出品丨AI 科技大本营(ID:rgznai100) 大家好!回想起我在 2017 年创办全球机器学习技术大会( ML-Summit ),在各位的支持下一起陪着 AI 一路走了八个年头,非常感慨。八年来,整个 人工智能领域也发生了波澜壮阔的变化。接下来我想和大家分享一下我对大模型最新发展的一些研究和思考。 我把 AI 的发展阶段和地球上从生物智能到人类智能的发展阶段做了一个对比,发现一些非常有意思的规律。大家首先来看 AI 发展的四个阶段。 第一阶段: 1940 年代开启人工智能的元年, 整个人工智能从 1940 年代图灵提出计算机理论模型和神经网络的初始构想,到 1956 年达特茅斯会议首 次提出人工智能,此后人工智能进入符号主义、行为主义 ...
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
Core Viewpoint - The article discusses the current state of AI agents, indicating that most are still in the "pilot purgatory" phase and have not yet transitioned to real-world applications, despite expectations for 2025 to be the "year of AI agents" [1][2]. Group 1: Current State of AI Agents - A survey on social platform X reveals that 64.2% of AI agents are stuck in pilot purgatory, while only 6.4% are smarter than the hype [2]. - The article highlights the need for advancements in AI systems to enhance their stability and reliability in enterprise applications [2]. Group 2: Introduction of RAGEN - A new system called RAGEN, developed by a team including researchers from Northwestern University, Microsoft, Stanford University, and the University of Washington, aims to improve AI agents' performance in real-world scenarios [2][5]. - RAGEN focuses on multi-turn interaction scenarios, requiring agents to reason under uncertainty and remember historical dialogues [5]. Group 3: StarPO Framework - RAGEN is built on a custom reinforcement learning framework named StarPO, which emphasizes learning through experience rather than rote memorization [5][7]. - The StarPO framework consists of two alternating phases: rollout, where the LLM generates complete interaction sequences, and update, where the model updates parameters based on normalized cumulative rewards [7]. Group 4: Training Challenges and Solutions - The article discusses the "Echo Trap" phenomenon, where agents generate repetitive responses due to early high rewards, leading to a decline in reasoning ability [12]. - To address training stability, the enhanced version StarPO-S introduces three key mechanisms: uncertainty-based rollout filtering, removal of KL penalty, and asymmetric PPO clipping [19]. Group 5: Evaluation Environments - RAGEN includes three symbolic testing environments to evaluate decision-making capabilities: Bandit, Sokoban, and Frozen Lake, each designed to assess different aspects of agent performance [15][17]. - These environments aim to minimize prior knowledge interference, allowing agents to rely solely on learned strategies for decision-making [15]. Group 6: Future Implications - RAGEN represents a significant step towards developing AI agents with autonomous reasoning capabilities, although challenges remain in applying these methods to real-world business processes [24]. - The article emphasizes the importance of optimizing reward mechanisms to focus on the quality of reasoning processes, not just the correctness of outcomes [24].
云南探索创建智能执法大模型
Zhong Guo Huan Jing Bao· 2025-04-24 01:35
Core Viewpoint - Yunnan Province is launching an innovative application project for ecological environment intelligent supervision based on a large language model to address issues such as a shortage of enforcement personnel, broad regulatory scope, and high regulatory difficulty [1] Group 1: Data Foundation - The model relies on the Yunnan Provincial Ecological Environment Data Resource Center's extensive data resources, integrating over 500 legal documents, 100 typical enforcement cases, and more than 60,000 pollution source data entries to create a high-quality corpus for training the large language model [2] - The corpus includes ecological environment-related laws, standards, technical guidelines, and enforcement cases, ensuring that enforcement personnel can generate answers based on the most accurate and comprehensive information [2] Group 2: Intelligent Applications - The model has developed four intelligent assistants: Environmental Knowledge Assistant, Pollution Source Assistant, Violation Behavior Analysis Assistant, and Pollution Source Statistical Analysis Assistant, significantly reducing time costs associated with traditional manual data retrieval and analysis [3] - Each assistant serves specific functions, such as providing intelligent Q&A based on legal regulations, offering quick references for on-site enforcement, generating penalty suggestions for violations, and automating statistical analysis to improve efficiency [3] Group 3: Future Directions - Yunnan Province plans to further activate ecological environment data potential and explore AI applications, gradually embedding AI technology into various ecological environment regulatory tasks to accelerate the digital transformation of ecological environment supervision [4]
新东方海外营收增速放缓,暂无开发大模型计划
Di Yi Cai Jing· 2025-04-23 13:50
海外备考和咨询业务增长放缓是受到宏观经济形势和国际关系变化的影响。 | (in thousands US$, except per ADS(1) data) | 3Q FY2025 | 31 | | --- | --- | --- | | Net revenues | 1,183,055 | | | Operating income | 124.519 | | | Non-GAAP operating income (2)(3) | 142,056 | | | Net income attributable to New Oriental | 87,255 | | | Non-GAAP net income attributable to New Oriental (2)(3) | 113,344 | | | Net income per ADS attributable to New Oriental - basic | 0.54 | | | Net income per ADS attributable to New Oriental - diluted | 0.54 | | | Non-GAAP net ...
AI动态汇总:openAI发布GPT-4.1,智谱发布GLM-4-32B-0414系列
China Post Securities· 2025-04-23 07:54
- GPT-4.1 significantly improved coding capabilities, achieving 54.6% in SWE-bench Verified tests, outperforming GPT-4o by 21.4% and GPT-4.5 by 26.6%[12][13][15] - GPT-4.1 demonstrated enhanced instruction-following ability, scoring 38.3% in Scale's MultiChallenge benchmark, a 10.5% improvement over GPT-4o[12][13][17] - GPT-4.1 achieved new SOTA in long-context understanding, scoring 72.0% in Video-MME benchmark, surpassing GPT-4o by 6.7%[12][13][22] - GLM-4-32B-0414 utilized 15T high-quality data for pretraining and applied reinforcement learning techniques to improve instruction-following, engineering code, and function-calling capabilities[26][28][30] - GLM-Z1-32B-0414 enhanced mathematical and logical reasoning through stack-sorting feedback reinforcement learning, significantly improving complex task-solving abilities[31][33] - GLM-Z1-Rumination-32B-0414 focused on deep reasoning and open-ended problem-solving, leveraging extended reinforcement learning and search tools[34] - Seed-Thinking-v1.5 adopted MoE architecture with 200B parameters, achieving 86.7% on AIME 2024 and 55.0% on Codeforces benchmarks, showcasing strong STEM and coding reasoning capabilities[35][37][41] - Seed-Thinking-v1.5 employed dual-track reward mechanisms for training, combining verifiable and non-verifiable data strategies to optimize model outputs[36][38][40] - GPT-o3/o4-mini introduced visual reasoning into the chain of thought (CoT), achieving 96.3% accuracy in V* benchmark, marking a major breakthrough in multimodal reasoning[42][46][48] - Video-R1 model applied T-GRPO algorithm to incorporate temporal reasoning in video tasks, achieving 35.8% accuracy in VSI-Bench, surpassing GPT-4o[63][65][68] - Pangu Ultra, a dense model with 135B parameters, achieved top performance in most English and all Chinese benchmarks, rivaling larger MoE models like DeepSeek-R1[69][73][74]
Agent、DeepSeek、多模态热点炸场!60+重磅嘉宾共探AI未来,2025全球机器学习技术大会完美收官!
AI科技大本营· 2025-04-21 10:24
以下文章来源于CSDN ,作者CSDN CSDN . 成就一亿技术人 作者 | 《新程序员》编辑部 出品 | CSDN(ID:CSDNnews) 在万物向 "智 " 生长的 2025 年,AI 领域的热潮持续升温,正引领着技术革新与产业探索的新浪潮。 了新的破解思路?围绕这些关键问题,欢迎回看大会首日视频,看众多技术大咖如何从理论、算法到实际应用层面进行了深度剖析 ,以此 了解 AI 技术 的更多最新进展: 大模型技术创新驱动的 AI 生态和应用演进 李建忠 CSDN 高级副总裁、 Boolan 首席技术专家 4 月 18-19 日,由 CSDN 联合高端 IT 咨询与教育平台 Boolan 举办的 2025 全球机器学习技术大会(ML-Summit 2025),在上海虹桥西郊庄园丽笙 大酒店隆重拉开帷幕。本次大会围绕 AI 最前沿的发展趋势与落地实践,聚焦大语言模型技术演进、AI 智能体、具身智能、DeepSeek 技术解析与行业 实践等 12 大专题,邀请了超 60 位来自全球顶尖科技企业与学术机构的重磅嘉宾齐聚一堂,全面呈现 AI 领域的技术风向与应用前沿。 在生成式 AI 重构技术边界的浪潮下,产业实 ...
类脑智能是AI新突破关键,上海全链条布局产业新赛道
Di Yi Cai Jing· 2025-04-19 05:49
中科院院士蒲慕明表示,我们要借鉴大脑的结构和计算特点,让人工智能突破算力、数据和参数规模的限制,实现更高级别的通用人工智能。 自上世纪80年代类脑计算的概念被首次提出起,有关类脑人工网络、类脑机器学习、类脑芯片等领域的技术研究不断涌现。而随着算力、芯片、算法三要素 的不断突破,以及多学科交叉融合的逐步深入,类脑智能发展正迎来新的发展契机。 4月18下午,在2025全国类脑智能产业创新发展推进会上,类脑智能产业创新发展联盟发起成立,类脑智能未来产业基金矩阵在会上首次亮相。 类脑智能未来产业基金矩阵由上海未来产业基金、博康共赢基金、道禾基金、杨浦科创集团等10家投资机构联合发起,将聚焦类脑智能产业新赛道,支持类 脑产业前沿技术研究、落地与应用,完善类脑产业布局,推动区域类脑产业发展。 于2017年就在全国率先开展类脑智能布局的上海,通过深化基础原创理论研究、加快关键核心技术攻关、承接国家重大战略任务等举措,在类脑计算芯片、 类脑视觉系统研发等领域取得了重要成果。 当前,规模定律(Scaling Law)已将达到算力和数据的瓶颈,人工智能的性能提升将会放缓。新算法与人工网络模型的进一步优化,借鉴低功耗但复杂而 精巧 ...
奥特曼自诩:达到或接近天才水平!OpenAI,重磅发布!
Zheng Quan Shi Bao· 2025-04-17 04:31
Core Insights - OpenAI has launched two new reasoning models, o3 and o4-mini, which are capable of image-based reasoning, marking a significant advancement in the o series [1][6] Group 1: Model Performance - The o3 model is described as the most powerful reasoning flagship model, excelling in programming, mathematics, science, and visual perception benchmarks [1][8] - The o4-mini model is optimized for cost-effective reasoning, providing a balance between performance and affordability [1][8] - In external evaluations, o3 made 20% fewer significant errors in challenging real-world tasks compared to its predecessor, particularly in programming and creative tasks [8] Group 2: Image Reasoning Capabilities - Both models can integrate images into reasoning processes, allowing for "thinking with images" [10] - Users can upload various types of images, and the models can interpret them even if they are of low quality [10] - For example, o3 can analyze a photo of a notebook and deduce the written content through reasoning [10] Group 3: Task Execution and Tool Utilization - o3 and o4-mini can autonomously execute tasks by accessing tools within ChatGPT and utilizing custom user tools via API [13] - The models can perform complex tasks such as searching for data, generating code, and creating visual representations based on user queries [13] Group 4: Future Developments - OpenAI's CEO, Sam Altman, indicated that o3 will soon be upgraded to a professional version, o3-pro [4] - The company has been releasing models at a rapid pace, including the recent launch of the GPT-4.1 series, which aims to attract users with cost-effective options [15] - There is ongoing anticipation for the release of GPT-5, which has faced delays due to integration challenges [16]
解读国内首个DeepResearch AI Agent 智谱沉思模型
2025-04-15 14:30
Summary of Conference Call on Zhipu's AI Product Company and Industry - The conference call focused on Zhipu, a company specializing in AI technology, particularly its latest product, the Autoglm Chen Si version, which is a deep research AI agent. This product has garnered significant market attention due to its unique capabilities in deep research and operational functions [1][14]. Core Points and Arguments - **Product Launch**: Zhipu AI introduced its latest AI agent product, Autoglm Chen Si version, at the Zhongguancun Forum, highlighting its dual capabilities in deep research and operational tasks [1]. - **Technical Architecture**: The product utilizes a chain model combined with a test-type scaling technology, designed specifically for completing tasks efficiently [2]. - **Local vs. Cloud Operation**: A key distinction between Zhipu's product and competitors like Manus is that Zhipu's agent operates on local client machines, while Manus runs in a cloud environment. This local operation addresses user concerns about data security and reduces computational costs [3][4]. - **Model Development**: Zhipu's agent is developed in-house, tailored for specific tasks, which enhances its performance in task planning compared to general models used by competitors like Linus [5]. - **User Experience**: Current limitations include the inability to handle multiple windows simultaneously and the overall processing time for tasks, which has been noted as relatively long [6][8]. - **Computational Power**: The company has invested approximately 2000 units of computational power, equivalent to around 2000 to 3000 NVIDIA 4090 graphics cards, to support the X400 product [9]. - **Marketing Strategy**: The decision to offer the product for free is attributed to missed market opportunities and competitive pressures from free offerings by other companies like Deep Sync [10]. - **Future Business Model**: There are considerations for evolving the business model, potentially introducing a membership system based on market conditions [10]. - **Data Source Challenges**: The product faces challenges in accessing certain online information sources, particularly academic databases like Google Scholar, which limits its functionality [11][12]. Other Important Content - **Browser Compatibility**: Currently, the agent primarily supports Chrome, with plans to improve compatibility with other browsers like IE and Safari [8]. - **User Authentication**: The agent requires users to log in to access certain features, with login information stored locally for convenience [12]. - **Investor Engagement**: The call included a Q&A session for investors, indicating a focus on transparency and engagement with stakeholders [13]. This summary encapsulates the key aspects of the conference call regarding Zhipu's AI product, highlighting its technological innovations, market positioning, and future strategies.
元戎启行周光:智驾最终拼的是 AI 技术,不只是规模丨具身智能对话#13
晚点Auto· 2025-04-14 13:47
以下文章来源于晚点LatePost ,作者晚点团队 晚点LatePost . 晚一点,好一点 先有一个移动能力的 "通才",才有更强的智驾系统。 文 丨 张家豪 编辑 丨 程曼祺 全无人驾驶,始终被视作自动驾驶行业皇冠上的明珠,就像登顶珠穆朗玛峰有 19 条路线一样,不同的公司选 择了不同的路线通往无人驾驶的最终目标。 Waymo、小马们选择了基于高精地图的 RoboTaxi 路线,在特定的路线已经实现了 RoboTaxi,为市民提供没有 司机的出行服务;以特斯拉为代表的车企与供应商,则是通过渐进式路线,卖车搭配辅助驾驶方案,收集数据 一步步迭代方案,试图逼近技术极限。 没有人能笃定哪条路线一定能成功登顶,也还有不同的公司,在尝试不同的登顶路线。 在今年的英伟达 GTC( GPU Technology Conference)上,元戎启行周光提出了一套新的解法,他说,大语言 模型的发展,经历了从弱专家模型(初代 Siri)、到通才(ChatGPT)、再到强专家模型(垂直模型)的过 程。智驾也可以复制这样的路线,一个移动能力的通才,能开好汽车、能骑好摩托车、能让配送小车随时找到 你,之后就可能进化到强专家模型—— ...