强化学习
Search documents
卓驭科技接入通义大模型,联合打造端到端世界模型
阿里云· 2025-04-24 09:13
Core Insights - The article highlights the collaboration between Zhuoyu Technology and Alibaba Cloud, focusing on the integration of the Tongyi large model and the development of an end-to-end world model [1][2] - Zhuoyu's end-to-end world model incorporates reinforcement learning and chain reasoning technology, enhancing safety in urban navigation and enabling personalized driving styles and natural language interaction [2] Summary by Sections - **Integration with Alibaba Cloud** - Zhuoyu Technology has fully migrated its core business systems, including big data and intelligent manufacturing, to Alibaba Cloud [1] - The company has established a GPU resource pool on the Alibaba Cloud PAI platform to meet the high computational demands of its model training [2] - **Model Training Efficiency** - The training method combines pre-training and post-training, resulting in a training efficiency improvement of over 50% compared to single GPU clusters [2] - The utilization rate of GPUs has been increased to over 95% due to the serverless capabilities of the Alibaba Cloud PAI platform, which simplifies cluster operations and ensures full observability of the training process [2] - **Development Acceleration** - In the research and development domain, Zhuoyu has integrated Tongyi Lingma and Tongyi Qianwen to accelerate development, achieving a code adoption rate of 29% [2]
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
Core Viewpoint - The article discusses the current state of AI agents, indicating that most are still in the "pilot purgatory" phase and have not yet transitioned to real-world applications, despite expectations for 2025 to be the "year of AI agents" [1][2]. Group 1: Current State of AI Agents - A survey on social platform X reveals that 64.2% of AI agents are stuck in pilot purgatory, while only 6.4% are smarter than the hype [2]. - The article highlights the need for advancements in AI systems to enhance their stability and reliability in enterprise applications [2]. Group 2: Introduction of RAGEN - A new system called RAGEN, developed by a team including researchers from Northwestern University, Microsoft, Stanford University, and the University of Washington, aims to improve AI agents' performance in real-world scenarios [2][5]. - RAGEN focuses on multi-turn interaction scenarios, requiring agents to reason under uncertainty and remember historical dialogues [5]. Group 3: StarPO Framework - RAGEN is built on a custom reinforcement learning framework named StarPO, which emphasizes learning through experience rather than rote memorization [5][7]. - The StarPO framework consists of two alternating phases: rollout, where the LLM generates complete interaction sequences, and update, where the model updates parameters based on normalized cumulative rewards [7]. Group 4: Training Challenges and Solutions - The article discusses the "Echo Trap" phenomenon, where agents generate repetitive responses due to early high rewards, leading to a decline in reasoning ability [12]. - To address training stability, the enhanced version StarPO-S introduces three key mechanisms: uncertainty-based rollout filtering, removal of KL penalty, and asymmetric PPO clipping [19]. Group 5: Evaluation Environments - RAGEN includes three symbolic testing environments to evaluate decision-making capabilities: Bandit, Sokoban, and Frozen Lake, each designed to assess different aspects of agent performance [15][17]. - These environments aim to minimize prior knowledge interference, allowing agents to rely solely on learned strategies for decision-making [15]. Group 6: Future Implications - RAGEN represents a significant step towards developing AI agents with autonomous reasoning capabilities, although challenges remain in applying these methods to real-world business processes [24]. - The article emphasizes the importance of optimizing reward mechanisms to focus on the quality of reasoning processes, not just the correctness of outcomes [24].
AI 智能体老“崩”?DeepSeek 前员工联手李飞飞等大佬开源新框架,教会模型真正推理
AI前线· 2025-04-24 03:03
很多人都觉得 2025 年会是"AI 智能体元年",也就是基于 OpenAI、Anthropic、Google 和 DeepSeek 等机构提供的大语言模型,打造专注特定任务的智能体系统。 但是,最近在社交平台 X 上有个调查显示,现在大部分 Agent 都在"玩票"阶段,还没真正走出实验 室,普遍滞留在"企业试点"的状态中。 编译 | Tina 推理智能体训练框架已开源 与解题或代码生成等静态任务不同,RAGEN 聚焦在多轮交互场景中训练智能体,要求它们能在不确 定性中进行推理、记忆历史对话并灵活应对变化。 | Al agents in the enterprise right now are ... | | | --- | --- | | Smarter than the hype | 6.4% | | Stuck in pilot purgatory | 64.2% | | Powerful, but high effort O | 24.8% | | Nearing real scale | 4.6% | 不过,李飞飞所在的一支团队或许即将带来改变:他们与西北大学、微软、斯坦福大学和华盛顿大学 的研究 ...
商汤绝影打造智能驾驶新路标——生成式智驾R-UniAD,让安全更有确定性,超越人类驾驶极限
Guan Cha Zhe Wang· 2025-04-24 01:18
Core Insights - The article discusses the advancements in autonomous driving technology by SenseTime's "绝影" (Jueying), particularly focusing on the R-UniAD technology framework that integrates reinforcement learning and world models to overcome existing limitations in end-to-end autonomous driving systems [1][2][3]. Group 1: Technology Advancements - SenseTime has developed the R-UniAD technology solution, which incorporates reinforcement learning to enhance the interaction between end-to-end autonomous driving systems and the real world, thereby improving safety and reliability [2][3]. - The VLAR architecture, which combines "vision-language-action-reinforcement learning," is a key breakthrough in achieving generative autonomous driving capabilities [6][9]. - The R-UniAD framework consists of a three-stage process: initial training through imitation learning, reinforcement learning with world model interaction, and efficient distillation for deployment in vehicles [9]. Group 2: Safety and Performance Improvements - The R-UniAD technology aims to significantly reduce the need for real-world data by generating virtual scenarios, thus lowering the requirement for high-quality corner case data by two orders of magnitude [9]. - The model's performance is designed to exceed human driving capabilities, with a reported reduction in collision rates by an order of magnitude compared to human drivers [9]. - The system's ability to handle complex scenarios, such as construction site interruptions, is enhanced through 4D simulation and reinforcement learning, allowing for better prediction and response to unforeseen obstacles [10][12][16]. Group 3: Commercialization and Partnerships - SenseTime's autonomous driving solutions are currently in collaboration with four automotive manufacturers, with seven vehicle models already equipped with their technology [1][21]. - The company is accelerating the mass production of its autonomous driving solutions, with plans for further deployment in 2025, including partnerships with major automotive brands like Dongfeng and Chery [21][23]. - The R-UniAD technology has received certification from the China Automotive Technology and Research Center, marking it as a leading product in the field of autonomous driving [23]. Group 4: Future Developments - The "绝影开悟" (Jueying Kaiwu) world model has been upgraded to version 2.0, enabling near real-time interaction and 4D scenario generation, which is crucial for training autonomous driving models [17][19][20]. - This upgraded model can generate diverse and complex driving scenarios, including extreme risk situations, which are essential for training robust autonomous systems [19][20]. - SenseTime aims to integrate its advanced AI technologies with the automotive industry to create a comprehensive ecosystem for intelligent driving, focusing on safety, adaptability, and user experience [24][25].
Agent、DeepSeek、多模态热点炸场!60+重磅嘉宾共探AI未来,2025全球机器学习技术大会完美收官!
AI科技大本营· 2025-04-21 10:24
以下文章来源于CSDN ,作者CSDN CSDN . 成就一亿技术人 作者 | 《新程序员》编辑部 出品 | CSDN(ID:CSDNnews) 在万物向 "智 " 生长的 2025 年,AI 领域的热潮持续升温,正引领着技术革新与产业探索的新浪潮。 了新的破解思路?围绕这些关键问题,欢迎回看大会首日视频,看众多技术大咖如何从理论、算法到实际应用层面进行了深度剖析 ,以此 了解 AI 技术 的更多最新进展: 大模型技术创新驱动的 AI 生态和应用演进 李建忠 CSDN 高级副总裁、 Boolan 首席技术专家 4 月 18-19 日,由 CSDN 联合高端 IT 咨询与教育平台 Boolan 举办的 2025 全球机器学习技术大会(ML-Summit 2025),在上海虹桥西郊庄园丽笙 大酒店隆重拉开帷幕。本次大会围绕 AI 最前沿的发展趋势与落地实践,聚焦大语言模型技术演进、AI 智能体、具身智能、DeepSeek 技术解析与行业 实践等 12 大专题,邀请了超 60 位来自全球顶尖科技企业与学术机构的重磅嘉宾齐聚一堂,全面呈现 AI 领域的技术风向与应用前沿。 在生成式 AI 重构技术边界的浪潮下,产业实 ...
机械设备行业点评报告:人形机器人首场马拉松收官,各家运动能力表现如何?
Soochow Securities· 2025-04-21 09:33
Investment Rating - The report maintains an "Accumulate" rating for the mechanical equipment industry [1] Core Insights - The first humanoid robot marathon took place on April 19, 2025, in Beijing, with 21 robot teams participating in a 21-kilometer race [1][2] - The event showcased the capabilities of humanoid robots, with notable performances from TianGong Ultra and SongYan Power N2, highlighting advancements in robotic movement and control [4][6] - The use of reinforcement learning technology was prevalent among the participating robots, indicating a promising direction for future development in humanoid robotics [5][36] Summary by Sections Event Overview - The first humanoid robot marathon was held on April 19, 2025, in Beijing, featuring 21 robot teams competing in a half marathon [1][14] Participating Teams - A total of 21 humanoid robot teams participated, including notable entries like TianGong Ultra, Kuavo, and SongYan Power N2 [2][16] Race Format and Rules - Robots participated in the marathon through remote operation, with engineers accompanying them. Each robot started at one-minute intervals, maintaining a distance of over one meter from each other [3][19] Race Results - TianGong Ultra won the marathon with a time of 2 hours 40 minutes 42 seconds, benefiting from advanced technology and design [4][22] - SongYan Power N2 secured second and third places, demonstrating excellent stability and humanoid gait without requiring dedicated support [4][26] Future Development Directions - The marathon set three world records, emphasizing the need for improved robustness and hardware stability for commercial viability [32][35] - The report suggests that enhancing the robots' endurance and joint cooling capabilities is crucial for their long-term operational success [37] Investment Recommendations - The report recommends focusing on the supply chains of TianGong Robotics and SongYan Power, highlighting specific companies for potential investment [6][38]
OpenAI发布o3与o4-mini,视觉推理与工具使用突破
GOLDEN SUN SECURITIES· 2025-04-20 05:22
Investment Rating - The report maintains an "Accumulate" rating for the industry [7]. Core Insights - OpenAI has released two groundbreaking models, o3 and o4-mini, which enhance visual reasoning and tool usage capabilities, marking a significant leap in ChatGPT's intelligence [11][12]. - The MCP (Model Context Protocol) is gaining traction, aiming to standardize how large models access context, thereby accelerating the development of AI applications [3][31]. Summary by Sections OpenAI Model Releases - OpenAI launched o3 and o4-mini on April 16, showcasing advanced reasoning capabilities through image processing and tool utilization, setting new benchmarks in performance [11][12]. - o3 is noted for its superior performance in complex tasks, achieving a 20% reduction in significant errors compared to its predecessor, o1, particularly in programming and creative tasks [12][13]. - o4-mini is optimized for quick and cost-effective reasoning, outperforming o3-mini in various non-STEM tasks [12][13]. Visual Reasoning and Tool Usage - The new models can integrate images into their reasoning processes, allowing dynamic manipulation of images and collaboration with tools like Python for data analysis and web searches [19][23]. - They can generate detailed responses quickly, often within a minute, by effectively utilizing multiple tools to address complex queries [25][26]. MCP Influence and Ecosystem Development - MCP serves as a standardized protocol for connecting AI models to various tools and data sources, enhancing reliability and efficiency in AI systems [3][31]. - The protocol is being adopted by major companies, including Google and Tencent, which is expected to lower development barriers for AI applications [35][36]. Investment Opportunities - The report suggests focusing on various sectors, including IAAS (e.g., Cambricon, Alibaba), garbage power generation (e.g., Wangneng Environment), and SAAS (e.g., Kingsoft Office, Yonyou Network) [4][36][37].
大模型:从单词接龙到行业落地
Zhejiang University· 2025-04-18 07:55
Investment Rating - The report does not provide a specific investment rating for the industry. Core Insights - The report discusses the evolution of large language models (LLMs) and their applications in various fields, emphasizing their ability to learn from vast amounts of unannotated data and perform tasks traditionally requiring human intelligence [48][49][50]. - It highlights the significance of pre-training and fine-tuning in enhancing model performance, with a focus on the advantages of using large datasets for training [35][56]. - The report also addresses the challenges faced by LLMs, including issues of hallucination, bias, and outdated information, and suggests that integrating external data sources can mitigate these problems [63][80]. Summary by Sections Section on Large Language Models - Large language models utilize vast amounts of unannotated data to learn about the physical world and human language patterns [48]. - The training process involves pre-training on diverse datasets followed by fine-tuning for specific tasks [35][56]. Section on Training Techniques - The report outlines various training techniques, including supervised fine-tuning (SFT) and instruction tuning, which help models generalize to unseen tasks [56][59]. - Reinforcement learning from human feedback (RLHF) is also discussed as a method to align model outputs with human preferences [59]. Section on Applications and Use Cases - The report emphasizes the versatility of LLMs in applications ranging from natural language processing to complex problem-solving tasks [48][49]. - It mentions specific use cases, such as in the fields of healthcare for predicting conditions like epilepsy [162][211]. Section on Challenges and Solutions - The report identifies key challenges such as hallucination, bias, and the need for timely information, proposing the use of external databases to enhance model accuracy and relevance [63][80]. - It suggests that addressing these challenges is crucial for the broader adoption of LLMs in various industries [63][80].
21支队伍参加人形机器人半马,每位选手最多三位人类“陪跑员”
Di Yi Cai Jing· 2025-04-18 05:07
组委会预计首名机器人撞线时间会在明日上午10:10左右。 4月18日上午,全球首个人形机器人半程马拉松公布了参赛选手名单。在明日上午7:30举行的半程马拉松中,共有21支机器人队伍会在北京亦庄南海子公园 一期南门起跑,这些参赛队伍分别来自"国家队"、民营企业和学校科研团队。 如果按照天工Ultra的最快奔跑速度,跑完一场21.0975公里的半程马拉松大约需要两个小时不到的时间。不过,由于中途会出现更换电池等情况,组委会预 计首名机器人撞线时间会在明日上午10:10左右。 8 t 695 27 16 d y 112 ST A 300 2 4 8 HEAD at 8 year 1 80 US GOLD GET th and 16 also 1 . f 除了国家队和科研领域的参与者外,灵宝CASBOT、松延动力等本体厂商也会参与本次半马。第一财经梳理看到当前参赛机器人主要采用了强化学习的算法 路线,在参赛过程当中大多采用遥控方式比赛。在实际比赛过程中,一个人形机器人将和人类运动员共同组成一个赛队,人类"陪跑团"最多能有三个人,可 能由机器人的工程师、操控员和领跑员组成。 根据此前公布的跑步规则,机器人在跑道上将按Z ...
谷歌高管入职两个月,字节AI开始扁平化?
阿尔法工场研究院· 2025-04-17 10:47
以下文章来源于AI科技评论 ,作者梁丙鉴 AI科技评论 . 字节 AI Lab 是 Seed 成立之前字节主要的 AI 探索部门,目前由李航管理,自2024年开始向 Seed 时 任负责人朱文佳汇报。今年2月下旬,原 Google DeepMind 副总裁吴永辉入职字节,成为 Seed 基础 研究负责人。此后李航的汇报对象变为吴永辉。 字节 AI Lab 成立于2016年,最初由微软亚洲研究院前常务副院长马维英负责,直接向张一鸣汇 报。AI lab 目前有多个子团队,包括机器人、AI4S 等方向,几乎覆盖人工智能领域所有前沿技术研 究。2018年其团队规模达到150人,为字节跳动AI研究的核心部门。 AI Lab 主要研究重点是开发为字节跳动内容平台服务的创新技术,曾参与字节手势识别、短视频特 效等功能开发。其研究成果应用于今日头条、抖音等产品,是支持抖音成长为国民级应用的基石, 并奠定了当时字节在国内AI领域的领先地位。 随着抖音、TikTok 占据绝对优势的市场地位,流量商业化成为字节面临的 Top 级问题,AI Lab 在 字节内部重要性下降。2020年,AI Lab 从集团级前瞻性项目转为技术中台,为 ...