Workflow
Reinforcement Learning
icon
Search documents
强化学习环境与科学强化学习:数据工厂与多智能体架构 --- RL Environments and RL for Science_ Data Foundries and Multi-Agent Architectures
2026-01-07 03:05
JAN 07, 2026 2026 年 1 ⽉ 7 ⽇ ∙ PAID ∙ 付费内容 79 Share 分享 RL Environments and RL for Science: Data Foundries and Multi-Agent Architectures 强化学习环境与科学强化学习:数据⼯⼚与多智能 体架构 Worker Automation, RL as a Service, Anthropic's next big bet, GDPval and Utility Evals, Computer Use Agents, LLMs in Biology, Mid-Training, Lab Procurement Patterns, Platform Politics and Access Last June, we argued that scaling RL is the critical path to unlocking further AI capabilities. As we will show, the past several months have affirmed our ...
OpenAI前首席科学家Ilya Sutskever:规模神话的终结,回到研究时代
3 6 Ke· 2026-01-04 05:13
"如果再增加100倍规模,一切就会被彻底改变吗?我不这么认为。所以,我们重新回到了研究时代。" 他的出现本身就是新闻 2025年11月25日,Ilya Sutskever出现在Dwarkesh Patel的播客中。 这本身就是新闻。 自从2024年离开OpenAI、创立Safe Superintelligence (SSI)以来,Ilya几乎从公众视野中消失。他的新公 司融资30亿美元、估值320亿美元,却几乎没有任何公开信息。没有产品发布,没有技术博客,没有社 交媒体上的只言片语。 所以当这期长达96分钟的访谈上线时,整个AI研究社区都停下了手头的工作。 Dario Amodei(Anthropic CEO)在社交媒体上宣布这是"Ilya播客日",并开玩笑说这足以成为请病假的 理由。 这场对话没有让人失望。Ilya对当前AI发展的判断,比任何公开声明都更坦率、更深刻、也更令人不 安。 开场:科幻照进现实 对话以一个近乎哲学的观察开始。 Ilya:"你知道什么是疯狂的吗?这一切都是真的。" Dwarkesh:"什么意思?" Ilya:"这些AI的东西,这些湾区正在发生的事情……难道不觉得像是科幻小说照进现实吗 ...
搞过自驾的小伙伴,在其他领域还是很抢手
自动驾驶之心· 2025-12-31 00:31
Group 1 - The core viewpoint of the article highlights the competitive landscape of the autonomous driving industry, emphasizing the focus on technology, cost, and efficiency as key areas of competition this year [1] - The industry has seen a shift with many professionals transitioning to sectors like embodied AI and drones, while autonomous driving remains a mature AI field, making algorithm talents highly sought after [1][2] - Major technological directions in autonomous driving have converged this year, including end-to-end systems, VLA, world models, and reinforcement learning, with many midstream companies tackling challenges like OCC and multi-sensor fusion perception [3] Group 2 - The membership of the paid community focused on autonomous driving has officially surpassed 4,000, indicating a growing interest in the development of technology routes and job information [3] - The company expresses gratitude to its supporters and announces various benefits and discounts for the new year, encouraging continued efforts in the upcoming year [4]
搞过自驾的小伙伴,在其他领域还是很抢手
自动驾驶之心· 2025-12-28 03:30
自驾行业今年还是很精彩的,在整体下沉的关键节点,都很卷。卷技术、卷成本、卷效率。我们今年亦是如此,扩充了很多 B端的客户,也开始尝试从线上走向线下。C端也慢慢从普适性的能容逐渐专业化和精细化。 上半年不少自驾的同学转行去了具身,包括现在也是如此,L4/具身/无人机几个行业在大批量招人,而自驾又是相对成熟的 AI领域,所以自驾的算法人才非常受欢迎,几个头部企业的薪资很到位(大疆/宇树/智元/哈啰等等)。 下周就要迎来26年了,也到了年末盘点的时候。 搞过自驾的人,用过大集群,解过各种corner case,上下游协同能力强,这些都是其他几个行业所欠缺的。 今年,自驾的头部技术收敛到几个大方向上:一段式端到端、VLA、世界模型(重建+仿真)、强化学习。我们接触到的中 游厂商还在攻坚OCC、无图、多传感器融合感知等等,明年这些公司都有大量hc开放。 今年,自动驾驶之心的付费社区的成员正式突破4000人了。如果想看技术路线的发展、各类圆桌、研报、职位信息,可以多 来逛逛。 新的一年,也感谢新老粉丝的支持,我们为大家推出了众多福利优惠。新的一年大家再接再厉。 星球新人六折券,续费五折券 欢迎添加助理咨询活动 ...
DiffusionDriveV2核心代码解析
自动驾驶之心· 2025-12-22 03:23
Core Viewpoint - The article discusses the DiffusionDrive model, which utilizes a truncated diffusion approach for end-to-end autonomous driving, emphasizing its architecture and the integration of reinforcement learning to enhance trajectory planning and safety [1]. Group 1: Model Architecture - DiffusionDriveV2 incorporates reinforcement learning constraints within a truncated diffusion modeling framework for autonomous driving [3]. - The model architecture includes environment encoding through bird's-eye view (BEV) features and vehicle status, facilitating effective data processing [5]. - The trajectory planning module employs multi-scale BEV features to enhance the model's ability to predict vehicle trajectories accurately [8]. Group 2: Trajectory Generation - The model generates trajectories by first clustering true future trajectories of the vehicle using K-Means to create anchors, which are then perturbed with Gaussian noise to simulate variations [12]. - The trajectory prediction process involves cross-attention mechanisms that integrate trajectory features with BEV features, enhancing the model's predictive capabilities [15][17]. - The final trajectory is derived from the predicted trajectory offsets combined with the original trajectory, ensuring continuity and coherence [22]. Group 3: Reinforcement Learning and Safety - The Intra-Anchor GRPO method is proposed to optimize strategies within specific behavioral intentions, enhancing safety and goal-oriented trajectory generation [27]. - A comprehensive scoring system evaluates generated trajectories based on safety, comfort, rule compliance, progress, and feasibility, ensuring robust performance in various driving scenarios [28]. - The model incorporates a modified advantage estimation approach to provide clear learning signals, penalizing trajectories that result in collisions [30]. Group 4: Noise and Exploration - The model introduces multiplicative noise to maintain trajectory smoothness, addressing the inherent scale inconsistencies between proximal and distal trajectory segments [33]. - This approach contrasts with additive noise, which can disrupt trajectory integrity, thereby improving the quality of exploration during training [35]. Group 5: Loss Function and Training - The total loss function combines reinforcement learning loss with imitation learning loss to prevent overfitting and ensure general driving capabilities [39]. - The trajectory recovery and classification confidence contribute to the overall loss, guiding the model towards accurate trajectory predictions [42].
深度|百亿美金AI独角兽Surge AI华裔创始人:不融资、小规模,AI创业的另一种可能
Z Potentials· 2025-12-19 03:01
Core Insights - Surge AI, founded by Edwin Chen, achieved over $1 billion in revenue within four years without external funding, employing fewer than 100 staff members, and has been profitable since inception [4][6][7] - The company focuses on high-quality AI data training, emphasizing the importance of data quality over quantity, and aims to create AI that benefits humanity rather than merely optimizing for engagement [6][11][12] Company Overview - Surge AI is a leading AI data company that supports model training for cutting-edge AI labs, achieving rapid growth and profitability without venture capital [4][6] - The company employs a unique approach by prioritizing product quality and customer alignment over traditional Silicon Valley practices of fundraising and marketing [9][10] Business Model and Strategy - Surge AI operates with a small, highly skilled team, believing that efficiency can be achieved without large organizations, which is facilitated by advancements in AI technology [7][8] - The company avoids typical Silicon Valley promotional tactics, relying instead on word-of-mouth and the intrinsic value of its products to attract clients [9][10] Data Quality and Evaluation - Surge AI defines data quality in a nuanced way, focusing on the emotional and intellectual resonance of outputs rather than just meeting superficial criteria [11][12] - The company employs a comprehensive signal system to assess the quality of data contributions, ensuring that only high-quality outputs are used for model training [13][14] AI Industry Trends - The conversation highlights a growing concern that many AI models are optimized for benchmark tests rather than real-world applications, leading to a disconnect between model performance and practical utility [18][19] - There is a belief that the future of AI will see a shift towards more diverse and specialized models, driven by the unique characteristics and goals of different research labs [42]
Reinforcement Learning Tutorial - RLVR with NVIDIA & Unsloth
Matthew Berman· 2025-12-15 13:00
This is the tech that got AI to be the best in the world at chess, Go, League of Legends, and even master autonomous driving. And today, I'm going to show you how to set it up and actually run it on your home computer. And by the way, I'm partnering with Nvidia on this video.They wanted me to put together this tutorial, and I thought it would be awesome to show you how to do RL locally. So, how did this actually happen. How did AI surpass humans at all of these games.The answer is reinforcement learning. An ...
Rivian Unveils Plans For Autonomous Driving
Youtube· 2025-12-11 17:32
RJ Rivian finally has an AI story. I actually would like to stop asking why now. You're ready to talk a bit more about where Rivian feels its A.I. competencies are. Yes.So Rivian launched the first vehicles and end of 2021, and almost immediately following that, we began the process of designing the clean sheet approach to how we were to integrate across the business sets on the autonomy platform, and that's the background level of the vehicle and of course, with the enterprise. And in doing that, we redefi ...
不融资、不烧钱、不扩团队,华裔 CEO 创办的AI独角兽打入谷歌、Anthropic核心供应链!如今营收近百亿
Sou Hu Cai Jing· 2025-12-10 07:15
编辑 | 冬梅 在 Meta 豪掷 143 亿美元入股竞争对手 Scale AI 时,这家由谷歌前工程师创立、员工仅为对手十分之一的公司,已悄然实现了年营收超 10 亿美元的业 绩,且从未接受外部投资。 AI 竞技场上,聚光灯总在追逐着 OpenAI、Google 等发布下一个万亿参数模型的明星。而决定模型"思维"与"品格"的训练数据,则像被遗忘的地基。 硅谷正上演一幕对比鲜明的戏剧:一边是 Meta 豪掷 143 亿美元收购数据标注公司 Scale AI 近半股份,使其创始人亚历山大·王成为硅谷红人。 另一边,是其低调的对手 Surge AI:成立近五年没有任何融资、过去两年几乎不发新闻稿、员工仅为对手十分之一,却悄悄实现了超过 10 亿美元的营 收,在财务上已超越获得巨资的 Scale AI。 公司组建了一个名为"Surge Force"的精英标注员网络,准入门槛极高。申请者除了需要过硬的专业背景,还需提交 5 道试写题目,经另一名资深标注员审 核通过后才可加入。 该网络不仅包括来自全球的专业人士,甚至聘请了斯坦福、普林斯顿和哈佛的教授来参与训练 AI,旨在将人类的专业知识、创造力和价值观编码进数 据。 ...
Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute
AI Engineer· 2025-12-09 15:51
[music] Hey everyone, it's great to meet you all. Really great to be here today. My name is Rhythm. This is my co-founder Lyndon.Our third co-founder, Yash, couldn't make it today, but we're all very excited to be here. Um, three of us were previously researchers at OpenAI, and now we're bringing Frontier AI inside of enterprise at applied compute. Today, we're going to be talking about efficient reinforcement learning.As some context on applied compute, we help enterprises build their own intelligence to p ...