RL

Search documents
从RLHF、PPO到GRPO再训练推理模型,这是你需要的强化学习入门指南
机器之心· 2025-06-22 04:26
选自 unsloth.ai 作者:Unsloth Team 强化学习(RL)已经成为当今 LLM 不可或缺的技术之一。从大模型对齐到推理模型训练再到如今的智能体强化学习(Agentic RL),你几乎能在当今 AI 领域的 每个领域看到强化学习的身影。 近日,Daniel Han 和 Michael Han 两兄弟组成的团队 Unsloth(用于微调模型的同名开源项目 GitHub 星数已超过 4 万)发布了一个强化学习教程,其中从吃豆人谈 起,简单易懂地从 RLHF、PPO 介绍到 GRPO,还分享了如何用 GRPO 训练推理模型的技巧。 全面了解强化学习以及如何使用 GRPO 训练你自己的推理模型。这是一份从初学者到高级的完整指南。 你将学到什么 本文涵盖了你需要了解的关于 GRPO、强化学习 (RL) 和奖励函数的所有内容 —— 从初学者到高级,还有基于 Unsloth 使用 GRPO 的基础知识。 如果你正需要学习如何一步步实现 GRPO,这份指南值得一读。 ❓什么是强化学习 (RL)? 强化学习的目标是: 就这么简单!「好」和「坏」的含义错综复杂,「增加」和「降低」也许斟酌,甚至「结果」的含义也各不 ...
管线覆盖ADC和RLT,这家创新药企总融资近6亿美元
3 6 Ke· 2025-06-22 01:59
Core Insights - Immunome completed a $150 million financing round in January 2025, aimed at advancing its core pipeline into clinical translation [1] - The company announced the first patient dosing in the Phase I clinical trial of its ROR1-targeting ADC (antibody-drug conjugate) IM-1021 in March 2025 [1] - Immunome's core product pipeline, Varegacestat (AL102), has completed patient enrollment in the Phase III RINGSIDE trial, with topline data expected in the second half of 2025 [1] Company Overview - Immunome has raised a total of $598.9 million over 21 funding rounds since its establishment 17 years ago, indicating strong investor confidence [1][21] - The company specializes in targeted oncology and leverages rapid antibody screening and precise delivery technologies to address the limitations of traditional cancer treatments [1][2] Technology and Innovation - Immunome's competitive edge lies in its Memory B cell technology and Targeted Effector platform, which enhance antibody affinity and specificity while minimizing off-target damage [3][4] - The Memory B cell platform utilizes patient-derived memory B cells to discover antibodies that are naturally equipped to target tumor-specific antigens, particularly useful for resistant tumors [5] - The Targeted Effector platform allows for modular design to optimize drug delivery and efficacy, significantly improving the therapeutic index compared to traditional methods [6][7] Product Pipeline - Immunome's pipeline includes three clinical-stage products: Varegacestat (AL102), IM-1021 (ROR1 ADC), and IM-3050 (RLT) [8] - Varegacestat is an oral γ-secretase inhibitor targeting rare sarcomas, showing a 64% objective response rate in Phase II trials [9] - IM-1021 is designed to overcome resistance in solid tumors, utilizing a high drug-to-antibody ratio (DAR8) to enhance efficacy [11] - IM-3050 targets FAP-positive cancer-associated fibroblasts, demonstrating significant potential in overcoming the tumor microenvironment barriers [16][17] Financial Performance - In 2024, Immunome reported revenues of $9.04 million, reflecting a 35.5% year-over-year decline, although losses have narrowed compared to 2023 [23] - The financing structure is heavily weighted towards post-IPO funding, which poses dilution risks for existing shareholders [22]
100+自动驾驶数据集,这5个你总得知道吧?
自动驾驶之心· 2025-06-22 01:35
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 自动驾驶技术日渐火热,各类开发者数据集层出不穷。"自动驾驶之心"已整理收录了100多个优质自动 驾驶数据集,为初学者和工程师提供了丰富素材。本文仅选取其中5个数据集进行介绍,覆盖了从感知 (目标检测、分割)到视觉里程计等多种任务场景。无论你是入门新手还是科研工程师,这5个数据集 都值得关注,更多资源欢迎加入社群获取完整资料! 不过,本文介绍的只是"自动驾驶之心"社群中海量资源的一小部分。想要获取全部100+数据集的详细信 息,以及与志同道合的业内同仁实时交流,请加入"自动驾驶之心"知识星球并加入社群! 1. KITTI 数据集 KITTI 数据集是自动驾驶领域最经典、使用最广泛的基准数据集之一。它通过在卡尔斯鲁厄街道环境中 搭载高精度传感器(如双目彩色/灰度相机、Velodyne 3D 激光雷达、GPS/IMU 等)采集数据。数据集中 包含了立体视觉、光流、视觉里程计、3D 目标检测和跟踪等多种感知任务的标注(如图像序列和 3D 物 体轨迹)。丰富的城市、高速和乡村场景让 KITTI 成为评测车载视觉算法性能 ...
More Americans are claiming Social Security early despite drawbacks: Tips to prepare for retirement
Yahoo Finance· 2025-06-21 14:01
The number of Americans claiming Social Security benefits is on the rise with the number of claims between January and May nearly 18% higher than the same time one year ago. And there could be risk to retirees overall earnings if the trend continues. Joining me now in studio is Larry Sprung, the founder of Milton Financial.So Larry, this is really interesting. saying, "We're seeing this surge in Americans claiming Social Security benefits at an early age, 62, highest in over two decades. What's driving this ...
X @Bankless
Bankless· 2025-06-20 15:38
→ Ethereum L1 is the world ledger.→ ETH is the world reserve asset. https://t.co/a9A25W9yoy ...
OpenAI路线遭质疑,Meta研究员:根本无法构建超级智能
3 6 Ke· 2025-06-20 12:00
Core Insights - The pursuit of "superintelligence" represents a significant ambition among leading AI companies like Meta, OpenAI, and Google DeepMind, with substantial investments being made in this direction [1][3][4] - Sam Altman of OpenAI suggests that building superintelligence is primarily an engineering challenge, indicating a belief in a feasible path to achieve it [3][4] - Meta AI researcher Jack Morris argues that the current approach of using large language models (LLMs) and reinforcement learning (RL) may not be sufficient to construct superintelligence [1][2] Group 1: Current Approaches and Challenges - Morris outlines three potential methods for building superintelligence: purely supervised learning (SL), RL from human validators, and RL from automated validators [2] - The integration of non-text data into models is believed not to enhance overall performance, as human-written text carries intrinsic value that sensory inputs do not [2][6] - The concept of a "data wall" or "token crisis" is emerging, where the availability of text data for training LLMs is becoming a concern, leading to extensive efforts to scrape and transcribe data from various sources [8][19] Group 2: Learning Algorithms and Their Implications - The two primary learning methods identified for potential superintelligence are SL and RL, with SL being more stable and efficient for initial training [10][22] - The hypothesis that superintelligence could emerge from SL alone is challenged by the limitations of current models, which may not exhibit human-level general intelligence despite excelling in specific tasks [15][16] - The combination of SL and RL is proposed as a more viable path, leveraging human feedback or automated systems to refine model outputs [20][22][28] Group 3: Future Directions and Speculations - The potential for RL to effectively transfer learning across various tasks remains uncertain, raising questions about the scalability of this approach to achieve superintelligence [34] - The competitive landscape among AI companies is likely to intensify as they seek to develop the most effective training environments for LLMs, potentially leading to breakthroughs in superintelligence [34]
突破开放世界移动操作!首个室内移动抓取多模态智能体亮相,微调模型真实环境零样本动作准确率达 90%
机器之心· 2025-06-20 11:59
在家庭服务机器人领域,如何让机器人理解开放环境中的自然语言指令、动态规划行动路径并精准执行操作,一直是学界和工业界的核心挑战。 近日,上海人工智能实验室联合新加坡国立大学、香港大学等机构的研究团队,提出了 " OWMM-Agent " 具身智能体——首个专为开放世界移动操作 (OWMM)设计的多模态智能体 (VLM Agent) 架构,首次实现了全局场景理解、机器人状态跟踪和多模态动作生成的统一建模。 同时该工作通过仿真器合成智能体轨迹数据,微调了针对该任务的多模态大模型 OWMM-VLM,在真实环境测试下,该模型零样本单步动作预测准确率达 90%。 论文链接:https://arxiv.org/pdf/2506.04217 Github 主页:https://github.com/HHYHRHY/OWMM-Agent 一、问题背景介绍:开放语义下的移动抓取任务 传统移动抓取机器人在家庭场景处理 "清理餐桌并将水果放回碗中" 这类开放指令时,往往需要依赖预先构建的场景 3D 重建或者语义地图,不仅耗时且 难以应对动态环境。OWMM 任务的核心难点在于: 二、OWMM-Agent:用 VLM 重构机器人 "大脑 ...
5 REITs To Earn $50,000 With A $573,400 Investment
Seeking Alpha· 2025-06-19 12:15
Our approach has earned us 500+ five-star reviews from satisfied members who are already seeing the benefits. Don’t miss out—join now and start maximizing your returns!We invest thousands of hours and over $100,000 annually into researching the most profitable investment opportunities—all to bring you real estate strategies at just a fraction of the cost.There's nothing better than REITs ( VNQ ), in my opinion, if your goal is to earn high passive income and retire early.He is the leader of the investing gr ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-06-18 22:09
RT JC Christopher (@JohnChr08117285)Here is the 5th time Terique @Muzeishen and I saw a driverless Tesla robotaxi with a trail car! We also saw a Tesla validation robotaxi not far behind this driverless one. And this was away from Congress Avenue going into neighborhoods and office parks. So exciting!@WholeMarsBlog @SawyerMerritt @teslaownersSV ...
GRAIL Announces Positive Top-Line Results From The Galleri® PATHFINDER 2 Registrational Study
Prnewswire· 2025-06-18 13:01
Core Insights - GRAIL, Inc. announced positive top-line performance and safety results from the PATHFINDER 2 study, which involved 25,578 participants and aimed to evaluate the Galleri multi-cancer early detection test [1][5] Group 1: Study Results - The PATHFINDER 2 study demonstrated that adding Galleri to standard cancer screening significantly increased cancer detection rates compared to the previous PATHFINDER study [2] - In the PATHFINDER study, Galleri had a positive predictive value (PPV) of 43%, specificity of 99.5%, and cancer signal origin (CSO) accuracy of 88%. The PATHFINDER 2 study showed a substantially higher PPV while maintaining consistent CSO accuracy and specificity [3] Group 2: Safety and Regulatory Aspects - No serious safety concerns were reported in the PATHFINDER 2 study, indicating a favorable safety profile for the Galleri test [4] - GRAIL plans to submit the PATHFINDER 2 study results to the U.S. FDA as part of the Galleri premarket approval application, which is currently in process under a Breakthrough Device Designation [5][6] Group 3: Future Directions - Detailed results from the PATHFINDER 2 study are expected to be presented at a leading international oncology meeting later this year [6] - The study aims to evaluate the Galleri test's performance across various measures, including PPV, negative predictive value (NPV), sensitivity, specificity, and CSO prediction accuracy [7]