RL

Search documents
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-06-23 02:49
RT Tesla Owners Silicon Valley (@teslaownersSV)Stepping into a driverless Tesla Robotaxi https://t.co/k5ABLAA7QT ...
摩根士丹利:迈瑞医疗_不走寻常路
摩根· 2025-06-23 02:10
June 18, 2025 05:02 PM GMT Medical Technology | North America Off the Beaten Path In this note we flag each week a handful of news drops that we think may have been missed / piqued our interest. [EW, MDT] Real-World Data in EMEA Favoring SAVR vs. TAVR In Younger Patients: A new study in Italy suggests that SAVR may be associated with more favorable mortality outcomes in intermediate surgical risk patients than TAVR, with curves decently below from about 1 year onwards (see below), which in theory is negativ ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-06-23 00:10
Stepping into a driverless Tesla Robotaxi https://t.co/k5ABLAA7QT ...
Free Diving | 60 Minutes Archive
60 Minutes· 2025-06-22 22:55
60 Minutes rewind. Ever tried to hold your breath for as long as you can. Most of us have.Well, this is a story about people who hold their breath for a really long time and dive down to depths never even approached before. The sport is called free diving, and it involves going down hundreds of feet into cold and dark waters on one single breath. It's considered an extreme sport because it's very dangerous.It's an experimental sport because it's revealing human capabilities which had never even been imagine ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-06-22 17:27
RT Tesla Owners Silicon Valley (@teslaownersSV)Seeing driverless Cars will become a normal thing.🤖🚖 are here @anuarbekiman https://t.co/xy5I7gBRmi ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-06-22 15:19
Seeing driverless Cars will become a normal thing.🤖🚖 are here @anuarbekiman https://t.co/xy5I7gBRmi ...
大模型强化学习,相比PPO,DPO 还是个弟弟?
自动驾驶之心· 2025-06-22 14:09
作者 | hzwer 黄哲威 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/696732944 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>点击进入→ 自动驾驶之心 『LLM』技术交流群 本文只做学术分享,如有侵权,联系删文 论文地址:https://arxiv.org/pdf/2404.10719v2 这是一篇四月份的新论文,一作单位是清华 这篇主要有三个部分,1. 从理论和实验上看,DPO 可能有本质缺陷 2. 研究了 PPO 提升的几个重要因素 3. 实验证实 PPO 可以在硬核任务上(编程比赛)碾压 DPO 达到新的 SoTA 论文先指出了一个令业界困惑的现状,即大部分的开源的榜单上,DPO 占据了领先的位置,但是众所周 知,最好的闭源模型 GPT4 和 Claude,用的都是 PPO 方案。所以这里就自然引出两个问题,即 1. DPO 相 对 PPO 真的有优势吗?2. 如何让 PPO 也很能刷榜呢? DPO 的缺陷 在调教 PPO 的时候,一种常见的现象是语言模型发现了奖励模型的缺陷,而构 ...
从RLHF、PPO到GRPO再训练推理模型,这是你需要的强化学习入门指南
机器之心· 2025-06-22 04:26
选自 unsloth.ai 作者:Unsloth Team 强化学习(RL)已经成为当今 LLM 不可或缺的技术之一。从大模型对齐到推理模型训练再到如今的智能体强化学习(Agentic RL),你几乎能在当今 AI 领域的 每个领域看到强化学习的身影。 近日,Daniel Han 和 Michael Han 两兄弟组成的团队 Unsloth(用于微调模型的同名开源项目 GitHub 星数已超过 4 万)发布了一个强化学习教程,其中从吃豆人谈 起,简单易懂地从 RLHF、PPO 介绍到 GRPO,还分享了如何用 GRPO 训练推理模型的技巧。 全面了解强化学习以及如何使用 GRPO 训练你自己的推理模型。这是一份从初学者到高级的完整指南。 你将学到什么 本文涵盖了你需要了解的关于 GRPO、强化学习 (RL) 和奖励函数的所有内容 —— 从初学者到高级,还有基于 Unsloth 使用 GRPO 的基础知识。 如果你正需要学习如何一步步实现 GRPO,这份指南值得一读。 ❓什么是强化学习 (RL)? 强化学习的目标是: 就这么简单!「好」和「坏」的含义错综复杂,「增加」和「降低」也许斟酌,甚至「结果」的含义也各不 ...
管线覆盖ADC和RLT,这家创新药企总融资近6亿美元
3 6 Ke· 2025-06-22 01:59
Core Insights - Immunome completed a $150 million financing round in January 2025, aimed at advancing its core pipeline into clinical translation [1] - The company announced the first patient dosing in the Phase I clinical trial of its ROR1-targeting ADC (antibody-drug conjugate) IM-1021 in March 2025 [1] - Immunome's core product pipeline, Varegacestat (AL102), has completed patient enrollment in the Phase III RINGSIDE trial, with topline data expected in the second half of 2025 [1] Company Overview - Immunome has raised a total of $598.9 million over 21 funding rounds since its establishment 17 years ago, indicating strong investor confidence [1][21] - The company specializes in targeted oncology and leverages rapid antibody screening and precise delivery technologies to address the limitations of traditional cancer treatments [1][2] Technology and Innovation - Immunome's competitive edge lies in its Memory B cell technology and Targeted Effector platform, which enhance antibody affinity and specificity while minimizing off-target damage [3][4] - The Memory B cell platform utilizes patient-derived memory B cells to discover antibodies that are naturally equipped to target tumor-specific antigens, particularly useful for resistant tumors [5] - The Targeted Effector platform allows for modular design to optimize drug delivery and efficacy, significantly improving the therapeutic index compared to traditional methods [6][7] Product Pipeline - Immunome's pipeline includes three clinical-stage products: Varegacestat (AL102), IM-1021 (ROR1 ADC), and IM-3050 (RLT) [8] - Varegacestat is an oral γ-secretase inhibitor targeting rare sarcomas, showing a 64% objective response rate in Phase II trials [9] - IM-1021 is designed to overcome resistance in solid tumors, utilizing a high drug-to-antibody ratio (DAR8) to enhance efficacy [11] - IM-3050 targets FAP-positive cancer-associated fibroblasts, demonstrating significant potential in overcoming the tumor microenvironment barriers [16][17] Financial Performance - In 2024, Immunome reported revenues of $9.04 million, reflecting a 35.5% year-over-year decline, although losses have narrowed compared to 2023 [23] - The financing structure is heavily weighted towards post-IPO funding, which poses dilution risks for existing shareholders [22]
100+自动驾驶数据集,这5个你总得知道吧?
自动驾驶之心· 2025-06-22 01:35
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 自动驾驶技术日渐火热,各类开发者数据集层出不穷。"自动驾驶之心"已整理收录了100多个优质自动 驾驶数据集,为初学者和工程师提供了丰富素材。本文仅选取其中5个数据集进行介绍,覆盖了从感知 (目标检测、分割)到视觉里程计等多种任务场景。无论你是入门新手还是科研工程师,这5个数据集 都值得关注,更多资源欢迎加入社群获取完整资料! 不过,本文介绍的只是"自动驾驶之心"社群中海量资源的一小部分。想要获取全部100+数据集的详细信 息,以及与志同道合的业内同仁实时交流,请加入"自动驾驶之心"知识星球并加入社群! 1. KITTI 数据集 KITTI 数据集是自动驾驶领域最经典、使用最广泛的基准数据集之一。它通过在卡尔斯鲁厄街道环境中 搭载高精度传感器(如双目彩色/灰度相机、Velodyne 3D 激光雷达、GPS/IMU 等)采集数据。数据集中 包含了立体视觉、光流、视觉里程计、3D 目标检测和跟踪等多种感知任务的标注(如图像序列和 3D 物 体轨迹)。丰富的城市、高速和乡村场景让 KITTI 成为评测车载视觉算法性能 ...