强化学习
Search documents
雷军:无论辅助驾驶多么先进,人驾还是非常关键
Sou Hu Cai Jing· 2026-01-03 14:52
雷军在直播中强调:一是欢迎大家试驾新版HAD辅助驾驶,感受其显著进步;二是无论辅助驾驶多么 先进、多么厉害,大家一定要注意安全,人驾还是非常关键的"。 15 1 67 4 27:43 @ =7 10 C R E Page 13 10 T t for any and un 3 Children T the state 1 t II E 1 1 据介绍,小米HAD增强版引入了强化学习和世界模型技术,有三大反馈。第一,在纵向体验上,车辆 加速、制动体验处理更柔和、拟人化,安全感提升;第二,横向体验方面,加速并线、减速并线及小路 绕行判断更果断,提前规划行驶路径;第三,主动安全能力升级,除原有AEB自动紧急制动功能外,新 增AES紧急转向辅助功能。 瑞财经 刘治颖1月3日晚间,小米创办人、董事长兼CEO雷军启动2026年第一场直播,现场直播拆一台 新的小米YU7,直播持续大约四五个小时。瑞财经全程直击。专题》》 ...
有300亿美元也未必“再造GPT-4”?NUS尤洋最新长文:拆穿AI增长瓶颈的真相
量子位· 2025-12-31 03:37
Core Viewpoint - The article discusses the growing anxiety surrounding the "AI bottleneck" as the third anniversary of ChatGPT approaches, questioning whether current technological paradigms can effectively utilize increased computational power to develop models significantly stronger than GPT-4 [1][2]. Group 1: Nature of Intelligence and Its Measurement - Intelligence is fundamentally about energy conversion, where AI has transformed electricity into reusable intelligence over the past decade, but the efficiency of this conversion is now under scrutiny [6]. - The essence of intelligence is not explanation but prediction, characterized by the ability to forecast future states and bear the consequences of those predictions [7][10]. - The current models derive their intelligence primarily from the pre-training phase, which consumes the most energy and computation, raising questions about the stability of intelligence growth with continued computational investment [15][20]. Group 2: Computational Paradigms and Their Limitations - The article emphasizes that the real bottleneck is not the cessation of computational growth but rather the diminishing returns in the relationship between computational power and intelligence growth [22][27]. - It challenges the mainstream narrative by suggesting that pre-training, fine-tuning, and reinforcement learning are fundamentally about gradient computation and parameter updates, rather than distinct methodologies [12][11]. - The success of the Transformer architecture is attributed to its compatibility with GPU systems, which has enabled a stable feedback loop between computational growth, model scaling, and capability enhancement [16][18]. Group 3: Future Directions and Exploration - Future AI infrastructure should focus on the overall scalability of parallel computing systems rather than just single-chip performance, with an emphasis on maintaining or improving the ratio of computational to communication costs [24][25]. - Multiple exploration directions are proposed, including higher precision, advanced optimizers, and more scalable architectures or loss functions, all aimed at ensuring that increased computational investments yield proportional intelligence enhancements [25][26]. - The article concludes that as long as more efficient computational organization methods can be found, the upper limits of intelligence are far from being reached [27].
L4数据闭环最重要的第一步:选对整个组织的LossFunction
自动驾驶之心· 2025-12-31 00:31
作者 | 李众力 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1973693169792213913 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 前言:看到一个问题有感而发写了一个关于数据闭环的整体的文章,发现引起了很多同学的共鸣,那就再写一写里面我认为很关键的点(踩坑记录),希望也给还在 做自动驾驶的各位同学一些不一样的思路。 原问题:目前各家做的自动驾驶数据闭环平台真的闭环了吗? 【数据闭环驱动问题解决·01】 把自动驾驶团队当成一个强化学习模型: 为什么我放弃 MPI,改用 MPS / MPD 做"损失函数"? 2025 年都快结束了,我也来分享一下我们在做数据闭环过程中,踩过的一些坑。 先简单自报下家门: 我在自动驾驶行业做数据相关已经 7 年多了,从最早拿着硬盘从工控机里拷数据,一路做到现在负责一条 L4 物流无人车线上的 数据闭环 & 质量体系。 这几年做下来,我越来越确信一件事: 如果把整个自动驾驶组 ...
从大厂设计师到超级一人公司:6000字回顾我和AI的2025
歸藏的AI工具箱· 2025-12-30 10:34
其实本来是没有写年终总结的习惯的。想着好不容易到了年底,顺便又是圣诞和元旦,没有什么要发布的东西,可以休息一下。 先来看一下今年自媒体的成绩单。 我一直不觉得自己是一个自媒体,也不把自己当做一个自媒体,甚至不是一个很称职的自媒体。我很少关注大号,很少看别人的数据,也很少 看自己的数据。 这件事有好有坏: 1. 好的方面:我不容易内耗,不容易产生数据焦虑 2. 不好的方面:有时候因为不看数据,反而会被数据"搞"。比如算法的调整、整个生态的变化,这些都需要去适应,而我适应得往往比别人慢一些 但是今天早上看到 Manus 被收购的消息,然后又看到了智谱的 IPO,感觉今年确实发生了很多事情。顺便昨天看到乔木写的总结,我觉得也 挺有意义的,还是写一下吧。 今年确实做了很多事情,生活和 AI 领域都发生了很大变化,所以还是决定写一下。 过去一年,我身份上的变化主要是从大厂的设计师变成了一个自由职业者(灵活就业者)。 也终于不再用数据审判自己,而是把注意力放回"我擅长什么、我愿意长期做什么":在保持松弛的同时提升节奏,用 AI 加持把歸藏做成一个 能持续输出、能带来收入、也能帮助行业朋友的超级一人公司。 AI 自媒体 年 ...
千人千面的真人级AI名师,劈开教育「不可能三角」
量子位· 2025-12-30 03:57
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 注意看,这是一个教育领域的AI应用新物种—— 咱就是说,这讲课节奏,这语气,这互动,也太自然了。 更重要的是,它不仅能「像老师一样讲课」,还能针对每一位学员进行一对一的个性化教学。 这位AI导师,出自一家名为「与爱为舞」的AI原生应用企业。自年初上线以来,已累计为百万级用户提供学习陪伴与一对一讲解服务。 教育行业, 向来是 个 「规模、质量、成本」的不可能三角 。 既能做到千人千面,又能服务百万名学员,还几乎看不出是AI……更是难上加难。 它究竟是怎么做到的? 与爱为舞用来劈开这个不可能三角的,是一把 硬核的技术巨剑 。 AI教育,要的不止「答案」 而铸造这把技术巨剑,有三块核心组成部件:「模型+语音+工程」。 先看 模型 —— 得益于CoT的Scaling,大模型解决复杂问题的能力指数级增长,「做题」水平突飞猛进,甚至能斩获「奥赛金牌」。 摘得奥赛桂冠,AI只需要给出标准答案。但搞教育不行。 先来看一个简单的英语语法题: Lily expects _ her grandparents in the countryside next month. A. ...
硬科技冲高,机器人行情火热,昊志机电涨超6%,机器人ETF基金(159213)冲击五连阳,连续3日强势吸金超6300万元!人形机器人"黄金十年"启幕?
Sou Hu Cai Jing· 2025-12-30 03:42
12月30日,沪指低开上冲,几度翻红,维持水面附近窄幅震荡。硬科技震荡上行,截至11:08,机器人ETF基金(159213)涨0.67%,冲击五连阳,盘中资金 大举净申购2000万元,加上今日已经连续3个交易日强势吸金超6300万元。 | 【机器人ETF基金(159213)标的指数前十大成分股】 | | --- | | 序号 | 代码 | 名称 | 申万―级行业 | 涨跌幅 | 估算权重▼ | | --- | --- | --- | --- | --- | --- | | 1 | 002230 | 科大讯飞 | 计算机 | -0.18% | 9.96% | | 2 | 300124 | 汇川技术 | 机械设备 | 0.19% | 9.94% | | 3 | 601689 | 拓管集团 | 汽车 | 0.81% | 7.71% | | 4 | 002236 | 大华股份 | 计算机 | -0.16% | 4.59% | | 5 | 002008 | 大族激光 | 机械设备 | -0.81% | 4.27% | | 6 | 688169 | 石头科技 | 家用电器 | -0.74% | 3.86% | | 7 | ...
渤海证券研究所晨会纪要(2025.12.30)-20251230
BOHAI SECURITIES· 2025-12-30 02:58
Macro and Strategy Research - The profit growth rate of industrial enterprises in China has marginally declined by 1.8 percentage points to 0.1% year-on-year for the period from January to November 2025, with November showing a significant drop of 13.1% compared to October, which is a decrease of 7.6 percentage points [4] - The industrial added value growth rate for November was 4.8%, a slight decrease of 0.1 percentage points from October, influenced by insufficient domestic demand and a high base effect from the previous year [4] - The revenue profit margin for January to November was 5.29%, down by 2.0% year-on-year, indicating a further expansion of the decline compared to the previous months [4] - Among 41 industrial sectors, 18 sectors achieved positive profit growth during the same period, with notable growth in sectors such as black metal smelting and processing, non-ferrous metal mining, and high-tech manufacturing [5] Fund Research - The market saw a continued inflow of nearly 50 billion yuan into the CSI A500 index, with the ETF market scale reaching a new high of over 6 trillion yuan [7][11] - The average return for equity funds was 2.69%, with 87.08% of funds reporting positive returns, while bond funds and other categories also showed positive performance [10] - The ETF market experienced a net inflow of 914.98 billion yuan, with bond ETFs leading the inflow at 599.48 billion yuan [10] Company Research: WuXi AppTec - WuXi AppTec is positioned as a leading integrated CRDMO provider, offering end-to-end drug development and manufacturing services, with a focus on continuous development through both organic and inorganic growth strategies [15] - The CRO industry is thriving due to the high costs and long timelines associated with drug development, leading to increased demand for specialized services [15] - WuXi Chemistry reported a strong performance in its integrated services, with a significant number of new molecules added to its pipeline, indicating robust growth potential [15] - The company has streamlined its operations by divesting its clinical services research business, allowing it to focus on core competencies and enhance its service offerings [16] Industry Research: Light Industry Manufacturing & Textile Apparel - The Chinese government plans to continue funding support for the "old-for-new" consumption policy in 2026, which has already driven over 2.5 trillion yuan in sales for related products in 2025 [19] - Retail sales of clothing and footwear saw a year-on-year increase of 3.5% in November, reflecting a positive trend in consumer spending [19] - The light industry manufacturing sector underperformed compared to the CSI 300 index, indicating challenges in the current market environment [19]
万字长文,VLA的架构和模型还有什么痛点?
具身智能之心· 2025-12-30 01:11
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 ★ 上次VLA模型+真机部署的圆桌受到了行业的一致好评。最近平台的同学也一直在整理对话的文稿,今天就为大家分享下第一部分" VLA的架构和模型 "相关内 容。 张强老师: 好,感谢主持人介绍,大家好,我是张强。我来自北京人形机器人中心,主要研究方向和研究背景都是在做人形机器人,大概从2021年开始做人形机器人。先后在 Fourier、GR-1 和 Embodied机器人,包括我们现在的天工平台上做了一些研究。我主要做的研究方向是运动控制,VLA 和一些基于人形机器人的世界模型和具身智 能大模型,希望大家关注我们的工作,然后今天也很高兴跟各位嘉宾。很高兴接受具身智能之心的邀请,很高兴跟各位嘉宾在一起讨论一下相关的问题,谢谢! 完整内容欢迎加入我们的具身社区获取: 具身智能之心知识星球 主持人: 好,那我们就正式开始,那么欢迎大家来到具身智能之心的圆 ...
QwenLong-L1.5发布:一套配方,三大法宝,让30B MoE模型长文本推理能力媲美GPT-5
机器之心· 2025-12-29 04:44
Core Insights - The article discusses the challenges faced by large models in long-text reasoning, highlighting issues such as false prosperity in performance metrics and difficulties in multi-hop reasoning tasks [2][3] - It introduces QwenLong-L1.5, a new model designed to address these challenges through a comprehensive post-training framework that includes data synthesis, reinforcement learning optimization, and memory management [4][32] Group 1: Challenges in Long-Text Reasoning - Models often achieve high scores in simple tasks but struggle with complex multi-hop reasoning, revealing limitations in deep understanding [2] - The training data for long-text tasks is complex and heterogeneous, leading to instability in reinforcement learning algorithms and potential performance degradation [14][16] - The physical memory limitations of models restrict their ability to process extensive knowledge, necessitating compromises that can result in loss of critical information [3] Group 2: QwenLong-L1.5 Model Features - QwenLong-L1.5 is built on the Qwen3-30B-A3B architecture and aims to provide a systematic solution to long-text reasoning challenges [4] - The model incorporates a high-quality data synthesis pipeline that generates multi-hop reasoning tasks, enhancing the model's ability to think critically [9] - It employs a stable and efficient reinforcement learning strategy to address challenges such as distributional drift and credit assignment problems [12][17] Group 3: Performance Improvements - QwenLong-L1.5 has shown significant performance improvements, achieving an average score increase of 9.9 points compared to its predecessor [26] - The model's enhancements are particularly evident in complex reasoning tasks, with notable performance gains in benchmarks like MRCR and CorpusQA [26][27] - It demonstrates superior capabilities in handling ultra-long tasks, showcasing its potential to process information beyond traditional memory limits [28][29] Group 4: Conclusion and Open Source - The article concludes that the combination of data synthesis, reinforcement learning optimization, and memory management in QwenLong-L1.5 provides a validated path for addressing long-text reasoning challenges [32] - The company encourages open collaboration and sharing of the technology, with relevant details available in the published paper and on GitHub [32]
个人电脑也能进行智能体RL训练?尤佳轩团队开源OpenTinker
机器之心· 2025-12-29 03:04
摘要 随着大模型走向 "智能体元年",强化学习(RL)逐渐被公认为通往通用人工智能的关键技术,但它长期停留在少数实验室的象牙塔里。传统 RL 框架的单体式设 计、昂贵的显存开销以及复杂的工程流程,让许多有想法的团队望而却步。 近期,由 UIUC Jiaxuan You 教授领衔的 U Lab 团队开源了 OpenTinker—— 一个全新的 "强化学习即服务"(RL-as-a-Service, RLaaS)系统。它通过精细的解耦架构 和友好的 API,让算力不再限制算法的开发,无论是在拥有 GPU 集群的研究机构还是在仅有 CPU 的个人电脑上,都能让更多开发者以极少的代码启动智能体训 练。 序言:后训练时代的挑战与突破 进入 2025 年,竞争的核心从模型规模的比拼转向能够进行长程决策的智能体。强化学习正是驱动这一范式转变的发动机。然而,对于大多数学者、创业公司甚至 一些大型科技企业来说,部署一套可靠的智能体训练管线仍然是一场艰难的工程战役。现有 RL 基础设施的瓶颈不只是算法问题,更是工程上的 "阿喀琉斯之踵": 很多人理解理论,却难以真正跑通一套面向落地应用的强化学习系统。 该研究团队来自伊利诺伊大学厄 ...