Workflow
强化学习
icon
Search documents
一见Auto采访小米陈光的一些信息分享......
自动驾驶之心· 2025-12-26 01:56
以下文章来源于一见Auto ,作者易思琳 一见Auto . 汽车竞争中的野心、方法论与新秩序。21世纪经济报道旗下汽车报道品牌。 作者 | 易思琳 来源 | 见谈|小米陈光:我们不想制造技术焦虑了 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 理想汽车智驾团队从端到端+世界模型全面切向VLA(Vision Language Action),在算法架构中引入大语言模型(LLM)。和理想一样坚定选择VLA的还 有智驾供应商元戎启行。 行业里也有坚定的VLA反对派。华为表示,不会走向VLA,而是会坚定选择WA(World Action,世界模型)。和华为一样尝试去掉Language环节的还有小 鹏。 而在这场争鸣中,端到端仍展现出巨大的潜力,小米汽车就是在这一方向持续深耕的企业。 "现在竞争太激烈,大家会产生一些焦虑,倾向于通过各种方式或技术让用户觉得更先进。"小米汽车端到端负责人陈光告诉《21汽车·一见Auto》, "但无 论VA、WA还是VLA,在我看来其实都一样,都 ...
机器学习应用系列:强化学习驱动下的解耦时序对比选股模型
Southwest Securities· 2025-12-25 11:40
Quantitative Models and Construction Model Name: DTLC_RL (Decoupled Temporal Contrastive Learning with Reinforcement Learning) - **Model Construction Idea**: The model aims to combine the nonlinear predictive power of deep learning with interpretability by decoupling feature spaces, enhancing representation through contrastive learning, ensuring independence via orthogonal constraints, and dynamically fusing spaces using reinforcement learning[2][11][12] - **Model Construction Process**: - **Feature Space Decoupling**: Three orthogonal latent spaces are constructed to capture market systemic risk (β space), stock-specific signals (α space), and fundamental information (θ space). Each space is equipped with a specialized encoder: TCN for β space, Transformer for α space, and gated residual MLP for θ space[11][12][92] - **Contrastive Learning**: Introduced within each space to enhance robustness by constructing positive and negative sample pairs based on return similarity. The InfoNCE loss function is used to maximize the similarity of positive pairs while minimizing that of negative pairs: $$L_{\mathrm{InfotNCE}}=-E\left[l o g~\frac{e x p\left(f(x)^{\top}f(x^{+})/\tau\right)}{e x p\left(f(x)^{\top}f(x^{+})/\tau\right)+\sum_{i=1}^{N-1}~e x p\left(f(x)^{\top}f(x_{i}^{-})/\tau\right)}\right]$$ where \(f(x)\) is the feature representation, \(x^+\) is the positive sample, \(x^-\) is the negative sample, and \(\tau\) is the temperature parameter[55][56] - **Orthogonal Constraints**: A loss function is added to ensure the outputs of the three spaces are statistically independent, reducing multicollinearity and enhancing interpretability[12][104] - **Reinforcement Learning Fusion**: A PPO-based reinforcement learning mechanism dynamically adjusts the weights of the three spaces based on market conditions. The reward function includes components for return correlation, weight stability, and weight diversification: $$r_{t}=R_{t}^{I C}\big(\widehat{y_{t}},y_{y}\big)+\lambda_{s}R_{t}^{s t a b l e}+\lambda_{d}R_{t}^{d i v}$$ The PPO optimization process includes GAE advantage estimation and a clipped policy loss: $$L^{C L P}=E\left[\operatorname*{min}(r\dot{A},c l i p(r,1-\varepsilon,1+\varepsilon)\dot{A})\right]$$[58][120][121] - **Model Evaluation**: The DTLC_RL model demonstrates strong predictive power and interpretability, with dynamic adaptability to market conditions[2][12][122] Model Name: DTLC_Linear - **Model Construction Idea**: A baseline model for comparison, using a linear layer to fuse the three feature spaces[98][100] - **Model Construction Process**: - The encoded information from the three spaces is concatenated and passed through a linear layer with a Softmax activation to generate fusion weights. The model is trained with a multi-task loss function, including IC maximization, contrastive learning loss, and orthogonal constraints[98][104] - **Model Evaluation**: Provides a benchmark for evaluating the contribution of reinforcement learning in DTLC_RL[98][103] Model Name: DTLC_Equal - **Model Construction Idea**: A simpler baseline model that equally weights the three feature spaces without dynamic adjustments[98] - **Model Construction Process**: The outputs of the three spaces are directly averaged to generate predictions[98] - **Model Evaluation**: Serves as a control group to assess the benefits of dynamic weighting in DTLC_RL[98][103] --- Model Backtesting Results DTLC_RL - **IC**: 0.1250[123] - **ICIR**: 4.38[123] - **Top 10% Portfolio Annualized Return**: 34.77%[123] - **Annualized Volatility**: 25.41%[123] - **IR**: 1.37[123] - **Maximum Drawdown**: 40.65%[123] - **Monthly Turnover**: 0.71X[123] DTLC_Linear - **IC**: 0.1239[105] - **ICIR**: 4.25[105] - **Top 10% Portfolio Annualized Return**: 32.95%[105] - **Annualized Volatility**: 24.39%[105] - **IR**: 1.35[105] - **Maximum Drawdown**: 35.94%[105] - **Monthly Turnover**: 0.76X[105] DTLC_Equal - **IC**: 0.1202[105] - **ICIR**: 4.06[105] - **Top 10% Portfolio Annualized Return**: 32.46%[105] - **Annualized Volatility**: 25.29%[105] - **IR**: 1.28[105] - **Maximum Drawdown**: 40.65%[105] - **Monthly Turnover**: 0.71X[105] --- Quantitative Factors and Construction Factor Name: Beta_TCN - **Factor Construction Idea**: Captures market systemic risk by quantifying stock sensitivity to common risk factors like macroeconomic fluctuations and market sentiment[67] - **Factor Construction Process**: - Five market-related features are selected, including beta to market returns, volatility sensitivity, liquidity beta, size exposure, and market sentiment sensitivity[72] - A TCN encoder processes 60-day time-series data, using dilated causal convolutions to capture short- and medium-term trends. The output is a 32-dimensional vector representing systemic risk features[68] - **Factor Evaluation**: Demonstrates moderate stock selection ability and effectively captures market-related information[73] Factor Name: Alpha_Transformer - **Factor Construction Idea**: Extracts stock-specific alpha signals from price-volume time-series data[76] - **Factor Construction Process**: - Thirteen price-volume features are encoded using a multi-scale Transformer model, with separate layers for short-, medium-, and long-term information. Outputs are fused using a gated mechanism and passed through a fully connected layer for return prediction[77][78] - **Factor Evaluation**: Exhibits strong predictive power and stock selection ability, with relatively low correlation to market benchmarks[81][82] Factor Name: Theta-ResMLP - **Factor Construction Idea**: Focuses on fundamental information to assess financial safety margins and risk resistance[88] - **Factor Construction Process**: - Eight core financial indicators, including PE, PB, ROE, and dividend yield, are encoded using a gated residual MLP. The architecture includes input projection, gated residual blocks, and a final output layer[92] - **Factor Evaluation**: Provides stable stock selection performance with lower turnover and drawdown compared to other spaces[95][96] --- Factor Backtesting Results Beta_TCN - **IC**: 0.0969[73] - **ICIR**: 3.73[73] - **Top 10% Portfolio Annualized Return**: 27.73%[73] - **Annualized Volatility**: 27.19%[73] - **IR**: 1.02[73] - **Maximum Drawdown**: 45.80%[73] - **Monthly Turnover**: 0.79X[73] Alpha_Transformer - **IC**: 0.1137[81] - **ICIR**: 4.19[81] - **Top 10% Portfolio Annualized Return**: 32.66%[81] - **Annualized Volatility**: 23.04%[81] - **IR**: 1.42[81] - **Maximum Drawdown**: 27.59%[81] - **Monthly Turnover**: 0.83X[81] Theta-ResMLP - **IC**: 0.0485[95] - **ICIR**: 1.87[95] - **Top 10% Portfolio Annualized Return**: 23.88%[95] - **Annualized Volatility**: 23.96%[95] - **IR**: 0.99[95] - **Maximum Drawdown**: 37.41%[95] - **Monthly Turnover**: 0.41X[95]
Physical Intelligence内部员工分享(从数采到VLA再到RL)
自动驾驶之心· 2025-12-25 09:33
以下文章来源于具身智能之心 ,作者具身智能之心 >> 点击进入→ 具身智能之心 技术交流群 更多VLA与RL实战项目,欢迎加入国内首个工业级VLA实战课程 : 具身VLA实战与求职教程来啦~ 。 原文链接:https://vedder.io/misc/state_of_robot_learning_dec_2025.html 这次来学习一下 PI 内部人员写的 blog,介绍了很多 robot learning 的现状,而且都是一线的真正经验,很多在一线的同学应该深有感 触,说了很多实话,质量很高,值得精读和学习。不管是对 IL DAgger RL 的看法都是很一手的经验。 接下来请享受这份知识 具身智能之心 . 与世界交互,更进一步。具身智能之心是国内具身与机器人领域的专业技术平台,集企业咨询、在线教育、展会服务、线下培 训、硬件研发、技术方案为一体。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 基本上,目前(2025 年 12 月)所有机器人学习系统都是纯粹的行为克隆(BC,也称模仿学习)系统。人类提供(接近)最优的任务演 示,机器学习模 ...
小米陈光:我们不想制造技术焦虑了
2025年,智能驾驶行业出现"名词过载"现象,从VLA、VA、到WA,分化出多个派别,争鸣不断。 理想汽车智驾团队从端到端+世界模型全面切向VLA(Vision Language Action),在算法架构中引入大 语言模型(LLM)。和理想一样坚定选择VLA的还有智驾供应商元戎启行。 行业里也有坚定的VLA反对派。华为表示,不会走向VLA,而是会坚定选择WA(World Action,世界 模型)。和华为一样尝试去掉Language环节的还有小鹏。 而在这场争鸣中,端到端仍展现出巨大的潜力,小米汽车就是在这一方向持续深耕的企业。 "现在竞争太激烈,大家会产生一些焦虑,倾向于通过各种方式或技术让用户觉得更先进。"小米汽车端 到端负责人陈光告诉《21汽车·一见Auto》,"但无论VA、WA还是VLA,在我看来其实都一样,都是看 如何让模型的智能密度最大。" 现有头部新势力中,小米汽车启动端到端研发较晚。2024年,小米在内部正式整合成立"端到端算法与 功能部",负责量产方案开发。而理想、蔚来都比小米早了至少3个月。 但小米追赶很快。今年2月,小米正式向用户全量推送了300万Clips的端到端(HAD),7月再次 ...
理想MindGPT-4o-Vision技术报告压缩版
自动驾驶之心· 2025-12-25 03:24
作者 | 理想TOP2 来源 | 理想TOP2 原文链接: 理想MindGPT-4o-Vision技术报告压缩版 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 2025年12月2日理想发布MindGPT-4ov技术报告 链接: https://arxiv.org/abs/2512.02895 通用能力与垂直领域适配的权衡冲突。 将通用多模态大模型(MLLM)迁移至垂直应用面临两个主要矛盾: 灾难性遗忘 (Catastrophic Forgetting):注入领域特定知识往往导致模型原有的通用理解能力(General Capabilities)退化。 缺乏系统的后训练方法论:现有方法要么忽视数据质量与成本控制,要么在优化领域能力时牺牲了基础能力和用户体验,缺乏涵盖数据生产、训练到部 署的全链路工程方案。 当前多模态模型训练中存在的三个关键低效与偏差现象: 资源分配粗放:传统数据合成方法通常对所有数据进行均等处理,忽视了数据本身信息密度的差异,导致高价值数据挖掘不足,低 ...
Dwarkesh最新播客:AI 进展年终总结
3 6 Ke· 2025-12-24 23:15
Core Insights - Dwarkesh's podcast features prominent AI figures Ilya Sutskever and Andrej Karpathy, indicating his significant standing in the AI community [1] - The article summarizes Dwarkesh's views on AI advancements, particularly regarding the timeline for achieving AGI [1] Group 1: AI Development and AGI Timeline - The focus on "mid-training" using reinforcement learning is seen as evidence that AGI is still far off, as it suggests models lack strong generalization capabilities [3][16] - The idea of pre-trained skills is questioned, as human labor's value lies in the ability to flexibly acquire new skills without heavy training costs [4][24] - AI's economic diffusion lag is viewed as an excuse for insufficient capabilities, rather than a natural delay in technology adoption [27][28] Group 2: AI Capabilities and Limitations - AI models currently lack the ability to fully automate even simple tasks, indicating a significant gap in their capabilities compared to human workers [25][30] - The adjustment of standards for AI capabilities is acknowledged as reasonable, reflecting a deeper understanding of intelligence and labor complexity [31] - The scaling laws observed in pre-training do not necessarily apply to reinforcement learning, with some studies suggesting a need for a million-fold increase in computational power to achieve similar advancements [10][33] Group 3: Future of AI and Continuous Learning - Continuous learning is anticipated to be a major driver of model capability enhancement post-AGI, with expectations for preliminary features to emerge within a year [13][40] - Achieving human-level continuous learning may take an additional 5 to 10 years, indicating that breakthroughs will not lead to immediate dominance in the field [14][41] - The potential for an explosion in intelligence once models reach human-level capabilities is highlighted, emphasizing the importance of ongoing learning and adaptation [36] Group 4: Economic Implications and Workforce Integration - The integration of AI labor into enterprises is expected to be easier than hiring human workers, as AI can be replicated without the complexities of human recruitment [29] - The current revenue gap between AI models and human knowledge workers underscores the distance AI still has to cover in terms of capability [30] - The article suggests that if AI models truly reached AGI levels, their economic impact would be profound, with businesses willing to invest significantly in AI labor [29]
GPT-5被吐槽没进步?Epoch年终报告打脸:AI在飞速狂飙,ASI更近了
3 6 Ke· 2025-12-24 11:17
Core Insights - The core message of the article is that AI development has accelerated rather than stagnated, with significant advancements in capabilities observed in recent months [7][10]. Group 1: AI Model Performance - Epoch AI tested several open-source Chinese models on FrontierMath, revealing that they lagged behind top global AI models by approximately seven months [1]. - The only model to score was DeepSeek-V3.2, achieving a score of about 2% [4]. - While top models like GPT and Gemini performed well on traditional math tests, their accuracy on FrontierMath was still low, indicating that all AI models struggle with complex mathematical problems [5][6]. Group 2: AI Capability Growth - The Epoch Capabilities Index (ECI) indicates that AI capability growth has accelerated since April 2024, nearly doubling the previous growth rate [10]. - Contrary to perceptions that AI progress has slowed since the release of GPT-4, data shows that advancements continue, particularly in reasoning abilities rather than just increasing model size [12]. Group 3: Cost and Accessibility of AI - The cost of AI reasoning has dramatically decreased, with token prices dropping over tenfold from April 2023 to March 2025, making AI more accessible to a broader audience [19]. - High-performance AI models can now run on consumer-grade hardware, suggesting that advanced AI capabilities will soon be widely available [22]. Group 4: Research and Development Trends - A significant portion of OpenAI's computational resources in 2024 is allocated to experiments rather than direct training or inference, highlighting the experimental nature of current AI development [25][28]. - NVIDIA's AI computing power has been doubling approximately every ten months since 2020, indicating rapid growth in the hardware necessary for AI advancements [29]. Group 5: Insights on AI's Future Impact - Epoch AI suggests that the majority of AI's value may come from automating routine tasks across the economy rather than solely from accelerating research and development [49]. - The potential for AI to transform industries may occur gradually over years or decades, rather than through sudden breakthroughs [52].
聚首香江!机器人产业大佬,重磅发声!
Zhong Guo Ji Jin Bao· 2025-12-24 10:41
【导读】聚首香江!机器人与AI产业大咖共探未来发展新方向 2025年12月20日下午,在首届香港国际AI艺术节期间,"破界融新,投启未来"2025机器人产业和AI投资 论坛成功举办。 论坛上,优必选首席品牌官谭旻、AMD大中华区销售副总裁晁亚新、迅兔科技创始人兼CEO李罗丹、 上海灵境智源CEO孙博、天数智芯边端产品事业部副总经理宋远盈,以及开源证券机械首席分析师孟鹏 飞围绕"生态共建在传统与创新中融合"主题,共同探讨AI的颠覆性影响与机器人技术发展前景。 01 人形是通用人工智能的必要载体吗? 孟鹏飞:中国已成为全球工业机器人和服务机器人第一大国。2022年,马斯克Optimus第一代的推出, 让我们看到人形机器人这一划时代产品的潜力。人形机器人不仅是AI闭环的关键,更能推动人类文明 和技术向前发展。2026年是人形机器人即将量产的重要节点。 请问各位,当前人形机器人处于技术和产业发展的哪个阶段?机器人是否必须通过人形来实现通用人工 智能? 谭旻:人形并非唯一形态,但却是具身智能的最佳形态。通用人工智能需要通用的数据基座,非人形机 器人难以实现全面通用。而人形机器人能收集真实世界数据,搭建世界模型,为Phy ...
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
MindDrive WAM-Diff 论文标题 :MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning 论文链接 :https://arxiv.org/abs/2512.13636 项目主页 :https://xiaomi-mlab.github.io/MindDrive/ 提出机构 :华中科技大学、小米汽车 一句话总结 :为解决VLA模型在线强化学习中连续动作空间探索低效的问题,提出MindDrive框架,通过双专家(决策专家+动作专家)架构将动作空间转化为离 散语言决策空间,实现高效在线RL训练。 核心贡献 : 设计双LoRA适配器架构,决策专家负责场景推理与语言决策,动作专家将决策映射为可行轨迹,建立语言-动作动态映射。 构建基于CARLA模拟器的在线闭环RL框架,采用稀疏奖励与PPO算法,结合KL正则化避免灾难性遗忘。 在Bench2Drive基准上以轻量Qwen-0.5B模型实现78.04的驾驶分数与55.09%的成功率,超越同规模SOTA模型。 点击下方 ...
聊聊导航信息SD如何在自动驾驶中落地?
自动驾驶之心· 2025-12-23 00:53
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 最近和业内专家讨论了导航信息SD如何应用到自动驾驶中,分享给大家: 图商提供的导航信息SD/SD Pro目前已经在很多量产方案上使用了。导航可以提供车道、粗粒度的waypoint等信息,相当于给司机提供了一个粗略的全局和局部视 野,将导航信息应用到车端模型上也就顺水渠成。目前来看,导航模块的核心职责有两个: 当然还有非常重要的一part,提供参考线reference line,这是下游规控强需的信息,有了参考线,可以极大的减轻规划的压力,相当于车辆已经有一条行驶的参考路 线,只需在细化即可。 除此之外,还可以提供规划约束与优先级、路径监控和重规划。 1. 车道级的全局路径规划:搜索一条目标车道的最优lane sequence; 2. 给行为规划提供明确的语义指导,方便车辆提前准备变道、减速、让行; 具体涉及到自车定位、道路结构构建和感知定位匹配可以参考下图: 在两段式中,导航输入到感知模型中,输出navi path,navi path作为ml planner的输入进而预测自车的行驶轨迹。 在一段式框架中,SD ...