强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

最爱喝奶茶的AI科学家，要做最能懂你的“智能体”

3 6 Ke· 2025-11-24 08:02

Core Insights - The article emphasizes the importance of maintaining an entrepreneurial mindset in AI research and development, focusing on rapid iteration and learning from failures [1][2][4] Group 1: Innovation and AI Development - Wu Yi's team developed the AReaL-lite framework, which significantly enhances AI training efficiency and reduces GPU waste [1] - The shift from traditional supervised learning to reinforcement learning is highlighted as crucial for developing intelligent AI capable of long-term task execution [6][33] - Wu Yi believes that the future of AI lies in creating intelligent agents that can understand vague human commands and perform complex tasks autonomously [12][13] Group 2: Entrepreneurial Spirit and Team Dynamics - Wu Yi stresses the need for innovation and resource creation within entrepreneurial teams, rejecting the notion of waiting for perfect conditions to act [25][26] - The article discusses the challenges faced by Wu Yi's early startup team, emphasizing the importance of having a committed and innovative mindset among team members [25][28] - Wu Yi's approach to team organization in the AI era involves creating a minimalistic structure that leverages AI to enhance productivity and efficiency [50][52] Group 3: Future of AI and Robotics - The concept of embodied intelligence is introduced, where intelligent agents can interact with the physical world and perform tasks based on minimal instructions [13][14] - Wu Yi envisions a future where multiple intelligent agents can collaborate to complete complex tasks, similar to a coordinated sports team [15][20] - The transition from digital to physical world applications of AI requires advancements in multi-modal data and training environments [21][22] Group 4: Learning and Adaptation - Wu Yi likens his career journey to a reinforcement learning process, emphasizing the value of learning through trial and error [29][30] - The article highlights the significance of prompt engineering in reinforcement learning, which is essential for effective AI training [35][36] - Wu Yi advocates for a layered approach in developing intelligent agents, combining low-level control with high-level reasoning capabilities [43][44]

Artificial Intelligence

Artificial Intelligence

抢先报名！MEET2026最新嘉宾阵容官宣，一起热聊AI

量子位· 2025-11-24 03:39

Core Viewpoint - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries and society as a whole, highlighting the upcoming MEET2026 conference as a platform to explore these advancements and trends in AI technology [1][3]. Group 1: Conference Overview - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry developments, particularly in AI [2]. - The theme of the conference is "Symbiosis Without Boundaries, Intelligence to Ignite the Future," aiming to explore how AI transcends industry, discipline, and scenario boundaries [3]. - Key topics of discussion will include reinforcement learning, multimodal AI, chip computing power, AI applications in various industries, and AI's global expansion [4]. Group 2: Notable Speakers - The conference will feature prominent figures such as Zhang Yaqin, a leading scientist in digital video and AI, and former president of Baidu [12][13]. - Sun Maosong, Executive Vice President of the Tsinghua University AI Research Institute, will also be a key speaker, known for his leadership in national research projects [17]. - Other notable speakers include Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, and He Xiaodong, Senior Vice President of JD Group, who has extensive experience in multimodal intelligence [21][30]. Group 3: AI Trends and Reports - The conference will unveil the "Artificial Intelligence Annual List" and the "Annual AI Trend Report," which are anticipated to provide insights into the most influential companies, products, and individuals in the AI sector [6][102]. - The 2025 AI Annual List will evaluate candidates across three dimensions: companies, products, and individuals, with results announced at the conference [103]. - The 2025 Annual AI Top Ten Trends Report will analyze significant AI trends based on technological maturity, current applications, and potential value, highlighting representative organizations and best cases [104]. Group 4: Event Details - The MEET2026 conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [105]. - The event is recognized as a significant technology business summit, attracting thousands of industry professionals and millions of online viewers each year [107].

端到端量产这件「小事」，做过的人才知道有多痛

自动驾驶之心· 2025-11-24 00:03

点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线端到端作为这两年的量产关键词，是各家车企核心的招聘岗位。但市面上真正的量产人才少之又少，从模型优化、场景优化、数据优化，再到下游的规划兜底，端到端其实是一个全栈的岗位，所以就出现一个神奇的现象：一方面求职者哀鸿遍野，另一方面企业招不到人。。。从技术的成熟度和工业界的需求来看，端到端需要攻克的难题还有很多。导航信息的引入、强化学习调优、轨迹的建模及优化都有很多门道，目前也是量产第一线。为此我们花了三个月的时间设计了端到端量产进阶课程，从实战到落地层层展开。该课程涉及的核心算法包括：一段式端到端、两段式端到端、导航信息的量产应用、开闭环强化学习、扩散模型+强化学习、自回归+强化学习、时空联合规划等等，最后分享一些实际的量产经验。很多想进阶或者跳槽的同学苦于没有专家辅导，想转行但实际工作中无法接触到实际的量产优化，简历上往往不够亮眼，遇到问题连个请教的人都没有。这门课程是自动驾驶之心联合工业界算法专家开设的《面向量产的端到端实战小班课》！课程只有一个重点：聚焦量产。从一段式、两段式、强化学习、导航应 ...

端到端量产

面向量产的端到端实战小班课

端到端量产

面向量产的端到端实战小班课

理想提出首个包含自车和他车轨迹的世界模型

理想TOP2· 2025-11-23 11:56

理想的世界模型包含自车和其他车的轨迹，这是理想首次提出的。做这件事目的是为了能够让理想VLA在仿真环境里进行强化学习，同一个场景可以不断测试更优的轨迹路线，这是真实数据完全无法实现的。可视化见下面这个视频：理想VLA训练过程：预训练阶段是在云端训一个32B的VL基座模型，包含3D视觉、比开源模型清晰度提升3-5倍的高清2D视觉、驾驶相关的language的语料，关键的 VL联合语料（如导航信息与人类判断的同步记录），为适配车端算力并保证推理速度，云端大模型蒸馏成3.2B的MoE模型。后训练阶段是将action引入模型，使其转化为VLA，参数量接近4B，采用短链条CoT，限制在2-3步以内，再用difusion，对未来4-8秒的轨迹和环境进行预测。强化学习阶段为两部分，一是人类反馈强化学习，二是不依赖人类反馈，利用世界模型模型生成数据进行纯强化学习训练，基于舒适性（G值）、无碰撞、遵守交规三大指标自我进化，目标是驾驶水平超越人类。 2025年3月12日理想发布 Other Vehicle Trajectories Are Also Needed: A Driving World Model Un ...

驾驶世界模型

EOT - WM驾驶世界模型

驾驶世界模型

EOT - WM驾驶世界模型

雷军：辅助驾驶不是自动驾驶，驾驶时仍需时刻保持专注

Sou Hu Cai Jing· 2025-11-23 08:56

11月23日，雷军发文总结小米端到端辅助驾驶HAD增强版的升级点。纵向加减速更舒适，旁车加塞时可提前预判减速，及时跟车提速，行车更舒适安全。横向变道更丝滑，在变道并线、借道绕行时表现更自然流畅。路况理解能力提升，在多车道复杂大路口能提前看懂导航信息，优化走对路、选对道的能力。此外，雷军还强调，辅助驾驶不是自动驾驶，驾驶时仍需时刻保持专注。此前在11月21日2025广州车展开幕日，小米汽车端到端辅助驾驶"Xiaomi HAD增强版"正式发布，其在1000万Clips版本基础上引入"强化学习"与"世界模型"，AEB防碰撞辅助升级，新增紧急转向辅助。 ...

XIAOMI(HK:01810)

小米端到端辅助驾驶HAD增强版

小米端到端辅助驾驶HAD增强版

雷军提醒：辅助驾驶不是自动驾驶，驾驶时仍需时刻保持专注

Sou Hu Cai Jing· 2025-11-23 06:25

IT之家 11 月 23 日消息，小米创办人、董事长兼 CEO 雷军今日发文，总结了小米端到端辅助驾驶 HAD 增强版的升级点。纵向加减速更舒适，旁车加塞时提前预判减速，及时跟车提速行车更舒适安全。横向变道更丝滑，变道并线、借道绕行时，更丝滑、不犹豫。路况理解更充分，多车道的复杂大路口，提前看懂导航信息，优化走对路、选对道能力。雷军也再次提醒：辅助驾驶不是自动驾驶，驾驶时仍需时刻保持专注。据IT之家此前报道，在 11 月 21 日 2025 广州车展开幕日当天，小米汽车端到端辅助驾驶"Xiaomi HAD 增强版"正式发布，其在 1000 万 Clips 版本的基础上引入了「强化学习」与「世界模型」，同时 AEB 防碰撞辅助升级，并新增紧急转向辅助。车道保持辅助 - 预警车道保持辅助 - 纠偏紧急车道保持盲区监测预警车门开启预警变道辅助预警其他安全能力超速告警红绿灯提醒自适应防眩目矩阵3 辅助驾驶不是自动驾驶，驾驶仍需时刻保持侧向安全能力 ...

XIAOMI(HK:01810)

小米端到端辅助驾驶 HAD 增强版

小米端到端辅助驾驶 HAD 增强版

理想2025广州车展视频版/图文压缩版

理想TOP2· 2025-11-21 04:22

Core Insights - The article emphasizes the ideal of living authentically and aligning with personal values, particularly in the context of driving standards and experiences [1] Group 1: Performance Metrics - In two months, VLA achieved a mileage of 312 million kilometers, with a penetration rate increase of 2.2 times and daily active users increasing threefold, including over 5,000 users driving 1,000 kilometers in a single day and 520,000 AD Max users [3] Group 2: Technological Advancements - The article discusses the transition from pre-reinforcement learning (blue) to post-reinforcement learning (green), indicating that new capabilities and features are currently in tight internal testing, with a rollout expected soon [6] - The company plans to automate all steps of charging at its stations, except for plugging in the vehicle, with 1,400, 2,400, and 2,900 stations expected to achieve this capability in January, February, and March of 2026, respectively [6] Group 3: Safety Features - The system has avoided potential collision incidents 11.32 million times and has cumulatively prevented 14,034 extreme accidents, with 2.08 million nighttime proactive risk avoidance actions [9] - New features include defensive acceleration maneuvers and a comprehensive 360-degree AES capability, enhancing safety against various driving threats [9] Group 4: Future Developments - Upcoming OTA content is anticipated to enhance user experience and vehicle functionality [11] - A new city NOA feature will soon be pushed to users of the revamped AD Pro [13]

城市NOA功能

城市NOA功能

小米HAD增强版辅助驾驶发布：引入强化学习与世界模型，AES紧急转向功能上车

Feng Huang Wang· 2025-11-21 02:33

凤凰网科技讯 11月21日，小米汽车在今日广州车展活动中正式对外发布了小米HAD增强版，并披露了其在智能驾驶领域的最新研发进展与人才布局。小米汽车方面表示，公司在AI领域的战略投入持续加码，2025年仅AI研发投入预算就将超过70亿元，目前的辅助驾驶专家团队规模已达1800人，其中包含 108名博士。在技术架构层面，此次发布的小米HAD增强版不仅基于原有的1000万clips训练基础，更核心的变化在于引入了强化学习算法与世界模型，试图通过"端到端"技术路径提升驾驶表现。据介绍，通过世界模型，系统能够在数字空间中生成包括极端天气（如大雾、大雪）、复杂路况及突发碰撞等在内的多种场景，利用奖励函数机制训练算法，使其从"规则驱动"转向"学习驱动"。官方数据显示，该世界模型技术已获得ICCV和NeurIPS等国际学术会议的认可。本次发布的智驾更新将包含在小米HyperOS 1.11.0版本中，由于审核的进度差异，不同车型的推送时间可能会略有不同，官方将全力推进，尽早的将版本推送给大家。针对用户实际驾驶痛点，新版本重点优化了纵向与横向的控制体验。在应对旁车加塞场景时，系统通过大模型预测并线意图，减少 ...

XIAOMI(HK:01810)

端到端技术

端到端技术

工业界算法专家带队！面向落地的端到端自动驾驶小班课

自动驾驶之心· 2025-11-21 00:04

端到端作为这两年的量产关键词，是各家车企核心的招聘岗位。但市面上真正的量产人才少之又少，模型优化、场景优化、数据优化，再到下游的规划兜底，可以说端到端是一个全栈的岗位。从技术的成熟度和工业界的需求来看，端到端需要攻克的难题还有很多。导航信息的引入、强化学习调优、轨迹的建模及优化都有很多门道，目前也是量产第一线。为此我们花了三个月的时间设计了端到端量产进阶课程，从实战到落地层层展开。该课程涉及的核心算法包括：一段式端到端、两段式端到端、导航信息的量产应用、开闭环强化学习、扩散模型+强化学习、自回归+强化学习、时空联合规划等等，最后分享一些实际的量产经验。很多想进阶或者跳槽的同学苦于没有专家辅导，想转行但实际工作中无法接触到实际的量产优化，简历上往往不够亮眼，遇到问题连个请教的人都没有。点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线这门课程是自动驾驶之心联合工业界算法专家开设的《面向量产的端到端实战小班课》！课程只有一个重点：聚焦量产。从一段式、两段式、强化学习、导航应用、轨迹优化、兜底方案再到具体量产经验分享。面向就业直击落地，所以这门课 ...

端到端算法

时空联合规划

端到端算法

时空联合规划

基于准确的原始材料对比小鹏理想VLA

理想TOP2· 2025-11-20 10:42

Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the VLA (Vision-Language-Action) architecture developed by Li Auto and the insights shared by Xiaopeng's autonomous driving head, Liu Xianming, during a podcast. Liu emphasizes the removal of the intermediate language component (L) to enhance scalability and efficiency in data usage [1][4][5]. Summary by Sections VLA Architecture and Training Process - The VLA architecture involves a pre-training phase using a 32 billion parameter (32B) vision-language model that incorporates 3D vision and high-definition 2D vision, improving clarity by 3-5 times compared to open-source models. It also includes driving-related language data and key VL joint data [10][11]. - The model is distilled into a 3.2 billion parameter (3.2B) MoE model to ensure fast inference on vehicle hardware, followed by a post-training phase that integrates action to form the VLA, increasing the parameter count to nearly 4 billion [13][12]. - The reinforcement learning phase consists of two parts: human feedback reinforcement learning (RLHF) and pure reinforcement learning using world model-generated data, focusing on comfort, collision avoidance, and adherence to traffic regulations [15][16]. Data Utilization and Efficiency - Liu argues that using language as a supervisory signal can introduce human biases, reducing data efficiency and scalability. The most challenging data to collect are corner cases, which are crucial for training [4][6]. - The architecture aims to achieve a high level of generalization, with plans to implement L4 robotaxi services in Guangzhou based on the current framework [4][5]. Future Directions and Challenges - Liu acknowledges the uncertainties in scaling the technology and ensuring safety, questioning how to maintain safety standards and align the model with human behavior [5][18]. - The conversation highlights that the VLA, VLM, and world model are fundamentally end-to-end architectures, with various companies working on similar concepts in the realm of Physical AI [5][18]. Human-Agent Interaction - The driver agent is designed to process short commands directly, while complex instructions are sent to the cloud for processing before execution. This approach allows the system to understand and interact with the physical world like a human driver [17][18]. - The article concludes that the traffic domain is a suitable environment for VLA implementation due to its defined rules and the ability to model human driving behavior effectively [19][20].