强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

小米社招&校招 | 自动驾驶与机器人具身智能算法研究员 (VLA方向)

具身智能之心· 2025-07-01 12:07

核心职责包括前沿算法研究与构建：负责设计和实现领先的具身多模态大模型。您的研究将不仅限于现有的VLA框架，更将探索如何构建能够理解复杂三维世界、并进行长时序、多步骤任务规划的世界模型 (World Model)。核心模型能力攻关：主导模型在以下关键能力上的突破：多模态场景理解：融合视觉、语言、雷达等多源信息，实现对动态、开放环境的深刻理解和空间感知。职位描述我们正在寻找一位杰出的研究员/科学家，加入我们的前沿探索团队，共同定义和构建下一代自动驾驶与机器人的"大脑"。您将致力于突破性的具身基座模型 (Embodied Foundation Model) 的研究，该模型将深度融合视觉-语言-行动 (VLA) 能力，并具备卓越的空间感知与空间推理能力。复杂语义推理与决策：让模型能够理解模糊、抽象的人类指令，并结合对物理世界的空间推理，生成安全、合理、可解释的行动序列。学习与适应机制：深入研究强化学习 (RL)、模仿学习 (IL) 及自监督学习方法，使模型能从海量数据和与环境的交互中持续学习和进化。技术愿景与路线图：主导构建可泛化、高效率的具身智能基座模型，为未来1-3年的技术演进提供核心支 ...

XIAOMI(HK:01810)

多模态大模型

多模态大模型

性能提升84%-166%！L-Zero仅靠强化学习解锁大模型探索世界的能力 | 已开源

量子位· 2025-07-01 00:53

招商局狮子山人工智能实验室投稿量子位 | 公众号 QbitAI 大模型可以不再依赖人类调教，真正"自学成才"啦？新研究仅通过 RLVR （可验证奖励的强化学习），成功让模型自主进化出通用的探索、验证与记忆能力，让模型学会"自学"！当前主流的LLM Agent依然高度依赖于提示词工程、复杂的系统编排、甚至静态规则表，这使得它们在面对复杂任务时难以实现真正的智能行为演化。而来自招商局狮子山人工智能实验室的研究团队认为，RLVR范式是智能体（Agent）通往更高通用性和自主性的重要突破口。于是，他们从两个关键层面出发构建了端到端Agent训练pipeline—— L0系统：智能体架构层面提出了结构化智能体框架——NB-Agent，在经典"代码即行动" （Code-as-Action）架构基础上进行扩展，使智能体能够操作记忆/上下文，从而获得类人类的记忆存储、信息总结与自我反思能力。学习范式层面探索了一个核心问题：是否可以仅通过RLVR范式，引导智能体从零开始，学会如何规划、搜索、验证与记忆，最终解决复杂的多轮推理任务？ L0系统的框架、模型及训练集已全部开源，详细可见文末链接。 ...

暑假打打比赛！PRCV 2025空间智能与具身智能视觉感知挑战赛正式启动~

自动驾驶之心· 2025-06-30 12:51

Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key supporting technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd., which also provides technical support [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power, for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - Spatial Intelligence requires building a 3D reconstruction model based on multi-view aerial images, while Embodied Intelligence involves completing tasks in dynamic occlusion scenarios [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on PSNR and F1-Score metrics [19][20]. - For Embodied Intelligence, evaluation focuses on task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Submission and Awards - Results must be submitted in a specified format, and top-ranking teams will have their results reproduced for evaluation [24]. - Awards for each track include cash prizes and computing vouchers, with a total of 12 awards distributed among the top teams [25].

空间智能与具身智能

计算机视觉

空间智能与具身智能

计算机视觉

当无人机遇到AI智能体：多领域自主空中智能和无人机智能体综述

具身智能之心· 2025-06-30 12:17

Core Insights - The article discusses the evolution of Unmanned Aerial Vehicles (UAVs) into Agentic UAVs, which are characterized by autonomous reasoning, multimodal perception, and reflective control, marking a significant shift from traditional automation platforms [5][6][11]. Research Background - The motivation for this research stems from the rapid development of UAVs from remote-controlled platforms to complex autonomous agents, driven by advancements in artificial intelligence (AI) [6][7]. - The increasing demand for autonomy, adaptability, and interpretability in UAV operations across various sectors such as agriculture, logistics, environmental monitoring, and public safety is highlighted [6][7]. Definition and Architecture of Agentic UAVs - Agentic UAVs are defined as a new class of autonomous aerial systems with cognitive capabilities, situational adaptability, and goal-directed behavior, contrasting with traditional UAVs that operate based on predefined instructions [11][12]. - The architecture of Agentic UAVs consists of four core layers: perception, cognition, control, and communication, enabling autonomous sensing, reasoning, action, and interaction [12][13]. Enabling Technologies - Key technologies enabling the development of Agentic UAVs include: - **Perception Layer**: Utilizes a suite of sensors (RGB cameras, LiDAR, thermal sensors) for real-time semantic understanding of the environment [13][14]. - **Cognition Layer**: Acts as the decision-making core, employing techniques like reinforcement learning and probabilistic modeling for adaptive control strategies [13][14]. - **Control Layer**: Converts planned actions into specific flight trajectories and commands [13][14]. - **Communication Layer**: Facilitates data exchange and task coordination among UAVs and other systems [13][14]. Applications of Agentic UAVs - **Precision Agriculture**: Agentic UAVs are transforming precision agriculture by autonomously identifying crop health issues and optimizing pesticide application through real-time data analysis [17][18]. - **Disaster Response and Search and Rescue**: These UAVs excel in dynamic environments, providing real-time adaptability and autonomous task reconfiguration during disaster scenarios [20][21]. - **Environmental Monitoring**: Agentic UAVs serve as intelligent, mobile environmental sentinels, capable of monitoring rapidly changing ecosystems with high spatial and temporal resolution [22][23]. - **Urban Infrastructure Inspection**: They offer a transformative approach to infrastructure inspections, enabling real-time damage detection and adaptive task planning [24]. - **Logistics and Smart Delivery**: Agentic UAVs are emerging as intelligent aerial couriers, capable of executing complex delivery tasks with minimal supervision [25][26]. Challenges and Limitations - Despite the transformative potential of Agentic UAVs, their widespread application faces challenges related to technical constraints, regulatory hurdles, and cognitive dimensions [43].

人工智能（AI）

基于记忆的架构

数字孪生技术

物联网（IoT）

边缘云协同

人工智能（AI）

基于记忆的架构

数字孪生技术

物联网（IoT）

边缘云协同

具身智能领域，全球Top50国/华人图谱（含具身智能赛道“师徒关系图”）

Robot猎场备忘录· 2025-06-30 08:09

Core Viewpoint - The development of embodied intelligence technology is a leading trend in the AI and robotics sector, involving advanced techniques such as large language models (LLM), visual multimodal models (VLM), reinforcement learning, deep reinforcement learning, and imitation learning [1]. Group 1: Embodied Intelligence Technology - Embodied intelligence technology encompasses various cutting-edge techniques, including LLM, VLM, reinforcement learning, deep reinforcement learning, and imitation learning [1]. - The evolution of humanoid robots has progressed from model-based control algorithms to dynamic model control and optimal control algorithms, and currently to simulation combined with reinforcement learning [1]. - The most frequently mentioned concepts in humanoid robotics companies are imitation learning and reinforcement learning, primarily researched by academic and leading tech company teams [1]. Group 2: Academic Contributions - UC Berkeley and Stanford University are leading institutions in the AI and robotics research field, with notable alumni contributing to the embodied intelligence sector [2]. - Four prominent figures from UC Berkeley, known as the "Four Returnees," have transitioned from Tsinghua University to UC Berkeley and then to entrepreneurial ventures in embodied intelligence [2]. Group 3: Notable Individuals in the Field - Wang He and Lu Ce Wu are key representatives of individuals who graduated from Stanford University and are now involved in the embodied intelligence startup scene in China [3]. - Wang He, a 2021 PhD graduate from Stanford, is now an assistant professor at Peking University and the founder of a leading humanoid robotics startup [3]. - Lu Ce Wu, a postdoctoral researcher at Stanford, is a co-founder and chief scientist of a unicorn collaborative robotics company and a founder of an embodied intelligence startup [3]. Group 4: Global Talent Pool - The majority of the top 50 Chinese individuals in the embodied intelligence field have educational backgrounds from prestigious institutions such as UC Berkeley, Stanford, MIT, and CMU, often under the mentorship of industry leaders [4]. - A detailed mapping of the top 50 Chinese talents in the field includes their educational history, research directions, and current positions in leading tech companies or startups [5].

人形机器人

深度强化学习

人形机器人

深度强化学习

人形机器人「通用临界点」：当灵巧手握住万亿市场

3 6 Ke· 2025-06-30 06:21

Core Insights - The article emphasizes that dexterous hands are becoming a crucial component in the evolution of embodied AI, transitioning from laboratory concepts to practical applications in various industries [2][4] - The report aims to provide insights into the development of dexterous hands, focusing on industry definitions, application scenarios, and competitive landscapes [2][4] Industry Definition and Technological Evolution - Dexterous hands are positioned as the end revolution of embodied AI, moving beyond mere grasping to mimicking human hand movements and adapting to complex environments [4][5] - The development of dexterous hands is supported by advancements in structural engineering, control algorithms, and sensor integration, expanding their industry boundaries [7][9] - The market perception of dexterous hands is shifting from a hardware component to a platform capability, particularly in humanoid robots and service robots [10] Application Scenarios and Business Trends - In industrial applications, dexterous hands address the challenges of handling irregular shapes and performing multi-task automation, enhancing productivity in logistics and manufacturing [21] - In service and medical fields, dexterous hands are seen as essential for home robots, rehabilitation prosthetics, and remote medical operations, with a focus on cost control and reliability [22][23] - The technology exhibits strong cross-scenario adaptability, with current focus on B2B applications while future potential lies in B2C markets [24] Competitive Landscape and Capital Judgments - The global competition in the dexterous hand sector features a mix of overseas technological advancements and domestic innovations, with companies like Shadow Robot and Linker Hand leading the charge [26][27] - Investment trends indicate a growing interest in dexterous hand technologies, with significant funding rounds reported for various startups focusing on high degrees of freedom and integrated control systems [32][35] - The investment logic emphasizes the importance of technological breakthroughs, application validation, and system integration capabilities for companies in this space [38][41]

SIASUN(SZ:300024)

三指“DEX‑EE”系列

三指“DEX‑EE”系列

具身智能入门必备的技术栈：从零基础到强化学习与Sim2Real

具身智能之心· 2025-06-30 03:47

Core Insights - The article emphasizes that the field of AI is at a transformative juncture, particularly with the rise of embodied intelligence, which allows machines to understand and interact with the physical world [1][2]. Group 1: Embodied Intelligence - Embodied intelligence is defined as AI systems that not only possess a "brain" but also have a "body" capable of perceiving and altering the physical environment [1]. - Major tech companies like Tesla, Boston Dynamics, OpenAI, and Google are actively developing technologies in this revolutionary field [1]. - The potential impact of embodied intelligence spans across various industries, including manufacturing, healthcare, and space exploration [1]. Group 2: Technical Challenges - Achieving true embodied intelligence presents unprecedented technical challenges, requiring advanced algorithms and a deep understanding of physical simulation, robot control, and perception fusion [2][4]. - MuJoCo (Multi-Joint dynamics with Contact) is highlighted as a critical technology in this domain, serving as a high-fidelity simulation engine that bridges the virtual and real worlds [4][6]. Group 3: MuJoCo's Role - MuJoCo allows researchers to create realistic virtual robots and environments, enabling millions of trials and learning experiences without risking expensive hardware [6]. - The simulation speed of MuJoCo can be hundreds of times faster than real-time, significantly accelerating the learning process [6]. - MuJoCo has become a standard tool in both academia and industry, with major companies utilizing it for robot research [7]. Group 4: Practical Training - A comprehensive MuJoCo development course has been developed, focusing on practical applications and theoretical foundations in embodied intelligence [8][9]. - The course is structured into six modules, each with specific learning objectives and practical projects, ensuring a solid grasp of the technology [10][12]. - Projects range from basic robotic arm control to complex multi-agent systems, providing hands-on experience in real-world applications [14][21]. Group 5: Target Audience and Outcomes - The course is designed for individuals with programming or algorithm backgrounds looking to enter the field of embodied robotics, as well as students and professionals seeking to enhance their practical skills [27][28]. - Upon completion, participants will have a complete skill set in embodied intelligence, including proficiency in MuJoCo, reinforcement learning, and real-world application of simulation techniques [27][28].

Sim-to-Real迁移技术

Optimus人形机器人

Sim-to-Real迁移技术

Optimus人形机器人

CVPR2025 WAD纯视觉端到端 | 冠军方案技术报告~

自动驾驶之心· 2025-06-29 11:33

Core Viewpoint - The article discusses the advancements in end-to-end autonomous driving technology, highlighting the performance of the top competitor, Poutine, in a recent visual-based driving competition, emphasizing its robust training methodology and superior results [1][13]. Group 1: Technical Overview - The leading solution, Poutine, utilizes a 3B parameter Vision-Language Model (VLM) to address long-tail scenarios in visual end-to-end autonomous driving [1]. - The training process consists of two phases: - Phase one involves self-supervised pre-training using a combination of vision, language, and trajectory data, with a total of 83 hours of CoVLA data and 11 hours of Waymo long-tail dataset [2]. - Phase two focuses on fine-tuning through reinforcement learning (RL) using 500 segments of manually annotated data from the Waymo validation set to enhance robustness [2][8]. - The Poutine model achieved a Rater-Feedback Score (RFS) of 7.99 on the Waymo test set, leading the competition [2][13]. Group 2: Data and Methodology - The datasets used include CoVLA, which contains 10,000 front-view images and 30 seconds of driving video, and WOD-E2E, which provides 4,021 long-tail driving scenarios with trajectory information [11]. - The evaluation metric, RFS, is calculated based on the proximity of predicted trajectories to expert-rated trajectories, with a scoring range of 0 to 10 [11]. - The training details include a batch size of 64 and a learning rate of 1e-5 for the CoVLA dataset, while the WOD-E2E dataset used a batch size of 16 with similar training parameters [11]. Group 3: Results and Analysis - Poutine's performance significantly outperformed other models, with a notable score of 7.99, while the second-best model scored 7.91, indicating a substantial lead [13]. - The article notes that while the addition of RL did not drastically improve scores, it effectively addressed challenging scenarios [13]. - The results suggest that the combination of VLM and RL training enhances the model's ability to handle complex driving environments [18]. Group 4: Future Considerations - The article raises questions about the mainstream applicability of VLM and LLM in trajectory prediction, particularly regarding their understanding of the physical world and 3D trajectory information [19]. - It suggests that for conventional evaluation datasets, the advantages of such models may not be as pronounced, indicating a need for further exploration [19]. - The potential integration of action models with VLM for trajectory prediction is proposed as a more comprehensive approach [19].

视觉语言轨迹预训练

Qwen2.5-VL 72B Instruct模型

视觉语言轨迹预训练

Qwen2.5-VL 72B Instruct模型

中科院自动化所最新综述！VLA模型后训练与类人运动学习的共性

具身智能之心· 2025-06-29 09:51

Core Viewpoint - The article discusses the post-training strategies of Vision-Language-Action (VLA) models from the perspective of human motor skill learning, emphasizing the need for robots to undergo a post-training phase to adapt to specific tasks and environments, similar to how humans learn skills through practice and experience [4][5][9]. Summary by Sections 1. Introduction to VLA Models - VLA models integrate visual perception, language understanding, and action generation, enabling robots to interact with their environment effectively. However, their out-of-the-box performance is often insufficient for complex real-world applications, necessitating a post-training phase to refine their capabilities [8][9]. 2. Post-Training Strategies - The article categorizes VLA model post-training strategies into three dimensions: environment perception, embodiment (body awareness), and task understanding. This classification mirrors the key components of human motor learning, facilitating targeted improvements in specific model capabilities [10][12]. 3. Environmental Perception Enhancement - Strategies include enhancing the model's ability to perceive and adapt to various operational environments, utilizing cues from the surroundings to inform actions, and optimizing visual encoding for task-specific scenarios [12][13]. 4. Body Awareness and Control - The post-training strategies focus on developing internal models that predict body state changes, improving the model's ability to control robotic movements through feedback mechanisms inspired by human motor control [14]. 5. Task Understanding and Planning - The article highlights the importance of breaking down complex tasks into manageable steps, akin to human learning processes, to enhance the model's understanding of task objectives and improve operational planning [14]. 6. Multi-Component Integration - Effective skill acquisition in humans involves synchronizing multiple learning components. Similarly, VLA models benefit from integrating various strategies to optimize performance across different dimensions [14]. 7. Challenges and Future Trends - Despite advancements, challenges remain in enabling robots to learn and adapt like humans. Key areas for future research include improving kinematic models, optimizing action output structures, and enhancing human-robot interaction through expert knowledge integration [16][17][18]. 8. Continuous Learning and Generalization - The need for continuous learning capabilities is emphasized, as current VLA models often struggle with retaining previously learned skills. Future research should focus on developing algorithms that allow for lifelong learning and better generalization in open environments [22]. 9. Safety and Explainability - The article underscores the importance of safety and explainability in robotic decision-making, advocating for research into interpretable AI and safety mechanisms to ensure reliable operation in diverse scenarios [22].

VLA模型后训练

人类运动技能学习

可解释性与安全

神经科学启发的人工智能技术

VLA模型后训练

人类运动技能学习

可解释性与安全

神经科学启发的人工智能技术

从后训练回到预训练，LLM+RL 的潜力兑现有有机会走更远吗？

机器之心· 2025-06-28 05:22

Core Insights - The article discusses the potential of combining Reinforcement Learning (RL) with Large Language Models (LLMs), particularly focusing on the transition from post-training to pre-training phases, highlighting the challenges and opportunities in this area [2][3]. Group 1: Transition from Post-training to Pre-training - The integration of RL with LLMs is seen as a significant technological advancement, extending applications from post-training to pre-training phases [2]. - LLMs traditionally rely on supervised learning, which requires extensive and accurate human-provided data, making RL a viable alternative to address these limitations [3]. - RL's ability to generate data through model-environment interaction reduces the dependency on high-quality labeled data, thus lowering the requirements for supervision [3][4]. Group 2: Applications and Innovations in RL - Initial applications of RL in LLMs were focused on post-training, with techniques like Reinforcement Learning from Human Feedback (RLHF) being prominent [4]. - Recent advancements, such as Reinforcement Pre-Training (RPT) by researchers from Microsoft and Tsinghua University, have expanded RL's application to the pre-training phase, showing improved performance on certain benchmarks [4][5]. - RPT redefines the next token prediction (NTP) task as a verifiable reasoning task, potentially unlocking RL's capabilities while reducing reliance on labeled data [5]. Group 3: Challenges and Limitations - Despite the promising developments, the known limitations of RL in LLMs are still being uncovered, indicating that while the path appears bright, significant challenges remain [4][6]. - The training data and settings for RPT have yet to be validated across broader text and foundational models, and the computational resource demands for RL training continue to pose challenges [5].

大语言模型（LLM）

NTP（下个词元预测）

Reinforcement Pre - Training（RPT）

大语言模型（LLM）

NTP（下个词元预测）

Reinforcement Pre - Training（RPT）