深度强化学习

Search documents
中原金太阳申请考虑碳捕捉效益的配电网内风电容量区间计算方法专利,实现碳效益‑经济成本的动态权衡
Jin Rong Jie· 2025-08-23 01:21
Group 1 - The company Henan Zhongyuan Jinyang Technology Co., Ltd. has applied for a patent titled "A Calculation Method for Wind Power Capacity Range in Distribution Networks Considering Carbon Capture Benefits" [1] - The patent application was published under CN120524785A and was filed on March 2025 [1] - The invention relates to the field of wind power capacity configuration and involves a method that integrates artificial intelligence algorithms with physical laws of energy systems [1] Group 2 - Henan Zhongyuan Jinyang Technology Co., Ltd. was established in 2020 and is located in Zhengzhou, primarily engaged in technology promotion and application services [2] - The company has a registered capital of 90 million RMB and has invested in 41 enterprises [2] - The company has participated in 91 bidding projects and holds 21 patents, along with 6 administrative licenses [2]
狄耐克:脑机交互事业部提出基于深度强化学习的主动式脑机接口共同控制方案
news flash· 2025-07-02 03:19
Core Insights - Dr. Peng Junren from Dineike's Brain-Computer Interface (BCI) division published a paper in the "Annals of the New York Academy of Sciences" discussing a new approach to shared autonomy between human electroencephalography and TD3 deep reinforcement learning [1] - The study indicates that approximately 15%-30% of users are unable to effectively operate traditional BCI systems due to physiological differences, highlighting a gap in current technology that only measures internal brain activity without considering environmental factors [1] - Dineike's BCI division proposes an active BCI co-control scheme based on deep reinforcement learning, aiming to provide a new paradigm for the universal application of BCIs through collaborative decision-making between humans and AI agents [1] - The next steps for Dineike involve focusing on breakthroughs in core technologies related to brainwave interaction and the industrialization of these technologies, moving from laboratory research to practical applications [1]
具身智能领域,全球Top50国/华人图谱(含具身智能赛道“师徒关系图”)
Robot猎场备忘录· 2025-06-30 08:09
Core Viewpoint - The development of embodied intelligence technology is a leading trend in the AI and robotics sector, involving advanced techniques such as large language models (LLM), visual multimodal models (VLM), reinforcement learning, deep reinforcement learning, and imitation learning [1]. Group 1: Embodied Intelligence Technology - Embodied intelligence technology encompasses various cutting-edge techniques, including LLM, VLM, reinforcement learning, deep reinforcement learning, and imitation learning [1]. - The evolution of humanoid robots has progressed from model-based control algorithms to dynamic model control and optimal control algorithms, and currently to simulation combined with reinforcement learning [1]. - The most frequently mentioned concepts in humanoid robotics companies are imitation learning and reinforcement learning, primarily researched by academic and leading tech company teams [1]. Group 2: Academic Contributions - UC Berkeley and Stanford University are leading institutions in the AI and robotics research field, with notable alumni contributing to the embodied intelligence sector [2]. - Four prominent figures from UC Berkeley, known as the "Four Returnees," have transitioned from Tsinghua University to UC Berkeley and then to entrepreneurial ventures in embodied intelligence [2]. Group 3: Notable Individuals in the Field - Wang He and Lu Ce Wu are key representatives of individuals who graduated from Stanford University and are now involved in the embodied intelligence startup scene in China [3]. - Wang He, a 2021 PhD graduate from Stanford, is now an assistant professor at Peking University and the founder of a leading humanoid robotics startup [3]. - Lu Ce Wu, a postdoctoral researcher at Stanford, is a co-founder and chief scientist of a unicorn collaborative robotics company and a founder of an embodied intelligence startup [3]. Group 4: Global Talent Pool - The majority of the top 50 Chinese individuals in the embodied intelligence field have educational backgrounds from prestigious institutions such as UC Berkeley, Stanford, MIT, and CMU, often under the mentorship of industry leaders [4]. - A detailed mapping of the top 50 Chinese talents in the field includes their educational history, research directions, and current positions in leading tech companies or startups [5].
港科大 | LiDAR端到端四足机器人全向避障系统 (宇树G1/Go2+PPO)
具身智能之心· 2025-06-29 09:51
Core Viewpoint - The article discusses the Omni-Perception framework developed by a team from the Hong Kong University of Science and Technology, which enables quadruped robots to navigate complex dynamic environments by directly processing raw LiDAR point cloud data for omnidirectional obstacle avoidance [2][4]. Group 1: Omni-Perception Framework Overview - The Omni-Perception framework consists of three main modules: PD-RiskNet perception network, high-fidelity LiDAR simulation tool, and risk-aware reinforcement learning strategy [4]. - The system takes raw LiDAR point clouds as input, extracts environmental risk features using PD-RiskNet, and outputs joint control signals, forming a complete closed-loop control [5]. Group 2: Advantages of the Framework - Direct utilization of spatiotemporal information avoids information loss during point cloud to grid/map conversion, preserving precise geometric relationships from the original data [7]. - Dynamic adaptability is achieved through reinforcement learning, allowing the robot to optimize obstacle avoidance strategies for previously unseen obstacle shapes [7]. - Computational efficiency is improved by reducing intermediate processing steps compared to traditional SLAM and planning pipelines [7]. Group 3: PD-RiskNet Architecture - PD-RiskNet employs a hierarchical risk perception network that processes near-field and far-field point clouds differently to capture local and global environmental features [8]. - The near-field processing uses farthest point sampling (FPS) to reduce data density while retaining key geometric features, and employs gated recurrent units (GRU) to capture local dynamic changes [8]. - The far-field processing uses average down-sampling to reduce noise and extract spatiotemporal features from distant environments [8]. Group 4: Reinforcement Learning Strategy - The obstacle avoidance task is modeled as an infinite horizon discounted Markov decision process, with state space including the robot's kinematic information and historical LiDAR point cloud sequences [10]. - The action space directly outputs target joint positions, allowing the policy to learn the mapping from raw sensor inputs to control signals without complex inverse kinematics [11]. - The reward function incorporates obstacle avoidance and distance maximization rewards to encourage the robot to seek open paths while penalizing deviations from target speeds [13][14]. Group 5: Simulation and Real-World Testing - The framework was validated against real LiDAR data collected using the Unitree G1 robot, demonstrating high consistency in point cloud distribution and structural integrity between simulated and real data [21]. - The Omni-Perception tool showed significant advantages in rendering efficiency, maintaining linear growth in rendering time as the number of environments increased, unlike traditional methods which exhibited exponential growth [22]. - In various tests, the framework achieved a 100% success rate in static obstacle scenarios and demonstrated superior performance in dynamic environments compared to traditional methods [26][27].
致敬钱学森,我国学者开发AI虚拟现实运动系统——灵境,解决青少年肥胖难题,揭示VR运动的减肥及促进大脑认知作用机制
生物世界· 2025-06-24 03:56
Core Viewpoint - Adolescent obesity is a global public health crisis with rising prevalence, leading to increased risks of cardiovascular and metabolic diseases, as well as cognitive impairments [2] Group 1: Research and Development - A research team from Shanghai Jiao Tong University and other institutions developed the world's first VR-based exercise intervention system, REVERIE, aimed at overweight adolescents [4][8] - The REVERIE system utilizes deep reinforcement learning and a Transformer-based virtual coach to provide safe, effective, and empathetic exercise guidance [4][8] Group 2: Study Design and Methodology - The study included a randomized controlled trial with 227 overweight adolescents, comparing outcomes between VR exercise, real-world exercise, and a control group [11] - Participants were assigned to different groups, including VR and real-world sports, with all groups receiving uniform dietary management over an eight-week intervention [11] Group 3: Results and Findings - After eight weeks, the VR exercise group lost an average of 4.28 kg of body fat, while the real-world exercise group lost 5.06 kg, showing comparable results [13] - Both VR and real-world exercise groups showed improvements in liver enzyme levels, LDL cholesterol, physical fitness, mental health, and exercise willingness [13] - VR exercise demonstrated superior cognitive function enhancement compared to real-world exercise, supported by fMRI findings indicating increased neural efficiency and plasticity [14] Group 4: Safety and Implications - The injury rate in the VR exercise group was 7.69%, lower than the 13.48% in the real-world exercise group, with no severe adverse events reported [15] - The REVERIE system is positioned as a promising solution for addressing adolescent obesity and promoting overall health improvements beyond weight loss [16][17]
字节跳动ByteBrain团队提出秒级推理强化学习VMR系统
news flash· 2025-06-05 06:49
Core Insights - ByteDance's ByteBrain team, in collaboration with UC Merced and UC Berkeley, has developed a VMR system based on deep reinforcement learning, achieving a significant reduction in inference time to 1.1 seconds while maintaining near-optimal performance [1] Group 1 - The VMR system addresses the long-neglected but critical issue of virtual machine re-scheduling (VMR) [1] - The research has been presented at the prestigious EuroSys25 conference, highlighting its academic significance [1] - The two co-first authors of the paper are interns from ByteDance's ByteBrain team, indicating the company's investment in nurturing talent [1]
深度强化学习赋能城市消防优化,中科院团队提出DRL新方法破解设施配置难题
3 6 Ke· 2025-06-03 07:27
Core Viewpoint - The presentation by Dr. Liang Haojian focuses on optimizing urban emergency fire facility allocation using a hierarchical deep reinforcement learning (DRL) approach, highlighting the advantages and potential of deep learning in geographic spatial optimization [1][4][17]. Geographic Spatial Optimization - Geographic spatial optimization combines mathematical combinatorial optimization with geographic information science, addressing practical issues such as spatial layout and resource allocation in urban development [4][5]. - Traditional methods for solving spatial optimization problems include exact algorithms, approximate algorithms, and heuristic algorithms, each with its limitations [4][5]. Deep Learning in Geographic Spatial Optimization - The exploration of neural spatial optimization (NeurSPO) aims to utilize deep learning to solve spatial optimization problems, motivated by the need for faster heuristic methods and the automatic design of new algorithms [6]. - Two main constructs of NeurSPO are deep construction and deep improvement, focusing on stepwise solution construction and local search enhancements, respectively [6]. SpoNet Model - The SpoNet model integrates dynamic coverage information and attention mechanisms to address location selection challenges, allowing the model to focus on specific input sequences during decoding [7][11]. - In a case study involving emergency facilities in Beijing's Chaoyang District, the model selected 20 out of 132 candidate facilities to maximize coverage [11]. AIAM Model - The adaptive interaction attention model (AIAM) is designed to solve the p-median problem by incorporating user-facility interaction, enhancing local search capabilities [12][16]. - The model demonstrated feasibility by retaining 15 hospitals from 80 candidates to minimize total distance to residents [16]. Hierarchical DRL for Fire Facility Allocation - The hierarchical DRL approach addresses the challenges of urban emergency fire facility allocation by improving fire risk prediction accuracy, optimizing resource allocation, and enhancing response timeliness [17][21]. - The model incorporates multi-dimensional spatiotemporal feature extraction, uncertainty considerations, and a hierarchical strategy for facility layout optimization [18][21][22]. Future Outlook - The research team plans to enhance geographic spatial optimization by integrating geographic computing mechanisms, expanding to large-scale emergency response issues, and designing more efficient DRL frameworks [23][24][25]. - The proposed hierarchical DRL method aims to address inefficiencies in traditional fire facility layouts and improve emergency management through innovative solutions [25].