深度强化学习
Search documents
首个AI控制器完成卫星在轨姿态调整验证
Ke Ji Ri Bao· 2025-11-14 00:20
据物理学家组织网10日报道,德国维尔茨堡大学科学家研制出全球第一个人工智能(AI)在轨卫星姿 态控制器,并在一颗名为InnoCube的纳米卫星上完成验证。这一突破标志着航天系统自主化向前迈出关 键一步。 2025年10月30日上午,在短短9分钟的卫星过境期间,这个AI控制器执行了完整的姿态机动操作。通过 控制反作用轮,AI精准地将卫星从初始姿态调整至目标姿态,后续测试中也始终保持稳定控制。 (文章来源:科技日报) 这项名为"学习型姿态控制在轨验证项目"的研究,致力于开发新一代自主姿态控制系统。其核心突破在 于采用深度强化学习技术,让神经网络在模拟环境中自主学习控制策略,而非依赖传统固定算法。与传 统方法相比,这种创新方案展现出显著优势,不仅将耗时数月的参数调试过程自动化,更能让控制器自 主适应实际环境变化,彻底摆脱人工校准的束缚。 团队表示,这项验证为深空探测开辟了新前景。在通信延迟的行星际或深空任务中,智能自主控制系统 将成为航天器生存的关键。最新进展表明,人类正在迎来卫星控制的新时代。 为确保万无一失,研究团队先在地面进行高保真模拟训练,再将成熟算法上传至在轨卫星。令人振奋的 是,经过模拟训练的AI控制器 ...
AI 赋能资产配置(十九):机构 AI+投资的实战创新之路
Guoxin Securities· 2025-10-29 07:16
Core Insights - The report emphasizes the transformative impact of AI on asset allocation, highlighting the shift from static optimization to dynamic, intelligent evolution in decision-making processes [1] - It identifies the integration of large language models (LLMs), deep reinforcement learning (DRL), and graph neural networks (GNNs) as key technologies reshaping investment research and execution [1][2] - The future of asset management is seen as a collaborative effort between human expertise and AI capabilities, necessitating a reconfiguration of organizational structures and strategies [3] Group 1: AI in Asset Allocation - LLMs are revolutionizing the understanding and quantification of unstructured financial texts, thus expanding the information boundaries traditionally relied upon in investment research [1][11] - The evolution of sentiment analysis from basic dictionary methods to advanced transformer-based models allows for more accurate emotional assessments in financial contexts [12][13] - The application of LLMs in algorithmic trading and risk management is highlighted, showcasing their ability to generate quantitative sentiment scores and identify early warning signals for market shifts [14][15] Group 2: Deep Reinforcement Learning (DRL) - DRL provides a framework for adaptive decision-making in asset allocation, moving beyond static models to a dynamic learning approach that maximizes long-term returns [17][18] - The report discusses various DRL algorithms, such as Actor-Critic methods and Proximal Policy Optimization, which show significant potential in financial applications [19][20] - Challenges in deploying DRL in real-world markets include data dependency, overfitting risks, and the need for models to adapt to different market cycles [21][22] Group 3: Graph Neural Networks (GNNs) - GNNs conceptualize the financial system as a network, allowing for a better understanding of risk transmission among financial institutions [23][24] - The ability of GNNs to model systemic risks and conduct stress testing provides valuable insights for regulators and investors alike [25][26] Group 4: Institutional Practices - BlackRock's AlphaAgents project exemplifies the integration of AI in investment decision-making, focusing on overcoming cognitive biases and enhancing decision-making processes through multi-agent systems [27][30] - The report outlines the strategic intent behind AlphaAgents, which aims to leverage LLMs for complex reasoning and decision-making in asset management [30][31] - J.P. Morgan's AI strategy emphasizes building proprietary, trustworthy AI technologies, focusing on foundational models and automated decision-making to navigate complex financial systems [42][45] Group 5: Future Directions - The report suggests that the future of asset management will involve a seamless integration of AI capabilities into existing workflows, enhancing both decision-making and execution processes [39][41] - The emphasis on creating a "financial brain" through proprietary AI technologies positions firms like J.P. Morgan to maintain a competitive edge in the evolving financial landscape [52]
9998元抱回家!全球首款万元以下人形机器人来了,21自由度,能说会走,会尬舞
机器之心· 2025-10-22 08:46
Core Viewpoint - The article highlights the launch of the Bumi robot by Songyan Power, marking a significant step in making humanoid robots accessible to consumers with a price point of 9998 yuan, which is lower than many high-end smartphones, thus entering the consumer-grade market for the first time [4][5][39]. Product Overview - The Bumi robot features 21 degrees of freedom (DOF), allowing for advanced movement capabilities, including walking, dancing, and interacting with users [20][36]. - Weighing only 12 kg and standing at 94 cm, Bumi is designed to be lightweight and safe for children, making it suitable for educational and entertainment purposes [16][17][34]. - The robot is equipped with a 48V battery system, providing a runtime of 1 to 2 hours, which is adequate for short-term applications [32][33]. Company Background - Songyan Power has rapidly gained attention in the humanoid robot industry, completing six rounds of financing within two years and becoming a key player in the market [7][39]. - The company first gained public recognition during the Beijing Yizhuang Half Marathon, where its N2 robot independently completed the race, showcasing its capabilities [8][9]. Technological Innovation - The company utilizes self-developed servo motors and advanced motion control algorithms to ensure precise and stable movements of the robots [41]. - Songyan Power has made significant advancements in deep reinforcement learning, allowing robots to learn and adapt through trial and error, enhancing their performance in complex tasks [43][45]. Market Strategy - The company focuses on smaller humanoid robots, which are more affordable and versatile compared to full-sized models, catering to various applications in education, entertainment, and exhibitions [40][46]. - The successful integration of domestic supply chains has enabled the company to reduce costs and enhance production capabilities, contributing to the competitive pricing of the Bumi robot [47][48].
ICLR 2025 | SmODE:用于生成平滑控制动作的常微分方程神经网络
自动驾驶之心· 2025-09-01 23:32
Core Viewpoint - The research team led by Professor Li Shengbo from Tsinghua University has developed a novel smoothing neural network called SmODE, which utilizes ordinary differential equations (ODE) to enhance the smoothness of control actions in reinforcement learning tasks, thereby improving the usability and safety of intelligent systems [4][23]. Background - Deep Reinforcement Learning (DRL) has proven effective in solving optimal control problems in various applications, including drone control and autonomous driving. However, the smoothness of control actions remains a significant challenge due to high-frequency noise and unregulated Lipschitz constants in neural networks [5][19]. Key Technologies of SmODE - **Smoothing ODE Design**: The team designed a smoothing neuron structure based on ODEs that can adaptively filter high-frequency noise while controlling the Lipschitz constant, thus enhancing the performance of control systems [8][9]. - **Smoothing Network Structure**: SmODE is structured to be integrated into various reinforcement learning frameworks, featuring an input module, a smoothing ODE module, and an output module, which can be adjusted based on task complexity [14][16]. - **Reinforcement Learning Algorithm Based on SmODE**: SmODE can be easily combined with existing deep reinforcement learning algorithms, requiring additional loss terms to regulate the time constant and Lipschitz constant during training [16][17]. Experimental Results - In experiments with Gaussian noise variance set at 0.05, SmODE demonstrated significantly lower action volatility compared to traditional MLP networks, enhancing vehicle comfort and safety during tasks such as sine curve tracking and lane changing [19][21]. - In the MuJoCo benchmark tests, SmODE outperformed other networks (LTC, LipsNet, and MLP) in terms of average action smoothness across various tasks, indicating its effectiveness in real-world applications [21][22]. Conclusion - The SmODE network effectively addresses the oscillation issues in action outputs within deep reinforcement learning, providing a new approach to enhance the performance and stability of intelligent systems in real-world applications [23].
中原金太阳申请考虑碳捕捉效益的配电网内风电容量区间计算方法专利,实现碳效益‑经济成本的动态权衡
Jin Rong Jie· 2025-08-23 01:21
Group 1 - The company Henan Zhongyuan Jinyang Technology Co., Ltd. has applied for a patent titled "A Calculation Method for Wind Power Capacity Range in Distribution Networks Considering Carbon Capture Benefits" [1] - The patent application was published under CN120524785A and was filed on March 2025 [1] - The invention relates to the field of wind power capacity configuration and involves a method that integrates artificial intelligence algorithms with physical laws of energy systems [1] Group 2 - Henan Zhongyuan Jinyang Technology Co., Ltd. was established in 2020 and is located in Zhengzhou, primarily engaged in technology promotion and application services [2] - The company has a registered capital of 90 million RMB and has invested in 41 enterprises [2] - The company has participated in 91 bidding projects and holds 21 patents, along with 6 administrative licenses [2]
狄耐克:脑机交互事业部提出基于深度强化学习的主动式脑机接口共同控制方案
news flash· 2025-07-02 03:19
Core Insights - Dr. Peng Junren from Dineike's Brain-Computer Interface (BCI) division published a paper in the "Annals of the New York Academy of Sciences" discussing a new approach to shared autonomy between human electroencephalography and TD3 deep reinforcement learning [1] - The study indicates that approximately 15%-30% of users are unable to effectively operate traditional BCI systems due to physiological differences, highlighting a gap in current technology that only measures internal brain activity without considering environmental factors [1] - Dineike's BCI division proposes an active BCI co-control scheme based on deep reinforcement learning, aiming to provide a new paradigm for the universal application of BCIs through collaborative decision-making between humans and AI agents [1] - The next steps for Dineike involve focusing on breakthroughs in core technologies related to brainwave interaction and the industrialization of these technologies, moving from laboratory research to practical applications [1]
具身智能领域,全球Top50国/华人图谱(含具身智能赛道“师徒关系图”)
Robot猎场备忘录· 2025-06-30 08:09
Core Viewpoint - The development of embodied intelligence technology is a leading trend in the AI and robotics sector, involving advanced techniques such as large language models (LLM), visual multimodal models (VLM), reinforcement learning, deep reinforcement learning, and imitation learning [1]. Group 1: Embodied Intelligence Technology - Embodied intelligence technology encompasses various cutting-edge techniques, including LLM, VLM, reinforcement learning, deep reinforcement learning, and imitation learning [1]. - The evolution of humanoid robots has progressed from model-based control algorithms to dynamic model control and optimal control algorithms, and currently to simulation combined with reinforcement learning [1]. - The most frequently mentioned concepts in humanoid robotics companies are imitation learning and reinforcement learning, primarily researched by academic and leading tech company teams [1]. Group 2: Academic Contributions - UC Berkeley and Stanford University are leading institutions in the AI and robotics research field, with notable alumni contributing to the embodied intelligence sector [2]. - Four prominent figures from UC Berkeley, known as the "Four Returnees," have transitioned from Tsinghua University to UC Berkeley and then to entrepreneurial ventures in embodied intelligence [2]. Group 3: Notable Individuals in the Field - Wang He and Lu Ce Wu are key representatives of individuals who graduated from Stanford University and are now involved in the embodied intelligence startup scene in China [3]. - Wang He, a 2021 PhD graduate from Stanford, is now an assistant professor at Peking University and the founder of a leading humanoid robotics startup [3]. - Lu Ce Wu, a postdoctoral researcher at Stanford, is a co-founder and chief scientist of a unicorn collaborative robotics company and a founder of an embodied intelligence startup [3]. Group 4: Global Talent Pool - The majority of the top 50 Chinese individuals in the embodied intelligence field have educational backgrounds from prestigious institutions such as UC Berkeley, Stanford, MIT, and CMU, often under the mentorship of industry leaders [4]. - A detailed mapping of the top 50 Chinese talents in the field includes their educational history, research directions, and current positions in leading tech companies or startups [5].
港科大 | LiDAR端到端四足机器人全向避障系统 (宇树G1/Go2+PPO)
具身智能之心· 2025-06-29 09:51
Core Viewpoint - The article discusses the Omni-Perception framework developed by a team from the Hong Kong University of Science and Technology, which enables quadruped robots to navigate complex dynamic environments by directly processing raw LiDAR point cloud data for omnidirectional obstacle avoidance [2][4]. Group 1: Omni-Perception Framework Overview - The Omni-Perception framework consists of three main modules: PD-RiskNet perception network, high-fidelity LiDAR simulation tool, and risk-aware reinforcement learning strategy [4]. - The system takes raw LiDAR point clouds as input, extracts environmental risk features using PD-RiskNet, and outputs joint control signals, forming a complete closed-loop control [5]. Group 2: Advantages of the Framework - Direct utilization of spatiotemporal information avoids information loss during point cloud to grid/map conversion, preserving precise geometric relationships from the original data [7]. - Dynamic adaptability is achieved through reinforcement learning, allowing the robot to optimize obstacle avoidance strategies for previously unseen obstacle shapes [7]. - Computational efficiency is improved by reducing intermediate processing steps compared to traditional SLAM and planning pipelines [7]. Group 3: PD-RiskNet Architecture - PD-RiskNet employs a hierarchical risk perception network that processes near-field and far-field point clouds differently to capture local and global environmental features [8]. - The near-field processing uses farthest point sampling (FPS) to reduce data density while retaining key geometric features, and employs gated recurrent units (GRU) to capture local dynamic changes [8]. - The far-field processing uses average down-sampling to reduce noise and extract spatiotemporal features from distant environments [8]. Group 4: Reinforcement Learning Strategy - The obstacle avoidance task is modeled as an infinite horizon discounted Markov decision process, with state space including the robot's kinematic information and historical LiDAR point cloud sequences [10]. - The action space directly outputs target joint positions, allowing the policy to learn the mapping from raw sensor inputs to control signals without complex inverse kinematics [11]. - The reward function incorporates obstacle avoidance and distance maximization rewards to encourage the robot to seek open paths while penalizing deviations from target speeds [13][14]. Group 5: Simulation and Real-World Testing - The framework was validated against real LiDAR data collected using the Unitree G1 robot, demonstrating high consistency in point cloud distribution and structural integrity between simulated and real data [21]. - The Omni-Perception tool showed significant advantages in rendering efficiency, maintaining linear growth in rendering time as the number of environments increased, unlike traditional methods which exhibited exponential growth [22]. - In various tests, the framework achieved a 100% success rate in static obstacle scenarios and demonstrated superior performance in dynamic environments compared to traditional methods [26][27].
致敬钱学森,我国学者开发AI虚拟现实运动系统——灵境,解决青少年肥胖难题,揭示VR运动的减肥及促进大脑认知作用机制
生物世界· 2025-06-24 03:56
Core Viewpoint - Adolescent obesity is a global public health crisis with rising prevalence, leading to increased risks of cardiovascular and metabolic diseases, as well as cognitive impairments [2] Group 1: Research and Development - A research team from Shanghai Jiao Tong University and other institutions developed the world's first VR-based exercise intervention system, REVERIE, aimed at overweight adolescents [4][8] - The REVERIE system utilizes deep reinforcement learning and a Transformer-based virtual coach to provide safe, effective, and empathetic exercise guidance [4][8] Group 2: Study Design and Methodology - The study included a randomized controlled trial with 227 overweight adolescents, comparing outcomes between VR exercise, real-world exercise, and a control group [11] - Participants were assigned to different groups, including VR and real-world sports, with all groups receiving uniform dietary management over an eight-week intervention [11] Group 3: Results and Findings - After eight weeks, the VR exercise group lost an average of 4.28 kg of body fat, while the real-world exercise group lost 5.06 kg, showing comparable results [13] - Both VR and real-world exercise groups showed improvements in liver enzyme levels, LDL cholesterol, physical fitness, mental health, and exercise willingness [13] - VR exercise demonstrated superior cognitive function enhancement compared to real-world exercise, supported by fMRI findings indicating increased neural efficiency and plasticity [14] Group 4: Safety and Implications - The injury rate in the VR exercise group was 7.69%, lower than the 13.48% in the real-world exercise group, with no severe adverse events reported [15] - The REVERIE system is positioned as a promising solution for addressing adolescent obesity and promoting overall health improvements beyond weight loss [16][17]
字节跳动ByteBrain团队提出秒级推理强化学习VMR系统
news flash· 2025-06-05 06:49
Core Insights - ByteDance's ByteBrain team, in collaboration with UC Merced and UC Berkeley, has developed a VMR system based on deep reinforcement learning, achieving a significant reduction in inference time to 1.1 seconds while maintaining near-optimal performance [1] Group 1 - The VMR system addresses the long-neglected but critical issue of virtual machine re-scheduling (VMR) [1] - The research has been presented at the prestigious EuroSys25 conference, highlighting its academic significance [1] - The two co-first authors of the paper are interns from ByteDance's ByteBrain team, indicating the company's investment in nurturing talent [1]