深度强化学习 - filings, earnings calls, financial reports, news - Reportify

深度强化学习

Search documents

他们在实验室里“过大年”

Xin Lang Cai Jing· 2026-02-23 21:43

（来源：天津日报）转自：天津日报春节假期的天津大学深度强化学习实验室内传出一则好消息，由具身团队自主研发的具身基座模型R1.5 版，搭载于多种型号的机器人上正式亮相。走进实验室，记者看到这里布置成了"春节居家现场"。"把客厅收拾一下，再给客人做杯奶茶。"话音刚落，一台机器人便动了起来。它先是"看"到地上的纸团，机械臂随即抓起扫帚，沿着规划好的路线将垃圾清扫干净；接着转身来到操作台，轻轻捏起一个纸杯，加茶、加奶、搅拌，一杯冒着热气的奶茶稳稳端到"客人"面前。整个过程一气呵成，无需人为干预。这台机器人如此"通人性"，原因在于它刚刚被植入的"大脑"——由该团队自主研发的具身基座模型R1.5 版。 "看着简单，背后可费了大功夫。"实验室青年骨干汤宏垚副研究员向记者解释，为了让机器人像人一样灵活应对复杂环境，团队给它设计了一套"大脑+小脑"的分工架构。他打了个比方：大脑是多模态大模型，负责理解指令、拆解任务、规划路径；小脑则是基于强化学习的决策控制模块，负责具体动作的精准执行——用多大力气拿杯子才不会捏碎、移动中如何保持平衡，这些都是小脑在实时计算。 "大脑做决策，小脑做控制，两者配合好了，机器人 ...

深度强化学习

搭载具身基座模型R1.5版的机器人

具身基座模型R1.5版

深度强化学习

搭载具身基座模型R1.5版的机器人

具身基座模型R1.5版

98年清华博士辍学造机器人，一个月融了小5亿

3 6 Ke· 2025-11-26 10:42

Core Insights - The company Songyan Power has completed nearly 200 million yuan in Pre-B+ round financing, led by CICC Capital, to enhance technological innovation and expand high-value application scenarios [1] - The humanoid robot and embodied intelligence sector is experiencing significant capital influx, with Songyan Power raising nearly 500 million yuan across five financing rounds in 2023 [1] - A strategic partnership with "Programming Cat" was announced to create a humanoid robot programming education laboratory, targeting the consumer market with the launch of the Bumi robot priced under 10,000 yuan [1] Financing and Growth - Songyan Power's recent financing round follows a previous Pre-B round of nearly 300 million yuan, indicating strong investor confidence and growth potential [1] - The company aims to bridge the gap from research and development to mass production and delivery, focusing on expanding its ecosystem [1] Product Development and Market Strategy - The Bumi robot, priced at 9,998 yuan, is designed for technology enthusiasts and youth learning programming, marking a shift towards consumer-level products [1][25] - The company emphasizes that lowering the price of humanoid robots is a strategic move to expand the market rather than engaging in price wars with competitors [5][9] Leadership and Vision - The founder, Jiang Zheyuan, reflects on the challenges of transitioning from a technical focus to understanding market dynamics and consumer needs [2][3] - The company is positioned to capitalize on the growing demand for affordable humanoid robots, aiming to make them accessible to households [4][5] Competitive Landscape - Songyan Power differentiates itself from competitors by targeting a broader consumer base rather than focusing solely on B2B applications, which are currently more saturated [12][14] - The company acknowledges the presence of established players like Yuzhu but believes its unique pricing strategy and market approach will allow it to carve out a significant share [16][22] Future Outlook - The company anticipates that achieving sales of over 10,000 units will help cover research and development costs, indicating a healthy financial model [23] - The strategic focus on consumer education and programming capabilities is expected to enhance the product's value proposition and market acceptance [25][31]

SIASUN(SZ:300024)

人形机器人

深度强化学习

E1和N2机器人

人形机器人

深度强化学习

E1和N2机器人

人类战队迎来最强AI挑战者？马斯克宣布Grok 5 迎战《英雄联盟》最强人类

Sou Hu Cai Jing· 2025-11-26 10:17

Core Insights - Elon Musk announced that the AI model Grok 5 will challenge top human teams in League of Legends by 2026 [1] - The core design goal of Grok 5 is to "master any game through reading instructions and experimenting," aiming to validate its general artificial intelligence capabilities [3] - Grok 5 is set to have a parameter scale of 6 trillion, double that of Grok 3 and Grok 4, and is expected to outperform in all metrics [4] Game Challenge Details - The challenge will include limitations such as only being able to view the screen through a camera, with a vision range not exceeding normal eyesight [3] - Response delays and click rates will be strictly matched to human limits to avoid any technological advantages [3] - The addition of StarCraft as a competitive project was proposed by Oriol Vinyals, indicating potential expansion of the challenge [3] AI Development Significance - Games like StarCraft and League of Legends have become important testing grounds for AI capabilities, with mature AI able to achieve high precision in operations and tactical decisions through deep reinforcement learning [5] - However, there remains a gap in long-term strategic planning and response to unexpected situations compared to human players [5] - A fair competition between Grok 5 and top human teams could mark a significant milestone in the history of AI development [5]

通用人工智能

深度强化学习

《英雄联盟》

《星际争霸》

通用人工智能

深度强化学习

《英雄联盟》

《星际争霸》

首个AI控制器完成卫星在轨姿态调整验证

Ke Ji Ri Bao· 2025-11-14 00:20

Core Insights - The development of the world's first artificial intelligence (AI) attitude controller for satellites by scientists at the University of Würzburg represents a significant advancement in the autonomy of space systems [1][2] - The AI controller was successfully validated on a nanosatellite named InnoCube, demonstrating its ability to perform complete attitude maneuvers within a short time frame [1] - The project utilizes deep reinforcement learning technology, allowing the neural network to autonomously learn control strategies in a simulated environment, which is a departure from traditional fixed algorithms [1] Group 1 - The AI controller executed a complete attitude maneuver during a 9-minute satellite transit, adjusting the satellite's position with precision [1] - The innovative approach automates the parameter tuning process that traditionally takes months, enabling the controller to adapt to real environmental changes without manual calibration [1] - High-fidelity simulations were conducted on the ground before uploading the mature algorithms to the satellite, ensuring reliability in real space conditions [1] Group 2 - InnoCube serves as a platform for testing new concepts directly in orbit, highlighting its role in advancing space technology [2] - The wireless satellite bus SKITH, which replaces traditional wiring with wireless data transmission, reduces weight and potential failure points in the control system [2] - The validation of this AI controller opens new prospects for deep space exploration, where intelligent autonomous control systems will be crucial for spacecraft survival in interplanetary or deep space missions [2]

深度强化学习

AI在轨卫星姿态控制器

无线卫星总线SKITH

深度强化学习

AI在轨卫星姿态控制器

无线卫星总线SKITH

AI 赋能资产配置（十九）：机构 AI+投资的实战创新之路

Guoxin Securities· 2025-10-29 07:16

Core Insights - The report emphasizes the transformative impact of AI on asset allocation, highlighting the shift from static optimization to dynamic, intelligent evolution in decision-making processes [1] - It identifies the integration of large language models (LLMs), deep reinforcement learning (DRL), and graph neural networks (GNNs) as key technologies reshaping investment research and execution [1][2] - The future of asset management is seen as a collaborative effort between human expertise and AI capabilities, necessitating a reconfiguration of organizational structures and strategies [3] Group 1: AI in Asset Allocation - LLMs are revolutionizing the understanding and quantification of unstructured financial texts, thus expanding the information boundaries traditionally relied upon in investment research [1][11] - The evolution of sentiment analysis from basic dictionary methods to advanced transformer-based models allows for more accurate emotional assessments in financial contexts [12][13] - The application of LLMs in algorithmic trading and risk management is highlighted, showcasing their ability to generate quantitative sentiment scores and identify early warning signals for market shifts [14][15] Group 2: Deep Reinforcement Learning (DRL) - DRL provides a framework for adaptive decision-making in asset allocation, moving beyond static models to a dynamic learning approach that maximizes long-term returns [17][18] - The report discusses various DRL algorithms, such as Actor-Critic methods and Proximal Policy Optimization, which show significant potential in financial applications [19][20] - Challenges in deploying DRL in real-world markets include data dependency, overfitting risks, and the need for models to adapt to different market cycles [21][22] Group 3: Graph Neural Networks (GNNs) - GNNs conceptualize the financial system as a network, allowing for a better understanding of risk transmission among financial institutions [23][24] - The ability of GNNs to model systemic risks and conduct stress testing provides valuable insights for regulators and investors alike [25][26] Group 4: Institutional Practices - BlackRock's AlphaAgents project exemplifies the integration of AI in investment decision-making, focusing on overcoming cognitive biases and enhancing decision-making processes through multi-agent systems [27][30] - The report outlines the strategic intent behind AlphaAgents, which aims to leverage LLMs for complex reasoning and decision-making in asset management [30][31] - J.P. Morgan's AI strategy emphasizes building proprietary, trustworthy AI technologies, focusing on foundational models and automated decision-making to navigate complex financial systems [42][45] Group 5: Future Directions - The report suggests that the future of asset management will involve a seamless integration of AI capabilities into existing workflows, enhancing both decision-making and execution processes [39][41] - The emphasis on creating a "financial brain" through proprietary AI technologies positions firms like J.P. Morgan to maintain a competitive edge in the evolving financial landscape [52]

AI赋能资产配置

大语言模型

深度强化学习

图神经网络

AI赋能资产配置

大语言模型

深度强化学习

图神经网络

9998元抱回家！全球首款万元以下人形机器人来了，21自由度，能说会走，会尬舞

机器之心· 2025-10-22 08:46

Core Viewpoint - The article highlights the launch of the Bumi robot by Songyan Power, marking a significant step in making humanoid robots accessible to consumers with a price point of 9998 yuan, which is lower than many high-end smartphones, thus entering the consumer-grade market for the first time [4][5][39]. Product Overview - The Bumi robot features 21 degrees of freedom (DOF), allowing for advanced movement capabilities, including walking, dancing, and interacting with users [20][36]. - Weighing only 12 kg and standing at 94 cm, Bumi is designed to be lightweight and safe for children, making it suitable for educational and entertainment purposes [16][17][34]. - The robot is equipped with a 48V battery system, providing a runtime of 1 to 2 hours, which is adequate for short-term applications [32][33]. Company Background - Songyan Power has rapidly gained attention in the humanoid robot industry, completing six rounds of financing within two years and becoming a key player in the market [7][39]. - The company first gained public recognition during the Beijing Yizhuang Half Marathon, where its N2 robot independently completed the race, showcasing its capabilities [8][9]. Technological Innovation - The company utilizes self-developed servo motors and advanced motion control algorithms to ensure precise and stable movements of the robots [41]. - Songyan Power has made significant advancements in deep reinforcement learning, allowing robots to learn and adapt through trial and error, enhancing their performance in complex tasks [43][45]. Market Strategy - The company focuses on smaller humanoid robots, which are more affordable and versatile compared to full-sized models, catering to various applications in education, entertainment, and exhibitions [40][46]. - The successful integration of domestic supply chains has enabled the company to reduce costs and enhance production capabilities, contributing to the competitive pricing of the Bumi robot [47][48].

人形机器人

消费级机器人

深度强化学习

松延动力 Bumi 小布米

人形机器人

消费级机器人

深度强化学习

松延动力 Bumi 小布米

ICLR 2025 | SmODE：用于生成平滑控制动作的常微分方程神经网络

自动驾驶之心· 2025-09-01 23:32

Core Viewpoint - The research team led by Professor Li Shengbo from Tsinghua University has developed a novel smoothing neural network called SmODE, which utilizes ordinary differential equations (ODE) to enhance the smoothness of control actions in reinforcement learning tasks, thereby improving the usability and safety of intelligent systems [4][23]. Background - Deep Reinforcement Learning (DRL) has proven effective in solving optimal control problems in various applications, including drone control and autonomous driving. However, the smoothness of control actions remains a significant challenge due to high-frequency noise and unregulated Lipschitz constants in neural networks [5][19]. Key Technologies of SmODE - **Smoothing ODE Design**: The team designed a smoothing neuron structure based on ODEs that can adaptively filter high-frequency noise while controlling the Lipschitz constant, thus enhancing the performance of control systems [8][9]. - **Smoothing Network Structure**: SmODE is structured to be integrated into various reinforcement learning frameworks, featuring an input module, a smoothing ODE module, and an output module, which can be adjusted based on task complexity [14][16]. - **Reinforcement Learning Algorithm Based on SmODE**: SmODE can be easily combined with existing deep reinforcement learning algorithms, requiring additional loss terms to regulate the time constant and Lipschitz constant during training [16][17]. Experimental Results - In experiments with Gaussian noise variance set at 0.05, SmODE demonstrated significantly lower action volatility compared to traditional MLP networks, enhancing vehicle comfort and safety during tasks such as sine curve tracking and lane changing [19][21]. - In the MuJoCo benchmark tests, SmODE outperformed other networks (LTC, LipsNet, and MLP) in terms of average action smoothness across various tasks, indicating its effectiveness in real-world applications [21][22]. Conclusion - The SmODE network effectively addresses the oscillation issues in action outputs within deep reinforcement learning, providing a new approach to enhance the performance and stability of intelligent systems in real-world applications [23].

深度强化学习

常微分方程神经网络

SmODE（Smooth Ordinary Differential Equations）

深度强化学习

常微分方程神经网络

SmODE（Smooth Ordinary Differential Equations）

中原金太阳申请考虑碳捕捉效益的配电网内风电容量区间计算方法专利，实现碳效益‑经济成本的动态权衡

Jin Rong Jie· 2025-08-23 01:21

Group 1 - The company Henan Zhongyuan Jinyang Technology Co., Ltd. has applied for a patent titled "A Calculation Method for Wind Power Capacity Range in Distribution Networks Considering Carbon Capture Benefits" [1] - The patent application was published under CN120524785A and was filed on March 2025 [1] - The invention relates to the field of wind power capacity configuration and involves a method that integrates artificial intelligence algorithms with physical laws of energy systems [1] Group 2 - Henan Zhongyuan Jinyang Technology Co., Ltd. was established in 2020 and is located in Zhengzhou, primarily engaged in technology promotion and application services [2] - The company has a registered capital of 90 million RMB and has invested in 41 enterprises [2] - The company has participated in 91 bidding projects and holds 21 patents, along with 6 administrative licenses [2]

GOLDEN SUN(SZ:300606)

碳捕捉效益

风电容量配置

人工智能算法

深度强化学习

图卷积网络

贝叶斯深度学习

碳捕捉效益

风电容量配置

人工智能算法

深度强化学习

图卷积网络

贝叶斯深度学习

狄耐克：脑机交互事业部提出基于深度强化学习的主动式脑机接口共同控制方案

news flash· 2025-07-02 03:19

Core Insights - Dr. Peng Junren from Dineike's Brain-Computer Interface (BCI) division published a paper in the "Annals of the New York Academy of Sciences" discussing a new approach to shared autonomy between human electroencephalography and TD3 deep reinforcement learning [1] - The study indicates that approximately 15%-30% of users are unable to effectively operate traditional BCI systems due to physiological differences, highlighting a gap in current technology that only measures internal brain activity without considering environmental factors [1] - Dineike's BCI division proposes an active BCI co-control scheme based on deep reinforcement learning, aiming to provide a new paradigm for the universal application of BCIs through collaborative decision-making between humans and AI agents [1] - The next steps for Dineike involve focusing on breakthroughs in core technologies related to brainwave interaction and the industrialization of these technologies, moving from laboratory research to practical applications [1]

Dnake (Xiamen) Intelligent Technology (SZ:300884)

深度强化学习

基于深度强化学习的主动式脑机接口共同控制方案

深度强化学习

基于深度强化学习的主动式脑机接口共同控制方案

具身智能领域，全球Top50国/华人图谱（含具身智能赛道“师徒关系图”）

Robot猎场备忘录· 2025-06-30 08:09

Core Viewpoint - The development of embodied intelligence technology is a leading trend in the AI and robotics sector, involving advanced techniques such as large language models (LLM), visual multimodal models (VLM), reinforcement learning, deep reinforcement learning, and imitation learning [1]. Group 1: Embodied Intelligence Technology - Embodied intelligence technology encompasses various cutting-edge techniques, including LLM, VLM, reinforcement learning, deep reinforcement learning, and imitation learning [1]. - The evolution of humanoid robots has progressed from model-based control algorithms to dynamic model control and optimal control algorithms, and currently to simulation combined with reinforcement learning [1]. - The most frequently mentioned concepts in humanoid robotics companies are imitation learning and reinforcement learning, primarily researched by academic and leading tech company teams [1]. Group 2: Academic Contributions - UC Berkeley and Stanford University are leading institutions in the AI and robotics research field, with notable alumni contributing to the embodied intelligence sector [2]. - Four prominent figures from UC Berkeley, known as the "Four Returnees," have transitioned from Tsinghua University to UC Berkeley and then to entrepreneurial ventures in embodied intelligence [2]. Group 3: Notable Individuals in the Field - Wang He and Lu Ce Wu are key representatives of individuals who graduated from Stanford University and are now involved in the embodied intelligence startup scene in China [3]. - Wang He, a 2021 PhD graduate from Stanford, is now an assistant professor at Peking University and the founder of a leading humanoid robotics startup [3]. - Lu Ce Wu, a postdoctoral researcher at Stanford, is a co-founder and chief scientist of a unicorn collaborative robotics company and a founder of an embodied intelligence startup [3]. Group 4: Global Talent Pool - The majority of the top 50 Chinese individuals in the embodied intelligence field have educational backgrounds from prestigious institutions such as UC Berkeley, Stanford, MIT, and CMU, often under the mentorship of industry leaders [4]. - A detailed mapping of the top 50 Chinese talents in the field includes their educational history, research directions, and current positions in leading tech companies or startups [5].

人形机器人

深度强化学习

人形机器人

深度强化学习