深度强化学习
Search documents
98年清华博士辍学造机器人,一个月融了小5亿
3 6 Ke· 2025-11-26 10:42
Core Insights - The company Songyan Power has completed nearly 200 million yuan in Pre-B+ round financing, led by CICC Capital, to enhance technological innovation and expand high-value application scenarios [1] - The humanoid robot and embodied intelligence sector is experiencing significant capital influx, with Songyan Power raising nearly 500 million yuan across five financing rounds in 2023 [1] - A strategic partnership with "Programming Cat" was announced to create a humanoid robot programming education laboratory, targeting the consumer market with the launch of the Bumi robot priced under 10,000 yuan [1] Financing and Growth - Songyan Power's recent financing round follows a previous Pre-B round of nearly 300 million yuan, indicating strong investor confidence and growth potential [1] - The company aims to bridge the gap from research and development to mass production and delivery, focusing on expanding its ecosystem [1] Product Development and Market Strategy - The Bumi robot, priced at 9,998 yuan, is designed for technology enthusiasts and youth learning programming, marking a shift towards consumer-level products [1][25] - The company emphasizes that lowering the price of humanoid robots is a strategic move to expand the market rather than engaging in price wars with competitors [5][9] Leadership and Vision - The founder, Jiang Zheyuan, reflects on the challenges of transitioning from a technical focus to understanding market dynamics and consumer needs [2][3] - The company is positioned to capitalize on the growing demand for affordable humanoid robots, aiming to make them accessible to households [4][5] Competitive Landscape - Songyan Power differentiates itself from competitors by targeting a broader consumer base rather than focusing solely on B2B applications, which are currently more saturated [12][14] - The company acknowledges the presence of established players like Yuzhu but believes its unique pricing strategy and market approach will allow it to carve out a significant share [16][22] Future Outlook - The company anticipates that achieving sales of over 10,000 units will help cover research and development costs, indicating a healthy financial model [23] - The strategic focus on consumer education and programming capabilities is expected to enhance the product's value proposition and market acceptance [25][31]
人类战队迎来最强AI挑战者?马斯克宣布Grok 5 迎战《英雄联盟》最强人类
Sou Hu Cai Jing· 2025-11-26 10:17
Core Insights - Elon Musk announced that the AI model Grok 5 will challenge top human teams in League of Legends by 2026 [1] - The core design goal of Grok 5 is to "master any game through reading instructions and experimenting," aiming to validate its general artificial intelligence capabilities [3] - Grok 5 is set to have a parameter scale of 6 trillion, double that of Grok 3 and Grok 4, and is expected to outperform in all metrics [4] Game Challenge Details - The challenge will include limitations such as only being able to view the screen through a camera, with a vision range not exceeding normal eyesight [3] - Response delays and click rates will be strictly matched to human limits to avoid any technological advantages [3] - The addition of StarCraft as a competitive project was proposed by Oriol Vinyals, indicating potential expansion of the challenge [3] AI Development Significance - Games like StarCraft and League of Legends have become important testing grounds for AI capabilities, with mature AI able to achieve high precision in operations and tactical decisions through deep reinforcement learning [5] - However, there remains a gap in long-term strategic planning and response to unexpected situations compared to human players [5] - A fair competition between Grok 5 and top human teams could mark a significant milestone in the history of AI development [5]
首个AI控制器完成卫星在轨姿态调整验证
Ke Ji Ri Bao· 2025-11-14 00:20
Core Insights - The development of the world's first artificial intelligence (AI) attitude controller for satellites by scientists at the University of Würzburg represents a significant advancement in the autonomy of space systems [1][2] - The AI controller was successfully validated on a nanosatellite named InnoCube, demonstrating its ability to perform complete attitude maneuvers within a short time frame [1] - The project utilizes deep reinforcement learning technology, allowing the neural network to autonomously learn control strategies in a simulated environment, which is a departure from traditional fixed algorithms [1] Group 1 - The AI controller executed a complete attitude maneuver during a 9-minute satellite transit, adjusting the satellite's position with precision [1] - The innovative approach automates the parameter tuning process that traditionally takes months, enabling the controller to adapt to real environmental changes without manual calibration [1] - High-fidelity simulations were conducted on the ground before uploading the mature algorithms to the satellite, ensuring reliability in real space conditions [1] Group 2 - InnoCube serves as a platform for testing new concepts directly in orbit, highlighting its role in advancing space technology [2] - The wireless satellite bus SKITH, which replaces traditional wiring with wireless data transmission, reduces weight and potential failure points in the control system [2] - The validation of this AI controller opens new prospects for deep space exploration, where intelligent autonomous control systems will be crucial for spacecraft survival in interplanetary or deep space missions [2]
AI 赋能资产配置(十九):机构 AI+投资的实战创新之路
Guoxin Securities· 2025-10-29 07:16
Core Insights - The report emphasizes the transformative impact of AI on asset allocation, highlighting the shift from static optimization to dynamic, intelligent evolution in decision-making processes [1] - It identifies the integration of large language models (LLMs), deep reinforcement learning (DRL), and graph neural networks (GNNs) as key technologies reshaping investment research and execution [1][2] - The future of asset management is seen as a collaborative effort between human expertise and AI capabilities, necessitating a reconfiguration of organizational structures and strategies [3] Group 1: AI in Asset Allocation - LLMs are revolutionizing the understanding and quantification of unstructured financial texts, thus expanding the information boundaries traditionally relied upon in investment research [1][11] - The evolution of sentiment analysis from basic dictionary methods to advanced transformer-based models allows for more accurate emotional assessments in financial contexts [12][13] - The application of LLMs in algorithmic trading and risk management is highlighted, showcasing their ability to generate quantitative sentiment scores and identify early warning signals for market shifts [14][15] Group 2: Deep Reinforcement Learning (DRL) - DRL provides a framework for adaptive decision-making in asset allocation, moving beyond static models to a dynamic learning approach that maximizes long-term returns [17][18] - The report discusses various DRL algorithms, such as Actor-Critic methods and Proximal Policy Optimization, which show significant potential in financial applications [19][20] - Challenges in deploying DRL in real-world markets include data dependency, overfitting risks, and the need for models to adapt to different market cycles [21][22] Group 3: Graph Neural Networks (GNNs) - GNNs conceptualize the financial system as a network, allowing for a better understanding of risk transmission among financial institutions [23][24] - The ability of GNNs to model systemic risks and conduct stress testing provides valuable insights for regulators and investors alike [25][26] Group 4: Institutional Practices - BlackRock's AlphaAgents project exemplifies the integration of AI in investment decision-making, focusing on overcoming cognitive biases and enhancing decision-making processes through multi-agent systems [27][30] - The report outlines the strategic intent behind AlphaAgents, which aims to leverage LLMs for complex reasoning and decision-making in asset management [30][31] - J.P. Morgan's AI strategy emphasizes building proprietary, trustworthy AI technologies, focusing on foundational models and automated decision-making to navigate complex financial systems [42][45] Group 5: Future Directions - The report suggests that the future of asset management will involve a seamless integration of AI capabilities into existing workflows, enhancing both decision-making and execution processes [39][41] - The emphasis on creating a "financial brain" through proprietary AI technologies positions firms like J.P. Morgan to maintain a competitive edge in the evolving financial landscape [52]
9998元抱回家!全球首款万元以下人形机器人来了,21自由度,能说会走,会尬舞
机器之心· 2025-10-22 08:46
Core Viewpoint - The article highlights the launch of the Bumi robot by Songyan Power, marking a significant step in making humanoid robots accessible to consumers with a price point of 9998 yuan, which is lower than many high-end smartphones, thus entering the consumer-grade market for the first time [4][5][39]. Product Overview - The Bumi robot features 21 degrees of freedom (DOF), allowing for advanced movement capabilities, including walking, dancing, and interacting with users [20][36]. - Weighing only 12 kg and standing at 94 cm, Bumi is designed to be lightweight and safe for children, making it suitable for educational and entertainment purposes [16][17][34]. - The robot is equipped with a 48V battery system, providing a runtime of 1 to 2 hours, which is adequate for short-term applications [32][33]. Company Background - Songyan Power has rapidly gained attention in the humanoid robot industry, completing six rounds of financing within two years and becoming a key player in the market [7][39]. - The company first gained public recognition during the Beijing Yizhuang Half Marathon, where its N2 robot independently completed the race, showcasing its capabilities [8][9]. Technological Innovation - The company utilizes self-developed servo motors and advanced motion control algorithms to ensure precise and stable movements of the robots [41]. - Songyan Power has made significant advancements in deep reinforcement learning, allowing robots to learn and adapt through trial and error, enhancing their performance in complex tasks [43][45]. Market Strategy - The company focuses on smaller humanoid robots, which are more affordable and versatile compared to full-sized models, catering to various applications in education, entertainment, and exhibitions [40][46]. - The successful integration of domestic supply chains has enabled the company to reduce costs and enhance production capabilities, contributing to the competitive pricing of the Bumi robot [47][48].
ICLR 2025 | SmODE:用于生成平滑控制动作的常微分方程神经网络
自动驾驶之心· 2025-09-01 23:32
Core Viewpoint - The research team led by Professor Li Shengbo from Tsinghua University has developed a novel smoothing neural network called SmODE, which utilizes ordinary differential equations (ODE) to enhance the smoothness of control actions in reinforcement learning tasks, thereby improving the usability and safety of intelligent systems [4][23]. Background - Deep Reinforcement Learning (DRL) has proven effective in solving optimal control problems in various applications, including drone control and autonomous driving. However, the smoothness of control actions remains a significant challenge due to high-frequency noise and unregulated Lipschitz constants in neural networks [5][19]. Key Technologies of SmODE - **Smoothing ODE Design**: The team designed a smoothing neuron structure based on ODEs that can adaptively filter high-frequency noise while controlling the Lipschitz constant, thus enhancing the performance of control systems [8][9]. - **Smoothing Network Structure**: SmODE is structured to be integrated into various reinforcement learning frameworks, featuring an input module, a smoothing ODE module, and an output module, which can be adjusted based on task complexity [14][16]. - **Reinforcement Learning Algorithm Based on SmODE**: SmODE can be easily combined with existing deep reinforcement learning algorithms, requiring additional loss terms to regulate the time constant and Lipschitz constant during training [16][17]. Experimental Results - In experiments with Gaussian noise variance set at 0.05, SmODE demonstrated significantly lower action volatility compared to traditional MLP networks, enhancing vehicle comfort and safety during tasks such as sine curve tracking and lane changing [19][21]. - In the MuJoCo benchmark tests, SmODE outperformed other networks (LTC, LipsNet, and MLP) in terms of average action smoothness across various tasks, indicating its effectiveness in real-world applications [21][22]. Conclusion - The SmODE network effectively addresses the oscillation issues in action outputs within deep reinforcement learning, providing a new approach to enhance the performance and stability of intelligent systems in real-world applications [23].
中原金太阳申请考虑碳捕捉效益的配电网内风电容量区间计算方法专利,实现碳效益‑经济成本的动态权衡
Jin Rong Jie· 2025-08-23 01:21
Group 1 - The company Henan Zhongyuan Jinyang Technology Co., Ltd. has applied for a patent titled "A Calculation Method for Wind Power Capacity Range in Distribution Networks Considering Carbon Capture Benefits" [1] - The patent application was published under CN120524785A and was filed on March 2025 [1] - The invention relates to the field of wind power capacity configuration and involves a method that integrates artificial intelligence algorithms with physical laws of energy systems [1] Group 2 - Henan Zhongyuan Jinyang Technology Co., Ltd. was established in 2020 and is located in Zhengzhou, primarily engaged in technology promotion and application services [2] - The company has a registered capital of 90 million RMB and has invested in 41 enterprises [2] - The company has participated in 91 bidding projects and holds 21 patents, along with 6 administrative licenses [2]
狄耐克:脑机交互事业部提出基于深度强化学习的主动式脑机接口共同控制方案
news flash· 2025-07-02 03:19
Core Insights - Dr. Peng Junren from Dineike's Brain-Computer Interface (BCI) division published a paper in the "Annals of the New York Academy of Sciences" discussing a new approach to shared autonomy between human electroencephalography and TD3 deep reinforcement learning [1] - The study indicates that approximately 15%-30% of users are unable to effectively operate traditional BCI systems due to physiological differences, highlighting a gap in current technology that only measures internal brain activity without considering environmental factors [1] - Dineike's BCI division proposes an active BCI co-control scheme based on deep reinforcement learning, aiming to provide a new paradigm for the universal application of BCIs through collaborative decision-making between humans and AI agents [1] - The next steps for Dineike involve focusing on breakthroughs in core technologies related to brainwave interaction and the industrialization of these technologies, moving from laboratory research to practical applications [1]
具身智能领域,全球Top50国/华人图谱(含具身智能赛道“师徒关系图”)
Robot猎场备忘录· 2025-06-30 08:09
Core Viewpoint - The development of embodied intelligence technology is a leading trend in the AI and robotics sector, involving advanced techniques such as large language models (LLM), visual multimodal models (VLM), reinforcement learning, deep reinforcement learning, and imitation learning [1]. Group 1: Embodied Intelligence Technology - Embodied intelligence technology encompasses various cutting-edge techniques, including LLM, VLM, reinforcement learning, deep reinforcement learning, and imitation learning [1]. - The evolution of humanoid robots has progressed from model-based control algorithms to dynamic model control and optimal control algorithms, and currently to simulation combined with reinforcement learning [1]. - The most frequently mentioned concepts in humanoid robotics companies are imitation learning and reinforcement learning, primarily researched by academic and leading tech company teams [1]. Group 2: Academic Contributions - UC Berkeley and Stanford University are leading institutions in the AI and robotics research field, with notable alumni contributing to the embodied intelligence sector [2]. - Four prominent figures from UC Berkeley, known as the "Four Returnees," have transitioned from Tsinghua University to UC Berkeley and then to entrepreneurial ventures in embodied intelligence [2]. Group 3: Notable Individuals in the Field - Wang He and Lu Ce Wu are key representatives of individuals who graduated from Stanford University and are now involved in the embodied intelligence startup scene in China [3]. - Wang He, a 2021 PhD graduate from Stanford, is now an assistant professor at Peking University and the founder of a leading humanoid robotics startup [3]. - Lu Ce Wu, a postdoctoral researcher at Stanford, is a co-founder and chief scientist of a unicorn collaborative robotics company and a founder of an embodied intelligence startup [3]. Group 4: Global Talent Pool - The majority of the top 50 Chinese individuals in the embodied intelligence field have educational backgrounds from prestigious institutions such as UC Berkeley, Stanford, MIT, and CMU, often under the mentorship of industry leaders [4]. - A detailed mapping of the top 50 Chinese talents in the field includes their educational history, research directions, and current positions in leading tech companies or startups [5].
港科大 | LiDAR端到端四足机器人全向避障系统 (宇树G1/Go2+PPO)
具身智能之心· 2025-06-29 09:51
Core Viewpoint - The article discusses the Omni-Perception framework developed by a team from the Hong Kong University of Science and Technology, which enables quadruped robots to navigate complex dynamic environments by directly processing raw LiDAR point cloud data for omnidirectional obstacle avoidance [2][4]. Group 1: Omni-Perception Framework Overview - The Omni-Perception framework consists of three main modules: PD-RiskNet perception network, high-fidelity LiDAR simulation tool, and risk-aware reinforcement learning strategy [4]. - The system takes raw LiDAR point clouds as input, extracts environmental risk features using PD-RiskNet, and outputs joint control signals, forming a complete closed-loop control [5]. Group 2: Advantages of the Framework - Direct utilization of spatiotemporal information avoids information loss during point cloud to grid/map conversion, preserving precise geometric relationships from the original data [7]. - Dynamic adaptability is achieved through reinforcement learning, allowing the robot to optimize obstacle avoidance strategies for previously unseen obstacle shapes [7]. - Computational efficiency is improved by reducing intermediate processing steps compared to traditional SLAM and planning pipelines [7]. Group 3: PD-RiskNet Architecture - PD-RiskNet employs a hierarchical risk perception network that processes near-field and far-field point clouds differently to capture local and global environmental features [8]. - The near-field processing uses farthest point sampling (FPS) to reduce data density while retaining key geometric features, and employs gated recurrent units (GRU) to capture local dynamic changes [8]. - The far-field processing uses average down-sampling to reduce noise and extract spatiotemporal features from distant environments [8]. Group 4: Reinforcement Learning Strategy - The obstacle avoidance task is modeled as an infinite horizon discounted Markov decision process, with state space including the robot's kinematic information and historical LiDAR point cloud sequences [10]. - The action space directly outputs target joint positions, allowing the policy to learn the mapping from raw sensor inputs to control signals without complex inverse kinematics [11]. - The reward function incorporates obstacle avoidance and distance maximization rewards to encourage the robot to seek open paths while penalizing deviations from target speeds [13][14]. Group 5: Simulation and Real-World Testing - The framework was validated against real LiDAR data collected using the Unitree G1 robot, demonstrating high consistency in point cloud distribution and structural integrity between simulated and real data [21]. - The Omni-Perception tool showed significant advantages in rendering efficiency, maintaining linear growth in rendering time as the number of environments increased, unlike traditional methods which exhibited exponential growth [22]. - In various tests, the framework achieved a 100% success rate in static obstacle scenarios and demonstrated superior performance in dynamic environments compared to traditional methods [26][27].