深度强化学习
Search documents
羊毛党大军,狂薅模型Token;明星AI公司,上市却现离职潮;华东大厂AI基建华南遇阻丨AI 情报局 VOL.2
雷峰网· 2026-03-26 04:11
Group 1 - The article discusses the surge in token consumption following the emergence of Openclaw, leading to the development of a hidden "wool harvesting" industry targeting overseas model vendors [2] - Various methods to exploit OpenAI and Google services are highlighted, including the use of virtual cards and unlimited API access packages, which are being resold for profit [3][4] - A specific example is provided where users exploit Google Cloud's promotional offers by creating multiple accounts to gain significant free computing resources at minimal cost [4] Group 2 - A notable internal conflict has arisen within a prominent embodied intelligence company, leading to the departure of key team members and the establishment of a competing firm [5] - The founder of the original company, previously a high-ranking executive at a leading consumer electronics firm, faces scrutiny as the company struggles with governance issues [6] - Another AI company is experiencing a wave of employee departures just before its IPO, attributed to long vesting periods for stock options and limited resources for innovation [7] Group 3 - A new AI video generation model from Beijing has begun limited API access for major domestic clients, with significant interest and demand reported [8] - A leading embodied intelligence hardware company is facing supplier dissatisfaction due to demands for confidential supplier lists, raising concerns about potential risks [9] - A major internet company in East China is encountering challenges in expanding its computing power project in South China due to past negative experiences with local authorities [10] Group 4 - Three domestic AI chip companies have entered the procurement list of a major internet company in East China, with potential orders amounting to billions of RMB [10] - A significant shift in the robotics industry is noted, driven by a young innovator who demonstrated the capabilities of deep reinforcement learning, leading to a major company hiring him at a high salary [11] - A humorous anecdote illustrates the challenges faced by a billion-dollar unicorn company in impressing investors, highlighting the gap between advanced technology and practical application [12][13]
ICLR 2026 Oral|中科院团队提出新框架「SparseRL」,深度强化学习可自动生成高性能CUDA代码
机器之心· 2026-03-25 07:01
Core Insights - The article discusses the introduction of a new framework called SparseRL by a team from the Chinese Academy of Sciences, which integrates deep reinforcement learning into the task of generating sparse CUDA code, aiming to optimize code performance based on the structure of sparse matrices [2][5]. Group 1: Framework and Methodology - SparseRL enhances the compilation success rate by 20% and execution speed by 30% in classic SpMV tasks [3][16]. - The framework employs a pre-trained language model as a policy network, where each token generation represents an action, and the compilation results and execution times serve as reward signals [12][18]. - The training process consists of three stages: pre-training on CUDA code, supervised fine-tuning with sparse matrix-code pairs, and reinforcement learning optimization focusing on both correctness and efficiency [18][20]. Group 2: Innovations and Challenges - A key innovation is the use of sinusoidal position embeddings to help the model understand the spatial relationships of non-zero elements in sparse matrices, akin to positional encoding in Transformers [13][14]. - The hierarchical reward function balances correctness and efficiency, ensuring that the generated code is both functional and performant [14][17]. - The method faces challenges such as high computational costs for reinforcement learning training, the need for retraining on new hardware architectures, and potential lack of human-like coding style and interpretability in generated code [20]. Group 3: Significance and Future Directions - SparseRL signifies a paradigm shift from generating merely runnable code to producing high-performance code, suggesting a new potential for AI in handling performance optimization tasks [22]. - Future plans include extending the method to multi-GPU distributed sparse computing, exploring integration with traditional AutoTuning techniques, and supporting a wider range of sparse operators [22].
他们在实验室里“过大年”
Xin Lang Cai Jing· 2026-02-23 21:43
Core Insights - The core message highlights the successful development and demonstration of the R1.5 embodiment model by the team at Tianjin University, showcasing its application in various robotic tasks during the Spring Festival holiday [1][2]. Group 1: Technological Advancements - The R1.5 model enables robots to perform tasks such as cleaning, making tea, and adapting to new environments without prior exposure, demonstrating "zero-shot adaptation" capabilities [2]. - The robot operates using a dual-architecture system, where the "brain" (multi-modal model) handles task understanding and planning, while the "small brain" (reinforcement learning module) executes precise actions [1][2]. Group 2: Collaboration and Outreach - The team integrates expertise from different fields, including new media, to effectively communicate their research outcomes through engaging videos that illustrate the robots' capabilities in real-life scenarios [2]. - The lab's efforts to present their work visually aim to enhance public understanding and appreciation of their technological advancements, moving beyond traditional academic dissemination [2][3]. Group 3: Recognition and Future Aspirations - The laboratory recently received the first prize in the 2025 China Society of Image and Graphics Technology Progress Award for their project on intelligent decision-making technologies [2]. - The team is committed to advancing their research and aims to bring their AI innovations into everyday life, reflecting a vision for a more intelligent future [3].
98年清华博士辍学造机器人,一个月融了小5亿
3 6 Ke· 2025-11-26 10:42
Core Insights - The company Songyan Power has completed nearly 200 million yuan in Pre-B+ round financing, led by CICC Capital, to enhance technological innovation and expand high-value application scenarios [1] - The humanoid robot and embodied intelligence sector is experiencing significant capital influx, with Songyan Power raising nearly 500 million yuan across five financing rounds in 2023 [1] - A strategic partnership with "Programming Cat" was announced to create a humanoid robot programming education laboratory, targeting the consumer market with the launch of the Bumi robot priced under 10,000 yuan [1] Financing and Growth - Songyan Power's recent financing round follows a previous Pre-B round of nearly 300 million yuan, indicating strong investor confidence and growth potential [1] - The company aims to bridge the gap from research and development to mass production and delivery, focusing on expanding its ecosystem [1] Product Development and Market Strategy - The Bumi robot, priced at 9,998 yuan, is designed for technology enthusiasts and youth learning programming, marking a shift towards consumer-level products [1][25] - The company emphasizes that lowering the price of humanoid robots is a strategic move to expand the market rather than engaging in price wars with competitors [5][9] Leadership and Vision - The founder, Jiang Zheyuan, reflects on the challenges of transitioning from a technical focus to understanding market dynamics and consumer needs [2][3] - The company is positioned to capitalize on the growing demand for affordable humanoid robots, aiming to make them accessible to households [4][5] Competitive Landscape - Songyan Power differentiates itself from competitors by targeting a broader consumer base rather than focusing solely on B2B applications, which are currently more saturated [12][14] - The company acknowledges the presence of established players like Yuzhu but believes its unique pricing strategy and market approach will allow it to carve out a significant share [16][22] Future Outlook - The company anticipates that achieving sales of over 10,000 units will help cover research and development costs, indicating a healthy financial model [23] - The strategic focus on consumer education and programming capabilities is expected to enhance the product's value proposition and market acceptance [25][31]
人类战队迎来最强AI挑战者?马斯克宣布Grok 5 迎战《英雄联盟》最强人类
Sou Hu Cai Jing· 2025-11-26 10:17
Core Insights - Elon Musk announced that the AI model Grok 5 will challenge top human teams in League of Legends by 2026 [1] - The core design goal of Grok 5 is to "master any game through reading instructions and experimenting," aiming to validate its general artificial intelligence capabilities [3] - Grok 5 is set to have a parameter scale of 6 trillion, double that of Grok 3 and Grok 4, and is expected to outperform in all metrics [4] Game Challenge Details - The challenge will include limitations such as only being able to view the screen through a camera, with a vision range not exceeding normal eyesight [3] - Response delays and click rates will be strictly matched to human limits to avoid any technological advantages [3] - The addition of StarCraft as a competitive project was proposed by Oriol Vinyals, indicating potential expansion of the challenge [3] AI Development Significance - Games like StarCraft and League of Legends have become important testing grounds for AI capabilities, with mature AI able to achieve high precision in operations and tactical decisions through deep reinforcement learning [5] - However, there remains a gap in long-term strategic planning and response to unexpected situations compared to human players [5] - A fair competition between Grok 5 and top human teams could mark a significant milestone in the history of AI development [5]
首个AI控制器完成卫星在轨姿态调整验证
Ke Ji Ri Bao· 2025-11-14 00:20
Core Insights - The development of the world's first artificial intelligence (AI) attitude controller for satellites by scientists at the University of Würzburg represents a significant advancement in the autonomy of space systems [1][2] - The AI controller was successfully validated on a nanosatellite named InnoCube, demonstrating its ability to perform complete attitude maneuvers within a short time frame [1] - The project utilizes deep reinforcement learning technology, allowing the neural network to autonomously learn control strategies in a simulated environment, which is a departure from traditional fixed algorithms [1] Group 1 - The AI controller executed a complete attitude maneuver during a 9-minute satellite transit, adjusting the satellite's position with precision [1] - The innovative approach automates the parameter tuning process that traditionally takes months, enabling the controller to adapt to real environmental changes without manual calibration [1] - High-fidelity simulations were conducted on the ground before uploading the mature algorithms to the satellite, ensuring reliability in real space conditions [1] Group 2 - InnoCube serves as a platform for testing new concepts directly in orbit, highlighting its role in advancing space technology [2] - The wireless satellite bus SKITH, which replaces traditional wiring with wireless data transmission, reduces weight and potential failure points in the control system [2] - The validation of this AI controller opens new prospects for deep space exploration, where intelligent autonomous control systems will be crucial for spacecraft survival in interplanetary or deep space missions [2]
AI 赋能资产配置(十九):机构 AI+投资的实战创新之路
Guoxin Securities· 2025-10-29 07:16
Core Insights - The report emphasizes the transformative impact of AI on asset allocation, highlighting the shift from static optimization to dynamic, intelligent evolution in decision-making processes [1] - It identifies the integration of large language models (LLMs), deep reinforcement learning (DRL), and graph neural networks (GNNs) as key technologies reshaping investment research and execution [1][2] - The future of asset management is seen as a collaborative effort between human expertise and AI capabilities, necessitating a reconfiguration of organizational structures and strategies [3] Group 1: AI in Asset Allocation - LLMs are revolutionizing the understanding and quantification of unstructured financial texts, thus expanding the information boundaries traditionally relied upon in investment research [1][11] - The evolution of sentiment analysis from basic dictionary methods to advanced transformer-based models allows for more accurate emotional assessments in financial contexts [12][13] - The application of LLMs in algorithmic trading and risk management is highlighted, showcasing their ability to generate quantitative sentiment scores and identify early warning signals for market shifts [14][15] Group 2: Deep Reinforcement Learning (DRL) - DRL provides a framework for adaptive decision-making in asset allocation, moving beyond static models to a dynamic learning approach that maximizes long-term returns [17][18] - The report discusses various DRL algorithms, such as Actor-Critic methods and Proximal Policy Optimization, which show significant potential in financial applications [19][20] - Challenges in deploying DRL in real-world markets include data dependency, overfitting risks, and the need for models to adapt to different market cycles [21][22] Group 3: Graph Neural Networks (GNNs) - GNNs conceptualize the financial system as a network, allowing for a better understanding of risk transmission among financial institutions [23][24] - The ability of GNNs to model systemic risks and conduct stress testing provides valuable insights for regulators and investors alike [25][26] Group 4: Institutional Practices - BlackRock's AlphaAgents project exemplifies the integration of AI in investment decision-making, focusing on overcoming cognitive biases and enhancing decision-making processes through multi-agent systems [27][30] - The report outlines the strategic intent behind AlphaAgents, which aims to leverage LLMs for complex reasoning and decision-making in asset management [30][31] - J.P. Morgan's AI strategy emphasizes building proprietary, trustworthy AI technologies, focusing on foundational models and automated decision-making to navigate complex financial systems [42][45] Group 5: Future Directions - The report suggests that the future of asset management will involve a seamless integration of AI capabilities into existing workflows, enhancing both decision-making and execution processes [39][41] - The emphasis on creating a "financial brain" through proprietary AI technologies positions firms like J.P. Morgan to maintain a competitive edge in the evolving financial landscape [52]
9998元抱回家!全球首款万元以下人形机器人来了,21自由度,能说会走,会尬舞
机器之心· 2025-10-22 08:46
Core Viewpoint - The article highlights the launch of the Bumi robot by Songyan Power, marking a significant step in making humanoid robots accessible to consumers with a price point of 9998 yuan, which is lower than many high-end smartphones, thus entering the consumer-grade market for the first time [4][5][39]. Product Overview - The Bumi robot features 21 degrees of freedom (DOF), allowing for advanced movement capabilities, including walking, dancing, and interacting with users [20][36]. - Weighing only 12 kg and standing at 94 cm, Bumi is designed to be lightweight and safe for children, making it suitable for educational and entertainment purposes [16][17][34]. - The robot is equipped with a 48V battery system, providing a runtime of 1 to 2 hours, which is adequate for short-term applications [32][33]. Company Background - Songyan Power has rapidly gained attention in the humanoid robot industry, completing six rounds of financing within two years and becoming a key player in the market [7][39]. - The company first gained public recognition during the Beijing Yizhuang Half Marathon, where its N2 robot independently completed the race, showcasing its capabilities [8][9]. Technological Innovation - The company utilizes self-developed servo motors and advanced motion control algorithms to ensure precise and stable movements of the robots [41]. - Songyan Power has made significant advancements in deep reinforcement learning, allowing robots to learn and adapt through trial and error, enhancing their performance in complex tasks [43][45]. Market Strategy - The company focuses on smaller humanoid robots, which are more affordable and versatile compared to full-sized models, catering to various applications in education, entertainment, and exhibitions [40][46]. - The successful integration of domestic supply chains has enabled the company to reduce costs and enhance production capabilities, contributing to the competitive pricing of the Bumi robot [47][48].
ICLR 2025 | SmODE:用于生成平滑控制动作的常微分方程神经网络
自动驾驶之心· 2025-09-01 23:32
Core Viewpoint - The research team led by Professor Li Shengbo from Tsinghua University has developed a novel smoothing neural network called SmODE, which utilizes ordinary differential equations (ODE) to enhance the smoothness of control actions in reinforcement learning tasks, thereby improving the usability and safety of intelligent systems [4][23]. Background - Deep Reinforcement Learning (DRL) has proven effective in solving optimal control problems in various applications, including drone control and autonomous driving. However, the smoothness of control actions remains a significant challenge due to high-frequency noise and unregulated Lipschitz constants in neural networks [5][19]. Key Technologies of SmODE - **Smoothing ODE Design**: The team designed a smoothing neuron structure based on ODEs that can adaptively filter high-frequency noise while controlling the Lipschitz constant, thus enhancing the performance of control systems [8][9]. - **Smoothing Network Structure**: SmODE is structured to be integrated into various reinforcement learning frameworks, featuring an input module, a smoothing ODE module, and an output module, which can be adjusted based on task complexity [14][16]. - **Reinforcement Learning Algorithm Based on SmODE**: SmODE can be easily combined with existing deep reinforcement learning algorithms, requiring additional loss terms to regulate the time constant and Lipschitz constant during training [16][17]. Experimental Results - In experiments with Gaussian noise variance set at 0.05, SmODE demonstrated significantly lower action volatility compared to traditional MLP networks, enhancing vehicle comfort and safety during tasks such as sine curve tracking and lane changing [19][21]. - In the MuJoCo benchmark tests, SmODE outperformed other networks (LTC, LipsNet, and MLP) in terms of average action smoothness across various tasks, indicating its effectiveness in real-world applications [21][22]. Conclusion - The SmODE network effectively addresses the oscillation issues in action outputs within deep reinforcement learning, providing a new approach to enhance the performance and stability of intelligent systems in real-world applications [23].
中原金太阳申请考虑碳捕捉效益的配电网内风电容量区间计算方法专利,实现碳效益‑经济成本的动态权衡
Jin Rong Jie· 2025-08-23 01:21
Group 1 - The company Henan Zhongyuan Jinyang Technology Co., Ltd. has applied for a patent titled "A Calculation Method for Wind Power Capacity Range in Distribution Networks Considering Carbon Capture Benefits" [1] - The patent application was published under CN120524785A and was filed on March 2025 [1] - The invention relates to the field of wind power capacity configuration and involves a method that integrates artificial intelligence algorithms with physical laws of energy systems [1] Group 2 - Henan Zhongyuan Jinyang Technology Co., Ltd. was established in 2020 and is located in Zhengzhou, primarily engaged in technology promotion and application services [2] - The company has a registered capital of 90 million RMB and has invested in 41 enterprises [2] - The company has participated in 91 bidding projects and holds 21 patents, along with 6 administrative licenses [2]