深度强化学习(DRL)
Search documents
宁波机器人公司,完成数亿元融资
DT新材料· 2026-03-03 16:29
/扫码加入行业交流群/ 机器人、无人机、eVTOL、商业航天.....等产业同行❤️↓ 更多关于 未来智能终端 的优质企业、创新技术和材料产品 ,欢迎参加,2026年6月10-12 日,上海新国际博览中心, 2026未来产业新材料博览会 。 【DT新材料】 获悉, 2月27日 , PNDbotics 宣布 完成数亿元融资,由中信金石领投, 资金将重点投向精密制造设备、智造中心建设及核心技术研 发。 据悉, PNDbotics 于2023年9月在宁波成立,是一家专注于全尺寸通用人形机器人研发的企业,已实现年产数百台级人形机器人的产能。 公司创始人 兼 CEO闫巡戈,联合创始人兼CTO崔昊天,联合创始人张子陶。 公司采用了WBC(全身动力学控制)与MPC(模型预测控制)融合的先进算法,结合深度强化学习(DRL)技术,从而让机器人实现高度自然的运动表现。 此外, 还自主研发的PSA系列一体化执行器,集成了电机、减速器、编码器、伺服驱动器与通信单元,覆盖60N·m至340N·m力矩区间,满足不同关节的力 矩需求。 2024年,PNDbotics推出了第一代 Adam 人形机器人。这款机器人身高1.67米,体重62公斤 ...
AI赋能资产配置(十九):机构AI+投资的实战创新之路
Guoxin Securities· 2025-10-29 06:51
Group 1 - The core conclusion emphasizes the transformation of information foundations through LLMs, which convert vast amounts of unstructured text into quantifiable Alpha factors, fundamentally expanding the information boundaries of traditional investment research [1] - The technology path has been validated, with a full-stack technology framework for AI-enabled asset allocation established, including signal extraction via LLMs, dynamic decision-making through DRL, and risk modeling with GNNs [1] - AI is evolving from a supportive tool to a central decision-making mechanism, driving asset allocation from static optimization to dynamic intelligent evolution, reshaping the buy-side investment research and execution logic [1] Group 2 - The practical application of AI investment systems relies on a modular collaborative mechanism rather than a single model's performance, as demonstrated by BlackRock's AlphaAgents, which utilizes LLMs for cognition and reasoning, external APIs for real-time information, and numerical optimizers for final asset allocation calculations [2] - Leading institutions are competing on an "AI-native" strategy, focusing on building proprietary, trustworthy AI core technology stacks, as evidenced by JPMorgan's approach, which is centered around "trustworthy AI and foundational models," "simulation and automated decision-making," and "physical and alternative data" [2] - Domestic asset management institutions should focus on strategic restructuring and organizational transformation, adopting a differentiated and focused approach to technology implementation, emphasizing a practical and efficient "human-machine collaboration" system [3] Group 3 - The report discusses the evolution of financial sentiment analysis mechanisms, highlighting the transition from early dictionary-based methods to advanced LLMs that can understand context and financial jargon, underscoring the importance of creating domain-specific LLMs [12][13] - LLMs are being applied in algorithmic trading and risk management, providing real-time sentiment scores and monitoring global information flows to identify potential market risks [14][15] - Despite the promising applications of LLMs, challenges such as data bias, high computational costs, and the need for explainability remain significant barriers to their widespread adoption in finance [15][16] Group 4 - Deep Reinforcement Learning (DRL) offers a dynamic adaptive framework for asset allocation, contrasting with traditional static optimization methods, allowing for continuous learning and decision-making based on market interactions [17][18] - The core architecture of DRL in finance includes various algorithms like Actor-Critic methods and Proximal Policy Optimization (PPO), which show significant potential for investment portfolio management [19][20] - Key challenges for deploying DRL in real financial markets include data dependency, overfitting risks, and the need to integrate real-world constraints into the learning framework [21][22] Group 5 - Graph Neural Networks (GNNs) conceptualize the financial system as a network, allowing for a better understanding of risk transmission and systemic risk, which traditional models often overlook [23][24] - GNNs can be utilized for stress testing and dynamic assessments of the financial system's robustness, providing valuable insights for regulatory bodies [25][26] - The insights gained from GNNs can help investors develop more effective hedging strategies by understanding interdependencies within financial networks [26] Group 6 - BlackRock's AlphaAgents project aims to enhance decision-making by addressing cognitive biases in human analysts and leveraging LLMs for complex reasoning, moving beyond mere data processing [30][31] - The dual-layer decision-making process in AlphaAgents involves collaborative and adversarial debates among AI agents, enhancing the robustness of investment decisions [31][33] - Backtesting results indicate that the multi-agent framework significantly outperforms single-agent models, demonstrating the value of collaborative AI in investment strategies [34][35] Group 7 - JPMorgan's AI strategy focuses on building proprietary, trustworthy AI technologies, emphasizing the importance of trust and security in AI applications within finance [45][46] - The bank is committed to developing foundational models and generative AI capabilities, aiming to control key AI functionalities and ensure compliance with regulatory standards [49][50] - By integrating multi-agent simulations and reinforcement learning, JPMorgan seeks to create sophisticated models that can navigate complex financial systems and enhance decision-making processes [53][54]
X-Nav:端到端跨平台导航框架,通用策略实现零样本迁移
具身智能之心· 2025-07-22 06:29
Core Viewpoint - The article presents the X-Nav framework, which enables end-to-end cross-embodiment navigation for mobile robots, allowing a single universal strategy to be deployed across different robot forms, including wheeled and quadrupedal robots [3][4]. Group 1: Existing Limitations - Current navigation methods are often designed for specific robot forms, limiting their generalizability across platforms [4]. - Navigation tasks require robots to move without collisions in complex environments, relying on visual observations, target positions, and proprioceptive information, but existing methods face significant limitations [4]. Group 2: X-Nav Architecture - The X-Nav architecture consists of two core phases: expert policy learning and universal policy refinement [5][8]. - Phase 1 involves training multiple expert policies using deep reinforcement learning (DRL) on randomly generated robot forms [6]. - Phase 2 refines these expert policies into a single universal policy using a Nav-ACT transformer model [8]. Group 3: Training and Evaluation - The training process utilizes the Proximal Policy Optimization (PPO) algorithm, with a reward function that includes task rewards and regularization rewards, tailored for wheeled and quadrupedal robots [10][16]. - Experimental validation shows that X-Nav outperforms other methods in success rate (SR) and success rate weighted path length (SPL), with Jackal achieving an SR of 90.4% and SPL of 0.84 [13]. - Scalability studies indicate that increasing the number of training forms significantly enhances the adaptability to unknown robots [14]. Group 4: Ablation Studies - Ablation studies validate the effectiveness of design choices, showing that using L1 loss instead of MSE reduces performance due to insufficient penalty for large errors [21]. - The execution of complete action blocks delays quadrupedal adaptation to dynamic changes, while omitting time integration (TE) leads to rough actions in wheeled robots [21]. Group 5: Real-World Testing - Real-world tests in indoor and outdoor environments demonstrate a success rate of 85% and SPL of 0.79, confirming the generalizability of the X-Nav framework across different sensor configurations [22].