Workflow
强化学习
icon
Search documents
渤海证券研究所晨会纪要(2025.12.30)-20251230
BOHAI SECURITIES· 2025-12-30 02:58
Macro and Strategy Research - The profit growth rate of industrial enterprises in China has marginally declined by 1.8 percentage points to 0.1% year-on-year for the period from January to November 2025, with November showing a significant drop of 13.1% compared to October, which is a decrease of 7.6 percentage points [4] - The industrial added value growth rate for November was 4.8%, a slight decrease of 0.1 percentage points from October, influenced by insufficient domestic demand and a high base effect from the previous year [4] - The revenue profit margin for January to November was 5.29%, down by 2.0% year-on-year, indicating a further expansion of the decline compared to the previous months [4] - Among 41 industrial sectors, 18 sectors achieved positive profit growth during the same period, with notable growth in sectors such as black metal smelting and processing, non-ferrous metal mining, and high-tech manufacturing [5] Fund Research - The market saw a continued inflow of nearly 50 billion yuan into the CSI A500 index, with the ETF market scale reaching a new high of over 6 trillion yuan [7][11] - The average return for equity funds was 2.69%, with 87.08% of funds reporting positive returns, while bond funds and other categories also showed positive performance [10] - The ETF market experienced a net inflow of 914.98 billion yuan, with bond ETFs leading the inflow at 599.48 billion yuan [10] Company Research: WuXi AppTec - WuXi AppTec is positioned as a leading integrated CRDMO provider, offering end-to-end drug development and manufacturing services, with a focus on continuous development through both organic and inorganic growth strategies [15] - The CRO industry is thriving due to the high costs and long timelines associated with drug development, leading to increased demand for specialized services [15] - WuXi Chemistry reported a strong performance in its integrated services, with a significant number of new molecules added to its pipeline, indicating robust growth potential [15] - The company has streamlined its operations by divesting its clinical services research business, allowing it to focus on core competencies and enhance its service offerings [16] Industry Research: Light Industry Manufacturing & Textile Apparel - The Chinese government plans to continue funding support for the "old-for-new" consumption policy in 2026, which has already driven over 2.5 trillion yuan in sales for related products in 2025 [19] - Retail sales of clothing and footwear saw a year-on-year increase of 3.5% in November, reflecting a positive trend in consumer spending [19] - The light industry manufacturing sector underperformed compared to the CSI 300 index, indicating challenges in the current market environment [19]
万字长文,VLA的架构和模型还有什么痛点?
具身智能之心· 2025-12-30 01:11
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 ★ 上次VLA模型+真机部署的圆桌受到了行业的一致好评。最近平台的同学也一直在整理对话的文稿,今天就为大家分享下第一部分" VLA的架构和模型 "相关内 容。 张强老师: 好,感谢主持人介绍,大家好,我是张强。我来自北京人形机器人中心,主要研究方向和研究背景都是在做人形机器人,大概从2021年开始做人形机器人。先后在 Fourier、GR-1 和 Embodied机器人,包括我们现在的天工平台上做了一些研究。我主要做的研究方向是运动控制,VLA 和一些基于人形机器人的世界模型和具身智 能大模型,希望大家关注我们的工作,然后今天也很高兴跟各位嘉宾。很高兴接受具身智能之心的邀请,很高兴跟各位嘉宾在一起讨论一下相关的问题,谢谢! 完整内容欢迎加入我们的具身社区获取: 具身智能之心知识星球 主持人: 好,那我们就正式开始,那么欢迎大家来到具身智能之心的圆 ...
QwenLong-L1.5发布:一套配方,三大法宝,让30B MoE模型长文本推理能力媲美GPT-5
机器之心· 2025-12-29 04:44
Core Insights - The article discusses the challenges faced by large models in long-text reasoning, highlighting issues such as false prosperity in performance metrics and difficulties in multi-hop reasoning tasks [2][3] - It introduces QwenLong-L1.5, a new model designed to address these challenges through a comprehensive post-training framework that includes data synthesis, reinforcement learning optimization, and memory management [4][32] Group 1: Challenges in Long-Text Reasoning - Models often achieve high scores in simple tasks but struggle with complex multi-hop reasoning, revealing limitations in deep understanding [2] - The training data for long-text tasks is complex and heterogeneous, leading to instability in reinforcement learning algorithms and potential performance degradation [14][16] - The physical memory limitations of models restrict their ability to process extensive knowledge, necessitating compromises that can result in loss of critical information [3] Group 2: QwenLong-L1.5 Model Features - QwenLong-L1.5 is built on the Qwen3-30B-A3B architecture and aims to provide a systematic solution to long-text reasoning challenges [4] - The model incorporates a high-quality data synthesis pipeline that generates multi-hop reasoning tasks, enhancing the model's ability to think critically [9] - It employs a stable and efficient reinforcement learning strategy to address challenges such as distributional drift and credit assignment problems [12][17] Group 3: Performance Improvements - QwenLong-L1.5 has shown significant performance improvements, achieving an average score increase of 9.9 points compared to its predecessor [26] - The model's enhancements are particularly evident in complex reasoning tasks, with notable performance gains in benchmarks like MRCR and CorpusQA [26][27] - It demonstrates superior capabilities in handling ultra-long tasks, showcasing its potential to process information beyond traditional memory limits [28][29] Group 4: Conclusion and Open Source - The article concludes that the combination of data synthesis, reinforcement learning optimization, and memory management in QwenLong-L1.5 provides a validated path for addressing long-text reasoning challenges [32] - The company encourages open collaboration and sharing of the technology, with relevant details available in the published paper and on GitHub [32]
个人电脑也能进行智能体RL训练?尤佳轩团队开源OpenTinker
机器之心· 2025-12-29 03:04
摘要 随着大模型走向 "智能体元年",强化学习(RL)逐渐被公认为通往通用人工智能的关键技术,但它长期停留在少数实验室的象牙塔里。传统 RL 框架的单体式设 计、昂贵的显存开销以及复杂的工程流程,让许多有想法的团队望而却步。 近期,由 UIUC Jiaxuan You 教授领衔的 U Lab 团队开源了 OpenTinker—— 一个全新的 "强化学习即服务"(RL-as-a-Service, RLaaS)系统。它通过精细的解耦架构 和友好的 API,让算力不再限制算法的开发,无论是在拥有 GPU 集群的研究机构还是在仅有 CPU 的个人电脑上,都能让更多开发者以极少的代码启动智能体训 练。 序言:后训练时代的挑战与突破 进入 2025 年,竞争的核心从模型规模的比拼转向能够进行长程决策的智能体。强化学习正是驱动这一范式转变的发动机。然而,对于大多数学者、创业公司甚至 一些大型科技企业来说,部署一套可靠的智能体训练管线仍然是一场艰难的工程战役。现有 RL 基础设施的瓶颈不只是算法问题,更是工程上的 "阿喀琉斯之踵": 很多人理解理论,却难以真正跑通一套面向落地应用的强化学习系统。 该研究团队来自伊利诺伊大学厄 ...
算法“点燃”新引擎 AI成航天推进技术的“助推器”
Huan Qiu Wang Zi Xun· 2025-12-29 01:27
Core Viewpoint - The integration of artificial intelligence (AI) into space propulsion technology is revolutionizing the field, enabling more efficient and innovative propulsion systems for deep space exploration [1][3]. Group 1: AI in Propulsion Design - AI, particularly machine learning, is enhancing the design and real-time operation of propulsion systems by simulating human-like learning through trial and error [3]. - This approach allows machines to develop a form of "intuition," enabling them to conduct millions of simulations to identify optimal solutions for propulsion challenges [3]. - A notable application is in optimizing heat transfer for nuclear thermal propulsion systems, where AI can significantly improve efficiency compared to traditional methods [4][5]. Group 2: Nuclear Thermal Propulsion - Nuclear thermal propulsion is highlighted as a promising technology that utilizes nuclear reactions to generate heat, allowing spacecraft to travel long distances with minimal fuel consumption [4]. - AI is facilitating the design of more efficient nuclear thermal systems by enabling real-time adjustments based on data feedback, thus optimizing the system's performance [4][5]. - The use of advanced materials and complex geometries in engine design, supported by AI, is enhancing heat transfer efficiency and reducing overall system weight [5]. Group 3: Advancements in Nuclear Fusion - Nuclear fusion presents a significant potential for future propulsion systems, with AI accelerating its development from large ground-based facilities to compact space-ready devices [6]. - AI is being utilized to maintain stability in high-energy plasma within fusion reactors, which is crucial for achieving sustained fusion reactions [6]. - The optimization of magnetic field configurations through AI is paving the way for smaller, more efficient fusion devices that could serve as the foundation for future nuclear-powered spacecraft [6]. Group 4: Fuel Management in Space Missions - Once in space, AI transitions to managing fuel efficiency, which is critical for the success of complex missions that may require mid-course adjustments [7]. - AI can calculate the most fuel-efficient trajectories in real-time, adapting to changing mission parameters and ensuring optimal fuel usage [7]. - Continuous monitoring of spacecraft systems by AI allows for early detection of potential issues, enhancing mission safety and efficiency [7].
市场正在惩罚只懂理论的端到端算法工程师......
自动驾驶之心· 2025-12-29 01:07
Core Insights - The article discusses the current challenges in the automotive industry regarding the recruitment of algorithm talent for end-to-end production roles, highlighting a gap between the skills of candidates and the high salary expectations for these positions [1] - A new course titled "End-to-End Practical Class for Mass Production" has been designed to address this gap, focusing on essential algorithms and practical applications in autonomous driving [1] Course Overview - The course is structured into eight chapters, covering various aspects of end-to-end algorithms, including the integration of perception tasks and learning-based control algorithms [6] - It emphasizes the importance of understanding both one-stage and two-stage end-to-end frameworks, with practical examples and real-world applications [7][8] - Key algorithms discussed include reinforcement learning, trajectory optimization, and spatial-temporal planning, which are crucial for the mass production of autonomous driving systems [10][12] Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving technologies, including familiarity with algorithms such as reinforcement learning and diffusion models [14][16] - It is designed to be accessible even to those with weaker foundations, as the instructor will provide guidance to help participants quickly get up to speed [14] Course Logistics - The course will commence on November 30 and is expected to last for three months, featuring offline video lectures and online Q&A sessions [14][17] - Participants are required to have a GPU with a recommended capability of 4090 or higher, along with a basic understanding of Python and PyTorch [16]
亚马逊团队15分钟单GPU搞定人形机器人步态训练!
具身智能之心· 2025-12-29 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Younggyo Seo等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在人形机器人控制领域,强化学习(RL)虽已实现从仿真到现实的迁移,但高维动作空间、强域随机化需求导致训练周期冗长,严重制约迭代效率。 亚马逊 FAR 实验室团队提出的快速强化学习方案 ,以优化后的离线 RL 算法(FastSAC、FastTD3)为核心,通过 "算法调优 - 极简奖励设计 - 大规模并行仿真" 的 三位一体技术体系,首次实现单 GPU 15 分钟训练出鲁棒人形机器人 locomotion 政策,同时支持全身运动追踪任务的快速部署,彻底重构了人形机器人 sim-to-real 的迭代范式。 论文题目:Learning Sim-to-Real Humanoid Locomotion in 15 Minutes FastSAC-Humanoid — Project Page:https://youngg ...
亚马逊团队15分钟单GPU搞定人形机器人步态训练!Locomotion新方案
具身智能之心· 2025-12-28 10:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Younggyo Seo等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在人形机器人控制领域,强化学习(RL)虽已实现从仿真到现实的迁移,但高维动作空间、强域随机化需求导致训练周期冗长,严重制约迭代效率。 亚马逊 FAR 实验室团队提出的快速强化学习方案 ,以优化后的离线 RL 算法(FastSAC、FastTD3)为核心,通过 "算法调优 - 极简奖励设计 - 大规模并行仿真" 的 三位一体技术体系,首次实现单 GPU 15 分钟训练出鲁棒人形机器人 locomotion 政策,同时支持全身运动追踪任务的快速部署,彻底重构了人形机器人 sim-to-real 的迭代范式。 论文题目:Learning Sim-to-Real Humanoid Locomotion in 15 Minutes FastSAC-Humanoid — Project Page:https://youngg ...
DiffusionDriveV2核心代码解析
自动驾驶之心· 2025-12-28 09:23
Core Viewpoint - The article discusses the DiffusionDrive model, which utilizes a truncated diffusion approach for end-to-end autonomous driving, emphasizing its architecture and the integration of reinforcement learning to enhance trajectory planning and safety [1]. Group 1: Model Architecture - DiffusionDriveV2 employs a reinforcement learning-constrained truncated diffusion model, focusing on the overall architecture for autonomous driving [3]. - The model incorporates environment encoding, including bird's-eye view (BEV) features and vehicle status, to enhance the understanding of the driving context [5]. - The trajectory planning module utilizes multi-scale BEV features to improve the accuracy of trajectory predictions [8]. Group 2: Trajectory Generation - The model generates trajectories by first clustering the true future trajectories of the vehicle using K-Means to create anchors, which are then perturbed with Gaussian noise [12]. - The trajectory prediction process involves cross-attention mechanisms between the trajectory features and BEV features, allowing for more accurate trajectory generation [15][17]. - The model also integrates time encoding to enhance the temporal aspect of trajectory predictions [14]. Group 3: Reinforcement Learning Integration - The Intra-Anchor GRPO method is proposed to optimize strategies within specific behavior intentions, enhancing safety and goal-oriented trajectory generation [27]. - The reinforcement learning loss function is designed to mitigate instability during early denoising steps, using a discount factor to adjust the influence of rewards over time [28]. - The model incorporates a clear learning signal by truncating negative advantages and applying strong penalties for collisions, ensuring safer trajectory outputs [30]. Group 4: Noise Management - The model introduces multiplicative noise rather than additive noise to maintain the structural integrity of trajectories, ensuring smoother exploration paths [33]. - This approach addresses the inherent scale inconsistencies in trajectory segments, allowing for more coherent and realistic trajectory generation [35]. Group 5: Evaluation Metrics - The model evaluates generated trajectories based on safety, comfort, rule compliance, progress, and feasibility, aggregating these into a comprehensive score [27]. - Specific metrics are employed to assess safety (collision detection), comfort (acceleration and curvature), and adherence to traffic rules, ensuring a holistic evaluation of trajectory performance [27].
想了很久,还是得招人一起把事情做大(部署/产品方向)
自动驾驶之心· 2025-12-27 09:36
Core Viewpoint - The article emphasizes the need for collaboration and innovation in the L2 intelligent driving sector, highlighting the importance of engaging more talented individuals to address industry challenges and contribute to advancements in technology [2]. Group 1: Industry Dynamics - The L2 intelligent driving sector is entering a critical phase where overcoming existing difficulties requires collective effort from industry professionals [2]. - The company aims to enhance its platform by providing various outputs such as roundtable discussions, practical and industrial-grade courses, and consulting services to add value to the industry [2]. Group 2: Key Directions - The main focus areas for development include but are not limited to: autonomous driving product management, 4D annotation/data closure, world models, VLA, large models for autonomous driving, reinforcement learning, and end-to-end solutions [4]. Group 3: Job Descriptions - The company is targeting training collaborations in autonomous driving, primarily focusing on B-end partnerships with enterprises, universities, and research institutions, as well as C-end offerings for students and job seekers [5].