强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

苹果四位 AI 大将出走，其中三位是华人

3 6 Ke· 2025-09-04 02:13

前段时间轰轰烈烈的Meta抢人行动，容易让我们忘掉一点：AI人才的流动一直都很大，而"被高薪挖走"从来就不是唯一的原因。彭博社名记马克·古尔曼（Mark Gurman）爆料称，苹果又损失了四位AI大将，分别是：苹果的机器人首席AI研究员Jian Zhang，以及苹果基础模型团队三名研员Nan Du、Zhao Meng和John Peebles。从这里面我们至少能得到两个信息。第一，离开的研究员很集中，有三个都是基础模型团队的。第二，华人占比依然很高，四个当中除了John Peebles都是华人。这很像是Meta抢人的习惯，但这次真的它关系不大——四个人中，只有Jian Zhang去了Meta。Nan Du和John Peelbles去了OpenAI，而Zhao Meng则加入了Anthropic。 Meta挖走了苹果的机器人AI大将从2005年加入，到如今离开，Jian Zhang在苹果整整效力十年。领英资料显示，他离开时已经是苹果人工智能与机器学习（AIML）部门的机器人研究负责人。不同于特斯拉的人形机器人项目，机器人技术是苹果未来产品线的关键组成部分。据彭博社报道，苹果有一系列设备 ...

大型语言模型

大型语言模型

松延动力：从清华园跑出的机器人“小孩哥”

Xin Jing Bao· 2025-09-03 02:02

Group 1 - The company, Songyan Power, has achieved significant recognition in the humanoid robotics field, winning multiple awards including the runner-up in the first humanoid robot marathon and gold medals in gymnastics and long jump at the World Humanoid Robot Games [1][4] - Founded by Jiang Zheyuan, who dropped out of Tsinghua University, the company has evolved from a small startup to a notable player in the robotics industry, with a focus on developing humanoid robots like N2 and E1 [2][3] - The N2 robot, priced at several tens of thousands, has gained substantial market traction, with total orders exceeding 2,500 units and a contract value surpassing 100 million yuan, positioning Songyan Power as a leading humanoid robot manufacturer [3][4] Group 2 - Songyan Power's strategy includes a focus on diverse application scenarios for its robots, targeting sectors such as education, research, cultural tourism, and commercial performances, with plans to expand into overseas markets [5][4] - The company has developed the "Xiao Nuo" robot, which features over 30 degrees of freedom for facial expressions, aimed at applications in elderly care, exhibition reception, and psychological counseling [5] - Beijing's supportive policies and initiatives since 2019 have fostered a robust robotics ecosystem, contributing to a 50% year-on-year growth in the city's robotics industry revenue in 2024 [5][6]

SIASUN(SZ:300024)

机器人产业

仿生人小诺

机器人产业

仿生人小诺

机器人操控新范式：一篇VLA模型系统性综述 | Jinqiu Select

锦秋集· 2025-09-02 13:41

Core Insights - The article discusses the emergence of Vision-Language-Action (VLA) models based on large Vision-Language Models (VLMs) as a transformative paradigm in robotic manipulation, addressing the limitations of traditional methods in unstructured environments [1][4][5] - It highlights the need for a structured classification framework to mitigate research fragmentation in the rapidly evolving VLA field [2] Group 1: New Paradigm in Robotic Manipulation - Robotic manipulation is a core challenge at the intersection of robotics and embodied AI, requiring deep understanding of visual and semantic cues in complex environments [4] - Traditional methods rely on predefined control strategies, which struggle in unstructured real-world scenarios, revealing limitations in scalability and generalization [4][5] - The advent of large VLMs has provided a revolutionary approach, enabling robots to interpret high-level human instructions and generalize to unseen objects and scenes [5][10] Group 2: VLA Model Definition and Classification - VLA models are defined as systems that utilize a large VLM to understand visual observations and natural language instructions, followed by a reasoning process that generates robotic actions [6][7] - VLA models are categorized into two main types: Monolithic Models and Hierarchical Models, each with distinct architectures and functionalities [7][8] Group 3: Monolithic Models - Monolithic VLA models can be implemented in single-system or dual-system architectures, integrating perception and action generation into a unified framework [14][15] - Single-system models process all modalities together, while dual-system models separate reflective reasoning from reactive behavior, enhancing efficiency [15][16] Group 4: Hierarchical Models - Hierarchical models consist of a planner and a policy, allowing for independent operation and modular design, which enhances flexibility in task execution [43] - These models can be further divided into Planner-Only and Planner+Policy categories, with the former focusing solely on planning and the latter integrating action execution [43][44] Group 5: Advancements in VLA Models - Recent advancements in VLA models include enhancements in perception modalities, such as 3D and 4D perception, as well as the integration of tactile and auditory information [22][23][24] - Efforts to improve reasoning capabilities and generalization abilities are crucial for enabling VLA models to perform complex tasks in diverse environments [25][26] Group 6: Performance Optimization - Performance optimization in VLA models focuses on enhancing inference efficiency through architectural adjustments, parameter optimization, and inference acceleration techniques [28][29][30] - Dual-system models have emerged to balance deep reasoning with real-time action generation, facilitating smoother deployment in real-world scenarios [35] Group 7: Future Directions - Future research directions include the integration of memory mechanisms, 4D perception, efficient adaptation, and multi-agent collaboration to further enhance VLA model capabilities [1][6]

视觉-语言-动作（VLA）模型

大型视觉语言模型（VLM）

机器人操控

具身人工智能

免训练优化

视觉-语言-动作（VLA）模型

大型视觉语言模型（VLM）

机器人操控

具身人工智能

免训练优化

性能逼近闭源最强，通义实验室开源Mobile-Agent-v3刷新10项GUI基准SOTA

机器之心· 2025-09-02 03:44

Core Viewpoint - The article highlights the launch of the GUI-Owl and Mobile-Agent-v3, which are advanced open-source models for GUI automation, showcasing superior performance compared to existing models and emphasizing their capabilities in various environments [1][29]. Group 1: Key Achievements - GUI-Owl has achieved state-of-the-art (SOTA) performance on both Android and desktop platforms, with the 32B model surpassing closed-source top models in multiple evaluations [21][29]. - The models are designed to operate in a cloud environment, allowing for dynamic task execution and data collection across multiple operating systems, including Android, Ubuntu, macOS, and Windows [11][29]. Group 2: Technical Innovations - The system employs a self-evolving data production chain that minimizes human involvement in generating high-quality training data, allowing the models to iteratively optimize themselves [11][14]. - GUI-Owl's capabilities include advanced UI element grounding, long task planning, and robust reasoning, enabling it to understand and execute complex tasks effectively [16][20]. Group 3: Reinforcement Learning Framework - A scalable reinforcement learning (RL) system has been developed to enhance the model's stability and adaptability in real-world environments, allowing it to learn continuously from its interactions [22][26]. - The introduction of the Trajectory-aware Relative Policy Optimization (TRPO) algorithm addresses the challenges of sparse and delayed reward signals in GUI automation tasks, improving learning efficiency [26]. Group 4: Conclusion - The release of GUI-Owl and Mobile-Agent-v3 represents a significant advancement in open-source GUI automation, providing a powerful tool for various applications while reducing deployment and resource costs [29].

Mobile-Agent-v3

Mobile-Agent-v3

XDog：具身低成本科研平台，四足机械狗+单臂（含VLA/强化学习/仿真/sim2real教程）

具身智能之心· 2025-09-02 02:00

Core Viewpoint - Xdog is a low-cost, multifunctional quadruped robotic dog and robotic arm development platform designed for embodied developers, featuring a comprehensive curriculum for research and learning in robotics [1][2]. Hardware Overview - Xdog integrates advanced functionalities such as voice control, sim2real, real2sim, target recognition and tracking, autonomous robotic arm grasping, and reinforcement learning gait control, covering most of the technology stack for embodied intelligent lower limb control [2][5]. - The robotic dog measures 25cm x 20cm x 30cm and weighs 7.0kg, with a maximum speed of 7.2 km/h and a maximum rotation speed of 450 degrees per second [3][11]. - The main control chip is Allwinner H616, featuring a quad-core 1.6GHz CPU, 4GB RAM, and 32GB storage [4][5]. - The robotic arm can reach a maximum height of 0.85m and has a grasping range of 0.4m around its base [7]. Software and Functionality - The system supports various control methods including voice control via TCP, keyboard control, visual control, and reinforcement learning for autonomous movement [15][17]. - Development is based on ROS1, with Python as the primary programming language, and it is recommended to use a GPU of at least 2080ti for inference [16][24]. - The platform includes a comprehensive curriculum covering topics from basic ROS knowledge to advanced reinforcement learning principles and practical applications [22][23]. Team and Support - The project is led by a team of experienced instructors responsible for project advancement, technical support, and course development [22]. - After-sales service is provided for one year post-delivery, with video and source code access granted immediately after hardware receipt [26]. Delivery and Consultation - The delivery cycle is set to be completed within three weeks after payment [25]. - For further inquiries, potential customers are encouraged to consult the assistant via WeChat [27].

蔚蓝机械狗

蔚蓝机械狗

大模型开始打王者荣耀了

量子位· 2025-09-02 01:40

Core Insights - The article discusses the implementation of the Think-In-Games (TiG) framework, which allows large language models to play the game Honor of Kings while learning in real-time, effectively bridging the gap between decision-making and action [1][3][4]. Group 1: TiG Framework Overview - TiG redefines decision-making based on reinforcement learning as a language modeling task, enabling models to generate strategies guided by language and optimize them through online reinforcement learning [3][4]. - The framework allows large language models to learn macro-level reasoning skills, focusing on long-term goals and team coordination rather than just micro-level actions [6][9]. - The model acts more like a strategic coach than a professional player, converting decisions into text and selecting macro actions based on game state [7][9]. Group 2: Training Methodology - The training process involves a multi-stage approach combining supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance model capabilities [12][16]. - The research team utilized a "relabeling algorithm" to ensure each game state is tagged with the most critical macro action, providing a robust signal for subsequent training [9][11]. - The Group Relative Policy Optimization (GRPO) algorithm is employed to maximize the advantages of generated content while limiting divergence from reference models [9][11]. Group 3: Experimental Results - The results indicate that the combination of SFT and GRPO significantly improves model performance, with Qwen-2.5-32B's accuracy increasing from 66.67% to 86.84% after applying GRPO [14][15]. - The Qwen-3-14B model achieved an impressive accuracy of 90.91% after training with SFT and GRPO [2][15]. - The TiG framework demonstrates competitive performance compared to traditional reinforcement learning methods while significantly reducing data and computational requirements [17].

TENCENT(HK:00700)

大语言模型

监督微调（SFT）

Think-In-Games (TiG)框架

大语言模型

监督微调（SFT）

Think-In-Games (TiG)框架

自搜索强化学习SSRL：Agentic RL的Sim2Real时刻

机器之心· 2025-09-02 01:27

Core Insights - The article discusses the development and effectiveness of SSRL (Structured Search Reinforcement Learning) in enhancing the training efficiency and stability of Search Agents using large language models (LLMs) [6][28] - SSRL demonstrates superior performance over traditional methods that rely on external search engines, achieving effective transfer from simulation to real-world applications (Sim2Real) [6][28] Group 1 - SSRL utilizes structured prompts and format rewards to effectively extract world knowledge from models, leading to improved performance across various benchmarks and reduced hallucination [2][6] - The research highlights the high costs and inefficiencies associated with current RL training methods for Search Agents, which include full-real and semi-real search approaches [7][13] - The introduction of SSRL allows for a significant increase in training efficiency, estimated at approximately 5.6 times, while maintaining a continuous increase in training rewards without collapse [31][32] Group 2 - Experiments show that models trained with SSRL outperform those relying on external engines, particularly in real-world search scenarios, indicating the importance of integrating real-world knowledge [28][31] - The article presents findings that suggest the combination of self-generated knowledge and real-world knowledge can enhance model performance, particularly through entropy-guided search strategies [34] - The integration of SSRL with TTRL (Task-Driven Reinforcement Learning) has shown to improve generalization and effectiveness, achieving up to a 67% performance increase in certain tasks [38][39]

自搜索强化学习

大语言模型

自搜索强化学习

大语言模型

维持推荐小盘成长，风格连续择优正确

2025-09-02 00:42

Summary of Key Points from the Conference Call Industry or Company Involved - The conference call primarily discusses the investment strategies and market outlook of CICC (China International Capital Corporation) focusing on small-cap growth stocks and various asset classes. Core Insights and Arguments - CICC maintains a positive outlook on small-cap growth style for September, despite a slight decline in overall indicators. Market conditions, sentiment, and macroeconomic factors support the continued superiority of small-cap growth in the coming month [1][2] - In asset allocation, CICC is optimistic about domestic equity assets, neutral on commodity assets, and cautious regarding bond assets. The macro expectation gap indicates a bullish stance on stocks, particularly small-cap and dividend stocks, while being bearish on growth stocks [3][4] - The industry rotation model for September recommends sectors such as comprehensive finance, media, computer, banking, basic chemicals, and real estate, based on price and volume information. The previous month's recommended sectors achieved a 2.4% increase [5] - The "growth trend resonance" strategy performed best in August with a return of 18.1%, significantly outperforming the mixed equity fund index for six consecutive months [7] - Year-to-date (YTD) performance of CICC's various strategies is strong, with an overall return of 43%, surpassing the Tian Gu Hang operating index by 15 percentage points. The XG Boost growth selection strategy has a YTD return of 47.1% [8] Other Important but Possibly Overlooked Content - The small-cap strategy underperformed expectations due to extreme market conditions led by large-cap stocks, which created a positive feedback loop for index growth. This indicates a potential phase of inefficacy for the strategy [6] - The active quantitative stock selection strategies include stable growth and small-cap exploration, with the latter showing mixed results in August. Despite positive absolute returns, small-cap exploration strategies lagged behind other indices [8] - CICC's quantitative team has developed various models based on advanced techniques like reinforcement learning and deep learning, with notable performance in stock selection strategies. The Attention GRU model, for instance, has shown promising results in both the market and specific indices [10]

大语言模型

大语言模型

端到端自动驾驶的万字总结：拆解三大技术路线（UniAD/GenAD/Hydra MDP）

自动驾驶之心· 2025-09-01 23:32

Core Viewpoint - The article discusses the current development status of end-to-end autonomous driving algorithms, comparing them with traditional algorithms and highlighting their advantages and limitations [3][5][6]. Group 1: Traditional vs. End-to-End Algorithms - Traditional autonomous driving algorithms follow a pipeline of perception, prediction, and planning, where each module has distinct inputs and outputs [5][6]. - The perception module takes sensor data as input and outputs bounding boxes for the prediction module, which then outputs trajectories for the planning module [6]. - End-to-end algorithms, in contrast, take raw sensor data as input and directly output path points, simplifying the process and reducing error accumulation [6][10]. Group 2: Limitations of End-to-End Algorithms - End-to-end algorithms face challenges such as lack of interpretability, safety guarantees, and issues related to causal confusion [12][57]. - The reliance on imitation learning in end-to-end algorithms limits their ability to handle corner cases effectively, as they may misinterpret rare scenarios as noise [11][57]. - The inherent noise in ground truth data can lead to suboptimal learning outcomes, as human driving data may not represent the best possible actions [11][57]. Group 3: Current End-to-End Algorithm Implementations - The ST-P3 algorithm is highlighted as an early example of end-to-end autonomous driving, focusing on spatiotemporal learning with three core modules: perception, prediction, and planning [14][15]. - Innovations in ST-P3 include a perception module that uses a self-centered cumulative alignment technique, a dual-path prediction mechanism, and a planning module that incorporates prior information for trajectory optimization [15][19][20]. Group 4: Advanced Techniques in End-to-End Algorithms - The UniAD framework introduces a multi-task approach by incorporating five auxiliary tasks to enhance performance, addressing the limitations of traditional modular stacking methods [24][25]. - The system employs a full Transformer architecture for planning, integrating various interaction modules to improve trajectory prediction and planning accuracy [26][29]. - The VAD (Vectorized Autonomous Driving) method utilizes vectorized representations to better express structural information of map elements, enhancing computational speed and efficiency [32][33]. Group 5: Future Directions and Challenges - The article emphasizes the need for further research to overcome the limitations of current end-to-end algorithms, particularly in optimizing learning processes and handling exceptional cases [57]. - The introduction of multi-modal planning and multi-model learning approaches aims to improve trajectory prediction stability and performance [56][57].

端到端自动驾驶

多模态规划

端到端自动驾驶算法

端到端自动驾驶

多模态规划

端到端自动驾驶算法

开学了：入门AI，可以从这第一课开始

机器之心· 2025-09-01 08:46

Core Viewpoint - The article emphasizes the importance of understanding AI and its underlying principles, suggesting that individuals should start their journey into AI by grasping fundamental concepts and practical skills. Group 1: Understanding AI - AI is defined through various learning methods, including supervised learning, unsupervised learning, and reinforcement learning, which allow machines to learn from data without rigid programming rules [9][11][12]. - The core idea of modern AI revolves around machine learning, particularly deep learning, which enables machines to learn from vast amounts of data and make predictions [12]. Group 2: Essential Skills for AI - Three essential skills for entering the AI field are mathematics, programming, and practical experience. Mathematics provides the foundational understanding, while programming, particularly in Python, is crucial for implementing AI concepts [13][19]. - Key mathematical areas include linear algebra, probability and statistics, and calculus, which are vital for understanding AI algorithms and models [13]. Group 3: Practical Application and Tools - Python is highlighted as the primary programming language for AI due to its simplicity and extensive ecosystem, including libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch [20][21]. - Engaging in hands-on projects, such as data analysis or machine learning tasks, is encouraged to solidify understanding and build a portfolio [27][46]. Group 4: Career Opportunities in AI - Various career paths in AI include machine learning engineers, data scientists, and algorithm researchers, each focusing on different aspects of AI development and application [38][40]. - The article suggests that AI skills can enhance various fields, creating opportunities for interdisciplinary applications, such as in finance, healthcare, and the arts [41][43]. Group 5: Challenges and Future Directions - The rapid evolution of AI technology presents challenges, including the need for continuous learning and adaptation to new developments [34][37]. - The article concludes by encouraging individuals to embrace uncertainty and find their passion within the AI landscape, highlighting the importance of human creativity and empathy in the technological realm [71][73].

人工智能（AI）

有监督学习

无监督学习

人工智能（AI）

有监督学习

无监督学习