强化学习 - filings, earnings calls, financial reports, news - Reportify

强化学习

Search documents

沉寂一个月，openPangu性能飙升8%！华为1B开源模型来了

机器之心· 2025-09-05 04:31

Core Viewpoint - Huawei's Pangu Embedded-1B model represents a significant advancement in edge AI, enabling powerful AI capabilities on resource-constrained devices, thus paving the way for intelligent upgrades in various industries [1][5]. Group 1: Model Performance and Efficiency - The openPangu Embedded-1B model, with 1 billion parameters, achieves a new state-of-the-art (SOTA) record in performance and efficiency, demonstrating that smaller models can deliver substantial capabilities [2][3]. - The model's overall average score reached 63.90, surpassing similar models and matching larger models like Qwen3-1.7B, showcasing its parameter efficiency [3][4]. - In mathematical reasoning, the model scored 82.76% on the GSM8K benchmark and 81.83% on the MATH dataset, significantly outperforming its peers [3][4]. Group 2: Technical Innovations - The model employs a soft-hardware collaborative design, optimizing its architecture to align with the characteristics of Ascend hardware, ensuring efficient resource utilization [9][10]. - A two-stage curriculum learning approach is utilized to enhance the model's reasoning capabilities, simulating a human-like learning process [15][16]. - The introduction of offline On-Policy knowledge distillation allows for a more flexible and effective training process, improving the model's accuracy and generalization [18][19]. Group 3: Reinforcement Learning and Future Directions - The model incorporates a multi-source reward reinforcement learning mechanism, enhancing its performance through targeted feedback based on task complexity [22][25]. - Future developments aim to integrate fast and slow thinking processes within a single model, allowing for adaptive responses based on problem difficulty, thus improving both speed and accuracy [29][30].

软硬件协同设计

软硬件协同设计

从近1000篇工作中，看具身智能的技术发展路线！

具身智能之心· 2025-09-05 00:45

Core Insights - The article discusses the evolution and challenges of embodied intelligence, emphasizing the need for a comprehensive understanding of its development, issues faced, and future directions [3][4]. Group 1: Robotic Manipulation - The survey on robotic manipulation highlights the transition from mechanical programming to embodied intelligence, focusing on the evolution from simple grippers to dexterous multi-fingered hands [5][6]. - Key challenges in dexterous manipulation include data collection methods such as simulation, human demonstration, and teleoperation, as well as skill learning frameworks like imitation learning and reinforcement learning [5][6]. Group 2: Navigation and Manipulation - The discussion on robotic navigation emphasizes the importance of physics simulators in addressing high costs and data scarcity in real-world training, with a focus on the Sim-to-Real transfer challenges [9][15]. - The evolution of navigation techniques is outlined, transitioning from explicit memory to implicit memory, and the role of various simulators in narrowing the Sim-to-Real gap is analyzed [15][16]. Group 3: Multimodal Large Models - The exploration of embodied multimodal large models (EMLMs) reveals their potential to bridge perception, cognition, and action gaps, driven by advancements in large model technologies [17][19]. - Challenges identified include cross-modal alignment difficulties, high computational resource demands, and weak domain generalization [19]. Group 4: Teleoperation and Data Collection - The survey on teleoperation of humanoid robots discusses the integration of human cognition with robotic capabilities, particularly in hazardous environments, while addressing challenges such as high degrees of freedom and communication limitations [29][30]. - Key components of teleoperation systems include human state measurement, motion retargeting, and multimodal feedback mechanisms [30][33]. Group 5: Vision-Language-Action Models - The analysis of Vision-Language-Action (VLA) models covers their evolution from cross-modal learning architectures to the integration of visual language models and action planners [33][36]. - The article identifies core challenges in real-time control, multimodal action representation, and system scalability, while proposing future directions for adaptive AI and cross-entity generalization [36][41].

视觉-语言-动作模型（VLA）

具身多模态大模型

视觉-语言-动作模型（VLA）

具身多模态大模型

GPT-5被批过度炒作、性能落后，OpenAI联创揭秘其中原因：我们把它关在 “象牙塔”，和现实世界接触不够

AI前线· 2025-09-04 06:30

Core Insights - OpenAI is shifting its focus from consumer markets to enterprise markets with the launch of GPT-5, despite initial setbacks in its release [2][5] - GPT-5 has received positive feedback from enterprise users, indicating its potential in the corporate sector [5][24] - The pricing strategy for GPT-5 is competitive, with significant reductions in costs over time, making it more accessible for businesses [34][35] Summary by Sections OpenAI's Market Shift - Sam Altman aims to capitalize on the enterprise market with GPT-5, moving beyond the consumer-focused ChatGPT [2] - Initial criticisms of GPT-5 led to a temporary rollback to GPT-4 for paid users, but the model is designed for enterprise applications [2][5] Enterprise Adoption - Companies like Cursor, Vercel, and Factory have adopted GPT-5 as their default model, citing improvements in speed, performance, and cost [2][3] - Box's CEO described GPT-5 as a breakthrough in reasoning capabilities, surpassing previous systems [3] - JetBrains has integrated GPT-5 into its AI Assistant, highlighting its efficiency in generating tools quickly [3][4] Technical Developments - OpenAI's Greg Brockman discussed the evolution of reasoning in AI models, emphasizing the importance of reinforcement learning for reliability [8][10] - The transition from offline to online learning is noted as a significant shift in AI training methodologies [10][12] Cost Efficiency - OpenAI has achieved a 1000-fold reduction in model costs over two and a half years, enhancing accessibility for users [34][35] - The company continues to focus on improving computational efficiency and model architecture to further reduce costs [35] Future Directions - The potential for GPT-5 to serve as a collaborative partner in research and development is highlighted, with implications for various fields including mathematics and biology [22][21] - OpenAI is exploring the integration of AI models into real-world applications, aiming to enhance productivity and problem-solving capabilities [24][40]

通用人工智能

Artificial Intelligence

通用人工智能

Artificial Intelligence

以年轻科创精神为桥梁，“西南偏南”科技艺术节向2025外滩大会发来邀请

Jing Ji Guan Cha Wang· 2025-09-04 04:50

Core Insights - The 2025 Bund Conference will take place from September 10 to 13 in Shanghai, showcasing cutting-edge technology and cross-disciplinary creativity, attracting global attention [1] - The conference has received a special video message from SXSW, highlighting the inspiration drawn from the Bund Conference and the vibrant creativity of China's younger generation [1][2] - The Bund Conference aims to connect advanced technology with everyday life, featuring various interactive exhibits and discussions on significant scientific topics [3][4] Group 1: Event Overview - The Bund Conference is recognized as a high-level global fintech and frontier technology event, with the 2025 theme being "Reshaping Innovation Growth" [4] - The event will include a main forum, over 40 open insight forums, a global theme day, 18 Creator innovation stages, nearly 10,000 square meters of technology exhibitions, a technology market, and a technology innovation competition [4] Group 2: Participation and Engagement - Nearly 20,000 young tech talents from over 10 countries, including China, the US, the UK, and Australia, have registered to participate in various activities such as the Innovators Stage and AI Innovation Competition [2] - The conference will feature prominent figures in AI, including Turing Award winner Richard Sutton and other leading young AI innovators [2] Group 3: Technological Innovations - The technology exhibition will demonstrate practical applications of advanced medical services, such as health management tools and interactive robotics [3] - The conference emphasizes that technology impacts daily life and creative expression, aligning with the views expressed by SXSW's Neil Minocha [3]

可控核聚变

蚂蚁健康管家AQ

可控核聚变

蚂蚁健康管家AQ

苹果四位 AI 大将出走，其中三位是华人

3 6 Ke· 2025-09-04 02:13

Core Insights - The recent talent exodus from Apple highlights significant movement in the AI sector, with four key researchers leaving for various companies, indicating a broader trend beyond just high salaries [1][3][24] Group 1: Talent Movement - Apple has lost four prominent AI researchers, including Jian Zhang, who was the head of robotics research, and three members from the foundational models team: Nan Du, Zhao Meng, and John Peebles [1][3][4] - The majority of the departing researchers are of Chinese descent, with three out of four being Chinese nationals [3][24] - Jian Zhang has joined Meta's Robotics Studio, while Nan Du and John Peebles have moved to OpenAI, and Zhao Meng has joined Anthropic [3][12][23] Group 2: Individual Contributions - Jian Zhang has a decade of experience at Apple, focusing on automation technology and AI applications in robotics, with a strong academic background in bionic flying vehicles [5][8][10] - Nan Du previously worked at Google for over seven years, contributing to significant projects like the 1 trillion parameter model and the second-generation Pathways Language Model [20][21] - John Peebles has expertise in generative AI and large language models, having worked on improving model inference capabilities at Apple [16][21] - Zhao Meng specializes in multimodal AI and generative models, with a notable academic record and contributions to zero-shot learning techniques [22][23] Group 3: Industry Context - The talent loss at Apple reflects a larger trend in the AI industry, where companies are competing for top talent, often leading to significant shifts in personnel [24][31] - Meta's aggressive recruitment strategy has been a focal point, but the recent departures from Apple suggest that other factors, such as company culture and research alignment, also play a critical role in talent retention [26][31] - The competitive landscape for AI talent is intensifying, with companies like OpenAI and Anthropic also offering lucrative compensation packages, further complicating the talent dynamics [26][27]

大型语言模型

大型语言模型

松延动力：从清华园跑出的机器人“小孩哥”

Xin Jing Bao· 2025-09-03 02:02

Group 1 - The company, Songyan Power, has achieved significant recognition in the humanoid robotics field, winning multiple awards including the runner-up in the first humanoid robot marathon and gold medals in gymnastics and long jump at the World Humanoid Robot Games [1][4] - Founded by Jiang Zheyuan, who dropped out of Tsinghua University, the company has evolved from a small startup to a notable player in the robotics industry, with a focus on developing humanoid robots like N2 and E1 [2][3] - The N2 robot, priced at several tens of thousands, has gained substantial market traction, with total orders exceeding 2,500 units and a contract value surpassing 100 million yuan, positioning Songyan Power as a leading humanoid robot manufacturer [3][4] Group 2 - Songyan Power's strategy includes a focus on diverse application scenarios for its robots, targeting sectors such as education, research, cultural tourism, and commercial performances, with plans to expand into overseas markets [5][4] - The company has developed the "Xiao Nuo" robot, which features over 30 degrees of freedom for facial expressions, aimed at applications in elderly care, exhibition reception, and psychological counseling [5] - Beijing's supportive policies and initiatives since 2019 have fostered a robust robotics ecosystem, contributing to a 50% year-on-year growth in the city's robotics industry revenue in 2024 [5][6]

SIASUN(SZ:300024)

机器人产业

仿生人小诺

机器人产业

仿生人小诺

机器人操控新范式：一篇VLA模型系统性综述 | Jinqiu Select

锦秋集· 2025-09-02 13:41

Core Insights - The article discusses the emergence of Vision-Language-Action (VLA) models based on large Vision-Language Models (VLMs) as a transformative paradigm in robotic manipulation, addressing the limitations of traditional methods in unstructured environments [1][4][5] - It highlights the need for a structured classification framework to mitigate research fragmentation in the rapidly evolving VLA field [2] Group 1: New Paradigm in Robotic Manipulation - Robotic manipulation is a core challenge at the intersection of robotics and embodied AI, requiring deep understanding of visual and semantic cues in complex environments [4] - Traditional methods rely on predefined control strategies, which struggle in unstructured real-world scenarios, revealing limitations in scalability and generalization [4][5] - The advent of large VLMs has provided a revolutionary approach, enabling robots to interpret high-level human instructions and generalize to unseen objects and scenes [5][10] Group 2: VLA Model Definition and Classification - VLA models are defined as systems that utilize a large VLM to understand visual observations and natural language instructions, followed by a reasoning process that generates robotic actions [6][7] - VLA models are categorized into two main types: Monolithic Models and Hierarchical Models, each with distinct architectures and functionalities [7][8] Group 3: Monolithic Models - Monolithic VLA models can be implemented in single-system or dual-system architectures, integrating perception and action generation into a unified framework [14][15] - Single-system models process all modalities together, while dual-system models separate reflective reasoning from reactive behavior, enhancing efficiency [15][16] Group 4: Hierarchical Models - Hierarchical models consist of a planner and a policy, allowing for independent operation and modular design, which enhances flexibility in task execution [43] - These models can be further divided into Planner-Only and Planner+Policy categories, with the former focusing solely on planning and the latter integrating action execution [43][44] Group 5: Advancements in VLA Models - Recent advancements in VLA models include enhancements in perception modalities, such as 3D and 4D perception, as well as the integration of tactile and auditory information [22][23][24] - Efforts to improve reasoning capabilities and generalization abilities are crucial for enabling VLA models to perform complex tasks in diverse environments [25][26] Group 6: Performance Optimization - Performance optimization in VLA models focuses on enhancing inference efficiency through architectural adjustments, parameter optimization, and inference acceleration techniques [28][29][30] - Dual-system models have emerged to balance deep reasoning with real-time action generation, facilitating smoother deployment in real-world scenarios [35] Group 7: Future Directions - Future research directions include the integration of memory mechanisms, 4D perception, efficient adaptation, and multi-agent collaboration to further enhance VLA model capabilities [1][6]

视觉-语言-动作（VLA）模型

大型视觉语言模型（VLM）

机器人操控

具身人工智能

免训练优化

视觉-语言-动作（VLA）模型

大型视觉语言模型（VLM）

机器人操控

具身人工智能

免训练优化

性能逼近闭源最强，通义实验室开源Mobile-Agent-v3刷新10项GUI基准SOTA

机器之心· 2025-09-02 03:44

Core Viewpoint - The article highlights the launch of the GUI-Owl and Mobile-Agent-v3, which are advanced open-source models for GUI automation, showcasing superior performance compared to existing models and emphasizing their capabilities in various environments [1][29]. Group 1: Key Achievements - GUI-Owl has achieved state-of-the-art (SOTA) performance on both Android and desktop platforms, with the 32B model surpassing closed-source top models in multiple evaluations [21][29]. - The models are designed to operate in a cloud environment, allowing for dynamic task execution and data collection across multiple operating systems, including Android, Ubuntu, macOS, and Windows [11][29]. Group 2: Technical Innovations - The system employs a self-evolving data production chain that minimizes human involvement in generating high-quality training data, allowing the models to iteratively optimize themselves [11][14]. - GUI-Owl's capabilities include advanced UI element grounding, long task planning, and robust reasoning, enabling it to understand and execute complex tasks effectively [16][20]. Group 3: Reinforcement Learning Framework - A scalable reinforcement learning (RL) system has been developed to enhance the model's stability and adaptability in real-world environments, allowing it to learn continuously from its interactions [22][26]. - The introduction of the Trajectory-aware Relative Policy Optimization (TRPO) algorithm addresses the challenges of sparse and delayed reward signals in GUI automation tasks, improving learning efficiency [26]. Group 4: Conclusion - The release of GUI-Owl and Mobile-Agent-v3 represents a significant advancement in open-source GUI automation, providing a powerful tool for various applications while reducing deployment and resource costs [29].

Mobile-Agent-v3

Mobile-Agent-v3

XDog：具身低成本科研平台，四足机械狗+单臂（含VLA/强化学习/仿真/sim2real教程）

具身智能之心· 2025-09-02 02:00

Core Viewpoint - Xdog is a low-cost, multifunctional quadruped robotic dog and robotic arm development platform designed for embodied developers, featuring a comprehensive curriculum for research and learning in robotics [1][2]. Hardware Overview - Xdog integrates advanced functionalities such as voice control, sim2real, real2sim, target recognition and tracking, autonomous robotic arm grasping, and reinforcement learning gait control, covering most of the technology stack for embodied intelligent lower limb control [2][5]. - The robotic dog measures 25cm x 20cm x 30cm and weighs 7.0kg, with a maximum speed of 7.2 km/h and a maximum rotation speed of 450 degrees per second [3][11]. - The main control chip is Allwinner H616, featuring a quad-core 1.6GHz CPU, 4GB RAM, and 32GB storage [4][5]. - The robotic arm can reach a maximum height of 0.85m and has a grasping range of 0.4m around its base [7]. Software and Functionality - The system supports various control methods including voice control via TCP, keyboard control, visual control, and reinforcement learning for autonomous movement [15][17]. - Development is based on ROS1, with Python as the primary programming language, and it is recommended to use a GPU of at least 2080ti for inference [16][24]. - The platform includes a comprehensive curriculum covering topics from basic ROS knowledge to advanced reinforcement learning principles and practical applications [22][23]. Team and Support - The project is led by a team of experienced instructors responsible for project advancement, technical support, and course development [22]. - After-sales service is provided for one year post-delivery, with video and source code access granted immediately after hardware receipt [26]. Delivery and Consultation - The delivery cycle is set to be completed within three weeks after payment [25]. - For further inquiries, potential customers are encouraged to consult the assistant via WeChat [27].

蔚蓝机械狗

蔚蓝机械狗

大模型开始打王者荣耀了

量子位· 2025-09-02 01:40

Core Insights - The article discusses the implementation of the Think-In-Games (TiG) framework, which allows large language models to play the game Honor of Kings while learning in real-time, effectively bridging the gap between decision-making and action [1][3][4]. Group 1: TiG Framework Overview - TiG redefines decision-making based on reinforcement learning as a language modeling task, enabling models to generate strategies guided by language and optimize them through online reinforcement learning [3][4]. - The framework allows large language models to learn macro-level reasoning skills, focusing on long-term goals and team coordination rather than just micro-level actions [6][9]. - The model acts more like a strategic coach than a professional player, converting decisions into text and selecting macro actions based on game state [7][9]. Group 2: Training Methodology - The training process involves a multi-stage approach combining supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance model capabilities [12][16]. - The research team utilized a "relabeling algorithm" to ensure each game state is tagged with the most critical macro action, providing a robust signal for subsequent training [9][11]. - The Group Relative Policy Optimization (GRPO) algorithm is employed to maximize the advantages of generated content while limiting divergence from reference models [9][11]. Group 3: Experimental Results - The results indicate that the combination of SFT and GRPO significantly improves model performance, with Qwen-2.5-32B's accuracy increasing from 66.67% to 86.84% after applying GRPO [14][15]. - The Qwen-3-14B model achieved an impressive accuracy of 90.91% after training with SFT and GRPO [2][15]. - The TiG framework demonstrates competitive performance compared to traditional reinforcement learning methods while significantly reducing data and computational requirements [17].

TENCENT(HK:00700)

大语言模型

监督微调（SFT）

Think-In-Games (TiG)框架

大语言模型

监督微调（SFT）

Think-In-Games (TiG)框架