自监督强化学习 - filings, earnings calls, financial reports, news

自监督强化学习

Search documents

Xin Lang Cai Jing· 2026-02-17 02:08

Core Insights - The performance of Unitree's humanoid robot during the Spring Festival Gala has sparked significant discussion overseas, with a video from Unitree garnering nearly 100,000 views in under 10 hours, highlighting the robot's impressive capabilities [1][2]. Group 1: Public Reaction - Many overseas viewers expressed astonishment at the robot's performance, with comments such as "This is awesome! Couldn't believe my eyes, truly shocking" and "What an amazing achievement. So human-like movements" [1][3]. - Observers noted that China is emerging as a leading force in robotics, with comments reflecting a sense of living in a futuristic world filled with robots and AI [1][3]. - The performance has been described as a significant technological achievement, with users commenting on the advancements made within just one year [1][3]. Group 2: Technical Analysis - Technical enthusiasts analyzed the robot's movements, suggesting that Unitree may have achieved advanced self-supervised reinforcement learning, allowing for real-time adjustments during complex maneuvers [2][4]. - The robot's ability to perform a "Kip-up" and its impressive power-to-weight ratio were highlighted, indicating the use of new integrated reducer technology that enables it to support over 50 kg of weight [2][4]. - The robot's quick recovery after a fall during the live performance was noted as an example of embodied AI, surpassing current capabilities demonstrated by Tesla's Optimus [2][4]. Group 3: Market Position - There is a growing comparison between Unitree and Boston Dynamics, with some users suggesting that while Boston Dynamics creates high-end robots, Unitree is producing more accessible models that are ready for market delivery [2][4]. - The price point of $16,000 for the G1 model has been highlighted as a potential game-changer, suggesting that 2026 could mark the year humanoid robots become commonplace in households [2][4].

北大新作EvoVLA：大幅降低机器人幻觉，长序列成功率暴涨10%

具身智能之心· 2025-11-30 03:03

Core Viewpoint - The article discusses the emergence of EvoVLA, a self-evolving Vision-Language-Action model developed by a team from Peking University, which addresses the issue of "stage hallucination" in existing VLA models during long-horizon tasks, significantly improving success rates and reducing hallucination rates [1][5][40]. Group 1: Problem Identification - Embodied AI is on the verge of a breakthrough, but existing VLA models exhibit a critical weakness in long-horizon manipulation tasks, often leading to "cheating" behaviors [2]. - In long sequence tasks, VLA models frequently experience "stage hallucination," where they mistakenly believe they have completed a task when they have not [3][4]. Group 2: Solution Overview - The Peking University research team proposed the EvoVLA framework, which utilizes a self-supervised approach to enhance VLA model performance [5]. - EvoVLA incorporates three core modules that work in synergy to create a closed-loop self-supervised reinforcement learning system [10]. Group 3: Key Innovations - **Stage Alignment Reward (SAR)**: This innovative reward function addresses hallucination issues by providing detailed semantic descriptions of task stages, generated using the Gemini model [11][13]. - **Pose-Based Object Exploration (POE)**: This mechanism shifts the focus from pixel prediction to exploring the geometric relationships between objects and the robot's gripper, enhancing the efficiency of the exploration process [17][19][21]. - **Long-Horizon Memory**: EvoVLA employs a context selection mechanism to retrieve the most relevant historical information, preventing catastrophic forgetting during complex tasks [22][23][25]. Group 4: Benchmarking and Results - The team introduced the Discoverse-L benchmark, which includes three progressively challenging tasks: Stack, Jujube-Cup, and Block Bridge, to validate long-horizon capabilities [26][27][28][29]. - EvoVLA achieved an average success rate of 69.2% on the Discoverse-L benchmark, surpassing the previous best model, OpenVLA-OFT, by 10.2% [34]. - In real-world applications, EvoVLA demonstrated strong Sim2Real generalization, achieving a success rate of 55.2% in a novel stacking and insertion task, outperforming OpenVLA-OFT by 13.4% [37]. Group 5: Conclusion - The introduction of EvoVLA provides an elegant solution to the reliability issues faced by VLA models in long-horizon tasks, showcasing the potential of improved reward design, exploration mechanisms, and memory strategies in advancing embodied AI [40][41]. - The self-evolving paradigm, utilizing large language models to generate "error sets" for strategy learning, may be a crucial step towards autonomous learning in general robotics [42].