强化学习
Search documents
北京大学最新!MobileVLA-R1:机械臂之外,移动机器人的VLA能力怎么样了?
具身智能之心· 2025-11-30 03:03
Core Insights - The article discusses the introduction of MobileVLA-R1, a new framework for quadruped robots that bridges the gap between high-level semantic reasoning and low-level action control, addressing the challenges of stability and interpretability in existing methods [1][2][21]. Group 1: Need for Reconstruction of VLA Framework - Current quadruped robots face two main challenges: a semantic-control gap leading to instability in command execution and a lack of traceable reasoning that complicates error diagnosis [2]. - MobileVLA-R1's breakthrough lies in decoupling reasoning from action execution, allowing robots to "think clearly" before "acting accurately," enhancing both interpretability and control robustness [2][23]. Group 2: Implementation of MobileVLA-R1 - MobileVLA-R1 employs a structured CoT dataset, a two-stage training paradigm, and multi-modal perception fusion to achieve coherent reasoning, stable control, and strong generalization [4][6]. - The structured CoT dataset includes 18K episode-level samples, 78K step-level samples, and 38K navigation-specific samples, filling the gap in reasoning supervision from instruction to action [4][5]. Group 3: Performance Evaluation - In navigation tasks, MobileVLA-R1 achieved a success rate of 68.3% and 71.5% on R2R-CE and RxR-CE datasets, respectively, outperforming existing methods by an average of 5% [10]. - For quadruped control tasks, it achieved an average success rate of 73% across six locomotion and operation tasks, significantly surpassing baseline models [12][13]. Group 4: Real-World Deployment - MobileVLA-R1 was tested on the Unitree Go2 quadruped robot in various environments, demonstrating robust adaptation to complex scenarios with a success rate of 86%-91% for complex instructions [14][18]. - The integration of depth and point cloud encoders improved navigation success rates by 5.8%, highlighting the importance of 3D spatial information for scene understanding [19][20]. Group 5: Key Conclusions and Future Directions - MobileVLA-R1 innovatively integrates chain-of-thought reasoning with reinforcement learning, addressing the industry's dilemma of either interpretability or execution stability [21][23]. - Future directions include expanding the action space for more precise tasks, reducing reasoning latency through model optimization, and enhancing self-supervised learning to decrease reliance on labeled data [23].
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-11-29 04:02
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has extensive experience in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value, highlighting representative organizations and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
混元OCR模型核心技术揭秘:统一框架、真端到端
量子位· 2025-11-29 04:02
Core Insights - Tencent's HunyuanOCR model is a commercial-grade, open-source, lightweight OCR-specific visual language model with 1 billion parameters, combining native ViT and lightweight LLM architectures [1] - The model excels in perception capabilities (text detection and recognition, complex document parsing) and semantic abilities (information extraction, text-image translation), winning the ICDAR 2025 DIMT challenge and achieving SOTA results on OCRBench for models under 3 billion parameters [2] Model Performance and Popularity - HunyuanOCR ranks in the top four on Hugging Face's trending list, has over 700 stars on GitHub, and was integrated by the vllm official team on Day 0 [3] Team Achievements - The HunyuanOCR team has achieved three major breakthroughs: 1. Unified efficiency, supporting various tasks like text detection, complex document parsing, and visual question answering within a lightweight framework [5] 2. Simplified end-to-end architecture, eliminating dependencies on pre-processing and reducing deployment complexity [6] 3. Data-driven innovations using high-quality data and reinforcement learning to enhance OCR task performance [8] Core Technology - HunyuanOCR focuses on lightweight model structure design, high-quality pre-training data production, application-oriented pre-training strategies, and task-specific reinforcement learning [11] Lightweight Model Structure - The model employs an end-to-end training and inference paradigm, requiring only a single inference to achieve complete results, avoiding common issues of error accumulation in traditional architectures [14][19] High-Quality Data Production - The team built a large-scale multimodal training corpus with over 200 million "image-text pairs," covering nine core real-world scenarios and over 130 languages [21] Pre-Training Strategy - HunyuanOCR uses a four-stage pre-training strategy focusing on visual-language alignment and understanding, with specific stages dedicated to long document processing and application-oriented training [29][32] Reinforcement Learning Approach - The model innovatively applies reinforcement learning to enhance performance, using a hybrid strategy for structured tasks and LLM-based rewards for open-ended tasks [36] Data Quality and Reward Design - The data construction process emphasizes quality, diversity, and difficulty balance, utilizing LLM to filter low-quality data and ensuring effective training [39] - Adaptive reward designs are implemented for various tasks, ensuring precise and verifiable outputs [40][42]
明日开课!端到端量产究竟在做什么?我们筹备了一门落地课程...
自动驾驶之心· 2025-11-29 02:06
Core Viewpoint - The article emphasizes the importance of end-to-end production in the automotive industry, highlighting the scarcity of qualified talent and the need for comprehensive training programs to address various challenges in this field [1][3]. Course Overview - The course is designed to cover essential algorithms related to end-to-end production, including single-stage and two-stage frameworks, reinforcement learning applications, and trajectory optimization [3][9]. - It aims to provide practical experience and insights into production challenges, focusing on real-world applications and expert guidance [3][16]. Course Structure - Chapter 1 introduces the overview of end-to-end tasks, discussing the integration of perception and control algorithms, and the importance of efficient data handling [9]. - Chapter 2 focuses on the two-stage end-to-end algorithm framework, explaining its modeling and information transfer processes [10]. - Chapter 3 covers the single-stage end-to-end algorithm framework, emphasizing its advantages in information transmission and performance [11]. - Chapter 4 discusses the application of navigation information in autonomous driving, detailing the formats and encoding methods of navigation maps [12]. - Chapter 5 introduces reinforcement learning algorithms, highlighting their necessity in complementing imitation learning for better generalization [13]. - Chapter 6 involves practical projects on trajectory output optimization, combining imitation and reinforcement learning techniques [14]. - Chapter 7 presents fallback strategies for trajectory planning, focusing on smoothing algorithms to enhance output reliability [15]. - Chapter 8 shares production experiences from various perspectives, offering strategies for optimizing system capabilities [16]. Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [17][18].
AI大神伊利亚宣告 Scaling时代终结!断言AGI的概念被误导
混沌学园· 2025-11-28 12:35
Group 1 - The era of AI scaling has ended, and the focus is shifting back to research, as merely increasing computational power is no longer sufficient for breakthroughs [2][3][15] - A significant bottleneck in AI development is its generalization ability, which is currently inferior to that of humans [3][22] - Emotions serve as a "value function" for humans, providing immediate feedback for decision-making, a capability that AI currently lacks [3][6][10] Group 2 - The current AI models are becoming homogenized due to pre-training, and the path to differentiation lies in reinforcement learning [4][17] - SSI, the company co-founded by Ilya Sutskever, is focused solely on groundbreaking research rather than competing in computational power [3][31] - The concept of superintelligence is defined as an intelligence that can learn to do everything, emphasizing a growth mindset [3][46] Group 3 - To better govern AI, it is essential to gradually deploy and publicly demonstrate its capabilities and risks [4][50] - The industry should aim to create AI that cares for all sentient beings, which is seen as a more fundamental and simpler goal than focusing solely on humans [4][51] - The transition from the scaling era to a research-focused approach will require exploring new paradigms and methodologies [18][20]
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-11-28 04:11
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1][2] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies are penetrating various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Event Highlights - Key topics of discussion will include reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - The conference will showcase the latest collisions between academic frontiers and commercial applications, featuring leading technological achievements from infrastructure, models, and product industries [4] - An authoritative release of the annual AI rankings and the annual AI trend report is anticipated during the conference [5][116] Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, will be a key speaker [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, will also present [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, is among the notable attendees [19] - Other prominent figures include Wang Ying, Vice President of Baidu Group, and Han Xu, Founder and CEO of WeRide [24][28] Awards and Reports - The "Artificial Intelligence Annual Rankings" initiated by Quantum Bit has become one of the most influential rankings in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The "2025 Annual AI Trend Report" will analyze ten significant AI trends based on technological maturity, current implementation, and potential value, highlighting representative organizations and best cases [118] Conference Details - The MEET2026 Smart Future Conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [119][121] - The event aims to attract thousands of technology professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
理想披露了一些新的技术信息
自动驾驶之心· 2025-11-28 00:49
Core Insights - The article discusses the advancements and challenges faced by Li Auto in the development of its autonomous driving technology, particularly focusing on the end-to-end model and VLA (Vision-Language-Action) integration [2][5][9]. Group 1: Model Performance and Data Utilization - The performance improvement of end-to-end models slows down after reaching a certain amount of training data, specifically after 10 million clips, where the model's MPI (Miles Per Interaction) only doubled in five months [5]. - To enhance model performance, Li Auto adjusted the training data mix, increasing the quantity of generated data, including corner cases, and implementing manual rules for safety and compliance in special scenarios [5][9]. Group 2: VLA Integration and Decision-Making - The introduction of VLA aims to enhance the decision-making capabilities of the end-to-end model, addressing issues such as illogical behavior, lack of deep thinking in decision-making, and insufficient preventive judgment based on scenarios [5][6]. - VLA incorporates spatial intelligence, linguistic intelligence, and action policy, allowing the model to understand and communicate spatial information effectively, and generate smooth driving trajectories using diffusion models [6][9]. Group 3: Simulation and Testing Efficiency - Li Auto upgraded its model evaluation methods by utilizing a world model for closed-loop simulation and testing, significantly reducing testing costs from 18.4 per kilometer to 0.53 per kilometer [9][11]. - The closed-loop training framework AD-R1 was introduced, allowing for efficient data management and reinforcement learning, with high-value data being processed through a series of steps back to the cloud platform [11][12]. Group 4: Computational Power and Resources - Li Auto's total computational power is 13 EFLOPS, with 3 EFLOPS dedicated to inference and 10 EFLOPS for training, utilizing 50,000 training and inference cards [13]. - The emphasis on inference power is crucial in the VLA era, as it is necessary for generating simulation training environments [13].
NeurIPS 2025最佳论文开奖,何恺明、孙剑等十年经典之作夺奖
3 6 Ke· 2025-11-27 07:27
Core Insights - NeurIPS 2025 announced its best paper awards, with four papers recognized, including a significant contribution from Chinese researchers [1][2] - The "Test of Time Award" was given to Faster R-CNN, highlighting its lasting impact on the field of computer vision [1][50] Best Papers - The first best paper titled "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)" was authored by a team from multiple prestigious institutions, including Washington University and Carnegie Mellon University [5][6] - The second best paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free," involved collaboration between researchers from Alibaba, Edinburgh University, Stanford University, MIT, and Tsinghua University [14][15] - The third best paper, "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities," was authored by researchers from Princeton University and Warsaw University of Technology [21][24] - The fourth best paper, "Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training," was a collaborative effort from PSL University and Bocconi University [28][29] Runners Up - Three runner-up papers were also recognized, including "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" from Tsinghua University and Shanghai Jiao Tong University [33][34] - Another runner-up paper titled "Optimal Mistake Bounds for Transductive Online Learning" was authored by researchers from Kent State University, Purdue University, Google Research, and MIT [38][39] - The third runner-up paper, "Superposition Yields Robust Neural Scaling," was from MIT [42][46] Test of Time Award - The "Test of Time Award" was awarded to the paper "Faster R-CNN," which has been cited over 56,700 times and has significantly influenced the computer vision field [50][52] - The paper introduced a fully learnable two-stage process that replaced traditional methods, achieving high detection accuracy and near real-time speeds [50][52]
离开OpenAI后,苏茨克维1.5小时长谈:AGI最快5年实现
3 6 Ke· 2025-11-27 05:43
Core Insights - The interview discusses the strategic vision of Safe Superintelligence (SSI) and the challenges in AI model training, particularly the gap between model performance in evaluations and real-world applications [1][3][5]. Group 1: AI Development and Economic Impact - SSI's CEO predicts that human-level AGI will be achieved within 5 to 20 years [5]. - Current AI investments, such as allocating 1% of GDP to AI, are seen as significant yet underappreciated by society [3][5]. - The economic impact of AI is expected to become more pronounced as AI technology permeates various sectors [3][5]. Group 2: Model Performance and Training Challenges - There is a "jagged" performance gap where models excel in evaluations but often make basic errors in practical applications [5][6]. - The reliance on large datasets and computational power for training has reached its limits, indicating a need for new approaches [5][6]. - The training environments may inadvertently optimize for evaluation metrics rather than real-world applicability, leading to poor generalization [6][21]. Group 3: Research and Development Focus - SSI is prioritizing research over immediate commercialization, aiming for a direct path to superintelligence [5][27]. - The company believes that fostering competition among AI models can help break the "homogeneity" of current models [5][27]. - The shift from a "scaling" era back to a "research" era is anticipated, emphasizing the need for innovative ideas rather than just scaling existing models [17][28]. Group 4: Value Function and Learning Mechanisms - The concept of a value function is likened to human emotions, suggesting it could guide AI learning more effectively [11][12]. - The importance of internal feedback mechanisms in human learning is highlighted, which could inform better AI training methodologies [25][39]. - SSI's approach may involve deploying AI systems that learn from real-world interactions, enhancing their adaptability and effectiveness [35][37]. Group 5: Future of AI and Societal Implications - The potential for rapid economic growth driven by advanced AI systems is acknowledged, with varying impacts based on regulatory environments [38][39]. - SSI's vision includes developing AI that cares for sentient beings, which may lead to more robust and empathetic AI systems [41][42]. - The company is aware of the challenges in aligning AI with human values and the importance of demonstrating AI's capabilities to the public [40][41].
月之暗面公开强化学习训练加速方法:训练速度暴涨97%,长尾延迟狂降93%
量子位· 2025-11-27 04:34
Core Viewpoint - The article discusses the introduction of a new acceleration engine called Seer, developed by Moonlight and Tsinghua University, which significantly enhances the reinforcement learning (RL) training speed of large language models (LLMs) without altering the core training algorithms [1][8]. Summary by Sections Performance Improvement - Seer can improve the rollout efficiency of synchronous RL by 74% to 97% and reduce long-tail delays by 75% to 93% [3][23]. Technical Architecture - Seer consists of three main modules: 1. **Inference Engine Pool**: Built on DRAM/SSD, it includes multiple inference instances and a global KVCache pool for load balancing and data reuse [9]. 2. **Request Buffer**: Acts as a unified entry for all rollout requests, managing metadata and request states for precise resource scheduling [10]. 3. **Context Manager**: Maintains context views for all requests and generates scheduling decisions based on context signals [11]. Key Technologies - **Divided Rollout**: This technique breaks down responses into independent requests and segments, reducing memory fluctuations and load imbalance [12][13]. - **Context-Aware Scheduling**: Implements a "speculative request" strategy to prioritize obtaining length features for requests, thus alleviating long request delays [17]. - **Adaptive Grouped Speculative Decoding**: Utilizes similar response patterns within groups to create a dynamic reference library for generating drafts, enhancing decoding efficiency [19]. Experimental Validation - In experiments with models like Moonlight, Qwen2-VL-72B, and Kimi-K2, Seer demonstrated a throughput increase of 74% to 97% compared to the baseline system veRL, with significantly reduced long-tail delays [21][23]. - For instance, in the Moonlight task, the last 10% of requests took 3984 seconds with veRL, while Seer reduced this to 364 seconds, achieving an 85% reduction in long-tail delays [23]. Financing and Future Plans - Moonlight is reportedly nearing completion of a new funding round, potentially raising several hundred million dollars, which could elevate its valuation to $4 billion [32][33]. - The company is in discussions with investment firms, including IDG Capital and existing shareholder Tencent, with plans to complete the funding by the end of the year and initiate an IPO process in the following year [36][37].