Workflow
强化学习
icon
Search documents
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]
AI已迷失方向?强化学习教父Sutton最新发布OaK架构,挑战当前AI范式,提出超级智能新构想
AI科技大本营· 2025-08-22 08:05
Core Concept - The OaK architecture is a systematic response to the need for intelligent agents that can continuously learn, model the world, and plan effectively, aiming to achieve superintelligence through experiential learning [3][5][7]. Group 1: OaK Architecture Overview - OaK architecture is a model-based reinforcement learning framework characterized by continuous learning components, specialized learning rates for each weight, and a five-step evolution path called FC-STOMP [3][26]. - The architecture emphasizes the importance of runtime learning over design-time learning, advocating for online learning where agents learn from real-world interactions [13][14][21]. Group 2: Key Features of OaK - The architecture is designed to be domain-general, empirical, and capable of open-ended complexity, allowing agents to form necessary concepts based on their computational resources [16][19]. - The "Big World" hypothesis posits that the world is far more complex than any intelligent agent can fully comprehend, leading to the conclusion that agents must operate with approximate models and strategies [19][20]. Group 3: Learning Mechanisms - OaK architecture introduces the concept of subproblems, where agents autonomously generate subproblems based on curiosity and intrinsic motivation, facilitating a cycle of problem-solving and feature generation [28][31]. - The architecture's core process involves eight steps that include learning main strategies, generating new state features, creating subproblems, and using learned models for planning [27][29]. Group 4: Challenges and Future Directions - Two significant challenges remain: ensuring reliable continual deep learning and generating new state features, which are critical for the architecture's success [37][38]. - The OaK framework aims to provide a comprehensive solution to fundamental AI problems, offering a mechanism for how learned models can be used for planning, which is currently lacking in AI [40].
快手Klear-Reasoner登顶8B模型榜首,GPPO算法双效强化稳定性与探索能力!
AI前线· 2025-08-22 06:07
Core Viewpoint - The competition in large language models has highlighted the importance of mathematical and coding reasoning capabilities, with the introduction of the Klear-Reasoner model by Kuaishou's Klear team, which achieves state-of-the-art performance in various benchmarks [1][2]. Group 1: Model Performance - Klear-Reasoner outperforms other strong open-source models in benchmarks such as AIME2024 and AIME2025, achieving scores of 90.5% and 83.2% respectively, making it the top 8B model [2]. - The model's performance is attributed to the innovative GPPO (Gradient-Preserving Clipping Policy Optimization) algorithm, which enhances exploration capabilities while maintaining training stability [5][24]. Group 2: Technical Innovations - The GPPO algorithm allows for the retention of all gradients during training, which contrasts with traditional clipping methods that can hinder model exploration and slow down convergence [8][10]. - GPPO enables high-entropy tokens to participate in backpropagation, thus preserving exploration ability and accelerating error correction [10]. Group 3: Training Methodology - The Klear team emphasizes the importance of data quality over quantity during the supervised fine-tuning (SFT) phase, demonstrating that high-quality data sources yield better training efficiency and outcomes [12]. - For high-difficulty tasks, retaining some erroneous samples can enhance model performance by providing additional exploration opportunities [16]. - In the reinforcement learning (RL) phase, using soft rewards based on test case pass rates is more effective than hard rewards, leading to improved training stability and efficiency [19]. Group 4: Future Implications - The release of Klear-Reasoner not only showcases impressive performance but also offers a reproducible and scalable approach for reasoning models in supervised and reinforcement learning tasks, providing valuable insights for future applications in mathematics, coding, and other RLVR tasks [24].
从繁杂技巧到极简方案:ROLL团队带来RL4LLM新实践
机器之心· 2025-08-22 04:58
本研究由淘天集团算法技术—未来生活实验室与爱橙科技智能引擎事业部联合完成 ,核心作者 刘子贺,刘嘉顺, 贺彦程和王维埙等 。未来生活实验室汇聚淘天 集团的算力、数据与顶尖技术人才,专注于大模型、多模态等前沿 AI 方向,致力于打造基础算法、模型能力及各类 AI Native 应用,引领 AI 在生活消费 领域的技术创新。爱橙科技则在大模型训练与优化方面具有丰富的实践经验。双方此前联合开源了高效大模型强化学习训练框架 ROLL,此次论文工作同样 是基于 ROLL 框架的实践探索。 近年来,强化学习(Reinforcement Learning, RL)在提升大语言模型(LLM)复杂推理能力方面展现出显著效果,广泛应用于数学解题、代码生成等任 务。通过 RL 微调的模型常在推理性能上超越仅依赖监督微调或预训练的模型。也因此催生了大量的相关研究。但随之而来的,是一系列令人困惑的现象: 不同研究提出了不同的 RL 优化技巧,却缺乏统一的实验对比和机制解释,有的甚至得出相互矛盾的结论。对于研究者和工程师而言,这种 "方法多、结论 乱" 的局面,反而增加了落地应用的难度。 为此,阿里巴巴淘天集团和爱橙科技联合多所高校,基 ...
动捕设备能成为具身大模型的下一场蓝海吗?
机器人大讲堂· 2025-08-21 10:11
Group 1: Development of Embodied Intelligence - The concept of embodied intelligence dates back to the 1950s, with Turing laying the groundwork for its potential development [1] - Significant theoretical support was provided by researchers like Rodney Brooks and Rolf Pfeifer in the 1980s and 1990s, marking the early exploration and theoretical development phase [1] - The early 2000s saw the integration of interdisciplinary methods and technologies, leading to a more complete academic branch of embodied intelligence [1] - The rapid advancement of deep learning technology in the mid-2010s injected new momentum into the field, leading to increased industrial application since 2020 [1] Group 2: Large Models and Their Evolution - Large models refer to machine learning models with vast parameter counts, widely applied in NLP, computer vision, and multimodal fields [2] - The development of large models can be traced back to early AI research focused on logic reasoning and expert systems, which were limited by hard-coded knowledge [2] - The introduction of the Transformer model by Google in 2017 significantly enhanced sequence modeling capabilities, leading to the mainstream adoption of pre-trained language models [2] - The emergence of ChatGPT in late 2022 propelled advancements in the NLP field, with GPT-4 introducing multimodal capabilities in March 2023 [2] Group 3: Embodied Large Models - Embodied large models evolved from non-embodied large models, initially focusing on single-modal language models before expanding to multimodal inputs and outputs [4] - Google's RT series exemplifies embodied large models, with RT-1 integrating vision, language, and robotic actions for the first time in 2022, and RT-2 enhancing multimodal fusion and generalization capabilities in 2023 [4] - The future of embodied large models is expected to move towards more general applications, driven by foundational models like RFM-1 [4] Group 4: Data as a Core Barrier - The competition between real data and synthetic data is crucial for embodied robots, which often face challenges such as data scarcity and high collection costs [15] - The scale of embodied robot datasets is significantly smaller compared to text and image datasets, with only 2.4 million data points available [15] - Various organizations are expected to release high-quality embodied intelligence datasets in 2024, such as AgiBotWorld and Open X-Embodiment [15] Group 5: Motion Capture Systems - Motion capture technology records and analyzes real-world actions, evolving from manual keyframe drawing to modern high-precision methods [23] - The motion capture system consists of hardware (sensors, cameras) and software (data processing modules), generating three-dimensional motion data [23] - Different types of motion capture systems include mechanical, acoustic, electromagnetic, inertial, and optical systems, each with its own advantages and limitations [25] Group 6: Key Companies in Motion Capture Industry - Beijing Duliang Technology specializes in optical 3D motion capture systems, offering high-resolution and high-precision solutions [28] - Lingyun Technology is a professional supplier of configurable vision systems, providing optical motion capture systems with real-time tracking capabilities [29] - Aofei Entertainment focuses on motion capture solutions through investments in companies like Nuoyiteng, which offers high-precision products based on MEMS inertial sensors [30] - Liyade is a leading company in audiovisual technology, utilizing optical motion capture technology for various applications [31] - Zhouming Technology has developed a non-wearable human posture motion capture system that leverages computer vision and AI [32] - Xindong Lianke focuses on high-performance MEMS inertial sensors, expanding its business into motion capture hardware for robots [33]
上汽通用“牵手”Momenta,别克至境L7将化身“AI驾驶宗师”
Xin Lang Cai Jing· 2025-08-21 06:47
Core Insights - SAIC-GM has signed a strategic cooperation agreement with Momenta to enhance advanced driver assistance systems tailored for Chinese roads and users [1] Group 1: Strategic Partnership - The collaboration aims to leverage technological integration and safety expertise to develop advanced driving technologies [1][4] - This partnership signifies Buick's commitment to combining global automotive expertise with local innovation to lead in the new energy vehicle market [4] Group 2: Product Launch - Buick's high-end electric sub-brand "Zhijing" will launch its first intelligent luxury sedan, the Zhijing L7, featuring the Momenta R6 flying model based on end-to-end reinforcement learning [2] - The Zhijing L7 will offer full-scene driving assistance capabilities, including "no-break" urban NOA and the industry's first "no-stop one-button parking" feature [2][3] Group 3: Technological Advancements - The Momenta R6 flying model utilizes 3 billion kilometers of real-world driving data and 70 million key data points to enhance its decision-making and adaptability in complex driving scenarios [2] - The vehicle can effectively handle challenging situations such as close-cutting, blind spots, and other high-risk scenarios, providing a seamless driving experience [2][3] Group 4: User Experience Enhancements - The Zhijing L7's advanced features include smooth driving performance akin to experienced human drivers, precise recognition in narrow road conditions, and efficient toll booth navigation [3] - The "no-stop one-button parking" feature allows for real-time parking space recognition and optimal trajectory planning, significantly improving parking efficiency [3]
喝点VC|a16z对话OpenAI研究员:GPT-5的官方解析,高质量使用场景将取代基准测试成为AGI真正衡量标准
Z Potentials· 2025-08-21 03:09
Core Viewpoint - The release of ChatGPT-5 marks a significant advancement in AI capabilities, particularly in reasoning, programming, and creative writing, with notable improvements in reliability and behavior design [3][4][6]. Group 1: Model Improvements - ChatGPT-5 has shown a substantial reduction in issues related to flattery and hallucination, indicating a more reliable interaction model [4][14]. - The model's programming capabilities have seen a qualitative leap, allowing users to create applications with minimal coding knowledge, which is expected to foster the emergence of many small businesses [6][17]. - The team emphasizes the importance of user experience and practical applications as key metrics for evaluating model performance, rather than just benchmark scores [20][21]. Group 2: Training and Development - The development process for ChatGPT-5 involved a focus on desired capabilities, with the team designing assessments to reflect real user value [22][23]. - The integration of deep research capabilities into the model has enhanced its ability to perform complex tasks efficiently, leveraging high-quality data and reinforcement learning [16][26]. - Mid-training techniques have been introduced to update the model's knowledge and improve its performance without the need for extensive retraining [45]. Group 3: Future Implications - The advancements in ChatGPT-5 are expected to unlock new use cases and increase daily usage among a broader audience, which is seen as a critical indicator of progress towards AGI [21][15]. - The model's ability to assist in creative writing has been highlighted, showcasing its potential to help users with complex writing tasks [29][31]. - The future of AI is anticipated to be characterized by the rise of autonomous agents capable of performing real-world tasks, with ongoing research focused on enhancing their capabilities [36][41].
突破Agent长程推理效率瓶颈!MIT&新加坡国立联合推出强化学习新训练方法
量子位· 2025-08-20 10:21
Core Viewpoint - The MEM1 framework, developed by MIT and the National University of Singapore, addresses the challenges faced by AI agents in managing complex tasks and memory efficiently, achieving significant improvements in inference speed and memory usage compared to traditional models [2][22]. Group 1: Framework Overview - MEM1 framework allows AI agents to autonomously manage their working memory and reasoning processes, akin to how humans organize thoughts after a period of work [4][10]. - The framework introduces a near constant memory usage model, significantly reducing the computational cost associated with increasing dialogue rounds [6][12]. Group 2: Performance Metrics - The MEM1-7B model demonstrates a 3.5 times faster inference speed compared to a traditional 14B model, while maintaining a peak token count that is approximately one-fourth of the latter [2][3]. - In a complex 16-target task, MEM1 outperformed larger models and those with external memory modules across accuracy, context length, and inference speed [17][18]. Group 3: Training Methodology - MEM1 employs an end-to-end reinforcement learning approach, utilizing an attention masking mechanism that allows the agent to focus on relevant historical information while compressing it efficiently [12][22]. - The training process involves three key operations: extracting key information, integrating it with internal memory, and pruning redundant content [14][20]. Group 4: Practical Applications - The MEM1 framework has been tested in various environments, including document retrieval QA, open-domain web QA, and multi-round online shopping scenarios, showcasing its adaptability and effectiveness in real-world applications [19][20]. Group 5: Industry Implications - The traditional approach in the industry has been to integrate external memory modules, which can be cumbersome and less effective; MEM1's approach suggests a shift towards self-managed memory systems through reinforcement learning [22].
强化学习大模型“上车”:上汽通用联手Momenta,解锁“老司机”智驾体验
Xin Hua Cai Jing· 2025-08-20 02:37
Core Viewpoint - The collaboration between SAIC General Motors and Momenta marks a significant advancement in the field of assisted driving technology, leveraging AI models to enhance vehicle safety and user experience [1][2][5]. Group 1: Strategic Collaboration - SAIC General Motors has signed a strategic cooperation agreement with Momenta to deepen collaboration in assisted driving technology [1]. - The Momenta R6 Flywheel model will debut in the Buick high-end new energy brand "Zhijing" with the Zhijing L7 sedan, offering full-scene assisted driving capabilities [1][2]. Group 2: Market Position and Historical Context - Momenta's city NOA (Navigation on Autopilot) has achieved over 50% cumulative installation among independent intelligent driving solution providers, leading the market [2]. - SAIC General Motors has a long history in assisted driving, starting with the Cadillac Deville in 1999, which was the first mass-produced vehicle to use infrared thermal imaging technology for animal detection [2][3]. Group 3: Technological Advancements - The R6 Flywheel model utilizes reinforcement learning, allowing the vehicle to evolve and improve its driving strategies beyond mere imitation of human drivers [3][4]. - The Zhijing L7 demonstrates superior performance in complex driving scenarios, effectively avoiding hazards in diverse road conditions [3][4]. Group 4: Enhanced Driving Experience - The Zhijing L7 features a "seamless" city NOA function that provides a driving experience akin to that of an experienced driver, with precise recognition and smooth stops [4]. - The vehicle's parking assistance includes a "no-stop one-button parking" feature, allowing it to identify parking spaces in real-time while in motion, significantly improving parking efficiency [5]. Group 5: System Integration and Future Prospects - The successful implementation of assisted driving technology requires the collaboration of various vehicle systems, including body structure, electronic architecture, and intelligent cockpit [5]. - This partnership signifies a new phase for SAIC General Motors' assisted driving technology, aiming to balance safety and driving fluidity while addressing industry pain points [5].
腾讯研究院AI速递 20250820
腾讯研究院· 2025-08-19 16:01
Core Insights - The article discusses advancements in generative AI models, highlighting new releases and updates from various companies, including Nvidia, OpenAI, and Tencent, among others. Group 1: Nvidia's Nemotron Nano 2 Model - Nvidia released the Nemotron Nano 2 model with 9 billion parameters, utilizing a Mamba-Transformer hybrid architecture, achieving inference throughput up to 6 times that of traditional models [1] - The model competes with Qwen3-8B, showing comparable or superior performance in mathematics, coding, reasoning, and long-context tasks, fully open-source and supporting a context length of 128K [1] - It was trained on 20 trillion tokens, compressing a 12 billion parameter model to 9 billion, and can be run on a single A10G GPU [1] Group 2: OpenAI's GPT Model Comparison - OpenAI's president Greg Brockman shared a comparison of responses from GPT-1 to GPT-5 using the same prompts, showcasing significant improvements in knowledge retention, logical structure, and language coherence [2] - The results indicated that earlier models like GPT-1 and GPT-2 often produced nonsensical answers, while GPT-5 provided more logical, rich, and emotionally valuable responses [2] - Interestingly, some users expressed a preference for the earlier models, finding them more "wild" and "unconventional," with GPT-1 being likened to "true AGI" [2] Group 3: DeepSeek Model Update - DeepSeek's latest online model has been upgraded to version 3.1, extending context length to 128K, available through official web, app, and mini-programs [3] - This update is a routine version iteration and is not related to the anticipated DeepSeek-R2, which is not expected to be released in August [3] - The expanded context capacity will enhance user experience in long document analysis, codebase understanding, and maintaining consistency in long conversations [3] Group 4: Nano Banana Model - The mysterious AI drawing model Nano Banana demonstrated exceptional character consistency in LMArena evaluations, accurately preserving facial features and expressions, outperforming competitors like GPT-4o and Flux [4] - Although not officially claimed, the model is said to originate from Google DeepMind and is currently only available in LMArena's battle mode without a public interface [4] - Besides character consistency, it excels in background replacement, style transfer, and text modification, effectively executing various complex image editing tasks [4] Group 5: Alibaba's Qwen-Image-Edit Model - Alibaba launched the Qwen-Image-Edit model, based on its 20 billion parameter Qwen-Image model, which supports both semantic and appearance editing capabilities [5][6] - The model can perform precise text editing while retaining the original font, size, and style, achieving state-of-the-art performance in multiple public benchmark tests [6] - It has shown excellent performance in tasks like adding signage, replacing backgrounds, and modifying clothing, though it still faces limitations in multi-round modifications and complex font generation [6] Group 6: Tencent's AutoCodeBench Dataset - Tencent's Mixyuan released the AutoCodeBench dataset to evaluate large model coding capabilities, featuring 3,920 high-difficulty problems across 20 programming languages [7] - The dataset is notable for its high difficulty, practicality, and diversity, with existing evaluations showing that leading industry models scored below 55, indicating its challenge [7] - A complete set of open-source tools is also available, including the data generation workflow AutoCodeGen and the evaluation tools AutoCodeBench-Lite and AutoCodeBench-Complete [7] Group 7: Higgsfield's Draw-to-Video Feature - AI startup Higgsfield introduced the Draw-to-Video feature, allowing users to draw arrows and shapes on images and input action commands to generate cinematic dynamic visuals [8] - This feature is complemented by the Product-to-Video function, supporting various video generation models, making it easier to create advertisement videos compared to text prompts [8] - Founded in October 2023, Higgsfield has garnered attention for its advanced cinematic control technology and user-friendly design [8] Group 8: Zhiyuan's A2 Humanoid Robot - Zhiyuan Robotics completed a 24-hour live broadcast of its humanoid robot A2 walking outdoors, achieving this feat in high temperatures of 37°C and ground temperatures of 61°C [9] - The A2 showcased strong environmental adaptability, autonomously navigating obstacles, planning paths, and adjusting gait without remote control, utilizing "hot-swappable" battery technology for quick recharging [9] - During the event, three industry dialogues were held to discuss the development path of humanoid robots, marking a significant milestone in transitioning from technology development to commercial production [9] Group 9: Richard Sutton's OaK Architecture - Richard Sutton, the father of reinforcement learning and 2024 ACM Turing Award winner, introduced the OaK architecture (Options and Knowledge), outlining a path to superintelligence through operational experience [10][11] - The OaK architecture consists of eight steps, including learning policies and value functions, generating state features, and maintaining metadata [11] - It emphasizes open-ended abstraction capabilities, enabling the active discovery of features and patterns during operation, though key technological prerequisites like continuous deep learning must be addressed to realize the superintelligence vision [11] Group 10: OpenAI's GPT-5 Release Review - OpenAI's VP and ChatGPT head Nick Turley acknowledged the failure to continue offering GPT-4o, underestimating user emotional attachment to models, and plans to provide clearer timelines for model discontinuation [12] - Turley noted a polarized user base, with casual users preferring simplicity while heavy users require complete model switching options, aiming to balance both needs through menu settings [12] - Regarding the business model, Turley mentioned strong growth in subscription services, with enterprise users increasing from 3 million to 5 million, and future exploration of transaction commissions while ensuring commercial interests do not interfere with content recommendations [12]