强化学习
Search documents
又帮到了一位同学拿到了VLA算法岗......
具身智能之心· 2025-08-22 16:03
Core Insights - The article emphasizes the importance of joining the "Embodied Intelligence Heart Knowledge Planet," a comprehensive community for learning and sharing knowledge in the field of embodied intelligence, which is rapidly growing in popularity and demand [1][16][85]. Community Features - The community offers a variety of resources including video content, written materials, learning pathways, Q&A sessions, and job exchange opportunities, aiming to create a robust platform for both beginners and advanced learners in embodied intelligence [1][2][17]. - It has established a job referral mechanism with multiple leading companies in the embodied intelligence sector, facilitating direct connections between job seekers and employers [10][17]. Learning Resources - The community has compiled over 30 technical pathways, covering various aspects of embodied intelligence, such as data collection, algorithm deployment, and simulation [2][16]. - It provides access to nearly 40 open-source projects and 60 datasets related to embodied intelligence, significantly reducing the time needed for research and development [16][30][36]. Networking and Collaboration - The community hosts roundtable discussions and live broadcasts to share insights on the latest developments in the embodied intelligence industry, fostering collaboration among members [4][76]. - Members can freely ask questions and receive guidance on career choices and research directions, enhancing the collaborative learning environment [78]. Industry Insights - The community includes members from renowned universities and leading companies in the field, ensuring a diverse range of expertise and perspectives [16][20][21]. - It provides summaries of industry reports and research papers, keeping members informed about the latest trends and applications in embodied intelligence [23][26].
用三组关键词囊括所有看好理想人士近期对理想的观点
理想TOP2· 2025-08-22 13:29
Core Viewpoint - The article discusses the differing perspectives of VC (Venture Capital) and PE (Private Equity) mindsets towards the company 理想 (Li Auto), highlighting how these mindsets influence the evaluation of the company's potential and performance. VC Mindset - The VC mindset focuses on long-term potential, often looking at a 3-5 year horizon and analyzing the core value or potential of 理想 in the context of being a leading physical AI company [2] - VCs are more tolerant of mistakes and failures during the long-term goal achievement process, believing in the transformative potential of AI technology [2][5] - The VC perspective emphasizes the low marginal cost of software and the significant future value creation potential, regardless of immediate financial metrics [9] PE Mindset - The PE mindset is more short-term oriented, typically evaluating the company on a timeline of less than a year, focusing on concrete financial metrics such as sales volume, revenue, and profit margins [3] - PEs require solid evidence of value and are less forgiving of short-term misjudgments, leading to a more critical view of 理想's recent performance [4][19] - The PE perspective is influenced by recent financial data, which has been disappointing, leading to a low evaluation based on specific performance metrics [15][16] Physical AI - 理想's approach to physical AI combines AI software with hardware, representing a significant advancement over traditional software-hardware integration [6][7] - The article emphasizes the unique capabilities of 理想 in achieving a high level of integration between AI software and hardware, which may be underestimated by those focused solely on hardware or traditional software [7] Recent Performance and Criticism - Recent performance metrics have led to criticism from the PE perspective, particularly regarding delivery targets and product expectations [15][16] - Specific issues highlighted include unmet delivery guidance, product delays, and customer dissatisfaction, which have contributed to a negative perception among PE investors [16] - The article notes that while the VC mindset may overlook these issues due to a focus on long-term potential, the PE mindset is less tolerant of such discrepancies [18][19]
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]
AI已迷失方向?强化学习教父Sutton最新发布OaK架构,挑战当前AI范式,提出超级智能新构想
AI科技大本营· 2025-08-22 08:05
Core Concept - The OaK architecture is a systematic response to the need for intelligent agents that can continuously learn, model the world, and plan effectively, aiming to achieve superintelligence through experiential learning [3][5][7]. Group 1: OaK Architecture Overview - OaK architecture is a model-based reinforcement learning framework characterized by continuous learning components, specialized learning rates for each weight, and a five-step evolution path called FC-STOMP [3][26]. - The architecture emphasizes the importance of runtime learning over design-time learning, advocating for online learning where agents learn from real-world interactions [13][14][21]. Group 2: Key Features of OaK - The architecture is designed to be domain-general, empirical, and capable of open-ended complexity, allowing agents to form necessary concepts based on their computational resources [16][19]. - The "Big World" hypothesis posits that the world is far more complex than any intelligent agent can fully comprehend, leading to the conclusion that agents must operate with approximate models and strategies [19][20]. Group 3: Learning Mechanisms - OaK architecture introduces the concept of subproblems, where agents autonomously generate subproblems based on curiosity and intrinsic motivation, facilitating a cycle of problem-solving and feature generation [28][31]. - The architecture's core process involves eight steps that include learning main strategies, generating new state features, creating subproblems, and using learned models for planning [27][29]. Group 4: Challenges and Future Directions - Two significant challenges remain: ensuring reliable continual deep learning and generating new state features, which are critical for the architecture's success [37][38]. - The OaK framework aims to provide a comprehensive solution to fundamental AI problems, offering a mechanism for how learned models can be used for planning, which is currently lacking in AI [40].
快手Klear-Reasoner登顶8B模型榜首,GPPO算法双效强化稳定性与探索能力!
AI前线· 2025-08-22 06:07
Core Viewpoint - The competition in large language models has highlighted the importance of mathematical and coding reasoning capabilities, with the introduction of the Klear-Reasoner model by Kuaishou's Klear team, which achieves state-of-the-art performance in various benchmarks [1][2]. Group 1: Model Performance - Klear-Reasoner outperforms other strong open-source models in benchmarks such as AIME2024 and AIME2025, achieving scores of 90.5% and 83.2% respectively, making it the top 8B model [2]. - The model's performance is attributed to the innovative GPPO (Gradient-Preserving Clipping Policy Optimization) algorithm, which enhances exploration capabilities while maintaining training stability [5][24]. Group 2: Technical Innovations - The GPPO algorithm allows for the retention of all gradients during training, which contrasts with traditional clipping methods that can hinder model exploration and slow down convergence [8][10]. - GPPO enables high-entropy tokens to participate in backpropagation, thus preserving exploration ability and accelerating error correction [10]. Group 3: Training Methodology - The Klear team emphasizes the importance of data quality over quantity during the supervised fine-tuning (SFT) phase, demonstrating that high-quality data sources yield better training efficiency and outcomes [12]. - For high-difficulty tasks, retaining some erroneous samples can enhance model performance by providing additional exploration opportunities [16]. - In the reinforcement learning (RL) phase, using soft rewards based on test case pass rates is more effective than hard rewards, leading to improved training stability and efficiency [19]. Group 4: Future Implications - The release of Klear-Reasoner not only showcases impressive performance but also offers a reproducible and scalable approach for reasoning models in supervised and reinforcement learning tasks, providing valuable insights for future applications in mathematics, coding, and other RLVR tasks [24].
从繁杂技巧到极简方案:ROLL团队带来RL4LLM新实践
机器之心· 2025-08-22 04:58
本研究由淘天集团算法技术—未来生活实验室与爱橙科技智能引擎事业部联合完成 ,核心作者 刘子贺,刘嘉顺, 贺彦程和王维埙等 。未来生活实验室汇聚淘天 集团的算力、数据与顶尖技术人才,专注于大模型、多模态等前沿 AI 方向,致力于打造基础算法、模型能力及各类 AI Native 应用,引领 AI 在生活消费 领域的技术创新。爱橙科技则在大模型训练与优化方面具有丰富的实践经验。双方此前联合开源了高效大模型强化学习训练框架 ROLL,此次论文工作同样 是基于 ROLL 框架的实践探索。 近年来,强化学习(Reinforcement Learning, RL)在提升大语言模型(LLM)复杂推理能力方面展现出显著效果,广泛应用于数学解题、代码生成等任 务。通过 RL 微调的模型常在推理性能上超越仅依赖监督微调或预训练的模型。也因此催生了大量的相关研究。但随之而来的,是一系列令人困惑的现象: 不同研究提出了不同的 RL 优化技巧,却缺乏统一的实验对比和机制解释,有的甚至得出相互矛盾的结论。对于研究者和工程师而言,这种 "方法多、结论 乱" 的局面,反而增加了落地应用的难度。 为此,阿里巴巴淘天集团和爱橙科技联合多所高校,基 ...
动捕设备能成为具身大模型的下一场蓝海吗?
机器人大讲堂· 2025-08-21 10:11
Group 1: Development of Embodied Intelligence - The concept of embodied intelligence dates back to the 1950s, with Turing laying the groundwork for its potential development [1] - Significant theoretical support was provided by researchers like Rodney Brooks and Rolf Pfeifer in the 1980s and 1990s, marking the early exploration and theoretical development phase [1] - The early 2000s saw the integration of interdisciplinary methods and technologies, leading to a more complete academic branch of embodied intelligence [1] - The rapid advancement of deep learning technology in the mid-2010s injected new momentum into the field, leading to increased industrial application since 2020 [1] Group 2: Large Models and Their Evolution - Large models refer to machine learning models with vast parameter counts, widely applied in NLP, computer vision, and multimodal fields [2] - The development of large models can be traced back to early AI research focused on logic reasoning and expert systems, which were limited by hard-coded knowledge [2] - The introduction of the Transformer model by Google in 2017 significantly enhanced sequence modeling capabilities, leading to the mainstream adoption of pre-trained language models [2] - The emergence of ChatGPT in late 2022 propelled advancements in the NLP field, with GPT-4 introducing multimodal capabilities in March 2023 [2] Group 3: Embodied Large Models - Embodied large models evolved from non-embodied large models, initially focusing on single-modal language models before expanding to multimodal inputs and outputs [4] - Google's RT series exemplifies embodied large models, with RT-1 integrating vision, language, and robotic actions for the first time in 2022, and RT-2 enhancing multimodal fusion and generalization capabilities in 2023 [4] - The future of embodied large models is expected to move towards more general applications, driven by foundational models like RFM-1 [4] Group 4: Data as a Core Barrier - The competition between real data and synthetic data is crucial for embodied robots, which often face challenges such as data scarcity and high collection costs [15] - The scale of embodied robot datasets is significantly smaller compared to text and image datasets, with only 2.4 million data points available [15] - Various organizations are expected to release high-quality embodied intelligence datasets in 2024, such as AgiBotWorld and Open X-Embodiment [15] Group 5: Motion Capture Systems - Motion capture technology records and analyzes real-world actions, evolving from manual keyframe drawing to modern high-precision methods [23] - The motion capture system consists of hardware (sensors, cameras) and software (data processing modules), generating three-dimensional motion data [23] - Different types of motion capture systems include mechanical, acoustic, electromagnetic, inertial, and optical systems, each with its own advantages and limitations [25] Group 6: Key Companies in Motion Capture Industry - Beijing Duliang Technology specializes in optical 3D motion capture systems, offering high-resolution and high-precision solutions [28] - Lingyun Technology is a professional supplier of configurable vision systems, providing optical motion capture systems with real-time tracking capabilities [29] - Aofei Entertainment focuses on motion capture solutions through investments in companies like Nuoyiteng, which offers high-precision products based on MEMS inertial sensors [30] - Liyade is a leading company in audiovisual technology, utilizing optical motion capture technology for various applications [31] - Zhouming Technology has developed a non-wearable human posture motion capture system that leverages computer vision and AI [32] - Xindong Lianke focuses on high-performance MEMS inertial sensors, expanding its business into motion capture hardware for robots [33]
上汽通用“牵手”Momenta,别克至境L7将化身“AI驾驶宗师”
Xin Lang Cai Jing· 2025-08-21 06:47
Core Insights - SAIC-GM has signed a strategic cooperation agreement with Momenta to enhance advanced driver assistance systems tailored for Chinese roads and users [1] Group 1: Strategic Partnership - The collaboration aims to leverage technological integration and safety expertise to develop advanced driving technologies [1][4] - This partnership signifies Buick's commitment to combining global automotive expertise with local innovation to lead in the new energy vehicle market [4] Group 2: Product Launch - Buick's high-end electric sub-brand "Zhijing" will launch its first intelligent luxury sedan, the Zhijing L7, featuring the Momenta R6 flying model based on end-to-end reinforcement learning [2] - The Zhijing L7 will offer full-scene driving assistance capabilities, including "no-break" urban NOA and the industry's first "no-stop one-button parking" feature [2][3] Group 3: Technological Advancements - The Momenta R6 flying model utilizes 3 billion kilometers of real-world driving data and 70 million key data points to enhance its decision-making and adaptability in complex driving scenarios [2] - The vehicle can effectively handle challenging situations such as close-cutting, blind spots, and other high-risk scenarios, providing a seamless driving experience [2][3] Group 4: User Experience Enhancements - The Zhijing L7's advanced features include smooth driving performance akin to experienced human drivers, precise recognition in narrow road conditions, and efficient toll booth navigation [3] - The "no-stop one-button parking" feature allows for real-time parking space recognition and optimal trajectory planning, significantly improving parking efficiency [3]
喝点VC|a16z对话OpenAI研究员:GPT-5的官方解析,高质量使用场景将取代基准测试成为AGI真正衡量标准
Z Potentials· 2025-08-21 03:09
Core Viewpoint - The release of ChatGPT-5 marks a significant advancement in AI capabilities, particularly in reasoning, programming, and creative writing, with notable improvements in reliability and behavior design [3][4][6]. Group 1: Model Improvements - ChatGPT-5 has shown a substantial reduction in issues related to flattery and hallucination, indicating a more reliable interaction model [4][14]. - The model's programming capabilities have seen a qualitative leap, allowing users to create applications with minimal coding knowledge, which is expected to foster the emergence of many small businesses [6][17]. - The team emphasizes the importance of user experience and practical applications as key metrics for evaluating model performance, rather than just benchmark scores [20][21]. Group 2: Training and Development - The development process for ChatGPT-5 involved a focus on desired capabilities, with the team designing assessments to reflect real user value [22][23]. - The integration of deep research capabilities into the model has enhanced its ability to perform complex tasks efficiently, leveraging high-quality data and reinforcement learning [16][26]. - Mid-training techniques have been introduced to update the model's knowledge and improve its performance without the need for extensive retraining [45]. Group 3: Future Implications - The advancements in ChatGPT-5 are expected to unlock new use cases and increase daily usage among a broader audience, which is seen as a critical indicator of progress towards AGI [21][15]. - The model's ability to assist in creative writing has been highlighted, showcasing its potential to help users with complex writing tasks [29][31]. - The future of AI is anticipated to be characterized by the rise of autonomous agents capable of performing real-world tasks, with ongoing research focused on enhancing their capabilities [36][41].
突破Agent长程推理效率瓶颈!MIT&新加坡国立联合推出强化学习新训练方法
量子位· 2025-08-20 10:21
Core Viewpoint - The MEM1 framework, developed by MIT and the National University of Singapore, addresses the challenges faced by AI agents in managing complex tasks and memory efficiently, achieving significant improvements in inference speed and memory usage compared to traditional models [2][22]. Group 1: Framework Overview - MEM1 framework allows AI agents to autonomously manage their working memory and reasoning processes, akin to how humans organize thoughts after a period of work [4][10]. - The framework introduces a near constant memory usage model, significantly reducing the computational cost associated with increasing dialogue rounds [6][12]. Group 2: Performance Metrics - The MEM1-7B model demonstrates a 3.5 times faster inference speed compared to a traditional 14B model, while maintaining a peak token count that is approximately one-fourth of the latter [2][3]. - In a complex 16-target task, MEM1 outperformed larger models and those with external memory modules across accuracy, context length, and inference speed [17][18]. Group 3: Training Methodology - MEM1 employs an end-to-end reinforcement learning approach, utilizing an attention masking mechanism that allows the agent to focus on relevant historical information while compressing it efficiently [12][22]. - The training process involves three key operations: extracting key information, integrating it with internal memory, and pruning redundant content [14][20]. Group 4: Practical Applications - The MEM1 framework has been tested in various environments, including document retrieval QA, open-domain web QA, and multi-round online shopping scenarios, showcasing its adaptability and effectiveness in real-world applications [19][20]. Group 5: Industry Implications - The traditional approach in the industry has been to integrate external memory modules, which can be cumbersome and less effective; MEM1's approach suggests a shift towards self-managed memory systems through reinforcement learning [22].