强化学习
Search documents
一个近300篇工作的综述!从“高层规划和低层控制”来看Manipulation任务的发展
具身智能之心· 2026-01-06 00:32
Core Insights - The article discusses the transformative advancements in robotic manipulation driven by the rapid development of visual, language, and multimodal learning, emphasizing the role of large foundation models in enhancing robots' perception and semantic representation capabilities [1][2]. Group 1: High-Level Planning - High-level planning is responsible for clarifying action intentions, organizing sequences, and allocating environmental attention, providing structured guidance for low-level execution [4]. - The core components of high-level planning include task decomposition and decision guidance, integrating multimodal information to address "what to do" and "in what order" [4]. - Task planning based on large language models (LLMs) maps natural language to task steps, with methods like SayCan and Grounded Decoding enhancing execution skill selection and planning capabilities [5]. - Multimodal large language models (MLLMs) break the limitations of pure text input by integrating visual and language reasoning, with models like PaLM-E and VILA demonstrating superior performance in embodied tasks [8]. - Code generation techniques convert planning into executable programs, improving the precision of language-based plans through methods like Code as Policies and Demo2Code [9]. - Motion planning utilizes LLMs and VLMs to generate continuous motion targets, linking high-level reasoning with low-level trajectory optimization [10]. - Usability learning focuses on establishing intrinsic associations between perception and action across geometric, visual, semantic, and multimodal dimensions [11]. - 3D scene representation transforms environmental perception into structured action proposals, bridging perception and action through techniques like Gaussian splatting [12]. Group 2: Low-Level Learning Control - Low-level control translates high-level planning into precise physical actions, addressing the "how to do" aspect of robotic manipulation [14]. - Learning strategies for skill acquisition are categorized into three main types, including pre-training and model-free reinforcement learning [16]. - Input modeling defines how robots perceive the world, emphasizing the integration of multimodal signals through reinforcement learning and imitation learning [18]. - Visual-action models utilize both 2D and 3D visual inputs to enhance action generation, while visual-language-action models integrate semantic, spatial, and temporal information [19]. - Additional modalities like tactile and auditory signals improve robustness in contact-rich manipulation scenarios [20]. Group 3: Challenges and Future Directions - Despite significant technological advancements, robotic manipulation faces four core challenges: the lack of universal architectures, data and simulation bottlenecks, insufficient multimodal physical interaction, and safety and collaboration issues [23][27][28][29]. - Future research directions include developing a "robotic brain" for flexible modal interfaces, establishing autonomous data collection mechanisms, enhancing multimodal physical interaction, and ensuring safety in human-robot collaboration [30]. - The review emphasizes the need for a unified framework that integrates high-level planning and low-level control, with a focus on overcoming data efficiency, physical interaction, and safety collaboration bottlenecks to facilitate the transition of robotic manipulation from laboratory settings to real-world applications [31].
田渊栋的2025年终总结:关于被裁和26年的研究方向
自动驾驶之心· 2026-01-06 00:28
Core Insights - The article discusses the complexities and challenges faced by the company in the context of project management and personal career decisions, particularly in the realm of AI and machine learning research [3][4][5]. Group 1: Project Management and Challenges - The company faced significant pressure when asked to assist with the Llama4 project, leading to a complex decision-making scenario that involved weighing potential outcomes and personal integrity [3]. - Despite the challenges, the company made progress in core areas of reinforcement learning, including training stability and model architecture design, which contributed to a shift in research perspectives [3]. Group 2: Career Decisions and Transitions - After over a decade with the company, there was contemplation about leaving, influenced by economic and personal factors, but ultimately a decision was made to stay, reflecting the difficulty of such transitions [4]. - The experience of navigating through ups and downs in the workplace provided valuable material for future creative endeavors, indicating a blend of professional and personal growth [5]. Group 3: Research Directions - The company is focusing on two main research directions for 2025: large model inference and understanding the "black box" of models, which has gained traction following the release of their continuous latent space reasoning work [6]. - Efforts to improve inference efficiency include various innovative approaches, such as using discrete tokens and parallel reasoning chains, which have shown promising results in reducing computational costs while enhancing performance [7]. Group 4: Interpretability and Future Directions - The company emphasizes the importance of interpretability in AI, arguing that understanding how AI systems work is crucial for ensuring ethical and effective use of technology [10]. - Current efforts to demystify model training processes are still in early stages, with a focus on deriving principles from first principles to guide future AI model design [11].
田渊栋2025年终总结:救火Llama4但被裁,现任神秘初创公司联创
机器之心· 2026-01-04 08:05
Core Insights - The article discusses the experiences and reflections of a prominent AI researcher, including the impact of layoffs at Meta and future work plans [1][2][3] Group 1: Layoffs and Career Reflections - The researcher was involved in the Llama 4 project during a critical period and faced the complexities of decision-making under pressure, leading to a deeper understanding of societal dynamics [4] - After over a decade at Meta, the researcher had contemplated leaving but ultimately decided to stay until the company made the decision for them, which provided new material for creative writing [5] - Following the layoffs, the researcher received numerous job offers but chose to become a co-founder of a new startup, indicating a shift towards entrepreneurship [6] Group 2: Research Directions for 2025 - The main research directions for 2025 include large model inference and understanding the "black box" of models, with a focus on improving training efficiency and interpretability [7][8] - The researcher’s team has made significant contributions to the field, including theoretical analyses and practical applications that enhance model performance and efficiency [8][9] Group 3: Importance of Interpretability - The article emphasizes the critical need for interpretability in AI, arguing that understanding how AI models work is essential for trust and effective deployment [11][12] - The challenges of explaining model behavior from first principles are highlighted, with a call for deeper insights into the emergent structures and training dynamics of AI models [12] Group 4: Future of Work and AI Integration - The integration of AI into the workforce is transforming traditional roles, with a shift from valuing human experience to assessing the ability to enhance AI capabilities [20][23] - The article presents two potential scenarios for the future: one where AI achieves superintelligence and another where traditional scaling methods fail, both underscoring the necessity of interpretability [21][23] Group 5: The Role of Independent Thinking - The future landscape will require individuals to maintain independent thought and creativity, as reliance on AI-generated content may lead to a decline in original thinking [29][30] - The transition from employee to entrepreneur or founder roles is emphasized, with a focus on having clear goals to drive proactive thinking and innovation [31][33]
前OpenAI CTO押注的赛道,被中国团队抢先跑通,AI「下半场」入场券人人有份
机器之心· 2026-01-04 03:01
Core Viewpoint - The article discusses the challenges faced by small entrepreneurs and researchers in the AI field amidst the dominance of large companies, highlighting the emergence of new tools like Mind Lab's MinT that aim to democratize access to advanced AI training capabilities [1][2][3]. Group 1: AI Landscape and Challenges - The AI landscape is increasingly perceived as a domain dominated by large companies, leaving smaller players and researchers feeling lost [1][2]. - The traditional path from academia to industry is being questioned, particularly regarding its relevance in the current AI environment [1]. - The saturation of pre-training models has led to new bottlenecks in deploying AI systems, necessitating a shift towards post-training and reinforcement learning [10][11]. Group 2: Innovations in Post-Training - Mind Lab, a research center backed by a team of young scientists, has developed the Mind Lab Toolkit (MinT), which allows efficient training of trillion-parameter models using standard CPUs, optimizing costs by tenfold [3][5]. - MinT is designed to address the limitations of current AI models that become "frozen" after training, enabling continuous learning from real-world interactions [23][24]. - The platform's architecture allows users to focus on data and algorithms while MinT manages the complexities of infrastructure, significantly enhancing engineering efficiency [31][39]. Group 3: Competitive Landscape - Mind Lab's MinT is positioned as a competitor to Thinking Machines' Tinker, with both platforms offering compatibility and advanced capabilities for post-training [21][25]. - MinT has achieved significant milestones, including being the first to implement 1T LoRA-RL for efficient reinforcement learning on trillion-parameter models, showcasing its technological leadership [25][36]. - The team behind MinT has published over 100 papers with more than 30,000 citations, indicating a strong research foundation [6]. Group 4: Market Applications and Benefits - MinT is expected to benefit startups in the agent domain and top academic labs that are constrained by computational resources, allowing them to validate algorithms at a lower cost [41][44]. - The platform supports a wide range of applications, from basic research to specific industry needs, demonstrating its versatility [44]. - By reducing the barriers to entry for reinforcement learning and post-training, MinT aims to empower more organizations to leverage advanced AI capabilities [49][50].
雷军:无论辅助驾驶多么先进,人驾还是非常关键
Sou Hu Cai Jing· 2026-01-03 14:52
Core Viewpoint - Xiaomi's founder and CEO Lei Jun launched the first live stream of 2026, showcasing the new Xiaomi YU7 and emphasizing the importance of safety in advanced driving assistance systems [3] Group 1: Product Launch and Features - The live stream lasted approximately four to five hours, highlighting the new features of the Xiaomi YU7 [3] - The enhanced Xiaomi HAD (Highway Assistance Driving) system incorporates reinforcement learning and world model technology, leading to significant improvements in user experience [3] Group 2: User Experience Enhancements - In terms of vertical experience, the vehicle's acceleration and braking are now smoother and more human-like, enhancing the sense of safety [3] - For lateral experience, the system demonstrates more decisive actions in acceleration, lane changes, and route planning [3] - The active safety capabilities have been upgraded, adding a new AES (Active Emergency Steering) function alongside the existing AEB (Automatic Emergency Braking) feature [3]
有300亿美元也未必“再造GPT-4”?NUS尤洋最新长文:拆穿AI增长瓶颈的真相
量子位· 2025-12-31 03:37
Core Viewpoint - The article discusses the growing anxiety surrounding the "AI bottleneck" as the third anniversary of ChatGPT approaches, questioning whether current technological paradigms can effectively utilize increased computational power to develop models significantly stronger than GPT-4 [1][2]. Group 1: Nature of Intelligence and Its Measurement - Intelligence is fundamentally about energy conversion, where AI has transformed electricity into reusable intelligence over the past decade, but the efficiency of this conversion is now under scrutiny [6]. - The essence of intelligence is not explanation but prediction, characterized by the ability to forecast future states and bear the consequences of those predictions [7][10]. - The current models derive their intelligence primarily from the pre-training phase, which consumes the most energy and computation, raising questions about the stability of intelligence growth with continued computational investment [15][20]. Group 2: Computational Paradigms and Their Limitations - The article emphasizes that the real bottleneck is not the cessation of computational growth but rather the diminishing returns in the relationship between computational power and intelligence growth [22][27]. - It challenges the mainstream narrative by suggesting that pre-training, fine-tuning, and reinforcement learning are fundamentally about gradient computation and parameter updates, rather than distinct methodologies [12][11]. - The success of the Transformer architecture is attributed to its compatibility with GPU systems, which has enabled a stable feedback loop between computational growth, model scaling, and capability enhancement [16][18]. Group 3: Future Directions and Exploration - Future AI infrastructure should focus on the overall scalability of parallel computing systems rather than just single-chip performance, with an emphasis on maintaining or improving the ratio of computational to communication costs [24][25]. - Multiple exploration directions are proposed, including higher precision, advanced optimizers, and more scalable architectures or loss functions, all aimed at ensuring that increased computational investments yield proportional intelligence enhancements [25][26]. - The article concludes that as long as more efficient computational organization methods can be found, the upper limits of intelligence are far from being reached [27].
L4数据闭环最重要的第一步:选对整个组织的LossFunction
自动驾驶之心· 2025-12-31 00:31
Core Viewpoint - The article emphasizes the importance of defining appropriate primary metrics (loss functions) in autonomous driving data loops, arguing that traditional metrics like MPI (Miles Per Intervention) are inadequate for driving problem-solving and system performance improvement [5][10][87]. Group 1: Data Loop and Metrics - The organization should be viewed as a large model where the primary metric acts as the loss function, guiding the optimization process [15][87]. - The common metric MPI is criticized for focusing on how often human intervention is needed rather than the vehicle's performance in avoiding "stupid" or "dangerous" actions [22][80]. - The article introduces two new metrics: MPS (Miles Per Stupid) and MPD (Miles Per Dangerous), which are more aligned with the actual performance of the autonomous system [10][44][80]. Group 2: Limitations of MPI - MPI is defined as total mileage divided by the number of interventions, which can mislead organizations into optimizing for fewer interventions rather than improving vehicle behavior [18][22]. - The timing of interventions often does not correlate with the actual problems occurring, leading to a misalignment in performance metrics [25][26]. - The article highlights that relying on MPI can create negative incentives, encouraging teams to avoid reporting issues rather than addressing them [26][90]. Group 3: MPS and MPD Implementation - MPS focuses on the frequency of "stupid" actions taken by the vehicle, while MPD addresses "dangerous" actions, providing a clearer picture of system performance [44][80]. - The organization can utilize triggers to define and capture these behaviors, allowing for a more precise analysis of performance [47][85]. - The metrics MPS and MPD can be used to drive self-improvement within the organization, ensuring that the focus remains on enhancing vehicle behavior rather than merely reducing human intervention [87][90]. Group 4: Examples and Case Studies - The article provides examples of how MPS and MPD can be applied in real scenarios, such as analyzing sudden braking events and their causes, which can lead to actionable insights for system improvement [49][51][66]. - It discusses the importance of understanding the context behind performance metrics, emphasizing that both improvements and deteriorations in metrics should be investigated thoroughly [59][78]. - The article concludes that effective metrics should not only reflect performance but also guide the organization towards continuous improvement and problem resolution [87][90].
从大厂设计师到超级一人公司:6000字回顾我和AI的2025
歸藏的AI工具箱· 2025-12-30 10:34
Core Insights - The article reflects on significant changes and developments in the AI industry and personal career transitions over the past year, highlighting the importance of adapting to new technologies and platforms [2][3]. Group 1: Personal Career Changes - The author transitioned from a designer at a large company to a freelancer, focusing on leveraging AI to create a sustainable one-person business that benefits industry peers [4]. - The shift in focus from self-judgment based on data to long-term interests and skills has led to a more relaxed yet productive work rhythm [4]. Group 2: Social Media and Content Creation - The author does not identify as a traditional content creator, which has helped avoid data anxiety and internal conflict, although it has also led to slower adaptation to platform changes [5][6]. - Twitter and Jike have been primary platforms for engagement, with the author achieving a significant following of nearly 25,000 on Jike and 110,000 on Twitter, emphasizing the importance of interaction with international users [12][10]. - The author has started producing videos, which have performed well on platforms like Douyin and Xiaohongshu, indicating a shift towards video content as a necessary adaptation in the AI landscape [17][19]. Group 3: AI Community and Networking - The author has developed a paid community to support the AIGC Weekly, which has proven effective in fostering collaboration and sharing among members [21][30]. - A recent promotional event for the community attracted around 2,000 paid members, showcasing the potential for community-driven marketing strategies [28]. Group 4: AI Product Development - The article discusses the rise of Vibe Coding and Agent tools, highlighting their significance in the AI programming landscape and the author's contributions to tutorials and community knowledge sharing [38][34]. - The author has engaged with various AI product teams, gaining insights that enhance understanding of industry trends and product development [43]. Group 5: Future Trends in AI - The article anticipates key technological breakthroughs in AI, particularly in reinforcement learning and multi-modal capabilities, which are expected to drive significant advancements in the coming years [52][55]. - The emergence of products like Chatwise and Manus is noted for their potential to redefine user interaction with AI, indicating a shift towards more integrated and user-friendly AI solutions [58][60].
千人千面的真人级AI名师,劈开教育「不可能三角」
量子位· 2025-12-30 03:57
Jay 发自 凹非寺 量子位 | 公众号 QbitAI 注意看,这是一个教育领域的AI应用新物种—— 咱就是说,这讲课节奏,这语气,这互动,也太自然了。 更重要的是,它不仅能「像老师一样讲课」,还能针对每一位学员进行一对一的个性化教学。 这位AI导师,出自一家名为「与爱为舞」的AI原生应用企业。自年初上线以来,已累计为百万级用户提供学习陪伴与一对一讲解服务。 教育行业, 向来是 个 「规模、质量、成本」的不可能三角 。 既能做到千人千面,又能服务百万名学员,还几乎看不出是AI……更是难上加难。 它究竟是怎么做到的? 与爱为舞用来劈开这个不可能三角的,是一把 硬核的技术巨剑 。 AI教育,要的不止「答案」 而铸造这把技术巨剑,有三块核心组成部件:「模型+语音+工程」。 先看 模型 —— 得益于CoT的Scaling,大模型解决复杂问题的能力指数级增长,「做题」水平突飞猛进,甚至能斩获「奥赛金牌」。 摘得奥赛桂冠,AI只需要给出标准答案。但搞教育不行。 先来看一个简单的英语语法题: Lily expects _ her grandparents in the countryside next month. A. ...
硬科技冲高,机器人行情火热,昊志机电涨超6%,机器人ETF基金(159213)冲击五连阳,连续3日强势吸金超6300万元!人形机器人"黄金十年"启幕?
Sou Hu Cai Jing· 2025-12-30 03:42
Core Viewpoint - The human-shaped robot and embodied intelligence industry is experiencing rapid growth, with the establishment of a standardization committee aimed at addressing the lagging standards and high collaboration costs in the sector [3]. Group 1: Market Performance - The Shanghai Composite Index opened lower but showed signs of recovery, with the Robot ETF Fund (159213) rising by 0.67%, marking a potential five-day winning streak and attracting a net subscription of 20 million yuan [1]. - The Robot ETF Fund has seen strong inflows, accumulating over 63 million yuan in the last three trading days [1]. - The index's constituent stocks exhibited mixed performance, with notable gains from companies like New Times reaching the daily limit and Haoshi Electric rising over 6% [6]. Group 2: Industry Developments - The establishment of the standardization committee for human-shaped robots and embodied intelligence is a significant step towards enhancing high-quality standard supply and promoting the maturation and application of related technologies [3]. - The committee will focus on developing industry standards across various domains, including common foundational technologies, components, systems, and safety, to guide healthy industry development [3]. Group 3: Future Outlook - The industry is expected to transition from "0-1" to "1-10" by 2025, focusing on technology convergence, with a shift towards mass production and commercialization anticipated in 2026 [4]. - Key milestones for 2026 include the completion of hardware platform design for Tesla's Gen2.5 robot and the initiation of large-scale manufacturing by August [8]. - The human-shaped robot sector is projected to experience a significant upward trend, driven by policy support and industry advancements, with potential IPOs for leading domestic companies in the first half of 2026 [8][10]. Group 4: Technological and Policy Insights - The evolution of models and hardware in the robotics sector is crucial, with real data becoming a core productivity driver and the VLA architecture expected to dominate applications by 2025 [9]. - The transition from industrial robots to general-purpose robots is underway, with applications expanding beyond data collection and education to include industrial and logistics sectors [9]. - Global policies are increasingly recognizing the importance of general-purpose robots, with major economies elevating the sector to a national strategic level, providing a clear development outlook and long-term certainty for the industry [10].