Workflow
强化学习
icon
Search documents
当还在纠结研究方向的时候!别的同学已经CCF-A了......
具身智能之心· 2025-11-04 00:05
Group 1 - The article introduces a new research guidance service focused on embodied intelligence, addressing common challenges faced by newcomers in selecting research topics and methodologies [1][2] - The guidance covers various advanced topics such as multimodal large models, reinforcement learning, and robot simulation, providing tailored one-on-one support [2][3] - The service is backed by a team of experienced mentors from prestigious institutions and leading companies, ensuring high-quality assistance throughout the research process [2][3] Group 2 - The program emphasizes a dual perspective from both industry and academia, aiming not only for publication but also for practical application and value [3] - An introductory offer is available for the first ten inquiries, allowing students to receive personalized mentorship and tailored advice on suitable conferences and journals [4]
机器人“干中学”,人类不用再给工厂中的机器人当保姆
Di Yi Cai Jing· 2025-11-03 12:49
机器人在产线中拿起iPad,放进功能测试站。测试平台随即合拢,几秒钟后,检测完成。机器人再次伸 臂,将iPad稳稳取出,转身送往下一道工序。 智元机器人合伙人兼首席科学家罗剑岚告诉第一财经记者,让强化学习直接嵌入真实产线,有利于优化 机器人强化学习的训练目标,减少现场的人力和物力投入。"在产线中部署强化学习,产线作业的通过 率、作业节拍、良率直接成为机器人的目标,机器人可以通过产线的原生信号完成训练,部署可以缩减 至分钟级。" 不过,罗剑岚也提到,在真机强化学习的部署过程中也可能存在物料损耗或安全的风险。"这需要通过 预训练和机器人的底层控制,将机器人在现场学习中的风险控制在可控范围内。" 解决了机器人部署效率的问题,如何批量复制是智元需要面临的下一个难题。罗剑岚透露,团队正通过 本地私有云与OTA(Over-the-Air,即空中升级)机制,让不同工序的真机强化学习经验得以共享,实 现模型的批量更新和复现。尽管用真机强化学习在产线中部署提升了部署效率,罗剑岚也表示,真机强 化学习不仅依赖算法本身,更依赖于与工厂系统的深度对接。从通信技术到数据接口,都需要在实际环 境中逐步打通。"只有当这些底层环节顺畅运转, ...
最火VLA,看这一篇综述就够了
具身智能之心· 2025-11-03 00:03
Core Insights - The article discusses the rapid growth and significance of the Vision-Language-Action (VLA) field, highlighting its potential to enable robots to understand human language, perceive the world, and perform tasks effectively [2][7]. Summary by Sections VLA Overview - VLA models have seen a dramatic increase in submissions, rising from single digits to 164 papers, an 18-fold increase [6]. - A model qualifies as VLA if it uses a pre-trained backbone on large-scale visual-language data, emphasizing its capabilities in language understanding, visual generalization, and task transfer [8][9]. Trends in VLA - **Trend 1: Efficient Architecture** Discrete diffusion models are emerging as a new paradigm, allowing for parallel generation of action sequences, enhancing efficiency [15][17]. - **Trend 2: Embodied Chain-of-Thought (ECoT)** ECoT enables robots to generate intermediate reasoning steps before actions, improving planning and interpretability [18][19]. - **Trend 3: Action Tokenizer** This trend focuses on converting continuous robot actions into discrete tokens that VLMs can understand, enhancing efficiency and integration of reasoning and action [22]. - **Trend 4: Reinforcement Learning (RL)** RL is re-emerging as a crucial tool for fine-tuning VLA strategies, particularly in extreme scenarios [26][27]. - **Trend 5: Efficiency Optimization** Efforts are being made to reduce the cost and complexity of VLA models, making them more accessible to smaller labs [28][29]. - **Trend 6: Video Prediction** Video generation models are being utilized to provide VLA with an understanding of temporal dynamics and physical laws [30]. - **Trend 7: Realistic Evaluation Benchmarks** New evaluation methods are being developed to address the saturation of existing benchmarks, focusing on future frame prediction tasks [37][39]. - **Trend 8: Cross-Body Learning** Innovations in architecture are essential for creating universal robot strategies that can operate across different structures [41][43]. Challenges and Future Directions - The article highlights the "performance ceiling" issue in mainstream simulation evaluations, where high scores do not necessarily translate to real-world capabilities [44]. - Two critical areas needing more attention are data quality and the potential for in-context learning to enhance VLA systems [49][50].
4倍速吊打Cursor新模型,英伟达数千GB200堆出的SWE-1.5,圆了Devin的梦,实测被曝性能“滑铁卢”?
3 6 Ke· 2025-10-31 12:16
Core Insights - Cognition has launched its new high-speed AI coding model SWE-1.5, designed for high performance and speed in software engineering tasks, now available in the Windsurf code editor following its acquisition of Windsurf in July [1][2] - SWE-1.5 operates at speeds up to 950 tokens per second, making it 13 times faster than Anthropic's Sonnet 4.5 model, and significantly improving task completion times from 20 seconds to 5 seconds [2][4] Model Performance - SWE-1.5 is a cutting-edge model with hundreds of billions of parameters, designed to provide top-tier performance without compromising speed [2] - The model achieved a score of 40.08% in the SWE-Bench Pro benchmark, ranking just below Claude's Sonnet 4.5, which scored 43.60% [4] Technical Infrastructure - The model is trained on an advanced cluster of thousands of NVIDIA GB200 NVL72 chips, which can enhance performance by up to 30 times compared to NVIDIA H100 GPUs while reducing costs and energy consumption by up to 25% [8] - SWE-1.5 utilizes a custom Cascade intelligent framework for end-to-end reinforcement learning, emphasizing the importance of high-quality coding environments for downstream model performance [9] Development Strategy - The development of SWE-1.5 is part of a broader strategy to integrate it into the Windsurf IDE, aiming to create a unified system that combines speed and intelligence [10] - Cognition plans to continuously iterate on model training, framework optimization, and tool development to enhance speed and accuracy [11] Market Positioning - The launch of SWE-1.5 coincides with the release of Cursor's Composer model, indicating a strategic convergence in the AI developer tools market, with both companies focusing on proprietary models and low-latency developer experiences [13] - SWE-1.5's processing speed of 950 tokens per second is nearly four times faster than Composer's 250 tokens per second, highlighting its competitive edge [14]
4倍速吊打Cursor新模型!英伟达数千GB200堆出的SWE-1.5,圆了Devin的梦!实测被曝性能“滑铁卢”?
AI前线· 2025-10-31 05:42
Core Insights - Cognition has launched its new high-speed AI coding model SWE-1.5, designed for high performance and speed in software engineering tasks, now available in the Windsurf code editor [2][3] - SWE-1.5 operates at a speed of up to 950 tokens per second, making it 13 times faster than Anthropic's Sonnet 4.5 model, and significantly improving task completion times [3][4][6] Performance and Features - SWE-1.5 is built on a model with hundreds of billions of parameters, aiming to provide top-tier performance without compromising speed [3][4] - The model's speed advantage is attributed to a collaboration with Cerebras, which optimized the model for better latency and performance [3][6] - In the SWE-Bench Pro benchmark, SWE-1.5 achieved a score of 40.08%, just behind Sonnet 4.5's 43.60%, indicating near-state-of-the-art coding performance [6] Development and Infrastructure - SWE-1.5 is trained on an advanced cluster of thousands of NVIDIA GB200 NVL72 chips, which offer up to 30 times better performance and 25% lower costs compared to previous models [10] - The training process utilizes a custom Cascade AI framework and incorporates extensive reinforcement learning techniques to enhance model capabilities [10][11] Strategic Vision - The development of SWE-1.5 is part of a broader strategy to integrate AI coding capabilities directly into the Windsurf IDE, enhancing user experience and performance [13][15] - Cognition emphasizes the importance of a collaborative system that includes the model, inference process, and agent framework to achieve high speed and intelligence [13][14] Market Position and Competition - The launch of SWE-1.5 coincides with Cursor's release of its own high-speed model, Composer, indicating a strategic convergence in the AI developer tools market [17] - Both companies are leveraging reinforcement learning in their models, highlighting a shared approach to creating efficient coding agents [17] User Feedback and Performance - Early user feedback on SWE-1.5 indicates a perception of high speed, although some users reported issues with task completion compared to other models like GPT-5 [18][19]
L4大方向有了:理想自动驾驶团队,在全球AI顶会上揭幕新范式
机器之心· 2025-10-31 04:11
Core Viewpoint - The article discusses the transition of AI into its "second half," emphasizing the need for new evaluation and configuration methods for AI to surpass human intelligence, particularly in the context of autonomous driving technology [1][5]. Group 1: AI Paradigm Shift - AI is moving from reliance on human-generated data to experience-based learning, as highlighted by Rich Sutton's paper "The Era of Experience" [1]. - OpenAI's former researcher, Yao Shunyu, asserts that AI must develop new evaluation methods to tackle real-world tasks effectively [1]. Group 2: Advancements in Autonomous Driving - At the ICCV 2025 conference, Li Auto's expert, Zhan Kun, presented a talk on evolving from data closed-loop to training closed-loop in autonomous driving [2][4]. - Li Auto introduced a systematic approach to integrate world models with reinforcement learning into mass-produced autonomous driving systems, marking a significant technological milestone [5]. Group 3: Li Auto's Technological Innovations - Li Auto's advanced driver assistance technology, LiAuto AD Max, is based on the Vision Language Action (VLA) model, showcasing a shift from rule-based algorithms to end-to-end solutions [7]. - The company has achieved significant improvements in its driver assistance capabilities, with a notable increase in the Human Takeover Mileage (MPI) over the past year [9]. Group 4: Challenges and Solutions in Data Utilization - Li Auto identified that the basic end-to-end learning approach faced diminishing returns as the training data expanded to 10 million clips, particularly due to sparse data in critical driving scenarios [11]. - The company aims to transition from a single data closed-loop to a more comprehensive training closed-loop, which includes data collection and iterative training through environmental feedback [12][14]. Group 5: World Model and Synthetic Data - Li Auto is developing a VLA vehicle model with prior knowledge and driving capabilities, supported by a cloud-based world model training environment that incorporates real, synthetic, and exploratory data [14]. - The ability to generate synthetic data has improved the training data distribution, enhancing the stability and generalization of Li Auto's driver assistance system [24]. Group 6: Research Contributions and Future Directions - Since 2021, Li Auto's research team has produced numerous papers, expanding their focus from perception tasks to advanced topics like VLM/VLA and world models [28]. - The company is addressing challenges in interactive intelligent agents and reinforcement learning engines, which are critical for the future of autonomous driving [35][38]. Group 7: Commitment to AI Development - Li Auto has committed nearly half of its R&D budget to AI, establishing multiple teams focused on various AI applications, including driver assistance and smart industrial solutions [43]. - The company has made significant strides in AI technology, with rapid iterations of its strategic AI products, including the VLA driver model launched with the Li Auto i8 [43].
港科提出新算法革新大模型推理范式:随机策略估值竟成LLM数学推理「神操作」
机器之心· 2025-10-31 04:11
Core Insights - The article discusses the introduction of ROVER (Random Policy Valuation for Diverse Reasoning), a novel approach that simplifies the reasoning process in large language models (LLMs) by evaluating a completely random policy to find optimal reasoning paths, thus bypassing traditional reinforcement learning (RL) iterations [3][4][11]. Group 1: ROVER's Methodology and Advantages - ROVER significantly outperforms existing methods on various mathematical reasoning benchmarks, achieving higher quality and diversity in reasoning generation through a minimalist approach [4][9]. - The algorithm eliminates the need for maintaining a value network or a reference model, making it more lightweight compared to traditional RL methods [9][16]. - ROVER's process consists of three simple steps: estimating Q-values, constructing policies using softmax sampling to maintain diversity, and implementing a training objective that reduces computational load and enhances stability [19][21][24]. Group 2: Performance Metrics - In high-difficulty tasks such as AIME24, AIME25, and HMMT25, ROVER improved pass@1 by +8.2 and pass@256 by +16.8, showcasing its superior performance [9][26]. - ROVER achieved a pass@1 score of 30.6 on AIME24, surpassing the best baseline (DAPO) by 19.1 points, and a pass@1 score of 14.6 on HMMT25, representing a 106% increase from the highest baseline [26][27]. - The diversity of strategies generated by ROVER is enhanced by 17.6% compared to baselines, allowing it to cover more problem-solving paths [29][31]. Group 3: Implications and Future Directions - The introduction of ROVER reflects a methodological shift, emphasizing that simplification rather than complexity can drive performance improvements in structured tasks [38].
具身智能之心交流群成立来!VLA/RL/导航/数采等多个方向
具身智能之心· 2025-10-30 10:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence technology, inviting participation from various subfields [1] - The group encompasses nearly 20 sub-directions, including humanoid robots, quadrupeds, robotic arms, and areas such as vla, large models, vln, reinforcement learning, mobile operation, multimodal perception, simulation, and data collection [1] - The invitation encourages collaboration and discussion on technology and industry developments among participants [1]
AI破晓前,最早动身的人
投资界· 2025-10-30 08:36
Core Viewpoint - The article discusses the evolving landscape of AI investment in China, highlighting the shift from merely "catching up" to establishing a unique innovation path driven by domestic capabilities and market conditions [6][11]. Group 1: Investment Trends - BlueRun Ventures has been actively investing in various AI sectors, including foundational models, embodied intelligence, and AI hardware, creating a systematic investment map [5][14]. - The firm emphasizes the importance of open-source models and their cost-effectiveness, which fosters rapid iteration and application development [9][10]. - The investment strategy is centered around five key trends, including the rise of open-source large language models, reinforcement learning, and the development of autonomous systems [9][10]. Group 2: Market Dynamics - China's economic structure is undergoing a transformation, with technology-driven growth becoming the new mainline, supported by increasing domestic demand and consumption [7][8]. - The competition between Chinese AI entrepreneurs and their U.S. counterparts is characterized by a dual-track approach, leveraging open-source ecosystems and diverse application scenarios [7][8]. - The emergence of successful Chinese AI products, such as DeepSeek, signifies a shift towards independent innovation and global competitiveness [8][11]. Group 3: Talent and Ecosystem - The density of talent, particularly in AI and related fields, is crucial for the success of new ventures, with a notable influx of young, highly educated entrepreneurs returning to China [13][16]. - BlueRun Ventures has established a supportive ecosystem for entrepreneurs, including initiatives like Boomi ng Camp and Boomi ng Hub, to foster collaboration and innovation [18][19]. - The firm believes that the future of AI investment lies in early-stage opportunities, emphasizing the importance of independent thinking amidst market noise [19][20].
老黄亲自站台,英伟达编程神器,Cursor 2.0自研模型狂飙4倍
3 6 Ke· 2025-10-30 07:33
Core Insights - Cursor has launched its self-developed coding model, Composer, which is reported to be four times faster than comparable models, designed for low-latency intelligent coding tasks that can be completed in under 30 seconds [1][6][9]. Group 1: Product Features - Composer achieves a speed of 200 Tokens per second and allows for the parallel operation of up to eight intelligent agents, utilizing git worktrees or remote machines to prevent file conflicts [2][6]. - The update introduces a new code review feature that simplifies the process of viewing changes across multiple files without switching back and forth [3]. - A voice mode has been added, enabling voice-activated programming, along with improvements in context-aware copy/paste prompts [5][6]. Group 2: Market Position and Strategy - Cursor, valued at over $10 billion, has historically relied on external models like Claude, which limited its innovation and profitability. The release of Composer marks a strategic shift towards self-reliance in AI model development [6][22]. - The recent updates indicate a move away from dependence on external models, with Composer being tested alongside open-source alternatives rather than proprietary models like GPT and Claude [22][30]. Group 3: User Experience and Feedback - Early testers have reported that Cursor 2.0 is significantly faster, with results generated in mere seconds, enhancing the overall user experience [16][26]. - Some developers have expressed that while Composer is fast, its intelligence may not match that of competitors like Sonnet 4.5 and GPT-5, indicating a competitive landscape in AI programming tools [30][34].