强化学习
Search documents
从近1000篇工作中,看具身智能的技术发展路线!
自动驾驶之心· 2025-09-07 23:34
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 每次再聊具身智能时,总能看到很多paper一直说 "突破、创新"。但很少完整地把整个技术路线串起来,让大家清晰地 知道具身是怎么发展的?碰到了哪些问题?未来的走向是怎么样的? 机器人操作如何让机械臂精准 "模仿" 人类?多模态融合怎样让智能体 "身临其境"?强化学习如何驱动系统自主进化? 遥操作与数据采集又怎样打破空间限制?这些具身智能的关键内容需要我们认真梳理下。 今天我们将会为大家带来领域里比较丰富的几篇研究综述,带你拆解这些方向的发展逻辑。 机器人操作相关 参考论文: The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey 论文链接:https://arxiv.org/abs/2507.11840 作者单位: 浙 ...
不及预期的diffusion多模态轨迹输出,能否胜任自动驾驶VLA的角色?
自动驾驶之心· 2025-09-07 23:34
Core Viewpoint - The article discusses the evolution and current state of autonomous driving paradigms, focusing on the transition from end-to-end systems to Vision-Language-Action (VLA) frameworks and the challenges faced in achieving effective multi-modal trajectory outputs [2][3][11]. Group 1: End-to-End Systems - The end-to-end autonomous driving network directly maps raw sensor inputs to control commands, eliminating traditional processing steps and maximizing information retention [4]. - Iterative practices in engineering involve clustering bad cases and retraining models, but this often leads to new issues arising from updates [8]. - Tesla's "daily update model" offers a solution by continuously evolving the model through the integration of bad cases into training samples [9]. Group 2: Emergence of Dual Systems - The introduction of large language models (LLMs) has led to the rapid adoption of the "end-to-end + VLM" dual system approach, which enhances generalization in zero-shot and few-shot scenarios [11]. - Early VLMs focused on recognizing specific semantics, and the EMMA architecture incorporates reasoning to assist in vehicle control [12]. Group 3: VLA and Diffusion Framework - The VLA framework outputs driving commands that are processed by a diffusion decoder to generate safe and smooth vehicle trajectories [16]. - Current challenges in the VLA + diffusion architecture include subpar multi-modal trajectory outputs, the "brain split" issue between VLA and diffusion systems, and the quality of single-modal trajectories [18][19]. - The alignment of language and action (LA alignment) remains a critical challenge, as the practical value of language models in autonomous driving is still uncertain [19]. Group 4: Future Directions - Future work should focus on scalable system solutions that leverage data advantages and enhance the capabilities of foundational models through reinforcement learning [20][22]. - The "generate + score" paradigm has proven effective in other domains, and the next steps involve optimizing trajectory quality through self-reflection mechanisms [22].
清华系前腾讯Robotics X核心成员创业,业内首款能“单手玩手机”的灵巧手来了|涌现新项目
3 6 Ke· 2025-09-06 23:56
Core Insights - SourceRise Intelligent Robotics has launched its first product, Apex Hand, which emphasizes balanced performance across multiple key indicators and is the first dexterous hand capable of single-handedly operating a smartphone [1][3] - The company was founded by Yang Sicong, who has a strong academic background and experience in robotics, and has a team with extensive research and patent experience in dexterous hands and tactile perception [1][3] Company Overview - SourceRise Intelligent Robotics was founded by Yang Sicong, who graduated from Beijing University of Aeronautics and Astronautics and Tsinghua University, and has experience at Tencent Robotics X [1] - The co-founder and CTO, Li Wangwei, holds a PhD from the National University of Singapore, and the founding team has published nearly 50 top-tier papers and holds over 100 patents in the field [1] Product and Business - Apex Hand features six core capabilities: - Degrees of freedom: 21, covering the necessary range for human hand tasks - Dynamic performance: Response and acceleration close to human levels - Load capacity: Approximately 2.5KG fingertip force and a vertical lifting limit of about 30KG - Robustness: Can withstand certain impacts and operate stably in unknown conditions - Precision: Accuracy of ≤0.1mm, close to zero backlash transmission - Tactile perception: Achieved through self-developed electronic skin for physical world sensing [3][4] - Apex Hand can perform various tasks, including pulse checking, mouse operation, and stable grasping of smooth objects in confined spaces [3][4] Technical Advantages - The company's main technical advantage lies in its full-stack development experience in both dexterous hands and tactile sensors, ensuring load capacity and safety in environmental interactions [3] - SourceRise has pioneered a brain-like ultra-high temporal and spatial resolution tactile processing technology, featuring sub-millisecond communication delays and supporting simultaneous transmission from thousands of tactile points [4] Market Potential - The market for dexterous hands is projected to exceed $5 billion by 2030, indicating significant growth potential [16] - The company believes that the current technological level in the industry still has considerable room for improvement, making it a favorable time for entrepreneurship in the dexterous hand sector [16]
《Science Robotics》重磅:仅需2小时,机器人柔性装配技能直逼人类顶尖水平
机器人大讲堂· 2025-09-06 11:43
Core Insights - The article discusses the challenges in robotic manipulation and introduces a new system called HIL-SERL that significantly improves the efficiency and effectiveness of robotic training in real-world scenarios [1][2]. Traditional Methods Challenges - Traditional robotic control methods require extensive engineering design or imitation learning, which often lack adaptability and struggle in new environments [1]. - These methods fail to achieve human-level proficiency and speed, leading to inefficiencies in real-world applications [1]. HIL-SERL System Overview - The HIL-SERL system developed by a research team at UC Berkeley allows robots to learn complex tasks with only 1 to 2.5 hours of real-world training, achieving near-perfect success rates and surpassing human execution speeds [2][3]. - The system combines human guidance with autonomous exploration, creating an efficient and safe learning loop [3]. System Architecture - HIL-SERL consists of three core components: an executor process, a learner process, and a replay buffer integrated within the learner [4]. - It employs off-policy reinforcement learning techniques to optimize behavior strategies by leveraging historical data, allowing robots to learn from human demonstrations and assess the contribution of different actions towards achieving goals [4]. Performance in Multi-Task Scenarios - The system was tested on challenging tasks such as precision assembly, dual-arm coordination, and dynamic manipulation, demonstrating its versatility [5][8]. - In precision assembly tasks, robots achieved sub-millimeter accuracy, while in dual-arm coordination tasks, they effectively managed complex operations requiring synchronized movements [8]. Results and Adaptability - After 1 to 2.5 hours of training, robots achieved nearly 100% success rates and executed tasks 1.8 times faster than traditional imitation learning methods, which had an average success rate of 49.7% [9]. - The robots exhibited remarkable adaptability, successfully adjusting to unexpected situations, such as misalignments or disturbances, showcasing their ability to learn from real-time feedback [12]. Learning Mechanism - HIL-SERL's adaptability stems from its ability to evolve different control strategies based on task requirements, allowing for real-time adjustments and corrections [13][16]. - For high-precision tasks, the system employs a closed-loop response strategy, while for dynamic tasks, it utilizes an open-loop predictive strategy, demonstrating a high level of confidence in executing planned actions [13]. Conclusion - The research highlights the potential of HIL-SERL to overcome traditional reinforcement learning limitations, enabling efficient learning of complex skills in real-world environments [14]. - This advancement opens new avenues for industrial applications, particularly in flexible manufacturing sectors requiring small-batch production [14].
想要「版本」超车,Agent 需要怎样的「Environment」?
机器之心· 2025-09-06 07:00
Core Viewpoint - The article discusses the recent transformation of AI startup you.com from a search engine to an AI infrastructure company following a $100 million Series C funding round. This shift aligns with the "product-driven infrastructure" strategy and reflects a broader trend of commercializing Agentic AI from laboratory settings [1]. Group 1: Agent Environment and Its Evolution - The focus of artificial intelligence is shifting from content creation to goal-driven, autonomous Agentic AI, driven by rapid advancements in the field [4]. - AI agents are expected to become the new interface for human-computer interaction, allowing users to issue commands in natural language without needing to write code [5]. - Companies like Cursor, Bolt, and Mercor have achieved significant revenue growth by leveraging unique intelligent agent products [6]. Group 2: Development of Agent Environment - The development of a suitable "Agent Environment" is crucial for modern intelligent applications, balancing the need for freedom in code execution with security and isolation [7]. - Companies like E2B and Modal Labs are providing secure, isolated cloud environments (sandboxes) for running AI-generated code [7]. - The concept of Agent Environment can be traced back to reinforcement learning, where it serves as a simulated space for training agents through trial and error [8]. Group 3: Real-World Application and Safety - As LLM-based agents advance, the requirements for their environments are evolving from training spaces to operational zones, necessitating safe access to real-world tools [9]. - Different types of agents require distinct environments, such as physical environments for robots and digital environments for virtual assistants [10].
深度|OpenAI联创:GPT-5的突破在于智能开始触及真正的深度认知领域;理想状态应该是默认使用我们的自动选择,而非手动配置
Z Potentials· 2025-09-06 04:40
Core Insights - OpenAI has released GPT-5 and GPT-OSS, marking significant advancements in AI technology and accessibility [4][3] - GPT-5 is the first hybrid model, designed to enhance user experience by automatically selecting model architectures [5][6] - The evolution of OpenAI's reasoning capabilities has transitioned from simple next-token prediction to more complex reasoning paradigms [9][10] Group 1: OpenAI's Technological Advancements - The release of GPT-5 and GPT-OSS has seen millions of downloads within days, showcasing the demand for these technologies [4] - GPT-5's breakthrough lies in its ability to engage in deep cognitive tasks, surpassing the limitations of its predecessor, GPT-4 [24][25] - The model's training has shifted from a one-time training approach to a more iterative reasoning-training cycle, enhancing its learning efficiency [9][10] Group 2: Learning Mechanisms and Challenges - OpenAI emphasizes the importance of real-world experience for models to develop generalization capabilities, highlighting the limitations of purely theoretical training [6][15] - The company is exploring the potential of real-time online learning, aiming to allow models to adapt continuously during operation [10][11] - Current bottlenecks in AI development are primarily related to computational power, which is essential for enhancing model capabilities [11][12] Group 3: Future Directions and Applications - OpenAI is focused on creating models that can assist in complex problem-solving, with applications in various fields, including mathematics and biology [25][22] - The company aims to improve the integration of AI into real-world applications, ensuring that models can handle the complexities of diverse environments [27][30] - OpenAI's vision includes making AI technology accessible to a broader audience, with plans for aggressive pricing strategies to enhance adoption [39][40]
沉寂一个月,openPangu性能飙升8%!华为1B开源模型来了
机器之心· 2025-09-05 04:31
Core Viewpoint - Huawei's Pangu Embedded-1B model represents a significant advancement in edge AI, enabling powerful AI capabilities on resource-constrained devices, thus paving the way for intelligent upgrades in various industries [1][5]. Group 1: Model Performance and Efficiency - The openPangu Embedded-1B model, with 1 billion parameters, achieves a new state-of-the-art (SOTA) record in performance and efficiency, demonstrating that smaller models can deliver substantial capabilities [2][3]. - The model's overall average score reached 63.90, surpassing similar models and matching larger models like Qwen3-1.7B, showcasing its parameter efficiency [3][4]. - In mathematical reasoning, the model scored 82.76% on the GSM8K benchmark and 81.83% on the MATH dataset, significantly outperforming its peers [3][4]. Group 2: Technical Innovations - The model employs a soft-hardware collaborative design, optimizing its architecture to align with the characteristics of Ascend hardware, ensuring efficient resource utilization [9][10]. - A two-stage curriculum learning approach is utilized to enhance the model's reasoning capabilities, simulating a human-like learning process [15][16]. - The introduction of offline On-Policy knowledge distillation allows for a more flexible and effective training process, improving the model's accuracy and generalization [18][19]. Group 3: Reinforcement Learning and Future Directions - The model incorporates a multi-source reward reinforcement learning mechanism, enhancing its performance through targeted feedback based on task complexity [22][25]. - Future developments aim to integrate fast and slow thinking processes within a single model, allowing for adaptive responses based on problem difficulty, thus improving both speed and accuracy [29][30].
从近1000篇工作中,看具身智能的技术发展路线!
具身智能之心· 2025-09-05 00:45
Core Insights - The article discusses the evolution and challenges of embodied intelligence, emphasizing the need for a comprehensive understanding of its development, issues faced, and future directions [3][4]. Group 1: Robotic Manipulation - The survey on robotic manipulation highlights the transition from mechanical programming to embodied intelligence, focusing on the evolution from simple grippers to dexterous multi-fingered hands [5][6]. - Key challenges in dexterous manipulation include data collection methods such as simulation, human demonstration, and teleoperation, as well as skill learning frameworks like imitation learning and reinforcement learning [5][6]. Group 2: Navigation and Manipulation - The discussion on robotic navigation emphasizes the importance of physics simulators in addressing high costs and data scarcity in real-world training, with a focus on the Sim-to-Real transfer challenges [9][15]. - The evolution of navigation techniques is outlined, transitioning from explicit memory to implicit memory, and the role of various simulators in narrowing the Sim-to-Real gap is analyzed [15][16]. Group 3: Multimodal Large Models - The exploration of embodied multimodal large models (EMLMs) reveals their potential to bridge perception, cognition, and action gaps, driven by advancements in large model technologies [17][19]. - Challenges identified include cross-modal alignment difficulties, high computational resource demands, and weak domain generalization [19]. Group 4: Teleoperation and Data Collection - The survey on teleoperation of humanoid robots discusses the integration of human cognition with robotic capabilities, particularly in hazardous environments, while addressing challenges such as high degrees of freedom and communication limitations [29][30]. - Key components of teleoperation systems include human state measurement, motion retargeting, and multimodal feedback mechanisms [30][33]. Group 5: Vision-Language-Action Models - The analysis of Vision-Language-Action (VLA) models covers their evolution from cross-modal learning architectures to the integration of visual language models and action planners [33][36]. - The article identifies core challenges in real-time control, multimodal action representation, and system scalability, while proposing future directions for adaptive AI and cross-entity generalization [36][41].
GPT-5被批过度炒作、性能落后,OpenAI联创揭秘其中原因:我们把它关在 “象牙塔”,和现实世界接触不够
AI前线· 2025-09-04 06:30
Core Insights - OpenAI is shifting its focus from consumer markets to enterprise markets with the launch of GPT-5, despite initial setbacks in its release [2][5] - GPT-5 has received positive feedback from enterprise users, indicating its potential in the corporate sector [5][24] - The pricing strategy for GPT-5 is competitive, with significant reductions in costs over time, making it more accessible for businesses [34][35] Summary by Sections OpenAI's Market Shift - Sam Altman aims to capitalize on the enterprise market with GPT-5, moving beyond the consumer-focused ChatGPT [2] - Initial criticisms of GPT-5 led to a temporary rollback to GPT-4 for paid users, but the model is designed for enterprise applications [2][5] Enterprise Adoption - Companies like Cursor, Vercel, and Factory have adopted GPT-5 as their default model, citing improvements in speed, performance, and cost [2][3] - Box's CEO described GPT-5 as a breakthrough in reasoning capabilities, surpassing previous systems [3] - JetBrains has integrated GPT-5 into its AI Assistant, highlighting its efficiency in generating tools quickly [3][4] Technical Developments - OpenAI's Greg Brockman discussed the evolution of reasoning in AI models, emphasizing the importance of reinforcement learning for reliability [8][10] - The transition from offline to online learning is noted as a significant shift in AI training methodologies [10][12] Cost Efficiency - OpenAI has achieved a 1000-fold reduction in model costs over two and a half years, enhancing accessibility for users [34][35] - The company continues to focus on improving computational efficiency and model architecture to further reduce costs [35] Future Directions - The potential for GPT-5 to serve as a collaborative partner in research and development is highlighted, with implications for various fields including mathematics and biology [22][21] - OpenAI is exploring the integration of AI models into real-world applications, aiming to enhance productivity and problem-solving capabilities [24][40]
以年轻科创精神为桥梁,“西南偏南”科技艺术节向2025外滩大会发来邀请
Jing Ji Guan Cha Wang· 2025-09-04 04:50
Core Insights - The 2025 Bund Conference will take place from September 10 to 13 in Shanghai, showcasing cutting-edge technology and cross-disciplinary creativity, attracting global attention [1] - The conference has received a special video message from SXSW, highlighting the inspiration drawn from the Bund Conference and the vibrant creativity of China's younger generation [1][2] - The Bund Conference aims to connect advanced technology with everyday life, featuring various interactive exhibits and discussions on significant scientific topics [3][4] Group 1: Event Overview - The Bund Conference is recognized as a high-level global fintech and frontier technology event, with the 2025 theme being "Reshaping Innovation Growth" [4] - The event will include a main forum, over 40 open insight forums, a global theme day, 18 Creator innovation stages, nearly 10,000 square meters of technology exhibitions, a technology market, and a technology innovation competition [4] Group 2: Participation and Engagement - Nearly 20,000 young tech talents from over 10 countries, including China, the US, the UK, and Australia, have registered to participate in various activities such as the Innovators Stage and AI Innovation Competition [2] - The conference will feature prominent figures in AI, including Turing Award winner Richard Sutton and other leading young AI innovators [2] Group 3: Technological Innovations - The technology exhibition will demonstrate practical applications of advanced medical services, such as health management tools and interactive robotics [3] - The conference emphasizes that technology impacts daily life and creative expression, aligning with the views expressed by SXSW's Neil Minocha [3]