Workflow
具身智能之心
icon
Search documents
CoRL 2025最新工作!ControlVLA:机器人看10遍就会,“通智大脑”能力再升级!
具身智能之心· 2025-09-25 09:54
Core Insights - The article discusses the development of ControlVLA, a novel framework that allows robots to learn complex tasks with minimal human demonstrations, achieving a success rate exceeding 75%, which is nearly four times higher than traditional methods [1][10][15]. Group 1: Research Background - Robots face significant challenges in performing tasks in real-world scenarios, especially with limited demonstrations. Existing few-shot learning methods often rely on simulation-enhanced data or pre-built modules, which struggle with the gap between simulation and reality [7][8]. - Recent advancements in Vision-Language-Action (VLA) models show promise in enhancing robot performance across multiple tasks and environments, but adapting these models efficiently to specific tasks in data-scarce situations remains a challenge [8][9]. Group 2: ControlVLA Framework - ControlVLA integrates pre-trained VLA models with object-centric representations to facilitate efficient few-shot fine-tuning for robot operation tasks. The framework employs a ControlNet-style architecture to maintain the rich prior knowledge of VLA models while focusing on task-critical objects [9][10]. - The workflow of ControlVLA consists of three main steps: 1. Pre-training a large-scale VLA model on diverse operation datasets to learn conditional distributions from visual and language instructions to action spaces [12]. 2. Extracting object-centric representations from demonstration videos to capture geometric and positional features of relevant objects [12]. 3. Fine-tuning the model using a dual attention mechanism that incorporates object information while preserving the pre-trained strategy [12]. Group 3: Experimental Results - The research team tested ControlVLA on the Astribot S1 robot, demonstrating its ability to efficiently complete both short-term and complex long-term tasks with only 10-20 demonstration data points [14][15]. - In experiments involving eight real-world tasks, ControlVLA achieved an overall success rate of 76.7%, significantly surpassing the traditional method's success rate of 20.8% [15][19]. - For long-sequence tasks, ControlVLA maintained an average success rate of 60%, approximately three times better than existing best methods, showcasing its capability to reduce error accumulation during task execution [19][24]. Group 4: Generalization and Cost Efficiency - ControlVLA demonstrated robust generalization capabilities, maintaining a success rate of 60%-70% when tested with unseen objects and new backgrounds, indicating its adaptability in dynamic environments [24][26]. - The framework allows for substantial reductions in the cost of collecting real operation demonstrations, as evidenced by achieving an 80% success rate in the OrganizeToy task with only 20 demonstration data points, while other methods required 100 data points to reach similar performance [21][26].
从300多篇工作中,看VLA在不同场景下的应用和实现......
具身智能之心· 2025-09-25 04:00
Core Insights - The article discusses the emergence of Vision Language Action (VLA) models, marking a shift in robotics from traditional strategy-based control to a more generalized robotic technology paradigm, enabling active decision-making in complex environments [2][5][20] - It emphasizes the integration of large language models (LLMs) and vision-language models (VLMs) to enhance robotic operations, providing greater flexibility and precision in task execution [6][12] - The survey outlines a clear classification system for VLA methods, categorizing them into autoregressive, diffusion, reinforcement learning, hybrid, and specialized methods, while also addressing the unique contributions and challenges within each category [7][10][22] Group 1: VLA Model Overview - VLA models represent a significant advancement in robotics, allowing for the unification of perception, language understanding, and executable control within a single modeling framework [15][20] - The article categorizes VLA methods into five paradigms: autoregressive, diffusion, reinforcement learning, hybrid, and specialized, detailing their design motivations and core strategies [10][22][23] - The integration of LLMs into VLA systems transforms them from passive input parsers to semantic intermediaries, enhancing their ability to handle long and complex tasks [29][30] Group 2: Applications and Challenges - VLA models have practical applications across various robotic forms, including robotic arms, quadrupeds, humanoid robots, and autonomous vehicles, showcasing their deployment in diverse scenarios [8][20] - The article identifies key challenges in the VLA field, such as data limitations, reasoning speed, and safety concerns, which need to be addressed to accelerate the development of VLA models and general robotic technology [8][19][20] - The reliance on high-quality datasets and simulation platforms is crucial for the effective training and evaluation of VLA models, addressing issues of data scarcity and real-world testing risks [16][19] Group 3: Future Directions - The survey outlines future research directions for VLA, including addressing data limitations, enhancing reasoning speed, and improving safety measures to facilitate the advancement of general embodied intelligence [8][20][21] - It highlights the importance of developing scalable and efficient VLA models that can adapt to various tasks and environments, emphasizing the need for ongoing innovation in this rapidly evolving field [20][39] - The article concludes by underscoring the potential of VLA models to bridge the gap between perception, understanding, and action, positioning them as a key frontier in embodied artificial intelligence [20][21][39]
基于移动设备采集的3DGS实现个性化Real-to-Sim-to-Real导航
具身智能之心· 2025-09-25 00:04
Group 1 - The core issue of embodied AI is the sim-to-real dilemma, where high fidelity in simulation conflicts with cost, leading to challenges in transferring successful strategies from simulation to real-world applications [2] - The potential of 3D Gaussian Splatting (GS) technology has been underexplored, with recent advancements enabling high-fidelity 3D representations from standard devices, addressing the gap between low-cost real scene reconstruction and embodied navigation [3][4] - The proposed method, EmbodiedSplat, consists of a four-stage pipeline that captures real scenes using low-cost mobile devices and reconstructs them in high fidelity for effective training and deployment [4][6] Group 2 - The first stage involves capturing RGB-D data using an iPhone 13 Pro Max and the Polycam app, which simplifies the process and reduces operational barriers [11] - The second stage focuses on mesh reconstruction, utilizing DN-Splatter for 3D GS training, ensuring geometric consistency and minimizing reconstruction errors [11][12] - The third stage includes simulation training with a composite reward function to balance success, path efficiency, and collision avoidance, employing a two-layer LSTM for decision-making [10][13] Group 3 - The fourth stage is real-world deployment, where the Stretch robot connects to a remote server for strategy inference, allowing real-time navigation based on observed images [14][17] - Experiments validate the zero-shot performance of pre-trained strategies in new environments, revealing that scene scale significantly impacts performance [20][22] - Fine-tuning pre-trained strategies leads to substantial performance improvements across various environments, demonstrating the effectiveness of personalized adjustments [25][28] Group 4 - The study highlights the limited success of zero-shot transfer from simulation to real-world scenarios, with significant performance gaps observed [32] - Fine-tuning enhances the transferability of strategies, with success rates increasing significantly after adjustments [32][35] - The necessity of large-scale pre-training is emphasized, as it provides foundational knowledge that aids in adapting to new environments and overcoming real-world challenges [35][44]
ARTS 2025大咖云集|第三届自主机器人研讨会嘉宾公布,相约浙大,开放注册!
具身智能之心· 2025-09-25 00:04
ARTS 2025 为了促进自主机器人领域一线青年学者/工程师的交流,推动学术界与企业界的交融与产学研合作,搭建一个深度、纯粹的技术交流平台,中国自动化学会决定主办自主 机器人技术研讨会(Autonomous Robotic Technology Seminar,简称ARTS)。 ARTS倡导理性批判、敢于质疑、务实的科学精神,积极探索自由平等的思想交锋。ARTS主要关注传感与感知、自主导航、状态估计、移动机器人定位建图、运动规 划、建模与控制、多机器人系统、具身智能、医疗仿生等方向。 第三届ARTS大会将于 2025年10月18日至19日 ,在浙江大学盛大启幕。诚挚邀请您参加,并对大会的组织提供意见和建议! 本届会议特邀超40位青年领军研究者,并设有学术辩论会、学术脱口秀、ARTS奖学金及企业参观等多元日程。 温馨提示:会议名额有限,请尽早报名以免满额后无法注册。 了解线下会议详情 请扫码加入 【 ARTS 2025 交流群】 一、组织机构 主办单位 : 中国自动化学会 承办单位 : 浙江大学控制科学与工程学院 上海交通大学自动化与感知学院 协办单位 : 深蓝学院 二、会议日程 | | 9:20-9:50 ...
具身的「Imagenet 时刻」,李飞飞团队官宣全球顶级具身智能挑战赛
具身智能之心· 2025-09-25 00:04
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在计算机视觉的历史上,Imagenet 挑战赛曾被誉为 AI 发展的分水岭,引爆了深度学习的浪潮。那么,在具身智能与机器人领域,是否也会迎来类似的 "拐点时 刻"? 答案或许渐渐清晰。李飞飞团队与斯坦福 AI 实验室正式官宣:首届 BEHAVIOR 挑战赛将登陆 NeurIPS 2025。这是一个为具身智能量身定制的 "超级 benchmark",涵盖真实家庭场景下最关键的 1000 个日常任务(烹饪、清洁、整理……),并首次以 50 个完整长时段任务作为核心赛题,考验机器人能否在逼真 的虚拟环境中完成真正贴近人类生活的操作。 为什么 BEHAVIOR 值得关注? 与以往碎片化的基准不同,BEHAVIOR 首次提出:一个真正的家庭机器人,必须同时具备跨房间导航、双手精细操控、长期规划与动态适应等多项能力。 任务规模前所未有:覆盖 1000 个家庭活动,50 个完整长程挑战,平均单个任务需 ...
最近在具身领域做的一些事情,社区、硬件和求职......
具身智能之心· 2025-09-25 00:04
Core Viewpoint - The article emphasizes the ongoing efforts in the field of embodied intelligence, focusing on community building, hardware development, and job opportunities in the sector [3][4][12]. Group 1: Community Development - The community aims to create a comprehensive platform for technical exchange related to embodied intelligence, facilitating academic and engineering discussions [14]. - The community has established a closed loop in various fields, including industry, academia, job seeking, and Q&A exchanges, providing solutions to encountered problems [6]. - The community is actively working on improving its structure and content to better serve its members, with plans to present enhanced offerings after the holiday [3]. Group 2: Hardware and Product Development - There are ongoing efforts to address complaints regarding the high cost and usability of hardware, with plans to introduce better solutions soon [3]. - The community is testing and developing embodied products, aiming to provide users with effective platforms for their needs [3]. Group 3: Job Opportunities and Academic Guidance - The community has received numerous inquiries from universities regarding recruitment in the field of embodied intelligence, particularly for research assistants, PhD candidates, and postdoctoral positions [3]. - Members are encouraged to prepare for upcoming job opportunities and academic advancements, with the community offering resume submission for internal referrals [3][20]. Group 4: Educational Resources - The community has compiled over 30 technical routes for newcomers, significantly reducing the time needed for research and learning [6][8]. - Various resources, including open-source projects, datasets, and learning paths, are available to assist members in their studies and career development [14][31][37]. Group 5: Industry Insights and Networking - The community has established connections with numerous leading companies in the field, facilitating job referrals and industry insights [20][22]. - Members can engage with industry leaders through forums and live discussions, gaining valuable knowledge about the latest developments in embodied intelligence [6][20].
具身智能之心国庆&中秋福利来了!课程/社区/硬件/论文辅导等
具身智能之心· 2025-09-24 06:32
Group 1 - The article promotes a series of discounts and offers related to embodied intelligence courses and services, available from September 24 to October 12 [1][4][6] - New members joining the knowledge community can enjoy a 30% discount, while existing members can renew at a 50% discount [1][4] - Various courses, including VLA, VLN, and reinforcement learning, are offered at a 20% discount [2][4] Group 2 - The Super Discount Card allows for a 30% discount on all courses within one year [4][7] - One-on-one paper tutoring can provide a maximum discount of 5,000 yuan for a 1,000 yuan fee, while group tutoring (1v6) offers a 1,000 yuan reduction [4][7] - The article highlights several research hardware options, including reinforcement learning platforms and robotic arms [4][7]
准备搞一个具身的吃瓜群!
具身智能之心· 2025-09-24 06:32
添加微信的时候记得备注:昵称+机构/公司+入群 思考了下,很有必要搞一个有趣的群。于是乎立刻创办了一个(因为精力只够维护1个的群,所以只有500 人的规模,满了就关闭新人加入),这个群后面不会转发任何具身智能之心的文章和直播类内容,仅做行 业交流、产品讨论、学术讨论, 当然也欢迎唠唠工作、求职和创业 。 最近峰哥收到具身智能之心的粉丝反馈,社区内有没有一个没那么正式的社群(就是不要每天发文章和学 术),可以每天聊一些行业、吃瓜、求职等topic。 如果大家比较感兴趣,可以加我微信oooops-life邀请入群。我们希望您是正在具身工业界就职的同学或正在 从事相关科研活动的大佬。 确实,我们的群都太过学术化了,可能和我们教育科技的IP有关。 ...
今日Talk来啦!具身智能新基建:从大模型到真实世界
具身智能之心· 2025-09-24 02:30
Core Viewpoint - The article discusses an upcoming event hosted by the Beijing Academy of Artificial Intelligence, focusing on embodied intelligence and its new infrastructure, highlighting the importance of this field in the AI industry [1]. Event Details - The event titled "AI 智原Talk" will take place on September 24, 2025, from 14:00 to 17:30 at the Beijing Zhiyuan Artificial Intelligence Research Institute [2]. - The event is organized by the Beijing Zhiyuan Artificial Intelligence Research Institute and supported by various organizations including Baidu PaddlePaddle and the China Internet Association's AI Committee [2]. Agenda Overview - The agenda includes a series of presentations: - Introduction by the Vice President of the Beijing Zhiyuan Artificial Intelligence Research Institute [3]. - Presentation on the innovative foundation of embodied intelligence by the head of embodied data [3]. - A session on the operational framework and construction of the embodied brain by the head of the large model [5]. - An upgrade of the Zhiyuan evaluation system, including the release of the autumn 2025 ranking [5]. - A discussion on the technical practices and value verification of FlagScale in embodied intelligence scenarios [5]. Participation Information - Attendees can register for the event via a QR code and join the WeChat group for further discussions on embodied intelligence [6].
【CEAIS 2025】全日程公布,参会早鸟报名火热进行中!
具身智能之心· 2025-09-24 00:04
Core Viewpoint - The article highlights the establishment and development of the Artificial Intelligence Institute at Xi'an Jiaotong University, emphasizing its role in advancing research in embodied intelligence and robotics, and announcing the upcoming second China Embodied Intelligence and Systems Conference (CEAIS 2025) to be held in Xi'an [2][4]. Group 1: Conference Overview - The second CEAIS 2025 will take place on November 1, 2025, at the Xi'an Jianguo Hotel, focusing on cutting-edge research in embodied intelligence [4]. - The conference will feature over ten academicians and nearly a hundred senior experts, with four keynote speeches and fifteen technical sub-forums covering various topics such as foundational models, intelligent robotics, and emotional embodied intelligence [4][8]. Group 2: Conference Schedule - The conference will commence with registration on October 31, 2025, followed by a dinner and a meeting for the Embodied Intelligence Committee [7]. - The main conference day on November 1 will include an opening ceremony, keynote reports, and multiple technical forums addressing topics like embodied intelligence computing architecture, intelligent driving, and robot sensors [8][10]. Group 3: Technical Forums and Topics - Key topics in the technical forums will include the exploration of large model emergence capabilities, advancements in embodied intelligence computing chips, and the development of humanoid and bionic robots [9][12]. - Specific reports will cover areas such as multi-modal flexible tactile perception technology, autonomous learning in robotic systems, and the integration of AI in traditional medicine [10][13]. Group 4: Registration and Participation - Early bird registration fees are set at 1800 RMB for non-members and 1200 RMB for members, with discounts for students [41]. - Participants are encouraged to register online and will receive a conference manual and invoice upon registration [42][44].