具身智能之心

Search documents
具身VLA后训练:TeleAI提出潜空间引导的VLA跨本体泛化方法
具身智能之心· 2025-09-09 00:03
编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在多模态大模型的基座上, 视觉 - 语言 - 动作(Visual-Language-Action, VLA) 模型使用大量机器人操作数据进行预训练,有望实现通用的具身操作能力。然 而, 现有 VLA 基座模型的能力仍存在很大不足,在进行目标场景应用时需要采集数十乃至数百小时目标本体数据完成后训练 (Post-Training),特别是当目标场 景本体和预训练本体存在差异时,预训练和后训练阶段的动作分布出现严重失配,从而引发了 VLA 模型跨本体适配(Cross-Embodiment Adaption)挑战。在后训 练阶段通过堆叠目标本体数据对抗这种失配的边际收益迅速递减,也难以有效拟合目标场景动作分布。 为了解决该问题, 中国电信人工智能研究院( TeleAl )具身智能团队 提出了 一种 " 对齐 - 引导 - 泛化"(Align then Steer, ATE)的 VLA 跨本体泛化框 ...
面向VLA方向的1v6科研论文辅导小班课来啦~
具身智能之心· 2025-09-09 00:03
VLA科研背景与介绍 VLA,Vision-Language-Action模型,是具身智能领域的新范式,从给定的语言指令和视觉信号,直接生成出机 器人可执行的动作。这种范式打破了以往只能在单个任务上训练大的局限性,提供了机器人模型往更加通用,场 景更加泛化的方向发展。VLA模型在学术界和工业界的重要性主要体现在其将视觉信息、语言指令和行动决策 有效整合,显著提升了机器人对复杂环境的理解和适应能力。 VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境,广 泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发展,如 pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性体现在能 够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力和实际应用 价值,成为智能机器人领域的关键驱动力。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头也积 ...
花了很久,才整理的具身学习路线......
具身智能之心· 2025-09-08 04:00
因为一直在做具身自媒体,后面也帮他分析了目前我们了解到的具身从业同学,主要有几类:自驾出身、大模型出身(硬件接触 少)、传统机器人领域(算法接触少)还有其他像机械、通信等没有完整技术闭环的在校同学。毕竟是一个新的方向,虽然行业发展 很快,但培养没跟上,好多入门的同学都是野路子。这也不怪他们,毕竟很多老师转向也没那么快。 根因就是没有系统的培养体系,导致这方面的人才出现了数量和质量的不足。前面我们在社区内给大家梳理了很多具身技术子领域的 学习路线,大家可以好好学习下,助力成为一个真正懂具身的从业者。如果您还不是我们的成员,欢迎加入我们,和近200家具身公 司与机构成员一起交流。 上周六去杨浦见了一个朋友,正在某头部具身公司担任算法负责人。没聊技术,就聊了Unitree上市的消息,和吐槽组内的同学不够专 业,需要帮着处理很多问题。 "具身智能之心知识星球"是我们一直在维护的具身社区,目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类的 具身社区,近2000人了。我们期望未来2年内做到近万人的规模。给大家打造一个交流+技术分享的聚集地,是许多初学者和进阶的同 学经常逛的地方。 社区内部经 ...
IROS 2025 | 走向物理智能,“桃源”与真实世界机器人学习挑战赛启动
具身智能之心· 2025-09-08 00:03
Core Viewpoint - The Shanghai Artificial Intelligence Laboratory is hosting a multimodal robot learning workshop during the IROS 2025 conference, aiming to bridge the gap between simulation and real-world applications in embodied intelligence [1][19]. Event Overview - The "IROS 2025" event includes a real-world robot learning challenge, inviting global innovators to participate [1]. - The challenge features two main tracks focused on operational and navigation tasks for embodied intelligence [1]. Challenge Details - Track 1 involves creating a multimodal robotic operating system capable of understanding and executing language commands in an open desktop environment [6]. - Track 2 focuses on developing a multimodal mobile robot navigation system that can interpret language instructions and navigate in real physical environments [9]. Rewards and Recognition - The winning team will receive a cash prize of 70,000 RMB, with opportunities to showcase their algorithms at the IROS Workshop and engage with top experts in the field [2][18]. - Total prize value is nearly 1 million RMB, including cash, prizes, and vouchers for robots [18]. Schedule - Key dates include the start of registration on July 25, testing server launch on July 30, submission deadline on September 30, and the offline challenge on October 18, culminating in the award ceremony on October 20 [18].
具身性在移动操作机器人直观全身遥操作中的作用与性能评估
具身智能之心· 2025-09-08 00:03
Core Insights - The article focuses on the exploration of teleoperation in mobile manipulation robots, emphasizing the need for high-quality datasets in dynamic environments, which are currently lacking [3][4] - It aims to balance three key factors: embodiment, cognitive load, and task efficiency in long-term manipulation tasks [3] Research Background - Existing datasets primarily focus on fixed-base robotic arms, limiting the applicability to stable workspaces [3] - The study addresses the complexities introduced by mobility, which increases the cognitive load on operators and necessitates effective feedback mechanisms [3] Related Work Review - Previous research has mainly optimized short-term tasks, neglecting long-term manipulation scenarios [4] - The study differentiates itself by evaluating the combined effects of control paradigms and feedback modalities on operator experience in high cognitive demand tasks [4] Teleoperation System Design - The teleoperation system utilizes the PAL Tiago++ robot and HTC Vive Pro VR equipment, testing four interface combinations [5] Controller Embodiment Schemes - Two types of controllers are analyzed: - Decoupled embodiment controller (SBC) allows independent control of base and arm movements [6] - Coupled embodiment controller (WBC) integrates full-body control with a focus on task space dynamics [6] Feedback Modalities - The study examines how operators perceive the robot's view through different feedback modalities, including immersive VR and traditional screens [7] User Research Design - The research employs a mixed design to quantify the impact of different interface combinations on operator performance and experience [9] Assessment Metrics - Metrics include usability, workload, performance, and ergonomics, covering task performance and operator experience comprehensively [15] Key Findings - Feedback modality and controller type significantly affect task completion time, with VR increasing completion time by 142 seconds [19] - Success rates remained high across conditions, indicating that VR does not compromise task quality despite longer completion times [19] - Usability scores were lower in VR, with SBC showing slightly better usability than WBC [20][22] - Workload was notably higher in VR, with SBC leading to greater physical demand and WBC causing more frustration [23] - Ergonomic assessments indicated moderate risk during long-term operations, with WBC showing greater variability in physical demand [26] VR-Specific Analysis - SBC users relied more on head camera perspectives in VR, while VR-induced dizziness was noted in real scenarios [32]
具身智能之心遥操作技术交流群来了!
具身智能之心· 2025-09-08 00:03
具身智能之心遥操作技术交流群来了!欢迎相关方向的同学加入一起交流。 添加小助理微信AIDriver005,备注昵称+机构+遥操加群,可以第一时间进群。 ...
具身智能之心开学季福利!今年的有点不太一样......
具身智能之心· 2025-09-08 00:03
所以这个基础上,大家如果感兴趣可以放开手做事情,数据、本体、算法都能大有所为。由于具身发展时间较 短,很多同学也苦于没有体系和路线,具身智能之心为了解决这类痛点,研发了几门具身领域的教程,还有一 个很不错的具身社区供大家学习。 开学季大额优惠 智能之川 学 福 H 利 男 DFT 9.1-9.14号 时间 活动说明 299元限时超级折扣卡: 全课七折 喜 具身课程七折优惠 (一年期) 具身智能之心知识 減 減66元 星球立减66 最高抵扣 具身智能论文辅导 托 1000普宣抵扣10000 10000 课程和社区亮点 又到了九月的开学季,实验室里已经可以陆续听到敲代码的声音了。还记得之前通宵调试机器人小车的校园时 光,转眼间机器人算法也从传统的pipeline方案发展到端到端。最近有个小朋友找峰哥,保研马上要开学了, 导师让我自己去看具身智能方向,不知道这个领域怎么样...... 从技术上来说,具身算法对全局的感知能力有进一步提升。一个抓取工作,早期方案首先通过姿态识别和3D 视觉完成感知,然后到规划执行,再到抓取。过程繁琐,泛化性差。而目前的VLA或者VA方案,通过学习的 方式让机器人能够丝滑的动起来。不乏有 ...
字节团队最新Robix!全能大模型,一个模型就能搞定机器人推理、任务规划和交互
具身智能之心· 2025-09-08 00:03
Core Viewpoint - The article discusses the development of Robix, a unified visual-language model by ByteDance, aimed at addressing the limitations of existing hierarchical robotic systems in understanding and executing tasks in dynamic environments [2][3][4]. Group 1: Problem Identification - Current hierarchical robotic systems face a capability fragmentation issue, relying heavily on large language models (LLMs) or visual-language models (VLMs) for task decomposition, which neglects human-robot interaction and embodied reasoning capabilities [3][4]. - Modular interaction and planning frameworks exhibit rigidity and lack robustness, making it difficult for robots to adapt to real-time environmental changes [3][4]. Group 2: Proposed Solution - Robix serves as the high-level cognitive hub in hierarchical robotic systems, integrating 3D spatial understanding and visual localization to enhance task planning and human interaction [2][5]. - The model employs a three-stage training strategy: continuous pre-training, supervised fine-tuning, and reinforcement learning, to systematically improve its capabilities [5][13]. Group 3: Key Contributions - Robix introduces a unified high-level cognitive model that integrates reasoning, long-term task planning, and natural language interaction within an end-to-end framework [5][6]. - Extensive experimental validation demonstrates Robix's performance advantages over existing commercial baselines, such as GPT-4o and Gemini 2.5 Pro, across various dimensions [5][24]. Group 4: Architecture and Mechanism - Robix operates at the high-level cognitive layer, processing multimodal reasoning, adaptive task planning, and human-robot interaction, while lower-level controllers execute the generated atomic action commands [7][8]. - The model generates outputs including atomic action commands, natural language responses, and structured reasoning trajectories to guide decision-making [11][12]. Group 5: Training Strategy - The training strategy involves a comprehensive dataset covering 200 billion tokens, focusing on enhancing embodied reasoning, visual localization, and task-centric reasoning [13][14]. - The supervised fine-tuning phase adapts the pre-trained model for high-level cognitive tasks, ensuring diverse human-robot interaction scenarios and high-quality reasoning trajectories [17][18]. Group 6: Performance Evaluation - Robix outperforms existing models in various tasks, including basic embodied reasoning, offline task planning, and online real-world scenarios, showcasing significant accuracy improvements [22][24][27]. - In online evaluations, Robix achieves an average task progress of 92.6%, surpassing Gemini-2.5-Pro and demonstrating lower response latency [29][32]. Group 7: Future Directions - Future efforts will focus on enhancing robustness in dynamic environments and improving long-term memory capabilities to support complex, extended tasks in real-world settings [36][38].
面向VLA方向的1v6科研论文辅导小班课来啦~
具身智能之心· 2025-09-07 12:28
VLA科研背景与介绍 VLA,Vision-Language-Action模型,是具身智能领域的新范式,从给定的语言指令和视觉信号,直接生成出机 器人可执行的动作。这种范式打破了以往只能在单个任务上训练大的局限性,提供了机器人模型往更加通用,场 景更加泛化的方向发展。VLA模型在学术界和工业界的重要性主要体现在其将视觉信息、语言指令和行动决策 有效整合,显著提升了机器人对复杂环境的理解和适应能力。 VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境,广 泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发展,如 pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性体现在能 够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力和实际应用 价值,成为智能机器人领域的关键驱动力。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头也积 ...
【附内推】银河通用机器人2026届校园招聘火热进行中
具身智能之心· 2025-09-07 01:39
Core Viewpoint - The article emphasizes the rapid advancement of technology, particularly in embodied AI models, which are expected to lead to the widespread adoption of humanoid robots across various industries, showcasing a promising future for general intelligence [2]. Company Overview - Galaxy General Robotics is a leading global company in the field of embodied intelligent models, established in May 2023, focusing on providing general robotic products for users worldwide, with applications in commercial, industrial, and medical sectors [4][5]. - The company has research centers in Beijing, Shenzhen, Suzhou, and Hong Kong, and has established joint laboratories with institutions such as Peking University and Xuanwu Hospital [5]. Product Development Timeline - May 2023: Company establishment [6]. - June 2024: Launch of the first self-developed embodied model robot, Galbot G1 [6]. - January 2025: Release of the world's first end-to-end embodied grasping foundational model, GraspVLA [6]. - May 2025: Announcement of the world's first humanoid robot smart retail solution, securing orders from 100 stores [6]. - June 2025: Introduction of product-level end-to-end embodied navigation model, TrackVLA, and the first end-to-end embodied model for the retail industry, GroceryVLA [7]. - August 2025: Launch of Galbot G1 Premium, the first robot equipped with NVIDIA's latest Jetson Thor chip, and victory in the sorting skills competition at the World Humanoid Robot Games [7]. Recruitment and Opportunities - The company is seeking graduates from domestic and international universities for various positions, emphasizing a passion for learning, exploration, and practical application [9]. - Available positions include roles in embodied multi-modal models, humanoid reinforcement learning control, robot planning and control, and embodied software system development [10].