Workflow
具身智能之心
icon
Search documents
跨越仿真与真实数据鸿沟:Real2Sim2Real重要工作一览!
具身智能之心· 2025-09-24 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 所有内容出自国内首个具身智能全栈学习社区:具身智能之心知识星球。国庆优惠,欢迎和近2000名成员 一起交流具身产业与学术。 Real2Sim2Real近3年工作一览 论文题目: Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators 论文链接:https://arxiv.org/pdf/2409.13228? 论文时间: ICRA 2025 作者单位: 马克斯·普朗克智能系统研究所 论文题目: RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning 本文只做学术分享,如有侵权,联系删文 由于真实数据采集成本高,国内外具身领域有不少团队在研究real2sim、Real2Sim2Real 相关工作。和一些 具身公司坚定走真机采集路线不同,他们相信 ...
西湖大学发布世界模型WorldForge,让普通视频模型秒变「世界引擎」
具身智能之心· 2025-09-24 00:04
Core Viewpoint - The article discusses the advancements in AI video generation, particularly focusing on the World Forge framework developed by the West Lake University AGI Lab, which allows for precise control over video generation without sacrificing quality or requiring retraining of models [2][3][32]. Summary by Sections Introduction to AI Video Generation - Since the introduction of Sora, the realism of AI-generated videos has significantly improved, but controllability remains a challenge [2]. - Current methods either require expensive fine-tuning or lead to quality degradation due to noise and artifacts in guiding signals [2]. World Forge Framework - World Forge is a new framework that enables precise control during the video generation process without modifying model weights, effectively adding a "director's brain" to video diffusion models [3][32]. - The framework allows for the generation of 360° videos from a single image and the ability to reframe videos with complex camera movements [6][21]. Method Overview - The framework operates on a training-free guidance principle, injecting "spatiotemporal geometry" during inference [12]. - It employs a series of innovative guiding modules to ensure that the model adheres to spatial and temporal consistency while maintaining creative freedom [13]. Key Innovations 1. **Intra-step Recursive Refinement (IRR)**: This mechanism ensures that AI-generated movements strictly follow predefined camera trajectories by incrementally correcting predictions with real content [15]. 2. **Flow-Gated Latent Fusion (FLF)**: This module separates motion and appearance channels in the latent space, allowing precise control signals to be sent only to motion channels, preserving detail in appearance channels [16]. 3. **Dual-Path Self-Correction Guidance (DSG)**: This strategy balances trajectory accuracy and image quality by dynamically adjusting the guiding signals based on the differences between guided and non-guided paths [17]. Performance Highlights - World Forge excels in generating 360° panoramic views from a single image, overcoming limitations of traditional panorama methods [21]. - It allows for cinematic-level video reframing, enabling users to specify complex camera movements while maintaining stability and reducing artifacts [23]. - The framework supports video editing capabilities, such as stabilizing footage, removing unwanted objects, and seamlessly integrating new elements [29]. Advantages of World Forge - The training-free nature of World Forge significantly lowers the barrier to creating high-quality 3D/4D visual content, making it accessible for various applications in film, gaming, and digital twin technologies [32][34]. - Its flexibility allows it to be integrated into various mainstream video models without the need for targeted retraining, showcasing strong generalization capabilities across different domains [34].
每当有人咨询具身入门的路线时,我一定会推荐这套完整的教程
具身智能之心· 2025-09-24 00:04
Core Insights - The article discusses the evolution and components of embodied intelligence, focusing on the roles of the "brain" and "cerebellum" in robotics, where the brain handles perception and planning, while the cerebellum is responsible for execution [3][12]. Technical Development - Embodied intelligence has progressed through several stages: from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models, indicating a shift from low-level perception to high-level understanding and generalization [7][12]. - The first stage focused on grasp pose detection using point clouds or images for static object manipulation, but lacked context modeling for complex tasks [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations, but faced challenges in generalization and performance in multi-target scenarios [7]. - The third stage, marked by the introduction of diffusion policy methods, enhances stability and generalization by modeling entire action trajectories [8]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and world models, aiming to overcome limitations in feedback and future prediction capabilities [10]. Subfields and Applications - Various subfields within embodied intelligence include simulation, VLA, diffusion policy, and world models, with VLA and world models currently gaining traction in autonomous driving and embodied applications [5][6]. - The integration of tactile sensing and reinforcement learning with VLA models is expected to improve robots' capabilities in complex environments [10]. Industry Impact - The advancements in embodied intelligence are leading to the development of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home automation, food service, and healthcare [12]. - The demand for engineering and system capabilities in the industry is increasing as embodied intelligence transitions from research to deployment [17]. Educational Initiatives - The article promotes a comprehensive curriculum designed to teach the full spectrum of embodied intelligence algorithms, catering to both beginners and advanced learners [14][18]. - The course aims to equip participants with practical skills in simulation, model training, and the application of various embodied intelligence techniques [17][25].
VLA及其相关方向占据了顶会近一半的具身工作,特别是这几个......
具身智能之心· 2025-09-23 04:00
从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、 泛化、少样本、VLA+RL、人形相关。 想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能 长时间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环 境,广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目 的发展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。 其适应性体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供 了广泛的潜力和实际应用价值,成为智能机器人领域的关键驱动力。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力 等团队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正 在一起推动这一领域的发展。 很多同学后台留言,咨 ...
具身智能之心近20个交流群来啦!欢迎加入
具身智能之心· 2025-09-23 04:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence technology, inviting participation from various subfields [1] - The group covers nearly 20 sub-directions, including humanoid robots, quadrupeds, robotic arms, and areas such as vla, large models, vln, reinforcement learning, mobile operation, multimodal perception, simulation, and data collection [1] - The invitation encourages collaboration and discussion on technology and industry developments among participants [1]
为什么 VLA 能叠毛巾,却测不准物体位姿?解读具身 “空间感知” 补全
具身智能之心· 2025-09-23 00:03
Core Viewpoint - The article discusses the innovative OnePoseViaGen framework, which addresses the challenges of 6D object pose estimation in robotics, enabling robots to accurately perceive and interact with unknown objects using a single reference image without the need for pre-existing 3D models [2][3][31]. Summary by Sections Introduction to the Problem - Current robotic systems can perform simple tasks like folding towels but struggle with complex interactions requiring precise spatial awareness, such as grasping unfamiliar objects [1][2]. - The inability to establish a closed-loop connection between generated models, real objects, and spatial poses is a significant barrier to effective robotic interaction with the physical world [2]. OnePoseViaGen Framework - OnePoseViaGen offers a revolutionary solution that estimates the 6D pose of unknown objects using only a single reference image, combining single-view 3D generation, coarse-to-fine alignment, and text-guided domain randomization [2][5]. - The framework follows a logical progression: addressing the absence of 3D models, calibrating real-world scales and poses, and enhancing robustness through domain adaptation [5][7]. Key Research Achievements - The framework begins with generating a 3D texture model from a single RGB-D anchor image, ensuring geometric consistency through normal vector estimation [8][9]. - A two-step alignment strategy is employed to refine the scale and pose, starting with a coarse alignment followed by a precise optimization process [10][12][13]. - Text-guided domain randomization is utilized to create diverse 3D model variants, enhancing the robustness of pose estimation against variations in lighting and occlusion [14][15]. Performance Validation - OnePoseViaGen outperforms existing methods on benchmark datasets, achieving an average ADD of 81.27% and ADD-S of 93.10%, significantly higher than competitors like Oryon and Any6D [16][17]. - In challenging scenarios, such as high occlusion environments, OnePoseViaGen maintains high accuracy, demonstrating its effectiveness in real-world applications [20][22]. Real-World Application - The framework was tested in real robotic operations, achieving a success rate of 73.3% in tasks involving single-arm and dual-arm object manipulation, far exceeding baseline methods [23][24][25]. - The qualitative results show that the generated 3D models closely match real object textures and structures, allowing for precise pose estimation even in the presence of occlusions [27]. Ablation Studies - Ablation experiments confirm the necessity of the coarse-to-fine alignment and the importance of domain randomization in enhancing the robustness of the framework [28][30]. Conclusion - OnePoseViaGen represents a significant advancement in robotic perception, enabling accurate pose estimation and interaction with unknown objects without relying on extensive 3D model libraries or multi-view inputs, thus paving the way for robots to operate in open-world environments [31].
具身智能绕不开的“四数”为什么这么难:数采、数据飞轮、数据工厂、仿真合成数据
具身智能之心· 2025-09-23 00:03
Core Viewpoint - The article discusses the evolution and significance of embodied intelligence, emphasizing its philosophical roots and the necessity of physical interaction for intelligent systems [4][5][7]. Group 1: Historical Development - The concept of embodied intelligence traces back to philosophical and cognitive science developments, highlighting the importance of physical interaction in cognitive processes [4]. - Key experiments, such as Richard Held's "passive movement cat" study, demonstrate the intrinsic link between perception and action, reinforcing the idea that active engagement with the environment is crucial for learning [5]. - The shift from traditional views of intelligence as disembodied computation to a more integrated approach that includes physical embodiment is outlined [6][7]. Group 2: Current Trends in Embodied Intelligence - The construction of immersive environments for embodied intelligence is essential, requiring the integration of physical properties and sensory feedback [9][10]. - The development of large-scale, systematic robot training facilities is identified as a critical infrastructure for advancing embodied intelligence [12]. - Various high-level robot training platforms are emerging across China, indicating a rapid growth in this sector [12]. Group 3: Data Collection and Training - High-quality, diverse behavioral data is crucial for the development of embodied intelligence, focusing on visual, interaction, and semantic understanding data [15][17]. - The article outlines the importance of structured data collection methods, including teleoperation and wearable devices, to enhance the training of robots [19][20]. - A systematic approach to data collection is emphasized, with a focus on stability in object grasping tasks, leading to improved predictive capabilities in robotic systems [22][23][25]. Group 4: Future Directions and Challenges - The integration of embodied intelligence with large models is seen as a key pathway for advancing robotic technology, emphasizing the need for a collaborative framework between edge and cloud computing [26][29]. - The article discusses the necessity of building a comprehensive training ecosystem that combines real and virtual environments to facilitate effective learning and adaptation [34][35]. - The future of embodied intelligence relies on diverse embodied agents and a robust learning and evolution framework to ensure continuous improvement and adaptability [31][36]. Group 5: Practical Applications - Embodied intelligence is being applied in various sectors, including logistics, consumer electronics, and healthcare, showcasing its potential to address real-world challenges [30][33]. - The establishment of training centers and collaborative platforms is crucial for fostering innovation and standardization in the field of embodied intelligence [42][45]. - The article highlights the importance of open-source ecosystems and collaborative efforts among industry players to drive advancements in embodied intelligence [74].
MBZUAI 机器人实验室招收2026 Fall 全奖博士生/访问研究生等
具身智能之心· 2025-09-23 00:03
Core Insights - The article highlights the recruitment of graduate students for the Robotics Cognition and Learning (RCL) lab led by Dr. Xingxing Zuo at MBZUAI, focusing on various advanced fields in robotics and artificial intelligence [1][2]. Recruitment Focus - The lab is looking for candidates in areas such as Robotics, 3D Computer Vision, Mixed Reality, State Estimation, Learning-based Visual-Inertial SLAM, Multi-sensor Fusion, Reinforcement Learning, VLN/VLA, Humanoid-Object Interaction, and Embodied AI [2]. Admission Requirements - Candidates should have a strong interest in robotics or AI, a solid mathematical foundation, programming skills, self-management, motivation, innovation, and a rigorous research attitude. PhD applicants are expected to have published in top-tier journals as primary authors, while experience in robotics competitions is a plus [3]. Benefits and Compensation - PhD students receive a scholarship of approximately 420,000 RMB per year, tax-free, along with free round-trip airfare, ample GPU computing resources, and sufficient hardware resources for robotics and sensors [3]. Hardware Acquisition - The RCL lab has ordered various robotics hardware, with some already in use and the majority expected to arrive by November 2025 [4]. Application Deadlines - For Fall 2026 admissions, the application system opens on September 1, 2025, with an early deadline of November 15 and a late deadline of December 15. Applications for visiting researchers and domestic interns can be submitted year-round [6]. Application Process - Interested candidates must send an English resume, transcripts, and representative papers to Dr. Zuo's email, and also submit complete application materials through the official university application system [7]. Institutional Background - MBZUAI is recognized as the world's first university dedicated to artificial intelligence, recently admitting 115 undergraduate students with an acceptance rate of approximately 5% [9].
为什么 VLA 能叠毛巾,却测不准物体位姿?具身智能的 “空间感知” 补全是怎么做的?
具身智能之心· 2025-09-22 09:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zheng Geng等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 想象这样一组反差场景:VLA 模型能流畅完成叠毛巾、整理衣物等几何类操作,可面对 "用机械臂抓起陌生调料瓶""给未知零件定位 3D 姿态" 这类任务时,却 频频失误——要么抓空,要么把物体碰倒。这背后藏着具身智能落地的关键瓶颈: 6D 物体位姿估计 。 玩过机器人操作的朋友都知道,"抓零件""放调料瓶" 这类需要精准交互的任务,核心是 "靠空间感知说话"——得知道物体的 3D 位置(平移)和朝向(旋转), 还要确保测算的尺度与真实世界一致。可现有方法总在 "妥协":要么依赖预先扫描的 CAD 模型(现实中根本找不到那么多),要么需要多视角图像(实时场景 中哪来得及拍),就算是单视图重建,也会陷入 "不知道物体真实大小" 的尺度模糊困境。 这就导致了鲜明的能力断层:VLA 能靠视觉规划完成 "叠毛巾" 这类不依赖精准空 ...
具身方向适合去工作还是读博?
具身智能之心· 2025-09-22 04:00
Core Viewpoint - The article discusses whether individuals in the field of embodied intelligence should pursue a PhD or enter the job market, emphasizing the importance of foundational knowledge and the suitability for pioneering roles in this evolving industry [1][2]. Group 1: Foundations and Suitability - The article highlights the necessity of having a solid foundation in embodied intelligence, particularly in robotic-related areas, to be competitive in the job market [1]. - It stresses the importance of being suited for the role of a "pioneer" in research, especially in a field with many unresolved issues, and the need for strong problem-solving skills [1][2]. Group 2: Community and Resources - The "Embodied Intelligence Heart Knowledge Planet" community is introduced as a comprehensive platform for beginners, offering resources such as videos, articles, learning paths, and job exchange opportunities [2][4]. - The community aims to grow from nearly 2,000 members to 10,000 within two years, providing a space for technical sharing and collaboration [2]. Group 3: Practical Support and Networking - The community addresses practical questions related to equipment usage, data collection, and model deployment, enhancing the application of knowledge in projects [4]. - It has established a job referral mechanism with various leading companies in the embodied intelligence sector, facilitating connections between job seekers and employers [6][14]. Group 4: Educational Content and Learning Paths - The community has compiled over 30 technical routes for various aspects of embodied intelligence, significantly reducing the time needed for research [4][14]. - It offers a wealth of resources, including open-source projects, datasets, and technical learning routes, catering to both beginners and advanced researchers [14][19][28].