具身智能之心

Search documents
VLA还是VTLA?这家企业用“超人类触觉”技术颠覆机器人未来!
具身智能之心· 2025-08-13 00:04
虽然触觉传感器如此重要,但还有很多问题没有解决,比如分辨率不高,实时性保证不了、买过来没多久 就坏了、质量不行等。然而,我们发现现场有一家触觉传感器硬件公司同时在分辨率、实时性、耐用性与 成本平衡方面取得了最优,这家公司就是 " 戴盟机器人 "。 这几天去WRC25逛了一圈,看到了各家具身机器人公司的产品和功能。说实话相比于去年,硬件和技术上 真的是有较大提升。还看到了多家没去WAIC25现场的公司,总体结论是现阶段的本体已经基本能够满足一 些场景的需求,反而是感知大脑,有点落后于硬件。 现场看到了很多相关的技术,特别是VLA模型。VLA作为新一代端到端视觉语言动作模型,是各家公司与 研究机构重点关注的。不过在展示过程中我们也发现了一个明显的问题,视觉虽然能提供丰富的环境信 息,在涉及物理交互(如抓取、操作物体)时,无法精确感知物体的材质、硬度、摩擦力等属性。特别是 在工业装配、医疗手术、家庭服务等场景中,机器人需要执行高精度任务,如果不小心用力过度将会产生 不良的后果。 就在近几天,戴盟机器人也(Daimon Robotics)宣布完成亿元级天使++轮融资,由招商局创投领投,东方 嘉富、架桥资本跟投。本轮融 ...
AI如何一步步「看懂」时空结构?一篇综述解析通往四维世界的五大层次
具身智能之心· 2025-08-13 00:04
Core Viewpoint - The article discusses the advancements in 4D spatial intelligence reconstruction, emphasizing its significance in computer vision and its applications in virtual reality, digital twins, and intelligent interactions. The research focuses on both foundational reconstruction techniques and higher-level understanding of spatial relationships and physical constraints [1][2]. Group 1: Levels of 4D Spatial Intelligence Reconstruction - Level 1 focuses on the reconstruction of basic 3D attributes such as depth perception, camera positioning, point cloud construction, and dynamic tracking, forming the digital skeleton of 3D space [6]. - Level 2 shifts to the detailed modeling of specific objects within the scene, including humans and various structures, while addressing the spatial distribution and dynamic interactions among these elements [8]. - Level 3 aims to construct complete 4D dynamic scenes by introducing the time dimension, supporting immersive visual experiences [10][11]. Group 2: Interaction and Physical Modeling - Level 4 represents a significant breakthrough by establishing dynamic interaction models among scene elements, with a focus on human interactions and their relationships with objects [13][15]. - Level 5 addresses the challenge of physical realism by integrating fundamental physical laws into the reconstruction process, enhancing the capabilities of embodied intelligence tasks such as robotic motion imitation [18][22]. - The hierarchical framework illustrates the evolution of AI cognitive abilities from basic observation to understanding physical laws, indicating a shift from "looking real" to "moving real" in virtual environments [23].
具身目标导航/视觉语言导航/点导航工作汇总!
具身智能之心· 2025-08-12 07:04
Core Insights - The article discusses the development and methodologies related to embodied navigation, particularly focusing on point-goal navigation and visual-audio navigation techniques [2][4][5]. Group 1: Point-Goal Navigation - The comparison between model-free and model-based learning for point-goal navigation highlights the effectiveness of different approaches in planning and execution [4]. - RobustNav aims to benchmark the robustness of various embodied navigation methods, providing a framework for evaluating performance [5]. - Significant advancements in visual odometry techniques have been noted, showcasing their effectiveness in embodied point-goal navigation [5]. Group 2: Visual-Audio Navigation - The integration of audio-visual elements in navigation tasks is explored, emphasizing the importance of sound in enhancing navigation efficiency [7][8]. - Various projects and papers have been referenced that focus on audio-visual navigation, indicating a growing interest in multi-modal approaches [8][9]. - The development of platforms like SoundSpaces 2.0 aims to facilitate research in visual-acoustic learning, further bridging the gap between visual and auditory navigation [8]. Group 3: Object Goal Navigation - The article outlines several methodologies for object goal navigation, including modular approaches and self-supervised learning techniques [9][13]. - The importance of auxiliary tasks in enhancing exploration and navigation capabilities is emphasized, indicating a trend towards more sophisticated learning frameworks [13][14]. - Benchmarking efforts such as DivScene aim to evaluate large language models for object navigation, reflecting the increasing complexity of navigation tasks [9][14]. Group 4: Vision-Language Navigation - The article discusses advancements in vision-language navigation, highlighting the role of language in guiding navigation tasks [22][23]. - Techniques such as semantically-aware reasoning and history-aware multimodal transformers are being developed to improve navigation accuracy and efficiency [22][23]. - The integration of language with visual navigation is seen as a critical area of research, with various projects aiming to enhance the interaction between visual inputs and language instructions [22][23].
CMU最新!跨实体世界模型助力小样本机器人学习
具身智能之心· 2025-08-12 00:03
Core Viewpoint - The article discusses a novel approach to training visuomotor policies for robots by leveraging existing low-cost data sources, which significantly reduces the need for expensive real-world data collection [2][11]. Group 1: Methodology - The proposed method is based on two key insights: 1. Embodiment-agnostic world model pretraining using optic flow as an action representation, allowing for cross-embodiment data set training followed by fine-tuning with minimal target embodiment data [3][12]. 2. Latent Policy Steering (LPS) method improves policy outputs by searching for better action sequences in the latent space of the world model [3][12]. Group 2: Experimental Results - Real-world experiments showed that combining the policy with a pretrained world model from existing datasets led to significant performance improvements, with 30 demonstrations yielding over 50% relative improvement and 50 demonstrations yielding over 20% relative improvement [3][9]. Group 3: Challenges and Solutions - The article highlights the challenges posed by embodiment gaps in pretraining models across different robots, and emphasizes that world models are more suitable for cross-embodiment pretraining and fine-tuning for new embodiments [11][12].
探究具身机器人有限泛化能力的本质原因!增强策略依然有效
具身智能之心· 2025-08-12 00:03
Research Background and Core Issues - The development of large-scale robot datasets and high-capacity models has shown strong capabilities in various tasks, but generalization remains limited in scenarios outside the training data distribution [2] - Shortcut learning, where models rely on task-irrelevant features rather than true causal relationships, is a key factor limiting generalization [2] Dataset Diversity and Fragmentation Analysis - The OXE dataset exhibits significantly lower visual and textual diversity compared to visual/multimodal datasets, even with the latest DROID dataset aimed at increasing diversity [4] - The fragmentation of the OXE dataset is evident, with distinct separation among sub-datasets, leading to a lack of overlap and effective division into smaller datasets [8] - The limited diversity is attributed to inherent constraints in the robot data collection process [6] Theoretical Connection Between Dataset Characteristics and Shortcut Learning - A mathematical framework has been established to analyze how multiple sub-datasets lead to correlations that facilitate shortcut learning [15] - The distance between task-irrelevant features across sub-datasets significantly influences shortcut learning, with models tending to rely on visual cues rather than textual instructions [16] Experimental Validation - Experiments indicate that increasing diversity within sub-datasets and reducing differences between them can effectively reduce shortcut dependencies [18] - The introduction of a "bridge" target in experiments significantly improved out-of-distribution (OOD) success rates by breaking false correlations [28] Mitigating Shortcut Learning Through Data Augmentation - Targeted data augmentation strategies can effectively increase sub-dataset diversity and reduce distribution differences, thereby alleviating shortcut learning [29] - Perspective augmentation creates shared visual contexts between sub-datasets, breaking false correlations tied to specific tasks [30] - The results confirm that carefully selected data augmentation strategies can enhance the generalization capabilities of robot policies [34]
机器人上下文协议首次开源:阿里达摩院一口气放出具身智能「三大件」
具身智能之心· 2025-08-12 00:03
Core Viewpoint - Alibaba's Damo Academy announced the open-source of several models and protocols aimed at enhancing embodied intelligence, addressing challenges in data, model, and robot compatibility, and streamlining the development process [1][2]. Group 1: Open-Source Models and Protocols - The RynnRCP protocol was introduced to facilitate the integration of various data, models, and robotic systems, creating a seamless workflow from data collection to action execution [2][5]. - RynnVLA-001 is a visual-language-action model that learns human operational skills from first-person perspective videos, enabling smoother robotic arm control [7]. - The RynnEC model incorporates multi-modal large language capabilities, allowing for comprehensive scene analysis across 11 dimensions, enhancing object recognition and interaction in complex environments [7]. Group 2: Technical Framework and Features - The RCP framework connects robotic bodies with sensors, providing standardized interfaces and compatibility across different transport layers and model services [5]. - RobotMotion serves as a bridge between large models and robotic control, converting low-frequency commands into high-frequency control signals for smoother robot movements [5][6]. - The framework includes integrated simulation and real-machine control tools, facilitating quick developer onboarding and supporting various functionalities like task regulation and trajectory visualization [5]. Group 3: Industry Engagement and Community Building - Damo Academy is actively investing in embodied intelligence, focusing on system and model development, and collaborating with various stakeholders to build industry infrastructure [7]. - The launch of the WorldVLA model, which merges world models with action models, has garnered significant attention for its enhanced understanding and generation capabilities [8]. - The establishment of the "Embodied Intelligence Heart" community aims to foster collaboration among developers and researchers in the field, providing resources and support for various technical directions [11][12].
具身智能之心技术交流群成立了!
具身智能之心· 2025-08-11 06:01
Group 1 - The establishment of a technical exchange group focused on embodied intelligence technologies, including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the joining process, it is recommended to include the organization/school, name, and research direction in the remarks [3]
找几个做数采的大佬一起搞点事情......
具身智能之心· 2025-08-11 06:01
具身智能之心准备在国内外招募3位做数采的大佬,主要研究方向:遥操作、AR、全身动捕等方向。 相关研究方向至少1年,具身公司从业人员、博士及以上学历优先(包含在读博士)。 联系我们 更多待遇和工作内容咨询,欢迎添加负责人微信oooops-life了解。 工作内容 和我们一起承接具身数采相关的项目开发、课程开发等; 岗位要求 ...
国内首个具身智能全栈学习社区来啦!
具身智能之心· 2025-08-11 06:01
Core Insights - The article emphasizes the value of a community that provides solutions to problems in the field of embodied intelligence, highlighting the importance of knowledge sharing and collaboration among members [3][16]. Group 1: Community and Resources - The community has established a closed loop across various fields including industry, academia, job seeking, and Q&A exchanges, providing timely solutions and job opportunities [3][4]. - A comprehensive list of over 30 technical routes has been compiled, aiding members in finding benchmarks, reviews, and learning paths efficiently [4][16]. - The community invites industry experts to answer questions and share insights, enhancing the learning experience for members [4][17]. Group 2: Educational Support - Resources for beginners include curated technical stacks and learning paths to facilitate entry into the field of embodied intelligence [11][16]. - For those already engaged in research, valuable industry frameworks and project proposals are provided to support further development [13][16]. Group 3: Job Opportunities and Networking - The community has established a job referral mechanism with multiple companies in the embodied intelligence sector, ensuring members can connect with potential employers [10][17]. - Members are encouraged to engage in discussions about career choices and research directions, fostering a supportive environment for professional growth [79][83]. Group 4: Research and Development - The community has compiled a wealth of resources including open-source projects, datasets, and simulation platforms relevant to embodied intelligence, facilitating research and development efforts [16][30][36]. - A focus on various research directions such as visual language navigation, reinforcement learning, and multi-modal models is evident, indicating the community's commitment to staying at the forefront of technological advancements [20][58][70].
国内首个具身大脑+小脑算法实战全栈教程
具身智能之心· 2025-08-11 00:14
Core Viewpoint - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][6]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, driving advancements in embodied brain and cerebellum technologies [3]. - Major domestic companies like Huawei, JD.com, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build an ecosystem for embodied intelligence, while international firms like Tesla and investment institutions in the U.S. are supporting companies like Wayve and Apptronik in autonomous driving and warehouse robotics [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6]. - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6]. - The third stage introduced Diffusion Policy methods, enhancing stability and generalization in task execution through sequence modeling [7]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing, aiming to overcome limitations in feedback and future prediction capabilities [8]. Product and Market Development - The evolution from grasp pose detection to behavior cloning and advanced VLA models signifies a shift towards intelligent agents capable of performing complex tasks in open environments, leading to a surge in product development across various sectors such as industrial, home, dining, and healthcare [9]. - The demand for engineering and system capabilities is increasing as the industry transitions from research to deployment, necessitating higher engineering standards [12]. Educational Initiatives - A comprehensive curriculum has been developed to assist learners in mastering the full spectrum of embodied intelligence algorithms, covering topics from basic tasks to advanced models like VLA and its integrations [9][12].