Workflow
具身智能之心
icon
Search documents
思岚发布首个消费级水下激光雷达品类-RPLIDAR U1
具身智能之心· 2025-06-26 14:19
Core Viewpoint - The company has launched the RPLIDAR U1, the first consumer-grade underwater LiDAR, marking the beginning of high-precision laser SLAM navigation in underwater environments [1][4]. Group 1: Product Features - RPLIDAR U1 is compact, with a size comparable to a ping pong ball, making it suitable for various devices [4]. - The innovative technology allows RPLIDAR U1 to be cost-effective for consumer applications [6]. - The system is designed to overcome challenges posed by underwater conditions, such as reduced detection range and noise, which have historically limited the use of traditional LiDAR in water [7][10]. Group 2: Performance and Testing - RPLIDAR U1 achieves an underwater maximum detection range of 5 meters and meets IPX8 waterproof standards [8]. - The product has undergone extensive testing to ensure it can handle various underwater challenges, including different water qualities and surface materials [11][18]. Group 3: Applications - The RPLIDAR U1 is paired with the SLAMKIT underwater mapping and navigation solution, enabling efficient mapping and navigation without the need for odometry support [22]. - Potential applications include nearshore vegetation exploration, pool cleaning, and seabed exploration [24][26][29]. Group 4: Availability - RPLIDAR U1 is now open for sample requests and product reservations from industry clients [32].
ICCV 2025放榜!录取率24%,夏威夷门票你抢到了吗?
具身智能之心· 2025-06-26 14:19
Core Viewpoint - The article discusses the significant increase in submissions to the ICCV 2025 conference, reflecting rapid growth in the computer vision field and the challenges faced in the peer review process due to the high volume of submissions [3][26][31]. Submission and Acceptance Data - ICCV 2025 received 11,239 valid submissions, with 2,699 papers accepted, resulting in an acceptance rate of 24% [3][4]. - In comparison, ICCV 2023 had 8,260 submissions and accepted 2,160 papers, yielding an acceptance rate of approximately 26.15% [6]. - Historical data shows ICCV 2021 had 6,152 submissions with a 26.20% acceptance rate, and ICCV 2019 had 4,323 submissions with a 25% acceptance rate [6]. Peer Review Challenges - Despite the increase in submissions, the acceptance rate has remained relatively stable, hovering around 25% to 26% [4]. - The ICCV 2025 conference implemented a new policy to enhance accountability and integrity, identifying 25 irresponsible reviewers and rejecting 29 associated papers [4][5]. - The article highlights the growing challenges in the peer review process as submission volumes exceed 10,000, with NIPS expected to surpass 30,000 submissions [31]. Recommendations for Peer Review System - The article advocates for a two-way feedback loop in the peer review process, allowing authors to evaluate review quality while reviewers receive formal recognition [34][38]. - It suggests a systematic reviewer reward mechanism to incentivize high-quality reviews [38]. - The need for reforms in the peer review system is emphasized to address issues of fairness and accountability [36][37].
RoboSense 2025 机器感知挑战赛正式启动
具身智能之心· 2025-06-25 13:52
Core Viewpoint - The RoboSense Challenge 2025 aims to systematically evaluate the perception and understanding capabilities of robots in real-world scenarios, addressing the limitations of traditional perception algorithms in complex environments [1][44]. Group 1: Challenge Overview - The challenge is organized by multiple prestigious institutions, including National University of Singapore and University of Michigan, and is officially recognized as part of IROS 2025 [5]. - The competition will take place in Hangzhou, China, with key dates including registration starting in June 2025 and award decisions on October 19, 2025 [3][46]. Group 2: Challenge Tasks - The challenge includes five real-world tasks focusing on various aspects of robotic perception, such as language-driven autonomous driving, social navigation, sensor placement optimization, cross-modal drone navigation, and cross-platform 3D object detection [6][9]. - Each task is designed to test the robustness and adaptability of robotic systems under different conditions, emphasizing the need for innovative solutions in perception and understanding [44]. Group 3: Technical Features - The tasks require the development of end-to-end multimodal models that integrate visual sequences with natural language instructions, aiming for deep coupling between language, perception, and planning [7]. - The challenge emphasizes the importance of robust performance in dynamic environments, including the ability to handle sensor placement variations and social interactions with humans [20][28]. Group 4: Evaluation Metrics - The evaluation framework includes multiple dimensions such as perception accuracy, understanding through visual question answering (VQA), prediction of trajectories, and planning consistency with language commands [9][22]. - Baseline models and their performance metrics are provided for each task, indicating the expected computational resources and training requirements [13][19][39]. Group 5: Awards and Incentives - The challenge offers a total prize pool exceeding $10,000, with awards for first, second, and third places, as well as innovation awards for outstanding contributions in each task [40][41]. - All teams that complete valid submissions will receive official participation certificates, encouraging widespread engagement in the competition [41].
同济大学最新!多模态感知具身导航全面综述
具身智能之心· 2025-06-25 13:52
Core Insights - The article presents a comprehensive analysis of multimodal navigation methods, emphasizing the integration of various sensory modalities such as visual, audio, and language processing to enhance navigation capabilities [4][32]. Group 1: Research Background - Goal-oriented navigation is a fundamental challenge in autonomous systems, requiring agents to navigate complex environments to reach specified targets. Over the past decade, navigation technology has evolved from simple geometric path planning to complex multimodal reasoning [7][8]. - The article categorizes goal-oriented navigation methods based on reasoning domains, revealing commonalities and differences among various tasks, thus providing a unified framework for understanding navigation methods [4]. Group 2: Navigation Tasks - Navigation tasks have increased in complexity, evolving from simple point navigation (PointNav) to more complex multimodal paradigms such as ObjectNav, ImageNav, and AudioGoalNav, each requiring different levels of semantic understanding and reasoning [8][12]. - The formal definition of navigation tasks is framed as a decision-making process where agents must reach specified goals in unknown environments through a series of actions [8]. Group 3: Datasets and Evaluation - The Habitat-Matterport 3D (HM3D) dataset is highlighted as the largest collection, encompassing 1,000 reconstructed buildings and covering 112.5k square meters of navigable area, with varying complexities across other datasets like Gibson and Matterport3D [9]. - Evaluation metrics for navigation tasks include success rate (SR), path length weighted success rate (SPL), and distance-related metrics, which assess the efficiency and effectiveness of navigation strategies [14]. Group 4: Methodologies - Explicit representation methods, such as ANM and LSP-UNet, construct and maintain environmental representations to support path planning, while implicit representation methods, like DD-PPO and IMN-RPG, encode spatial understanding without explicit mapping [15][16]. - Object navigation tasks are modularly approached, breaking down the task into mapping, strategy, and path planning, with methods like Sem-EXP and PEANUT focusing on semantic understanding [17]. Group 5: Challenges and Future Work - Current challenges in multimodal navigation include the effective integration of sensory modalities, the transfer from simulation to real-world applications, and the development of robust multimodal representation learning methods [31][32]. - Future work is suggested to focus on enhancing human-robot interaction, developing balanced multimodal representation learning methods, and addressing the computational efficiency of navigation systems [32].
重磅分享!A0:首个基于空间可供性感知的通用机器人分层模型
具身智能之心· 2025-06-25 13:52
点击下方 卡片 ,关注" 具身智能之心 "公众号 >>直播和内容获取转到 → 具身智能之心知识星球 由无界智慧(Spatialtemporal AI)团队推出的A0模型,是首个基于空间可供性感知的通用机器人分层扩散 模型,通过具身无关的可供性表征 (Embodiment-Agnostic Affordance Representation) 实现了跨平台的通 用操作能力,模型框架和代码等已经开源。 论文链接:https://arxiv.org/abs/2504.12636 项目主页:https://a-embodied.github.io/A0/ 机器人操作面临的核心挑战 在机器人技术快速发展的今天,通用化操作能力始终是制约行业发展的关键瓶颈。想象一下,当你让机器 人"擦干净白板"时,它需要准确理解应该在何处施力("where"),以及如何移动抹布("how")。这正是 当前机器人操作面临的核心挑战——空间可供性感知理解不足。 现有方法主要分为两类:基于模块化的方法和端到端的视觉-语言-动作(VLA)大模型。前者虽然能利用视 觉基础模型进行空间理解,但对物体可供性的捕捉有限;后者虽能直接生成动作,却缺乏对空间 ...
今年秋招靠什么卷赢那些top实验室啊?
具身智能之心· 2025-06-25 08:24
Core Viewpoint - The article highlights the rapid advancements in AI technologies, particularly in autonomous driving and embodied intelligence, which have significantly influenced the industry and investment landscape [1]. Group 1: AutoRobo Knowledge Community - AutoRobo Knowledge Community is established as a platform for job seekers in the fields of autonomous driving, embodied intelligence, and robotics, currently hosting nearly 1000 members from various companies [2]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and resume optimization services to assist members in their job search [2][3]. Group 2: Recruitment Information - The community regularly shares job openings in algorithms, development, and product roles, including positions for campus recruitment, social recruitment, and internships [3][4]. Group 3: Interview Preparation - A compilation of 100 interview questions related to autonomous driving and embodied intelligence is available, covering essential topics for job seekers [6]. - Specific areas of focus include sensor fusion, lane detection algorithms, and various machine learning deployment techniques [7][12]. Group 4: Industry Reports - The community offers access to numerous industry reports that provide insights into the current state, development trends, and market opportunities within the autonomous driving and embodied intelligence sectors [13][14]. - Reports include analyses of successful and failed interview experiences, which serve as valuable learning tools for members [15]. Group 5: Salary Negotiation and Professional Development - The community emphasizes the importance of salary negotiation skills and provides resources to help members navigate this aspect of their job search [17]. - A collection of recommended books related to robotics, autonomous driving, and AI is also available to support professional development [18].
显示端到端VLA是什么?有哪些方法?
具身智能之心· 2025-06-25 08:24
什么是显示端到端VLA,提到显示,这一点和隐式是对立的。上一期分享,我们分享了隐式端到端的模型定 义,显示端到端VLA模型视频生成GOAL,明确生成了未来机械臂如何运动的图像!可以浏览下图! 其中还涉及了一个比较重要的概念:逆运动学。 逆运动学 逆运动学主要应用在机器人学、动画学和计算机图形学中,与经典运动学相对。它的目标是根据目标位置,计 算物体(如机械臂或骨骼系统)的各个关节应该如何运动才能到达该目标。 列入在机器人领域,逆运动学会回答这样的实际问题:机械臂的末端(手爪)需要到达某个指定位置,那么每 个关节应该如何旋转。 逆运动学的核心步骤: 已知信息: 求解: 利用矩阵、三角学或迭代方法,计算每个关节的角度或未知,使得末端能够到达目标点。 多解性问题: 逆运动学通用会有多个解(甚至没解),需要在可能的解中选择一个最优解(如最小能量消耗或最自然运 动)。 主要工作一览 3)LAPA 1)开山之作:UniPi 将序列决策问题转化为文本条件视频生成问题:给定文本编码的目标描述,规划器会合成一组未来帧来描绘其 计划执行的行动序列,随后从生成的视频中提取控制动作。通过以文本作为底层目标描述,我们能够自然而然 地实 ...
MuJoCo具身智能实战:从零基础到强化学习与Sim2Real
具身智能之心· 2025-06-24 14:29
Core Insights - The article discusses the unprecedented turning point in AI development, highlighting the rise of embodied intelligence, which allows machines to understand language, navigate complex environments, and make intelligent decisions [1][2]. Group 1: Embodied Intelligence - Embodied intelligence is defined as AI systems that not only possess a "brain" but also have a "body" capable of perceiving and interacting with the physical world [1]. - Major tech companies like Tesla, Boston Dynamics, OpenAI, and Google are competing in this transformative field, which is expected to revolutionize various industries including manufacturing, healthcare, and space exploration [1]. Group 2: Technical Challenges - Achieving true embodied intelligence faces significant technical challenges, requiring advanced algorithms and a deep understanding of physical simulation, robot control, and perception fusion [2][4]. - MuJoCo (Multi-Joint dynamics with Contact) is identified as a key technology in this domain, serving as a high-fidelity training environment for robot learning [4][8]. Group 3: MuJoCo's Role - MuJoCo allows researchers to create realistic virtual robots and environments, enabling millions of trials and learning experiences without the risk of damaging expensive hardware [6][4]. - The simulation speed can be hundreds of times faster than real-time, significantly accelerating the learning process [6]. - MuJoCo has become a standard tool in both academia and industry, with major companies utilizing it for robot research [8]. Group 4: Practical Training - A comprehensive MuJoCo development course has been designed, focusing on practical applications and theoretical foundations, covering topics from physical simulation to deep reinforcement learning [9][10]. - The course is structured into six modules, each with specific learning objectives and practical projects, ensuring a solid grasp of the technology stack [13][16]. Group 5: Project-Based Learning - The course includes six progressively challenging projects, such as building a robotic arm control system and implementing vision-guided grasping [19][21]. - Each project is designed to reinforce theoretical concepts through hands-on experience, ensuring participants understand both the "how" and "why" of the technology [29][33]. Group 6: Target Audience and Outcomes - The course is suitable for individuals with programming or algorithm backgrounds looking to enter the field of embodied robotics, as well as students and professionals interested in enhancing their practical skills [30][32]. - Upon completion, participants will have a complete technology stack in embodied intelligence, gaining advantages in technical, engineering, and innovation capabilities [32][33].
AI Lab最新InternSpatia:VLM空间推理数据集,显著提升模型能力
具身智能之心· 2025-06-24 14:09
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in spatial reasoning tasks, highlighting the need for improved datasets and methodologies to enhance performance in various scenarios [3][12]. Dataset Limitations - The existing InternSpatial dataset has three main limitations: 1. Limited scene diversity, focusing primarily on indoor and outdoor environments, lacking diverse contexts like driving and embodied navigation [3]. 2. Restricted instruction formats, only supporting natural language or region masks, which do not encompass the variety of queries found in real-world applications [3]. 3. Lack of multi-view supervision, with over 90% of data focusing on single-image reasoning, failing to model spatiotemporal relationships across views [3]. Evaluation Benchmark - The InternSpatial-Bench evaluation benchmark includes 6,008 QA pairs across five tasks, assessing position comparison, size comparison, rotation estimation, object counting, and existence estimation [7]. - The benchmark also introduces 1,000 additional QA pairs for multi-view rotation angle prediction [7]. Data Engine Design - The data engine employs a three-stage automated pipeline: 1. Annotation generation using existing annotations or SAM2 for mask generation [9]. 2. View alignment to construct a standard 3D coordinate system [9]. 3. Template-based QA generation with predefined task templates [9]. Experimental Results - Spatial reasoning performance has improved, with InternVL-Spatial-8B showing a 1.8% increase in position comparison accuracy and a 17% increase in object counting accuracy compared to its predecessor [10]. - The model's performance across various tasks demonstrates significant enhancements, particularly in multi-view tasks [10]. Instruction Format Robustness - Current models exhibit a 23% accuracy drop when using the <box> format, while training with InternSpatial reduces the gap between different formats to within 5% [12]. - However, the automated QA generation struggles to replicate the complexity of natural language, indicating a need for further refinement [12].
具身领域的目标导航到底是什么?从目标搜索到触达有哪些路线?
具身智能之心· 2025-06-24 14:09
Core Insights - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation [2] - The technology has been successfully implemented in various verticals, enhancing service efficiency in delivery, healthcare, and hospitality sectors [3] - The evolution of Goal-Oriented Navigation can be categorized into three generations, each with distinct methodologies and advancements [5][7] Group 1: Technology Overview - Goal-Oriented Navigation is a key aspect of embodied navigation, relying on language understanding, environmental perception, and path planning [2] - The transition from explicit instructions to autonomous decision-making involves semantic parsing, environmental modeling, and dynamic decision-making [2] - The technology has been integrated into delivery robots, service robots in healthcare and hospitality, and humanoid robots for domestic and industrial applications [3] Group 2: Technical Evolution - The first generation focuses on end-to-end methods using reinforcement and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5] - The second generation employs modular methods that explicitly construct semantic maps, enhancing performance in zero-shot object navigation tasks [5] - The third generation integrates large language models (LLMs) and visual language models (VLMs) to improve exploration strategies and open-vocabulary target matching accuracy [7][8] Group 3: Challenges and Learning Path - The complexity of embodied navigation requires knowledge across multiple domains, making it challenging for newcomers to grasp the necessary concepts [10] - A new course has been developed to address these challenges, focusing on practical applications and theoretical foundations of Goal-Oriented Navigation [11][12][13] - The course aims to build a comprehensive understanding of the technology stack, including end-to-end reinforcement learning, modular semantic map construction, and LLM/VLM integration methods [30]