Workflow
自动驾驶之心
icon
Search documents
从感知能力提升到轻量化落地,具身这条路还要走很长一段时间~
自动驾驶之心· 2025-07-02 02:05
Core Viewpoint - The embodied intelligence industry is expected to experience explosive growth by 2025, driven by technological advancements and application traction, shaping both the technical roadmap and commercialization pathways [1]. Group 1: Technological Developments - Upgrades in perception capabilities and multimodal integration are crucial for the development of embodied technologies, with a focus on tactile perception, particularly in dexterous hands, enhancing precision and feedback [1]. - Multimodal sensor fusion technology allows robots to process various types of information simultaneously, significantly improving environmental perception accuracy and comprehensiveness [1]. - Large model-driven algorithms are enhancing robots' understanding of the world, particularly in humanoid robots, by improving perception, autonomous learning, and decision-making capabilities [1]. - Lightweight model design is becoming a pressing need for industry implementation, requiring low-computation, multimodal, and cross-platform models [1]. Group 2: Simulation and Data Ecosystem - The continuous improvement of simulation environments and data ecosystems is vital for embodied intelligence, providing efficient training platforms for robots [1]. - Simulations based on physical world principles help in modeling and analyzing various phenomena, aiding robots in understanding physical interactions and operations [1]. - The alignment of simulation and real-world environments is a key challenge that researchers are working to overcome [1]. Group 3: Community and Resources - The "Embodied Intelligence Heart Knowledge Planet" serves as a technical exchange platform for various stakeholders in the field, including members from renowned universities and leading robotics companies [6]. - The community has compiled over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms and various learning pathways [6][12]. - Members can access a wealth of resources, including research reports, technical learning routes, and job opportunities in the embodied intelligence sector [11][14].
同样的idea别人中了CVPR,你的却被秒拒?
自动驾驶之心· 2025-07-02 02:05
与其讨论同样的idea别人为什么能中顶会,不如讨论在同样的idea下顶会的论文究竟强在哪里? 1. 是否为一个point solution? 同样的idea ,如果单纯把某些指标刷的很高那多半中不了顶会。那就是point solution,本身而言不具备太大的影响力。 顶会的成果,绝大部分不单纯只 能用在某个特定的地方,这至少一个系列的方法。 那么对于想要快速有科研成果的小伙伴来说, 最重要的问题莫过于如何能高效、精准、短平快地中稿,特别是中稿顶会。 在前沿且复杂的自动驾驶、具 身智能、机器人领域,没有专业的领路人发顶会真的很难! 为此,我们为有需要的小伙伴推出了深度辅导,面向计算机全领域及AI4s领域,提升论文中稿率,直至拿下顶会! 能中的文章才是好文章, 咨询更多扫码添加: 适用人群 我们能提供什么? 2. 文章的方法实现起来是否困难? 同样的idea,但是别人的论文实现无难度,效果还杠杠的;或者实现起来虽然很复杂,但是使用起来很容易,这样的论文不中什么样的论文中? 从idea、实验设计、数据集选择、跑通baseline最后到初稿的写作, 任何一个环节的细微差别都会导致最后投稿区位的巨大不同。 清晰的科研 ...
时序融合等价梯度下降?GDFusion刷新OCC SOTA !显存大降七成~
自动驾驶之心· 2025-07-01 12:58
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 澳门大学X 武汉大学 最新的工作! 时序融合等价于 梯度下降?GDFusion 刷新 OCC 性能 SOTA,显存还大降72%! 如果您有相关工 作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Dubing Chen等 编辑 | 自动驾驶之心 一句话总结:来自澳门大学等机构的研究者提出了一种全新的时序融合框架GDFusion。它通过一个极其巧 妙的视角——将传统RNN更新过程重新诠释为"特征空间上的梯度下降",成功统一了多种异构时序信息的 融合。GDFusion不仅在3D占用栅格预测任务上取得了1.4%-4.8%的mIoU提升,更惊人地将推理显存消耗 降低了27%-72%,实现了性能和效率的双赢。 论文标题 :Rethinking Temporal Fusion with a Unified Gradient Descent View for ...
黑武士!科研&教学级自动驾驶全栈小车来啦~
自动驾驶之心· 2025-07-01 12:58
Core Viewpoint - The article announces the launch of the "Black Warrior Series 001," a lightweight autonomous driving solution aimed at research and education, with a promotional price of 34,999 yuan and a deposit scheme for early orders [1]. Group 1: Product Overview - The "Black Warrior 001" is developed by the Autonomous Driving Heart team, featuring a comprehensive solution that supports perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [2]. - The product is designed for various educational and research applications, including undergraduate learning, graduate research, and as teaching tools in laboratories and vocational schools [5]. Group 2: Performance and Testing - The product has been tested in multiple environments, including indoor, outdoor, and parking scenarios, demonstrating its capabilities in perception, localization, fusion, navigation, and planning [3]. - Specific tests include 3D point cloud target detection, 2D and 3D laser mapping in indoor parking, and outdoor scene mapping, including night driving capabilities [7][9][11][15][17]. Group 3: Hardware Specifications - Key hardware components include: - 3D LiDAR: Mid 360 - 2D LiDAR: Lidar from Raysun - Depth Camera: Orbbec with IMU - Main Control Chip: Nvidia Orin NX 16G - Display: 1080p [19]. - The vehicle specifications include a weight of 30 kg, a battery power of 50W, a voltage of 24V, and a maximum speed of 2 m/s [21]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, supporting one-click startup and providing a development environment [23]. - The system supports various functionalities such as 2D and 3D SLAM, vehicle navigation, and obstacle avoidance [24]. Group 5: After-Sales and Support - The company offers one year of after-sales support for non-human damage, with free repairs for damages caused by operational errors or code modifications during the warranty period [46].
小米社招&校招 | 自动驾驶与具身智能算法研究员 (VLA/具身方向)
自动驾驶之心· 2025-07-01 12:58
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 职位描述 我们正在寻找一位杰出的研究员/科学家,加入我们的前沿探索团队,共同定义和构建下一代自 动驾驶与机器人的"大脑"。您将致力于突破性的具身基座模型 (Embodied Foundation Model) 的 研究,该模型将深度融合视觉-语言-行动 (VLA) 能力,并具备卓越的空间感知与空间推理能 力。 多模态场景理解:融合视觉、语言、雷达等多源信息,实现对动态、开放环境的深刻理解和空间 感知。 复杂语义推理与决策:让模型能够理解模糊、抽象的人类指令,并结合对物理世界的空间推理, 生成安全、合理、可解释的行动序列。 学习与适应机制:深入研究强化学习 (RL)、模仿学习 (IL) 及自监督学习方法,使模型能从海量 数据和与环境的交互中持续学习和进化。 技术愿景与路线图:主导构建可泛化、高效率的具身智能基座模型,为未来1-3年的技术演进提 供核心支撑,并探索其在自动驾驶和通用机器人领域的统一应用潜力。 学术影响力与合作:与全球顶尖高校及研究机构合作,探索表征学习、因果推理、世界模型等长 期议题。在CVPR、 ...
重磅直播!清华&博世开源SOTA性能纯血VLA:Impromptu-VLA告别双系统~
自动驾驶之心· 2025-07-01 12:58
Core Viewpoint - The article discusses the advancements and challenges in autonomous driving systems, particularly in unstructured environments, and introduces the Impromptu VLA framework developed by Tsinghua AIR and Bosch Research Institute to address data gaps in these scenarios [1]. Group 1: Advancements in Autonomous Driving - Current autonomous driving systems have made significant progress in structured environments like cities and highways, but face challenges in unstructured scenarios such as rural roads and construction zones [1]. - Existing large-scale autonomous driving datasets primarily focus on conventional traffic conditions, leading to a lack of specialized, large-scale, and finely annotated data for complex unstructured environments [1]. Group 2: Impromptu VLA Framework - The Impromptu VLA framework aims to provide an open-weight and open-data driving vision-language-action model, which is a fully end-to-end system that extracts multimodal features directly from driving video segments [1]. - Impromptu VLA generates driving commands in natural language format without the need for manually designed perception modules or intermediate representations [1]. - In the NeuroNCAP closed-loop safety evaluation system, Impromptu VLA demonstrates strong decision robustness and generalization capabilities, significantly outperforming the latest BridgeAD system proposed at CVPR 2025 (2.15 vs. 1.60) [1].
目标导航到底是什么?自驾有没有落地的点?
自动驾驶之心· 2025-07-01 12:24
Core Viewpoint - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation systems [2][3]. Group 1: Technology Overview - Embodied navigation is a core area of embodied intelligence, relying on three technical pillars: language understanding, environmental perception, and path planning [2]. - Goal-Oriented Navigation allows robots to explore unfamiliar 3D environments and plan paths using only goal descriptions such as coordinates, images, or natural language [2]. - The technology has been industrialized across various verticals, including delivery, healthcare, hospitality, and industrial logistics, showcasing its adaptability and effectiveness [3]. Group 2: Technological Evolution - The evolution of Goal-Oriented Navigation can be categorized into three generations: 1. **First Generation**: End-to-end methods focusing on reinforcement learning and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5]. 2. **Second Generation**: Modular methods that explicitly construct semantic maps, breaking tasks into exploration and goal localization phases, showing significant advantages in zero-shot object navigation [5]. 3. **Third Generation**: Integration of large language models (LLMs) and visual language models (VLMs) to enhance knowledge reasoning and open-vocabulary target matching accuracy [7]. Group 3: Challenges and Learning Path - The complexity of embodied navigation, particularly Goal-Oriented Navigation, requires knowledge from multiple fields, including natural language processing, computer vision, and reinforcement learning [9]. - The fragmented nature of knowledge and the abundance of literature make it challenging for newcomers to extract frameworks and understand development trends [9]. - A new course has been developed to address these challenges, focusing on practical applications and theoretical foundations to facilitate learning [10][11][12]. Group 4: Course Structure - The course is structured to cover various aspects of Goal-Oriented Navigation, including: 1. **Semantic Navigation Framework**: Establishing theoretical foundations and technical lineage [14]. 2. **Habitat Simulation Ecosystem**: Analyzing the technical architecture of the Habitat platform [15]. 3. **End-to-End Navigation Methodology**: Teaching core algorithms and performance differences [16]. 4. **Modular Navigation Architecture**: Focusing on semantic map construction and task decomposition strategies [17]. 5. **LLM/VLM Driven Navigation Systems**: Exploring integration paradigms and algorithm design [18]. Group 5: Practical Application - The course includes a major project focusing on the replication of VLFM algorithms and real-world deployment, allowing participants to engage in hands-on learning [18][22].
上岸小厂,心满意足了。。。
自动驾驶之心· 2025-07-01 04:04
Core Viewpoint - The article discusses the advancements in AI technology, particularly in autonomous driving and embodied intelligence, highlighting the saturation of the autonomous driving industry and the challenges faced by job seekers in this field [2]. Group 1: Industry Developments - The autonomous driving sector has seen significant breakthroughs, with L2 to L4 functionalities being mass-produced, alongside advancements in humanoid robots and quadrupedal robots [2]. - The industry has a clear demand for technology and talent, as evidenced by the experiences shared by job seekers [2]. Group 2: Job Seeking Platform - The introduction of AutoRobo Knowledge Community aims to assist job seekers in the fields of autonomous driving, embodied intelligence, and robotics, providing a platform for job matching and networking [2][3]. - The community currently has nearly 1,000 members, including professionals from companies like Horizon Robotics, Li Auto, Huawei, and Xiaomi [2]. Group 3: Resources and Support - The community offers a variety of resources, including interview questions, industry reports, salary negotiation tips, and resume optimization services [3][4]. - Specific interview preparation materials include a compilation of 100 questions related to autonomous driving and embodied intelligence, covering various technical aspects [6][7][11]. Group 4: Industry Reports - The community provides access to numerous industry reports that help members understand the current state, development trends, and market opportunities within the autonomous driving and embodied intelligence sectors [12][15].
WorldVLA:世界模型实现视觉-动作双向增强,抓取精度显著提升
自动驾驶之心· 2025-07-01 04:04
Core Viewpoint - WorldVLA is introduced as a self-regressive action world model that integrates action and image understanding and generation, outperforming independent action and world models through mutual enhancement [4][7][9]. Group 1: Model Definition and Components - WorldVLA combines visual, language, and action (VLA) models with a world model to predict future images based on actions and visual understanding [4][6]. - The model employs three independent tokenizers for images, text, and actions, sharing the same vocabulary for unified cross-modal understanding [7][14]. - The action model generates subsequent actions based on image observations, while the world model predicts future visual states, enhancing decision-making in action models [6][29]. Group 2: Performance and Evaluation - Experiments show that WorldVLA achieves a 4% higher success rate in grasping tasks compared to traditional action models and reduces Fréchet Video Distance (FVD) by 10% compared to standard world models [8][27]. - The attention mask strategy significantly mitigates performance degradation in action sequence generation, improving grasping success rates by 4% to 23% [8][32]. - The model's performance correlates positively with image resolution, indicating that higher resolution provides better visual information for robotic tasks [27]. Group 3: Training Strategy and Data - WorldVLA is trained using a mix of action model data and world model data, enhancing action generation through understanding of environmental physics [16][22]. - The training involves generating actions based on text instructions and image observations, while the world model predicts the next image frame based on current observations and actions [17][18]. - The loss function balances contributions from action and world model data, ensuring effective training despite the disparity in token counts [22]. Group 4: Contributions and Innovations - The introduction of the attention mask strategy allows for independent generation of actions, reducing error propagation in sequential action generation [19][20]. - WorldVLA demonstrates superior performance in generating longer video sequences compared to pure world models, highlighting the benefits of integrating action models [31]. - The model's architecture and training strategies reveal the potential for enhanced task performance through pre-training with world model data [36].
暑假打打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛正式启动~
自动驾驶之心· 2025-06-30 12:51
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key supporting technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd., which also provides technical support [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power, for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - Spatial Intelligence requires building a 3D reconstruction model based on multi-view aerial images, while Embodied Intelligence involves completing tasks in dynamic occlusion scenarios [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on PSNR and F1-Score metrics [19][20]. - For Embodied Intelligence, evaluation focuses on task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Submission and Awards - Results must be submitted in a specified format, and top-ranking teams will have their results reproduced for evaluation [24]. - Awards for each track include cash prizes and computing vouchers, with a total of 12 awards distributed among the top teams [25].