Workflow
自动驾驶之心
icon
Search documents
正在筹划一个万人的自动驾驶&具身技术社区~
自动驾驶之心· 2025-06-25 09:54
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving and embodied intelligence, aiming to gather industry professionals and facilitate rapid problem-solving and knowledge sharing within the sector [2][4]. Group 1: Community Development - The goal is to create a community of 10,000 members focused on intelligent driving and embodied intelligence within three years, welcoming contributions from talented individuals [2]. - The community will serve as a bridge connecting academia, products, and recruitment, forming a closed loop in teaching and research [2][4]. - The community will provide the latest industry technology updates, technical discussions, and job sharing opportunities [2][3]. Group 2: Knowledge Sharing and Resources - The "Autonomous Driving Heart Knowledge Planet" is designed as a technical exchange platform for academic and engineering issues, attracting students and professionals from top universities and companies [4][11]. - The community has established connections with numerous companies for recruitment, including Xiaomi, Horizon, and NIO, facilitating direct resume submissions [4][11]. - Members will have access to a variety of learning modules, from basic to advanced, covering algorithm explanations and code implementations [4][11]. Group 3: Technical Focus Areas - By 2025, the focus will be on advanced technology areas such as visual large language models (VLM), end-to-end trajectory prediction, and 3D generative simulation [6][10]. - The community has developed over 30 learning pathways covering various subfields of autonomous driving, including perception, mapping, and AI model deployment [11][16]. - Regular live sessions will feature top researchers and industry experts discussing practical applications and research advancements in autonomous driving [18][19]. Group 4: Engagement and Interaction - The community encourages active participation, with weekly engagement metrics ranking among the top 20 in the country, fostering a collaborative learning environment [12]. - Members can freely ask questions and engage in discussions, enhancing their learning experience and networking opportunities [11][12]. - The platform offers exclusive rights to members, including access to academic advancements, expert Q&A, and discounts on paid courses [14].
SOTA端到端算法如何设计?CVPR'25 WOD纯视觉端到端比赛Top3技术分享~
自动驾驶之心· 2025-06-25 09:54
Core Insights - The article discusses the results of the 2025 Waymo Open Dataset End-to-End Driving Challenge, highlighting the advancements in end-to-end autonomous driving systems and the shift towards using large-scale public datasets for training models [2][18]. Group 1: Competition Results - The champion of the competition was the EPFL team, which utilized the DiffusionDrive model, nuPlan data, and an ensembling strategy [1]. - The runner-up was a collaboration between Nvidia and Tubingen teams, which also referenced DiffusionDrive and SmartRefine, employing multiple datasets to demonstrate the importance of training data quality [1][22]. - The third place was secured by Hanyang University from South Korea, which focused on a simplified structure using only front-view input and vehicle state [1][3]. Group 2: Methodology - The UniPlan framework was introduced, leveraging large-scale public driving datasets to enhance generalization in rare long-tail scenarios, achieving competitive results without relying on expensive multimodal large language models [3][18]. - The model architecture is based on DiffusionDrive, which employs a truncated diffusion strategy for efficient and diverse trajectory generation [4][6]. - The diffusion decoder utilizes a cross-attention mechanism to refine trajectory predictions based on scene context [5][6]. Group 3: Data Processing - The nuPlan dataset was processed to create a diverse training set, resulting in 90,000 samples by applying a sliding window approach [7]. - A similar filtering strategy was used for the WOD-E2E dataset, generating 35,000 training samples and 10,000 validation samples [8]. - The model was trained on a powerful computing setup with four H100 GPUs, achieving significant training efficiency [10]. Group 4: Experimental Results - The performance was evaluated using Rater Feedback Score (RFS) and Average Displacement Error (ADE), with various configurations tested [12][17]. - The results indicated that the combined training of WOD-E2E and nuPlan datasets led to slight improvements in average RFS, particularly in long-tail categories [23]. - The analysis showed that while additional datasets generally provide benefits, the quality of the data sources is more critical than quantity [39]. Group 5: Conclusion - The article emphasizes the potential of data-centric approaches in enhancing the robustness of autonomous driving systems, as demonstrated by the competitive results achieved with the UniPlan framework [18][39].
RoboSense 2025机器感知挑战赛正式启动!自动驾驶&具身方向~
自动驾驶之心· 2025-06-25 09:54
Core Viewpoint - The RoboSense Challenge 2025 aims to systematically evaluate the perception and understanding capabilities of robots in real-world scenarios, addressing key challenges in stability, robustness, and generalization of perception systems [2][43]. Group 1: Challenge Overview - The challenge consists of five major tracks focusing on real-world tasks, including language-driven autonomous driving, social navigation, sensor placement optimization, cross-modal drone navigation, and cross-platform 3D object detection [8][9][29][35]. - The event is co-hosted by several prestigious institutions and will be officially recognized at the IROS 2025 conference in Hangzhou, China [5][43]. Group 2: Task Details - **Language-Driven Autonomous Driving**: This track evaluates the ability of robots to understand and act upon natural language commands, aiming for a deep coupling of language, perception, and planning [10][11]. - **Social Navigation**: Focuses on robots navigating shared spaces with humans, emphasizing social compliance and safety [17][18]. - **Sensor Placement Optimization**: Assesses the robustness of perception models under various sensor configurations, crucial for reliable deployment in autonomous systems [23][24]. - **Cross-Modal Drone Navigation**: Involves training models to retrieve aerial images based on natural language descriptions, enhancing the efficiency of urban inspections and disaster responses [29][30]. - **Cross-Platform 3D Object Detection**: Aims to develop models that maintain high performance across different robotic platforms without extensive retraining [35][36]. Group 3: Evaluation and Performance Metrics - Each task includes specific performance metrics and baseline models, with detailed requirements for training and evaluation [16][21][28][42]. - The challenge encourages innovative solutions and provides a prize pool of up to $10,000, shared across the five tracks [42]. Group 4: Timeline and Participation - The challenge will officially start on June 15, 2025, with key deadlines for submissions and evaluations leading up to the award ceremony on October 19, 2025 [4][42]. - Participants are encouraged to engage in this global initiative to advance robotic perception technologies [43].
黑武士!科研&教学级自动驾驶全栈小车来啦~
自动驾驶之心· 2025-06-25 09:48
Core Viewpoint - The article introduces the launch of the "Black Warrior 001," a lightweight autonomous driving solution aimed at educational and research purposes, highlighting its various functionalities and applications in different scenarios [3][6]. Group 1: Product Overview - The "Black Warrior 001" supports multiple functionalities including perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [3]. - The product is currently available for pre-sale at a discounted price, with a deposit option to secure orders [2]. Group 2: Functional Demonstrations - The product has been tested in various environments such as indoor, outdoor, and parking scenarios, showcasing its capabilities in perception, localization, fusion, navigation, and planning [4]. - Specific applications include undergraduate learning, graduate research, and as teaching tools in academic laboratories and vocational training institutions [6]. Group 3: Hardware Specifications - Key hardware components include: - 3D LiDAR: Mid 360 - 2D LiDAR: from Lidar Technology - Depth Camera: from Orbbec, equipped with IMU - Main Control Chip: Nvidia Orin NX 16G - Display: 1080p [10]. - The vehicle weighs 30 kg, has a battery power of 50W, and a maximum speed of 2 m/s [12]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, allowing for one-click startup and providing a development environment [14]. - Various functionalities supported include 2D and 3D SLAM, vehicle navigation, and obstacle avoidance [15]. Group 5: After-Sales and Maintenance - The product offers one year of after-sales support (excluding human damage), with free repairs for damages caused by operational errors or code modifications during the warranty period [37].
为什么做不好4D自动标注,就做不好智驾量产?
自动驾驶之心· 2025-06-25 09:48
Core Viewpoint - The article emphasizes the importance of efficient 4D data automatic annotation in the development of intelligent driving algorithms, highlighting the challenges and solutions in achieving high-quality annotations for dynamic and static elements in autonomous driving systems [2][6]. Summary by Sections 4D Data Annotation Process - The article outlines the complexity of automatic annotation for dynamic obstacles, which involves multiple modules and requires high-quality data processing to enhance 3D detection performance [2][4]. - It discusses the need for offline single-frame 3D detection results to be linked through tracking, addressing issues such as sensor occlusion and post-processing optimization [4]. Challenges in Automatic Annotation - High spatiotemporal consistency is crucial, necessitating precise tracking of dynamic targets across frames to avoid annotation breaks due to occlusions or interactions [6]. - The complexity of multi-modal data fusion is highlighted, requiring synchronization of data from various sensors like LiDAR and cameras, along with addressing coordinate alignment and semantic unification [6]. - The difficulty in generalizing dynamic scenes is noted, as unpredictable behaviors of traffic participants and environmental factors pose significant challenges to annotation models [6]. - The article points out the contradiction between annotation efficiency and cost, where high-precision 4D automatic annotation relies on manual verification, leading to long cycles and high costs [6]. Educational Course on 4D Annotation - The article promotes a course designed to address the challenges of entering the field of 4D automatic annotation, covering the entire process and core algorithms [7][8]. - The course aims to provide practical training on dynamic obstacle detection, SLAM reconstruction, static element annotation, and end-to-end truth generation [10][11][13][15]. - It emphasizes the importance of real-world applications and hands-on practice to enhance algorithm capabilities [7][22]. Course Structure and Target Audience - The course is structured into several chapters, each focusing on different aspects of 4D automatic annotation, including foundational knowledge, dynamic obstacle marking, and data closure topics [8][10][12][16]. - It is targeted at individuals with a background in deep learning and autonomous driving perception algorithms, including students, researchers, and professionals looking to transition into the field [21][23].
BEV高频面试问题汇总!(纯视觉&多模态融合算法)
自动驾驶之心· 2025-06-25 02:30
Core Viewpoint - The article discusses the rapid advancements in BEV (Bird's Eye View) perception technology, highlighting its significance in the autonomous driving industry and the various companies investing in its development [2]. Group 1: BEV Perception Technology - BEV perception has become a competitive area in visual perception, with various models like BEVDet, PETR, and InternBEV gaining traction since the introduction of BEVFormer [2]. - The technology is being integrated into production by companies such as Horizon, WeRide, XPeng, BYD, and Haomo, indicating a shift towards practical applications in autonomous driving [2]. Group 2: Technical Insights - In BEVFormer, the temporal and spatial self-attention modules utilize BEV queries, with keys and values derived from historical BEV information and image features [3]. - The grid_sample warp in BEVDet4D is explained as a method for transforming coordinates based on camera parameters and predefined BEV grids, facilitating pixel mapping from 2D images to BEV space [3]. Group 3: Algorithm and Performance - Lightweight BEV algorithms such as fast-bev and TRT versions of BEVDet and BEVDepth are noted for their deployment in vehicle systems [5]. - The physical space size corresponding to a BEV bird's eye matrix is typically around 50 meters, with pure visual solutions achieving stable performance up to this distance [6]. Group 4: Community and Collaboration - The article mentions the establishment of a knowledge-sharing platform for the autonomous driving industry, aimed at fostering technical exchanges among students and professionals from various prestigious universities and companies [8].
为什么一篇论文要耗尽整个研究生生涯?
自动驾驶之心· 2025-06-25 02:30
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 能辅导哪些会议和期刊? 收到了许多同学在论文发表上的求助,学校绕不开一篇三区论文硕士毕业,没有三篇CCF-A博士都毕不了 业,老师对这个新的方向不熟悉,开展不了工作。一直在为论文选题绞尽脑汁,实验设计总遇瓶颈,写作 逻辑混乱不清,投稿屡屡被拒! 尤其是在前沿且复杂的自动驾驶、具身智能、机器人领域,真的有点力不 从心! 一篇论文往往需要1-2年的时间筹备发出,对硕士来说,基本上贯穿了整个学术生涯。方法错误、走弯路、 无人指点是最消耗时间的!论文发表难,但也不是没有办法,有大佬带队,一年发几篇都很正常。筹备了 好久,我们服务大家的论文辅导正式推出了,面向自动驾驶/具身智能/机器人领域。 我们是谁? 国内最大的AI类技术自媒体平台,IP包含自动驾驶之心/具身智能之心/3D视觉之心等平台,拥有国内最顶 尖的学术资源。深耕 自动驾驶、具身智能、机器人 方向多年。我们深刻理解这些交叉学科的挑战与机遇, 更明白一篇高质量论文对于学生(尤其是硕博生)学业和未来发展的重要性。 我们300+专职于自动驾驶/具身智能方向的老师。来 ...
穆尧团队最新!RoboTwin 2.0:用于鲁棒双臂操作的可扩展数据基准
自动驾驶之心· 2025-06-24 12:41
Core Insights - The article discusses the development of RoboTwin 2.0, a scalable data generation framework aimed at enhancing bimanual robotic manipulation through robust domain randomization and automated expert data generation [2][6][18]. Group 1: Motivation and Challenges - Existing synthetic datasets for bimanual robotic manipulation are insufficient, facing challenges such as lack of efficient data generation methods for new tasks and overly simplified simulation environments [2][5]. - RoboTwin 2.0 addresses these challenges by providing a scalable simulation framework that supports automatic, large-scale generation of diverse and realistic data [2][6]. Group 2: Key Components of RoboTwin 2.0 - RoboTwin 2.0 integrates three key components: an automated expert data generation pipeline, comprehensive domain randomization, and entity-aware adaptation for diverse robotic platforms [6][18]. - The automated expert data generation pipeline utilizes multimodal large language models (MLLMs) and simulation feedback to iteratively optimize task execution code [10][12]. Group 3: Domain Randomization - Domain randomization is applied across five dimensions: clutter, background texture, lighting conditions, desktop height, and diverse language instructions, enhancing the robustness of strategies against environmental variability [12][13]. - The framework generates a large object library (RoboTwin-OD) with 731 instances across 147 categories, each annotated with semantic and operational labels [3][18]. Group 4: Data Collection and Benchmarking - Over 100,000 dual-arm operation trajectories were collected across 50 tasks, supporting extensive benchmarking and evaluation of robotic strategies [24][22]. - The framework allows for flexible entity configurations, ensuring compatibility with diverse hardware setups and promoting scalability for future robotic platforms [20][22]. Group 5: Experimental Analysis - Evaluations demonstrated that RoboTwin 2.0 significantly improves the success rates of robotic tasks, particularly for low-degree-of-freedom platforms, with average increases of 8.3% in task success rates [29][31]. - The framework's data enhances the generalization capabilities of models, showing substantial improvements in performance when tested in unseen scenarios [32][34].
谈薪避坑、跨行转岗?自动驾驶/具身求职,AutoRobo星球一站搞定!
自动驾驶之心· 2025-06-24 12:41
Core Viewpoint - The article emphasizes the rapid advancements in AI technologies, particularly in autonomous driving and embodied intelligence, which have significantly influenced the job market and industry dynamics [2]. Group 1: Industry Developments - Recent breakthroughs in AI technologies, especially in autonomous driving (L2 to L4 functionalities) and robotics, have led to a surge in technical routes and funding [2]. - The establishment of the AutoRobo knowledge community aims to connect job seekers in the fields of autonomous driving, embodied intelligence, and robotics, facilitating better job matching and career development [2][3]. Group 2: Community Offerings - The AutoRobo community provides a wealth of resources, including interview questions, industry reports, salary negotiation tips, and resume optimization services [3][4]. - A comprehensive collection of 100 interview questions related to autonomous driving and embodied intelligence is available for members, covering various technical aspects [6][7]. Group 3: Recruitment Information - The community regularly shares job openings in algorithms, development, and product roles, including campus recruitment, social recruitment, and internships [4]. - Members have access to a variety of interview experiences and insights from successful candidates across different companies in the industry [16]. Group 4: Industry Reports and Insights - The community compiles industry reports that provide insights into the current state, development trends, and market opportunities within the autonomous driving and embodied intelligence sectors [12][15]. - Members can learn about the specific landscape of the embodied intelligence industry, including technological routes and investment opportunities [15].
基于LSD的4D点云底图生成 - 4D标注之点云建图~
自动驾驶之心· 2025-06-24 12:41
作者 | LiangWang 编辑 | 自动驾驶之心 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>点击进入→ 自动驾驶之心 『4D标注』技术交流群 本文只做学术分享,如有侵权,联系删文 近几年随着深度学习技术的发展,基于数据驱动的算法方案在自动驾驶/机器人领域逐渐成为主流,因此算法对数据的要求也越来越大。区别于传统单帧标注,基 于高精点云地图的4D标注方案能够有效减少标注成本并提高数据真值质量。 4D标注中的4D是指三维空间+时间维度,4D数据能够映射到任意时刻得到单帧真值用于模型训练,区别于大范围高精地图生产,4D标注只关注一小片区域的静态 和动态元素。然而如何生成标注所需底图是其中的一个关键环节,针对不同的标注需求,通常需要实现"单趟建图","多躺建图"和"重定位"等关键技术,在场景上 还需要支持有GNSS的行车场景和无GNSS的泊车场景。 LSD (LiDAR SLAM & Detection) 是一个开源的面向自动驾驶/机器人的环境感知算法框架,能够完成数据采集回放、多传感器标定、SLAM建图定位和障碍物检测 等多种感知任务。 本文将详细介 ...