自动驾驶之心

Search documents
清华大学最新综述!当下智能驾驶中多传感器融合如何发展?
自动驾驶之心· 2025-06-26 12:56
Group 1: Importance of Embodied AI and Multi-Sensor Fusion Perception - Embodied AI is a crucial direction in AI development, enabling autonomous decision-making and action through real-time perception in dynamic environments, with applications in autonomous driving and robotics [2][3] - Multi-sensor fusion perception (MSFP) is essential for robust perception and accurate decision-making in embodied AI, integrating data from various sensors like cameras, LiDAR, and radar to achieve comprehensive environmental awareness [2][3] Group 2: Limitations of Current Research - Existing AI-based MSFP methods have shown success in fields like autonomous driving but face inherent challenges in embodied AI, such as the heterogeneity of cross-modal data and temporal asynchrony between different sensors [3][4] - Current reviews on MSFP often focus on single tasks or research areas, limiting their applicability to researchers in related fields [4] Group 3: Overview of MSFP Research - The paper discusses the background of MSFP, including various perception tasks, sensor data types, popular datasets, and evaluation standards [5] - It reviews multi-modal fusion methods at different levels, including point-level, voxel-level, region-level, and multi-level fusion [5] Group 4: Sensor Data and Datasets - Various sensor data types are critical for perception tasks, including camera data, LiDAR data, and radar data, each with unique advantages and limitations [7][10] - The paper presents several datasets used in MSFP research, such as KITTI, nuScenes, and Waymo Open, detailing their characteristics and the types of data they provide [12][13][14] Group 5: Perception Tasks - Key perception tasks include object detection, semantic segmentation, depth estimation, and occupancy prediction, each contributing to the overall understanding of the environment [16][17] Group 6: Multi-Modal Fusion Methods - Multi-modal fusion methods are categorized into point-level, voxel-level, region-level, and multi-level fusion, each with specific techniques to enhance perception robustness [20][21][22][27] Group 7: Multi-Agent Fusion Methods - Collaborative perception techniques integrate data from multiple agents and infrastructure, addressing challenges like occlusion and sensor failures in complex environments [32][34] Group 8: Time Series Fusion - Time series fusion is a key component of MSFP systems, enhancing perception continuity across time and space, with methods categorized into dense, sparse, and hybrid queries [40][41] Group 9: Multi-Modal Large Language Model (MM-LLM) Fusion - MM-LLM fusion combines visual and textual data for complex tasks, with various methods designed to enhance the integration of perception, reasoning, and planning capabilities [53][54][57][59]
等了十年,特斯拉Robotaxi终于上线!马斯克:仅需4.2美元一口价
自动驾驶之心· 2025-06-26 12:56
Core Viewpoint - Tesla has officially launched its Robotaxi service in Austin, Texas, fulfilling Elon Musk's long-standing promise of autonomous ride-hailing services, although the service is still in a limited trial phase and not fully open to the public [1][2][8]. Summary by Sections Launch and Pricing - The initial fare for the Robotaxi service is set at a fixed price of $4.20, with the option for passengers to leave tips [2][4]. - The service is currently limited to invited users, primarily supporters and influencers, raising questions about the objectivity of initial feedback [8]. Operational Details - Approximately 10 to 20 Model Y vehicles marked as "Robotaxi" are being used for this limited trial [8]. - The operational area is strictly confined to a mapped geofenced region, with specific boundaries defined [8]. - The service operates daily from 6 AM to midnight, avoiding complex scenarios such as bad weather and highways [8]. User Experience - Initial feedback indicates that the Robotaxi rides are generally smooth and capable of handling typical urban driving situations, maintaining speeds below 40 miles per hour [17][18]. - The app interface is similar to ride-hailing services like Uber, featuring a "start trip" button and music app integration [19]. - However, some users reported issues such as slow app notifications and unclear pickup point locations, indicating room for improvement in user experience [22]. Safety Measures - The current version of Robotaxi is not fully autonomous; it includes a safety monitor in the passenger seat to take control in emergencies [14]. - In certain situations, remote operators may also intervene, with an average response time of about two minutes [20]. - Tesla plans to enhance identity verification processes if the safety monitor is removed in the future [15]. Future Plans and Competition - Tesla aims to expand the Robotaxi fleet to thousands of vehicles within months and plans to extend the service to California and other regions with stricter regulations [25]. - In contrast, competitors like Waymo are already operating over 1,500 autonomous vehicles in multiple cities and plan to increase their fleet to 2,000 by 2026 [25][26].
自动驾驶之『多模态大模型』交流群成立了!
自动驾驶之心· 2025-06-26 12:56
自动驾驶之心是国内领先的技术交流平台,关注自动驾驶前沿技术与行业、职场成长等。如果您的方向是 具身智能、视觉大语言模型、世界模型、端到端自动驾驶、扩散模型、车道线检测、2D/3D目标跟踪、 2D/3D目标检测、BEV感知、多模态感知、Occupancy、多传感器融合、transformer、大模型、点云处 理、在线地图、SLAM、光流估计、深度估计、轨迹预测、高精地图、NeRF、Gaussian Splatting、规划控 制、模型部署落地、自动驾驶仿真测试、产品经理、硬件配置、AI求职交流 等,欢迎加入自动驾驶之心大 家庭,一起讨论交流! 添加小助理微信加群 备注公司/学校+昵称+研究方向 ...
硕士毕业论文写不出来了怎么办?
自动驾驶之心· 2025-06-26 12:56
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 收到了许多同学在论文发表上的求助,学校绕不开一篇三区论文硕士毕业,没有三篇CCF-A博士都毕不了 业,老师对这个新的方向不熟悉,开展不了工作。一直在为论文选题绞尽脑汁,实验设计总遇瓶颈,写作 逻辑混乱不清,投稿屡屡被拒! 尤其是在前沿且复杂的自动驾驶、具身智能、机器人领域,真的有点力不 从心! 一篇论文往往需要1-2年的时间筹备发出,对硕士来说,基本上贯穿了整个学术生涯。方法错误、走弯路、 无人指点是最消耗时间的!论文发表难,但也不是没有办法,有大佬带队,一年发几篇都很正常。筹备了 好久,我们服务大家的论文辅导正式推出了,面向自动驾驶/具身智能/机器人领域。 我们是谁? 国内最大的AI类技术自媒体平台,IP包含自动驾驶之心/具身智能之心/3D视觉之心等平台,拥有国内最顶 尖的学术资源。深耕 自动驾驶、具身智能、机器人 方向多年。我们深刻理解这些交叉学科的挑战与机遇, 更明白一篇高质量论文对于学生(尤其是硕博生)学业和未来发展的重要性。 我们300+专职于自动驾驶/具身智能方向的老师。来自于全球QS排名前100 ...
最近,一些自驾公司疯狂往一线『输送』人才。。。
自动驾驶之心· 2025-06-26 12:56
Core Viewpoint - The article discusses the current challenges in the autonomous driving industry, including layoffs and the shifting of roles from research and development to sales, indicating a significant pressure on revenue and the need for companies to adapt to market demands [2][3][4]. Group 1: Industry Challenges - Recent layoffs in the autonomous driving sector have affected not only existing employees but also recent graduates, highlighting the industry's struggle with revenue generation [2][4]. - Companies are increasingly moving employees from R&D roles to frontline sales positions as a strategy to cope with financial pressures, suggesting that sales roles are now prioritized for revenue generation [3][4]. - The article emphasizes that the pressure on sales performance is leading to a reevaluation of workforce allocation, with many companies facing the risk of further layoffs if sales targets are not met [3][4]. Group 2: Recommendations for Professionals - For those facing layoffs, it is advised to refine resumes and consider learning new technical skills, as the job market may become competitive with many individuals seeking new positions simultaneously [5][6]. - Individuals who are transitioned to sales roles are cautioned against fully committing to these positions, as it may limit their future opportunities in more technical roles, particularly in algorithm development [7]. - The article encourages professionals to use this period as a time for reflection and preparation for future job opportunities, suggesting that networking and skill development are crucial during this transitional phase [6][7]. Group 3: Community and Resources - The article promotes a community platform that offers resources for learning and job opportunities in the autonomous driving field, aiming to build a network of professionals and share industry insights [8]. - It highlights the availability of comprehensive learning materials, including courses and recruitment information, to support individuals in navigating their careers in the evolving landscape of autonomous driving [8].
刚刚,何恺明官宣新动向~
自动驾驶之心· 2025-06-26 10:41
Core Viewpoint - The article highlights the significant impact of Kaiming He joining Google DeepMind as a distinguished scientist, emphasizing his dual role in academia and industry, which is expected to accelerate the development of Artificial General Intelligence (AGI) at DeepMind [1][5][8]. Group 1: Kaiming He's Background and Achievements - Kaiming He is renowned for his contributions to computer vision and deep learning, particularly for introducing ResNet, which has fundamentally transformed deep learning [4][18]. - He has held prestigious positions, including being a research scientist at Microsoft Research Asia and Meta's FAIR, focusing on deep learning and computer vision [12][32]. - His academic credentials include a tenure as a lifelong associate professor at MIT, where he has published influential papers with over 713,370 citations [18][19]. Group 2: Impact on Google DeepMind - Kaiming He's expertise in computer vision and deep learning is expected to enhance DeepMind's capabilities, particularly in achieving AGI within the next 5-10 years, as stated by Demis Hassabis [7][8]. - His arrival is seen as a significant boost for DeepMind, potentially accelerating the development of advanced AI models [5][39]. Group 3: Research Contributions - Kaiming He has published several highly cited papers, including works on Faster R-CNN and Mask R-CNN, which are among the most referenced in their fields [21][24]. - His recent research includes innovative concepts such as fractal generative models and efficient one-step generative modeling frameworks, showcasing his continuous contribution to advancing AI technology [36][38].
重磅分享!A0:首个基于空间可供性感知的通用机器人分层模型
自动驾驶之心· 2025-06-26 10:41
点击下方 卡片 ,关注" 具身智能之心 "公众号 >>直播和内容获取转到 → 具身智能之心知识星球 由无界智慧(Spatialtemporal AI)团队推出的A0模型,是首个基于空间可供性感知的通用机器人分层扩散 模型,通过具身无关的可供性表征 (Embodiment-Agnostic Affordance Representation) 实现了跨平台的通 用操作能力,模型框架和代码等已经开源。 论文链接:https://arxiv.org/abs/2504.12636 项目主页:https://a-embodied.github.io/A0/ 机器人操作面临的核心挑战 在机器人技术快速发展的今天,通用化操作能力始终是制约行业发展的关键瓶颈。想象一下,当你让机器 人"擦干净白板"时,它需要准确理解应该在何处施力("where"),以及如何移动抹布("how")。这正是 当前机器人操作面临的核心挑战——空间可供性感知理解不足。 现有方法主要分为两类:基于模块化的方法和端到端的视觉-语言-动作(VLA)大模型。前者虽然能利用视 觉基础模型进行空间理解,但对物体可供性的捕捉有限;后者虽能直接生成动作,却缺乏对空间 ...
正在筹划一个万人的自动驾驶&具身技术社区~
自动驾驶之心· 2025-06-25 09:54
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving and embodied intelligence, aiming to gather industry professionals and facilitate rapid problem-solving and knowledge sharing within the sector [2][4]. Group 1: Community Development - The goal is to create a community of 10,000 members focused on intelligent driving and embodied intelligence within three years, welcoming contributions from talented individuals [2]. - The community will serve as a bridge connecting academia, products, and recruitment, forming a closed loop in teaching and research [2][4]. - The community will provide the latest industry technology updates, technical discussions, and job sharing opportunities [2][3]. Group 2: Knowledge Sharing and Resources - The "Autonomous Driving Heart Knowledge Planet" is designed as a technical exchange platform for academic and engineering issues, attracting students and professionals from top universities and companies [4][11]. - The community has established connections with numerous companies for recruitment, including Xiaomi, Horizon, and NIO, facilitating direct resume submissions [4][11]. - Members will have access to a variety of learning modules, from basic to advanced, covering algorithm explanations and code implementations [4][11]. Group 3: Technical Focus Areas - By 2025, the focus will be on advanced technology areas such as visual large language models (VLM), end-to-end trajectory prediction, and 3D generative simulation [6][10]. - The community has developed over 30 learning pathways covering various subfields of autonomous driving, including perception, mapping, and AI model deployment [11][16]. - Regular live sessions will feature top researchers and industry experts discussing practical applications and research advancements in autonomous driving [18][19]. Group 4: Engagement and Interaction - The community encourages active participation, with weekly engagement metrics ranking among the top 20 in the country, fostering a collaborative learning environment [12]. - Members can freely ask questions and engage in discussions, enhancing their learning experience and networking opportunities [11][12]. - The platform offers exclusive rights to members, including access to academic advancements, expert Q&A, and discounts on paid courses [14].
SOTA端到端算法如何设计?CVPR'25 WOD纯视觉端到端比赛Top3技术分享~
自动驾驶之心· 2025-06-25 09:54
Core Insights - The article discusses the results of the 2025 Waymo Open Dataset End-to-End Driving Challenge, highlighting the advancements in end-to-end autonomous driving systems and the shift towards using large-scale public datasets for training models [2][18]. Group 1: Competition Results - The champion of the competition was the EPFL team, which utilized the DiffusionDrive model, nuPlan data, and an ensembling strategy [1]. - The runner-up was a collaboration between Nvidia and Tubingen teams, which also referenced DiffusionDrive and SmartRefine, employing multiple datasets to demonstrate the importance of training data quality [1][22]. - The third place was secured by Hanyang University from South Korea, which focused on a simplified structure using only front-view input and vehicle state [1][3]. Group 2: Methodology - The UniPlan framework was introduced, leveraging large-scale public driving datasets to enhance generalization in rare long-tail scenarios, achieving competitive results without relying on expensive multimodal large language models [3][18]. - The model architecture is based on DiffusionDrive, which employs a truncated diffusion strategy for efficient and diverse trajectory generation [4][6]. - The diffusion decoder utilizes a cross-attention mechanism to refine trajectory predictions based on scene context [5][6]. Group 3: Data Processing - The nuPlan dataset was processed to create a diverse training set, resulting in 90,000 samples by applying a sliding window approach [7]. - A similar filtering strategy was used for the WOD-E2E dataset, generating 35,000 training samples and 10,000 validation samples [8]. - The model was trained on a powerful computing setup with four H100 GPUs, achieving significant training efficiency [10]. Group 4: Experimental Results - The performance was evaluated using Rater Feedback Score (RFS) and Average Displacement Error (ADE), with various configurations tested [12][17]. - The results indicated that the combined training of WOD-E2E and nuPlan datasets led to slight improvements in average RFS, particularly in long-tail categories [23]. - The analysis showed that while additional datasets generally provide benefits, the quality of the data sources is more critical than quantity [39]. Group 5: Conclusion - The article emphasizes the potential of data-centric approaches in enhancing the robustness of autonomous driving systems, as demonstrated by the competitive results achieved with the UniPlan framework [18][39].
RoboSense 2025机器感知挑战赛正式启动!自动驾驶&具身方向~
自动驾驶之心· 2025-06-25 09:54
Core Viewpoint - The RoboSense Challenge 2025 aims to systematically evaluate the perception and understanding capabilities of robots in real-world scenarios, addressing key challenges in stability, robustness, and generalization of perception systems [2][43]. Group 1: Challenge Overview - The challenge consists of five major tracks focusing on real-world tasks, including language-driven autonomous driving, social navigation, sensor placement optimization, cross-modal drone navigation, and cross-platform 3D object detection [8][9][29][35]. - The event is co-hosted by several prestigious institutions and will be officially recognized at the IROS 2025 conference in Hangzhou, China [5][43]. Group 2: Task Details - **Language-Driven Autonomous Driving**: This track evaluates the ability of robots to understand and act upon natural language commands, aiming for a deep coupling of language, perception, and planning [10][11]. - **Social Navigation**: Focuses on robots navigating shared spaces with humans, emphasizing social compliance and safety [17][18]. - **Sensor Placement Optimization**: Assesses the robustness of perception models under various sensor configurations, crucial for reliable deployment in autonomous systems [23][24]. - **Cross-Modal Drone Navigation**: Involves training models to retrieve aerial images based on natural language descriptions, enhancing the efficiency of urban inspections and disaster responses [29][30]. - **Cross-Platform 3D Object Detection**: Aims to develop models that maintain high performance across different robotic platforms without extensive retraining [35][36]. Group 3: Evaluation and Performance Metrics - Each task includes specific performance metrics and baseline models, with detailed requirements for training and evaluation [16][21][28][42]. - The challenge encourages innovative solutions and provides a prize pool of up to $10,000, shared across the five tracks [42]. Group 4: Timeline and Participation - The challenge will officially start on June 15, 2025, with key deadlines for submissions and evaluations leading up to the award ceremony on October 19, 2025 [4][42]. - Participants are encouraged to engage in this global initiative to advance robotic perception technologies [43].