Workflow
自动驾驶之心
icon
Search documents
AI Day直播 | 冠军方案BridgeVLA(CVPR'25)
自动驾驶之心· 2025-06-30 12:33
Core Viewpoint - The article emphasizes the significant shift in the automotive industry towards autonomous driving technology, highlighting its potential to transform transportation and mobility solutions [1] Group 1: Industry Trends - The automotive industry is experiencing rapid advancements in autonomous driving technology, with major players investing heavily in research and development [1] - Consumer demand for safer and more efficient transportation options is driving the growth of autonomous vehicles [1] - Regulatory frameworks are evolving to accommodate the testing and deployment of autonomous driving systems, which is crucial for industry growth [1] Group 2: Company Insights - Leading automotive companies are forming strategic partnerships with technology firms to enhance their autonomous driving capabilities [1] - Investment in artificial intelligence and machine learning is critical for the development of reliable autonomous systems [1] - Companies are focusing on building robust data ecosystems to support the functionality of autonomous vehicles [1]
ICCV 2025!复旦BezierGS:利用贝塞尔曲线实现极简标注驾驶场景SOTA重建~
自动驾驶之心· 2025-06-30 12:33
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 复旦大学ICCV2025中稿的最新工作! BezierGS:基于贝塞尔曲线高斯泼溅的动态城市场景重建! 如 果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Zipei Ma等 编辑 | 自动驾驶之心 1. 构建一个高质量街景世界,供自驾模型在其中训练、探索,减少数据采集的成本; 2. 减少对bounding box精确性的依赖,目前业界以及开源自驾数据集采集的准确性不是很高,bounding box的标注不精确; 3. 这篇是对自驾世界的学习与探索,未来会探索一个真正的自驾世界模型,该工作只能实现轨迹内插,无法轨迹外插。 论文链接:https://arxiv.org/abs/2506.22099 代码代码:https://github.com/fudan-zvg/BezierGS 随着需要实时传感器反馈的端到端自动驾驶系统的兴起,现 ...
紧急加薪+全员放假!OpenAI被连挖8人后,真慌了
自动驾驶之心· 2025-06-30 12:33
作者 | 量子位 来源 | 量子位 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 面对Meta疯狂挖人,OpenAI内部的变化出人意料: 本周基本停工,员工放假一周! (高管继续工作) 《连线》杂志获得了OpenAI 首席研究官Mark Chen 向员工发送的全员信,承诺将与Meta正面交锋。 多位知情人士透露OpenAI将基本停工一周,让员工有时间恢复精力。 Mark Chen表示他与奥特曼和公司其他高层正在 全天候与收到Meta offer的人沟通 。 OpenAI的反制措施还包括 重新调整薪酬 ,并探索新的方式来认可和奖励顶尖人才,但他同时也强调了一个原则:"虽然我会努力留住你们每 一个人,但 不会以牺牲对其他人的公平为代价 "。 短短几周内, Meta就从OpenAI挖走了至少八名关键研究员 ,Mark Chen表示: 我现在有一种强烈的预感,就像有人闯入我们家偷了东西一样。请相信我们并没有袖手旁观。 每周工作80小时,OpenAI正在改变 在全员信中,Ma ...
「走出新手村」十次 CV 论文会议投稿的经验总结
自动驾驶之心· 2025-06-30 12:33
Core Insights - The article provides a comprehensive guide for newcomers on how to improve the quality and acceptance rate of research papers in the field of deep learning, based on the author's personal experiences and reflections during the submission process [2][3]. Paper Production and Submission Process - The typical process for producing and submitting a deep learning paper involves generating a good idea or experimental results, expanding on them, and writing a structured paper according to the conference's requirements [3][4]. - After submission, if there are no serious issues, the paper enters the review stage, where feedback is provided by three reviewers, and authors must respond to comments, often leading to a significant number of papers being withdrawn from consideration [4][5]. Importance of Writing Quality - Writing a good paper is crucial as it serves as a vehicle for conveying ideas and can significantly impact an author's career; high-quality papers are more likely to be cited and recognized [7][8]. - The quality of a paper can reflect an author's research achievements, with a few outstanding papers often defining a scholar's career [7]. Innovation and Core Ideas - The concept of novelty is central to deep learning papers, where innovation can be measured by the impact of the problem addressed, the effectiveness of the solution, and the novelty of the methods used [10][11]. - Authors should clearly define their core ideas and potential impact when selecting topics and writing papers, ensuring that their contributions are well-articulated [11]. Writing Techniques - Effective writing in deep learning papers often follows a structured approach, where the title and abstract are critical for attracting readers and matching appropriate reviewers [13][14]. - The introduction should clearly present the importance of the problem and the proposed solution, while the experimental section should demonstrate the effectiveness of the approach [15][16]. Common Reviewer Feedback - Common negative feedback from reviewers includes perceived lack of understanding of the field, unclear contributions, and failure to respect prior work [22][24]. - Authors are encouraged to address potential issues before submission by considering common criticisms and ensuring their papers are well-structured and clearly articulated [22][24].
双非研究生,今年找工作有些迷茫。。。
自动驾驶之心· 2025-06-30 05:51
Core Viewpoint - The article emphasizes the importance of advanced skills and knowledge in the fields of autonomous driving and embodied intelligence, highlighting the need for candidates with strong backgrounds to meet industry demands. Group 1: Industry Trends - The demand for talent in autonomous driving and embodied intelligence is increasing, with a focus on cutting-edge technologies such as SLAM, ROS, and large models [3][4]. - Many companies are transitioning from traditional methods to more advanced techniques, indicating a shift in the required skill sets for job seekers [3][4]. - The article notes that while there is a saturation of talent in certain areas, the growth of startups in robotics presents new opportunities for learning and development [3][4]. Group 2: Learning and Development - The article encourages individuals to enhance their technical skills, particularly in areas related to robotics and embodied intelligence, which are seen as the forefront of technology [3][4]. - It mentions the availability of resources and community support for learning, including access to courses, hardware, and job information through platforms like Knowledge Planet [5][6]. - The community aims to create a comprehensive ecosystem for knowledge sharing and recruitment in the fields of intelligent driving and embodied intelligence [5][6]. Group 3: Technical Directions - The article outlines four major technical directions in the industry: visual large language models, world models, diffusion models, and end-to-end autonomous driving [7]. - It highlights the importance of staying updated with the latest research and developments in these areas, providing links to various resources and papers for further exploration [8][9].
下半年CCF会议“僧多粥少”?如何做到“一发入魂”?大佬早都玩明白了
自动驾驶之心· 2025-06-29 11:33
Core Viewpoint - The article emphasizes the importance of timely submission and high-quality research papers for researchers in the field of autonomous driving, highlighting the challenges faced and the solutions offered through a specialized 1v1 guidance program for AI research papers [2]. Group 1: Pain Points Addressed - The program addresses the lack of guidance for students, helping them establish a clear research framework and improve their practical skills [6]. - It assists students in developing innovative ideas and understanding the research process, ensuring their research direction is forward-looking and innovative [13]. - The program provides comprehensive support throughout the research paper process, from topic selection to submission [5][11]. Group 2: Course Content - The guidance includes assistance in the topic selection phase, where mentors help students brainstorm ideas or provide direct suggestions [5]. - During the experimental phase, mentors guide students through experimental design, model building, and validation of ideas [7]. - In the writing phase, mentors help students craft compelling research papers that stand out to reviewers [9]. - The submission phase involves recommending suitable journals and assisting with precise submissions [11]. Group 3: Course Structure and Benefits - The course is structured with a core guidance period followed by a maintenance period, with a total guidance cycle ranging from 3 to 18 months depending on the publication target [23]. - Students will learn to produce high-quality papers, master the research process, and enhance their coding and project implementation skills [22]. - The program includes personalized communication with mentors and a structured approach to addressing student queries [26].
CVPR2025 WAD纯视觉端到端 | 冠军方案技术报告~
自动驾驶之心· 2025-06-29 11:33
Core Viewpoint - The article discusses the advancements in end-to-end autonomous driving technology, highlighting the performance of the top competitor, Poutine, in a recent visual-based driving competition, emphasizing its robust training methodology and superior results [1][13]. Group 1: Technical Overview - The leading solution, Poutine, utilizes a 3B parameter Vision-Language Model (VLM) to address long-tail scenarios in visual end-to-end autonomous driving [1]. - The training process consists of two phases: - Phase one involves self-supervised pre-training using a combination of vision, language, and trajectory data, with a total of 83 hours of CoVLA data and 11 hours of Waymo long-tail dataset [2]. - Phase two focuses on fine-tuning through reinforcement learning (RL) using 500 segments of manually annotated data from the Waymo validation set to enhance robustness [2][8]. - The Poutine model achieved a Rater-Feedback Score (RFS) of 7.99 on the Waymo test set, leading the competition [2][13]. Group 2: Data and Methodology - The datasets used include CoVLA, which contains 10,000 front-view images and 30 seconds of driving video, and WOD-E2E, which provides 4,021 long-tail driving scenarios with trajectory information [11]. - The evaluation metric, RFS, is calculated based on the proximity of predicted trajectories to expert-rated trajectories, with a scoring range of 0 to 10 [11]. - The training details include a batch size of 64 and a learning rate of 1e-5 for the CoVLA dataset, while the WOD-E2E dataset used a batch size of 16 with similar training parameters [11]. Group 3: Results and Analysis - Poutine's performance significantly outperformed other models, with a notable score of 7.99, while the second-best model scored 7.91, indicating a substantial lead [13]. - The article notes that while the addition of RL did not drastically improve scores, it effectively addressed challenging scenarios [13]. - The results suggest that the combination of VLM and RL training enhances the model's ability to handle complex driving environments [18]. Group 4: Future Considerations - The article raises questions about the mainstream applicability of VLM and LLM in trajectory prediction, particularly regarding their understanding of the physical world and 3D trajectory information [19]. - It suggests that for conventional evaluation datasets, the advantages of such models may not be as pronounced, indicating a need for further exploration [19]. - The potential integration of action models with VLM for trajectory prediction is proposed as a more comprehensive approach [19].
新国立×上交发布RoboCerebra:长时序机器人操作推理的全新评测基准
自动驾驶之心· 2025-06-29 11:33
Core Insights - The article discusses the development of RoboCerebra, a new benchmark designed to evaluate long-horizon robotic manipulation tasks, emphasizing the need for collaboration between high-level planning (VLM) and low-level control (VLA) models [6][8][10]. Group 1: Background and Motivation - Recent advancements in visual-language models (VLM) have enabled robots to execute commands based on visual inputs, but challenges arise when tasks become more complex, requiring long-term planning and memory management [6][7]. - Existing benchmarks often fail to assess the collaborative capabilities of VLM and VLA, leading to performance issues in dynamic environments [8]. Group 2: RoboCerebra Contributions - RoboCerebra includes a large-scale dataset and a systematic benchmark for evaluating cognitive challenges related to planning, memory, and reflection in robotic tasks [10]. - The dataset construction process integrates automated generation and manual annotation to ensure high quality and scalability [10]. Group 3: Task Setting - The benchmark features long task sequences averaging 2,972 steps, with dynamic disturbances introduced to challenge the models' planning and recovery abilities [11]. - A top-down data generation pipeline utilizes GPT to create high-level tasks, which are then broken down into sub-goals and validated for logical consistency and physical feasibility [11][13]. Group 4: Evaluation Protocol and Metrics - RoboCerebra employs a four-dimensional evaluation framework assessing success rate, plan match accuracy, plan efficiency, and action completion accuracy to measure the collaboration between VLM and VLA [15][21]. - The framework includes anchor points to synchronize evaluations across different models, ensuring consistency in task execution [21]. Group 5: Experimental Results - The hierarchical planning and execution framework significantly improves task success rates, particularly in memory execution scenarios, demonstrating the necessity of collaboration between VLM and VLA [27]. - The results indicate that using either the VLA or VLM alone is insufficient for stable performance in complex tasks, highlighting the importance of their integration [27][28]. Group 6: Memory Task Evaluation - The evaluation of memory tasks shows that the VLM's reasoning capabilities are crucial for both memory exploration and execution, with GPT-4o outperforming other models in exploration success rates and decision accuracy [31][32].
大会预告!无人驾驶专用车技术与产业发展大会
自动驾驶之心· 2025-06-29 11:33
承办单位 中 国 汽 车 工 程 学 会 CHINA 汽车智能交通分会 办单位 3主智能无人系统科学中心 Shanghai Research Institute for Intelligent Autonomous Systems 同济大学汽车学院 由中国汽车工程学会指导,汽车智能交通分会主办,上海自主智能无人系统科学中心与同济大学汽车学院合办的无人驾驶专用车技术与产业发展大会拟于2025 年10月23日~24日在重庆召开。具体日程内容如下,欢迎行业各界同仁参会交流,如有意向合作或参会请随时联系我们。 指导車位 中国汽车工程学会 主办单位 无人驾驶专用车标准研究工作组 同济汽车设计研究院有限公司 ...
当下自动驾驶的技术发展,重建还有哪些应用?
自动驾驶之心· 2025-06-29 08:19
Core Viewpoint - The article discusses the evolving landscape of 4D annotation in autonomous driving, emphasizing the shift from traditional SLAM techniques to more advanced methods for static element reconstruction and automatic labeling [1][4]. Group 1: Purpose and Applications of Reconstruction - The primary purposes of reconstruction are to create 3D maps from lidar or multiple cameras and to output vector lane lines and categories [5][6]. - The application of 4D annotation in static elements remains broad, with a focus on lane markings and static obstacles, which require 2D spatial annotations at each timestamp [1][6]. Group 2: Challenges in Automatic Annotation - The challenges in 4D automatic annotation include high temporal consistency requirements, complex multi-modal data fusion, difficulties in generalizing dynamic scenes, conflicts between annotation efficiency and cost, and high demands for scene generalization in production [8][9]. - These challenges hinder the iterative efficiency of data loops in autonomous driving, impacting the system's generalization capabilities and safety [8]. Group 3: Course Structure and Content - The course on 4D automatic annotation covers a comprehensive curriculum, including dynamic obstacle detection, SLAM reconstruction principles, static element annotation based on reconstruction graphs, and the end-to-end truth generation process [9][10][17]. - Each chapter includes practical exercises to enhance understanding and application of the algorithms discussed [9][10]. Group 4: Instructor and Target Audience - The course is led by an industry expert with extensive experience in multi-modal 3D perception and data loop algorithms, having participated in multiple production delivery projects [21]. - The target audience includes researchers, students, and professionals looking to transition into the data loop field, requiring a foundational understanding of deep learning and autonomous driving perception algorithms [24][25].