Workflow
自动驾驶之心
icon
Search documents
上岸小厂,心满意足了。。。
自动驾驶之心· 2025-07-01 04:04
Core Viewpoint - The article discusses the advancements in AI technology, particularly in autonomous driving and embodied intelligence, highlighting the saturation of the autonomous driving industry and the challenges faced by job seekers in this field [2]. Group 1: Industry Developments - The autonomous driving sector has seen significant breakthroughs, with L2 to L4 functionalities being mass-produced, alongside advancements in humanoid robots and quadrupedal robots [2]. - The industry has a clear demand for technology and talent, as evidenced by the experiences shared by job seekers [2]. Group 2: Job Seeking Platform - The introduction of AutoRobo Knowledge Community aims to assist job seekers in the fields of autonomous driving, embodied intelligence, and robotics, providing a platform for job matching and networking [2][3]. - The community currently has nearly 1,000 members, including professionals from companies like Horizon Robotics, Li Auto, Huawei, and Xiaomi [2]. Group 3: Resources and Support - The community offers a variety of resources, including interview questions, industry reports, salary negotiation tips, and resume optimization services [3][4]. - Specific interview preparation materials include a compilation of 100 questions related to autonomous driving and embodied intelligence, covering various technical aspects [6][7][11]. Group 4: Industry Reports - The community provides access to numerous industry reports that help members understand the current state, development trends, and market opportunities within the autonomous driving and embodied intelligence sectors [12][15].
WorldVLA:世界模型实现视觉-动作双向增强,抓取精度显著提升
自动驾驶之心· 2025-07-01 04:04
Core Viewpoint - WorldVLA is introduced as a self-regressive action world model that integrates action and image understanding and generation, outperforming independent action and world models through mutual enhancement [4][7][9]. Group 1: Model Definition and Components - WorldVLA combines visual, language, and action (VLA) models with a world model to predict future images based on actions and visual understanding [4][6]. - The model employs three independent tokenizers for images, text, and actions, sharing the same vocabulary for unified cross-modal understanding [7][14]. - The action model generates subsequent actions based on image observations, while the world model predicts future visual states, enhancing decision-making in action models [6][29]. Group 2: Performance and Evaluation - Experiments show that WorldVLA achieves a 4% higher success rate in grasping tasks compared to traditional action models and reduces Fréchet Video Distance (FVD) by 10% compared to standard world models [8][27]. - The attention mask strategy significantly mitigates performance degradation in action sequence generation, improving grasping success rates by 4% to 23% [8][32]. - The model's performance correlates positively with image resolution, indicating that higher resolution provides better visual information for robotic tasks [27]. Group 3: Training Strategy and Data - WorldVLA is trained using a mix of action model data and world model data, enhancing action generation through understanding of environmental physics [16][22]. - The training involves generating actions based on text instructions and image observations, while the world model predicts the next image frame based on current observations and actions [17][18]. - The loss function balances contributions from action and world model data, ensuring effective training despite the disparity in token counts [22]. Group 4: Contributions and Innovations - The introduction of the attention mask strategy allows for independent generation of actions, reducing error propagation in sequential action generation [19][20]. - WorldVLA demonstrates superior performance in generating longer video sequences compared to pure world models, highlighting the benefits of integrating action models [31]. - The model's architecture and training strategies reveal the potential for enhanced task performance through pre-training with world model data [36].
暑假打打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛正式启动~
自动驾驶之心· 2025-06-30 12:51
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key supporting technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd., which also provides technical support [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power, for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - Spatial Intelligence requires building a 3D reconstruction model based on multi-view aerial images, while Embodied Intelligence involves completing tasks in dynamic occlusion scenarios [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on PSNR and F1-Score metrics [19][20]. - For Embodied Intelligence, evaluation focuses on task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Submission and Awards - Results must be submitted in a specified format, and top-ranking teams will have their results reproduced for evaluation [24]. - Awards for each track include cash prizes and computing vouchers, with a total of 12 awards distributed among the top teams [25].
AI Day直播 | 冠军方案BridgeVLA(CVPR'25)
自动驾驶之心· 2025-06-30 12:33
Core Viewpoint - The article emphasizes the significant shift in the automotive industry towards autonomous driving technology, highlighting its potential to transform transportation and mobility solutions [1] Group 1: Industry Trends - The automotive industry is experiencing rapid advancements in autonomous driving technology, with major players investing heavily in research and development [1] - Consumer demand for safer and more efficient transportation options is driving the growth of autonomous vehicles [1] - Regulatory frameworks are evolving to accommodate the testing and deployment of autonomous driving systems, which is crucial for industry growth [1] Group 2: Company Insights - Leading automotive companies are forming strategic partnerships with technology firms to enhance their autonomous driving capabilities [1] - Investment in artificial intelligence and machine learning is critical for the development of reliable autonomous systems [1] - Companies are focusing on building robust data ecosystems to support the functionality of autonomous vehicles [1]
ICCV 2025!复旦BezierGS:利用贝塞尔曲线实现极简标注驾驶场景SOTA重建~
自动驾驶之心· 2025-06-30 12:33
Core Viewpoint - The article discusses the latest work from Fudan University on a method called BezierGS, which utilizes Bezier curves for dynamic urban scene reconstruction, crucial for developing closed-loop simulations in autonomous driving [4][5]. Group 1: Methodology and Contributions - BezierGS addresses the limitations of existing methods that rely on precise pose annotations for dynamic targets, which restricts large-scale scene reconstruction [4][7]. - The method employs learnable Bezier curves to represent the motion trajectories of dynamic targets, effectively utilizing temporal information and calibrating pose errors [4][8]. - Extensive experiments on the Waymo open dataset and nuPlan benchmark demonstrate that BezierGS outperforms state-of-the-art alternatives in both dynamic and static scene target reconstruction and novel view synthesis [4][14]. Group 2: Advantages and Future Directions - The approach aims to build a high-quality street scene for training autonomous models, reducing data collection costs and reliance on bounding box accuracy, which is often imprecise in current datasets [6]. - Future exploration will focus on creating a true autonomous driving world model, although the current work is limited to trajectory interpolation and cannot extrapolate beyond the trajectory [6]. - The introduction of additional supervision for dynamic target rendering enhances the separation and reconstruction of scene elements, leading to more accurate simulations [8][49]. Group 3: Experimental Results - The experiments conducted on the Waymo and nuPlan datasets show significant improvements in reconstruction quality, with BezierGS achieving higher PSNR and SSIM scores compared to existing methods [36][41]. - Specifically, in the Waymo dataset, BezierGS achieved a PSNR of 33.98 and an SSIM of 0.934, outperforming other methods by notable margins [36][37]. - In the nuPlan benchmark, BezierGS demonstrated a PSNR improvement of 3.04 dB and a reduction in LPIPS by 16.35%, showcasing its effectiveness in handling complex dynamic scenes [41].
紧急加薪+全员放假!OpenAI被连挖8人后,真慌了
自动驾驶之心· 2025-06-30 12:33
作者 | 量子位 来源 | 量子位 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 面对Meta疯狂挖人,OpenAI内部的变化出人意料: 本周基本停工,员工放假一周! (高管继续工作) 《连线》杂志获得了OpenAI 首席研究官Mark Chen 向员工发送的全员信,承诺将与Meta正面交锋。 多位知情人士透露OpenAI将基本停工一周,让员工有时间恢复精力。 Mark Chen表示他与奥特曼和公司其他高层正在 全天候与收到Meta offer的人沟通 。 OpenAI的反制措施还包括 重新调整薪酬 ,并探索新的方式来认可和奖励顶尖人才,但他同时也强调了一个原则:"虽然我会努力留住你们每 一个人,但 不会以牺牲对其他人的公平为代价 "。 短短几周内, Meta就从OpenAI挖走了至少八名关键研究员 ,Mark Chen表示: 我现在有一种强烈的预感,就像有人闯入我们家偷了东西一样。请相信我们并没有袖手旁观。 每周工作80小时,OpenAI正在改变 在全员信中,Ma ...
「走出新手村」十次 CV 论文会议投稿的经验总结
自动驾驶之心· 2025-06-30 12:33
Core Insights - The article provides a comprehensive guide for newcomers on how to improve the quality and acceptance rate of research papers in the field of deep learning, based on the author's personal experiences and reflections during the submission process [2][3]. Paper Production and Submission Process - The typical process for producing and submitting a deep learning paper involves generating a good idea or experimental results, expanding on them, and writing a structured paper according to the conference's requirements [3][4]. - After submission, if there are no serious issues, the paper enters the review stage, where feedback is provided by three reviewers, and authors must respond to comments, often leading to a significant number of papers being withdrawn from consideration [4][5]. Importance of Writing Quality - Writing a good paper is crucial as it serves as a vehicle for conveying ideas and can significantly impact an author's career; high-quality papers are more likely to be cited and recognized [7][8]. - The quality of a paper can reflect an author's research achievements, with a few outstanding papers often defining a scholar's career [7]. Innovation and Core Ideas - The concept of novelty is central to deep learning papers, where innovation can be measured by the impact of the problem addressed, the effectiveness of the solution, and the novelty of the methods used [10][11]. - Authors should clearly define their core ideas and potential impact when selecting topics and writing papers, ensuring that their contributions are well-articulated [11]. Writing Techniques - Effective writing in deep learning papers often follows a structured approach, where the title and abstract are critical for attracting readers and matching appropriate reviewers [13][14]. - The introduction should clearly present the importance of the problem and the proposed solution, while the experimental section should demonstrate the effectiveness of the approach [15][16]. Common Reviewer Feedback - Common negative feedback from reviewers includes perceived lack of understanding of the field, unclear contributions, and failure to respect prior work [22][24]. - Authors are encouraged to address potential issues before submission by considering common criticisms and ensuring their papers are well-structured and clearly articulated [22][24].
双非研究生,今年找工作有些迷茫。。。
自动驾驶之心· 2025-06-30 05:51
Core Viewpoint - The article emphasizes the importance of advanced skills and knowledge in the fields of autonomous driving and embodied intelligence, highlighting the need for candidates with strong backgrounds to meet industry demands. Group 1: Industry Trends - The demand for talent in autonomous driving and embodied intelligence is increasing, with a focus on cutting-edge technologies such as SLAM, ROS, and large models [3][4]. - Many companies are transitioning from traditional methods to more advanced techniques, indicating a shift in the required skill sets for job seekers [3][4]. - The article notes that while there is a saturation of talent in certain areas, the growth of startups in robotics presents new opportunities for learning and development [3][4]. Group 2: Learning and Development - The article encourages individuals to enhance their technical skills, particularly in areas related to robotics and embodied intelligence, which are seen as the forefront of technology [3][4]. - It mentions the availability of resources and community support for learning, including access to courses, hardware, and job information through platforms like Knowledge Planet [5][6]. - The community aims to create a comprehensive ecosystem for knowledge sharing and recruitment in the fields of intelligent driving and embodied intelligence [5][6]. Group 3: Technical Directions - The article outlines four major technical directions in the industry: visual large language models, world models, diffusion models, and end-to-end autonomous driving [7]. - It highlights the importance of staying updated with the latest research and developments in these areas, providing links to various resources and papers for further exploration [8][9].
下半年CCF会议“僧多粥少”?如何做到“一发入魂”?大佬早都玩明白了
自动驾驶之心· 2025-06-29 11:33
Core Viewpoint - The article emphasizes the importance of timely submission and high-quality research papers for researchers in the field of autonomous driving, highlighting the challenges faced and the solutions offered through a specialized 1v1 guidance program for AI research papers [2]. Group 1: Pain Points Addressed - The program addresses the lack of guidance for students, helping them establish a clear research framework and improve their practical skills [6]. - It assists students in developing innovative ideas and understanding the research process, ensuring their research direction is forward-looking and innovative [13]. - The program provides comprehensive support throughout the research paper process, from topic selection to submission [5][11]. Group 2: Course Content - The guidance includes assistance in the topic selection phase, where mentors help students brainstorm ideas or provide direct suggestions [5]. - During the experimental phase, mentors guide students through experimental design, model building, and validation of ideas [7]. - In the writing phase, mentors help students craft compelling research papers that stand out to reviewers [9]. - The submission phase involves recommending suitable journals and assisting with precise submissions [11]. Group 3: Course Structure and Benefits - The course is structured with a core guidance period followed by a maintenance period, with a total guidance cycle ranging from 3 to 18 months depending on the publication target [23]. - Students will learn to produce high-quality papers, master the research process, and enhance their coding and project implementation skills [22]. - The program includes personalized communication with mentors and a structured approach to addressing student queries [26].
CVPR2025 WAD纯视觉端到端 | 冠军方案技术报告~
自动驾驶之心· 2025-06-29 11:33
Core Viewpoint - The article discusses the advancements in end-to-end autonomous driving technology, highlighting the performance of the top competitor, Poutine, in a recent visual-based driving competition, emphasizing its robust training methodology and superior results [1][13]. Group 1: Technical Overview - The leading solution, Poutine, utilizes a 3B parameter Vision-Language Model (VLM) to address long-tail scenarios in visual end-to-end autonomous driving [1]. - The training process consists of two phases: - Phase one involves self-supervised pre-training using a combination of vision, language, and trajectory data, with a total of 83 hours of CoVLA data and 11 hours of Waymo long-tail dataset [2]. - Phase two focuses on fine-tuning through reinforcement learning (RL) using 500 segments of manually annotated data from the Waymo validation set to enhance robustness [2][8]. - The Poutine model achieved a Rater-Feedback Score (RFS) of 7.99 on the Waymo test set, leading the competition [2][13]. Group 2: Data and Methodology - The datasets used include CoVLA, which contains 10,000 front-view images and 30 seconds of driving video, and WOD-E2E, which provides 4,021 long-tail driving scenarios with trajectory information [11]. - The evaluation metric, RFS, is calculated based on the proximity of predicted trajectories to expert-rated trajectories, with a scoring range of 0 to 10 [11]. - The training details include a batch size of 64 and a learning rate of 1e-5 for the CoVLA dataset, while the WOD-E2E dataset used a batch size of 16 with similar training parameters [11]. Group 3: Results and Analysis - Poutine's performance significantly outperformed other models, with a notable score of 7.99, while the second-best model scored 7.91, indicating a substantial lead [13]. - The article notes that while the addition of RL did not drastically improve scores, it effectively addressed challenging scenarios [13]. - The results suggest that the combination of VLM and RL training enhances the model's ability to handle complex driving environments [18]. Group 4: Future Considerations - The article raises questions about the mainstream applicability of VLM and LLM in trajectory prediction, particularly regarding their understanding of the physical world and 3D trajectory information [19]. - It suggests that for conventional evaluation datasets, the advantages of such models may not be as pronounced, indicating a need for further exploration [19]. - The potential integration of action models with VLM for trajectory prediction is proposed as a more comprehensive approach [19].