自动驾驶之心 - filings, earnings calls, financial reports, news

自动驾驶之心

Search documents

自动驾驶之心· 2025-06-29 07:36

星球创建的初衷是为了给自动驾驶行业提供一个技术交流平台，交流学术和工程上的问题。星球成员主要来在校本科/硕士/博士生，以及想要转行或者进阶的算法工程人员，这其中包括但不限于：清华大学、北京大学、复旦大学、德州农工、西湖大学、上海交大、上海人工智能实验室、港科大、港大、港中文、南洋理工、新加坡国立、 ETH、南京大学等等；除此之外，我们还和许多公司建立了校招/社招内推，包括小米汽车、地平线、理想汽车、小鹏、英伟达、比亚迪、华为、大疆、博世、斑马、Momenta、蔚来、百度等等业界知名公司！如果您是自动驾驶和AI公司的创始人、高管、产品经理、运营人员或者数据/高精地图相关公司，也非常欢迎加入，资源的对接与引进也是我们一直在推动的！我们坚信自动驾驶能够改变人类未来出行，想要加入该行业推动社会进步的小伙伴们，星球内部准备了基础到进阶模块，算法讲解+代码实现，轻松搞定学习！自动驾驶之心知识星球点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近15个方向学习路线 ...

自动驾驶

Artificial Intelligence

Autos

Artificial Intelligence

自动驾驶

Artificial Intelligence

Autos

Artificial Intelligence

2025年，找工作有些迷茫。。。

自动驾驶之心· 2025-06-28 13:34

点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近15个方向学习路线 AutoRobo知识星球这是一个给自动驾驶、具身智能、机器人方向同学求职交流的地方，目前近1000名成员了，成员范围包含已经工作的社招同学，如地平线、理想汽车、华为、小米汽车、momenta、元戎启行等公司。同时也包含2024年秋招、2025年秋招的小伙伴，方向涉及自动驾驶与具身智能绝大领域。这几年以自动驾驶和具身智能为主线的AI技术不断突破，撑起了近一半的技术路线和融资金额。从 L2~L4自动驾驶功能的不断量产到人形机器人完成跳舞、四足机械狗在沙漠与丛林跳跃。很幸运能够完整的经历这一发展周期，我们非常清晰行业对技术和人才的需求。星球内部有哪些内容？这一点结合我们已有的优势，给大家汇总了面试题目、面经、行业研报、谈薪技巧、还有各类内推公司、简历优化建议服务。招聘信息做了3年多的技术自媒体，在自驾、具身智能、3D视觉、机器人领域，我们沉淀了大量的内容。但后期陆续收到了许多同学关于就业的求助，谈薪、避坑、职位选择、跨行等都是大家很关注的问题。我们一直想给大家这样一个平台，让需要就业的同学能够 ...

放榜了！ICCV 2025最新汇总（自驾/具身/3D视觉/LLM/CV等）

自动驾驶之心· 2025-06-28 13:34

Core Insights - The article discusses the recent ICCV conference, highlighting the excitement around the release of various works related to autonomous driving and the advancements in the field [2]. Group 1: Autonomous Driving Innovations - DriveArena is introduced as a controllable generative simulation platform aimed at enhancing autonomous driving capabilities [4]. - Epona presents an autoregressive diffusion world model specifically designed for autonomous driving applications [4]. - SynthDrive offers a scalable Real2Sim2Real sensor simulation pipeline for high-fidelity asset generation and driving data synthesis [4]. - StableDepth focuses on scene-consistent and scale-invariant monocular depth estimation, which is crucial for improving perception in autonomous vehicles [4]. - CoopTrack explores end-to-end learning for efficient cooperative sequential perception, enhancing the collaborative capabilities of autonomous systems [4]. Group 2: Image and Vision Technologies - CycleVAR repurposes autoregressive models for unsupervised one-step image translation, which can be beneficial for visual recognition tasks in autonomous driving [5]. - CoST emphasizes efficient collaborative perception from a unified spatiotemporal perspective, which is essential for real-time decision-making in autonomous vehicles [5]. - Hi3DGen generates high-fidelity 3D geometry from images via normal bridging, improving the spatial understanding of environments for autonomous systems [5]. - GS-Occ3D focuses on scaling vision-only occupancy reconstruction for autonomous driving using Gaussian splatting techniques [5]. Group 3: Large Model Applications - ETA introduces a dual approach to self-driving with large models, enhancing the efficiency and effectiveness of autonomous driving systems [5]. - Taming the Untamed discusses graph-based knowledge retrieval and reasoning for multi-layered large models (MLLMs), which can significantly improve the decision-making processes in autonomous driving [7].

中科院&字节提出BridgeVLA！斩获CVPR 2025 workshop冠军~

自动驾驶之心· 2025-06-28 13:34

Core Viewpoint - The article discusses the introduction of BridgeVLA, a new paradigm for 3D Vision-Language Action (VLA) models that enhances data efficiency and operational effectiveness in robotic manipulation tasks [3][21]. Group 1: Introduction of BridgeVLA - BridgeVLA integrates the strengths of existing 2D and 3D models by aligning inputs and outputs in a unified 2D space, thereby bridging the gap between Vision-Language Models (VLM) and VLA [5][21]. - The model has achieved a significant success rate of 88.2% in the RLBench benchmark, outperforming all existing baseline methods [14][19]. Group 2: Pre-training and Fine-tuning - The pre-training phase involves equipping the VLM with the ability to predict 2D Heatmaps from image-target text pairs, enhancing its target detection capabilities [8][10]. - During fine-tuning, BridgeVLA predicts actions by utilizing point clouds and instruction text, aligning the input with the pre-training phase to ensure consistency [11][12]. Group 3: Experimental Results - In RLBench, BridgeVLA improved the average success rate from 81.4% to 88.2%, particularly excelling in high-precision tasks [14][15]. - The model demonstrated robust performance in the COLOSSEUM benchmark, increasing the average success rate from 56.7% to 64.0% across various perturbations [16][19]. Group 4: Real-World Testing - In real-world evaluations, BridgeVLA outperformed the leading baseline RVT-2 in six out of seven settings, showcasing its robustness against visual disturbances [18][19]. - The model's ability to retain pre-training knowledge after fine-tuning indicates its effective learning and generalization capabilities [19]. Group 5: Future Directions - Future research will explore diverse pre-training tasks to enhance the model's general visual understanding and consider integrating more expressive action decoding methods to improve strategy performance [21]. - There is a plan to address the challenges of long-horizon tasks by utilizing large language models (LLMs) for task decomposition [21].

3D VLA

Vision-Language Models

Artificial Intelligence

BridgeVLA

3D VLA

Vision-Language Models

Artificial Intelligence

BridgeVLA

何恺明CVPR 2025报告深度解读：生成模型如何迈向端到端？

自动驾驶之心· 2025-06-28 13:34

Core Viewpoint - The article discusses the evolution of generative models in deep learning, drawing parallels to the revolutionary changes brought by AlexNet in recognition models, and posits that generative models may be on the brink of a similar breakthrough with the introduction of MeanFlow, which simplifies the generation process from multiple steps to a single step [1][2][35]. Group 1: Evolution of Recognition Models - Prior to AlexNet, layer-wise training was the dominant method for training recognition models, which involved optimizing each layer individually, leading to complex and cumbersome training processes [2][3]. - The introduction of AlexNet in 2012 marked a significant shift to end-to-end training, allowing the entire network to be trained simultaneously, greatly simplifying model design and improving performance [3][7]. Group 2: Current State of Generative Models - Generative models today resemble the pre-AlexNet era of recognition models, relying on multi-step reasoning processes, such as diffusion models and autoregressive models, which raises the question of whether they are in a similar "pre-AlexNet" phase [7][9]. - The article emphasizes the need for generative models to transition from multi-step reasoning to end-to-end generation to achieve a revolutionary breakthrough [7][35]. Group 3: Relationship Between Recognition and Generation - Recognition and generation can be viewed as two sides of the same coin, with recognition being an abstract process that extracts semantic information from data, while generation is a concrete process that transforms abstract representations into realistic data samples [13][15][16]. - The fundamental difference lies in the nature of the mapping: recognition has a deterministic mapping from data to labels, while generation involves a highly nonlinear mapping from noise to complex data distributions, presenting both opportunities and challenges [18][20]. Group 4: Flow Matching and Mean Flows - Flow matching is a key exploration direction for addressing the challenges faced by generative models, aiming to construct a flow field of data distributions to facilitate generation [20][22]. - Mean Flows, a recent method introduced by Kaiming, seeks to achieve one-step generation by replacing complex integral calculations with average velocity computations, significantly enhancing generation efficiency [24][27][29]. - In experiments, Mean Flows demonstrated impressive performance on ImageNet tasks, achieving a FID score of 3.43 with a single function evaluation, outperforming traditional multi-step models [31][32]. Group 5: Future Directions and Challenges - The article outlines several future research directions, including consistency models, two-time-variable models, and revisiting normalizing flows, while questioning whether generative models are still in the "pre-AlexNet" era [33][34]. - Despite the advancements made by Mean Flows, the challenge remains to identify a truly effective formula for end-to-end generative modeling, which is an exciting and open research question [34][35].

之心急聘！25年业务合伙人招聘，量大管饱~

自动驾驶之心· 2025-06-27 09:34

Group 1 - The article discusses the recruitment of 10 outstanding partners for the "Autonomous Driving Heart" team, focusing on the development of autonomous driving-related courses, thesis guidance, and hardware development [2][3] - The main areas of expertise sought include large models/multi-modal large models, diffusion models, VLA, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, world models, closed-loop simulation 3DGS, and large model deployment and quantized perception reasoning [3] - Candidates are preferred to have a master's degree or higher from universities ranked within the QS200, with priority given to those who have significant contributions in top conferences [4] Group 2 - The company offers various benefits including resource sharing for job seeking, doctoral studies, and studying abroad recommendations, along with substantial cash incentives and opportunities for entrepreneurial project collaboration [5][6] - Interested parties are encouraged to contact the company via WeChat for consultation regarding institutional or corporate collaboration in autonomous driving [7]

自动驾驶之心· 2025-06-27 09:15

Core Viewpoint - The article discusses the challenges and opportunities faced by SenseTime's autonomous driving division, particularly in the context of market competition and technological advancements. Group 1: Leadership Changes - The head of the world model development for SenseTime's autonomous driving division has left the company, which raises concerns about the continuity of their cloud technology system and the R-UniAD development [2][3]. Group 2: Market Dynamics - 2025 is expected to be a challenging year for SenseTime's autonomous driving division, with significant upgrades in the mid-tier market, transitioning from highway NOA to full-area NOA [4]. - The mid-tier market will see algorithm upgrades that could reshape the competitive landscape, with companies that keep pace remaining viable while others may be pushed out [4]. Group 3: High-End Strategy - The focus for high-end projects this year is on a one-stage end-to-end solution, which has shown impressive performance, leading manufacturers to prioritize this in their tenders [5]. - SenseTime's UniAD one-stage end-to-end solution has partnered with Dongfeng Motor, aiming for mass production by Q4 2025, which is crucial for establishing a foothold in the high-end market [5][6]. Group 4: Competitive Positioning - SenseTime's ability to deliver a benchmark project in the high-end segment is critical for gaining credibility with manufacturers and securing further projects [6][7]. - The current window of opportunity for SenseTime in the high-end market is limited, as many manufacturers are testing high-end models this year, which could lead to a saturation of opportunities [6][8].

基于VLM的快慢双系统自动驾驶 - DriveVLM解析~

自动驾驶之心· 2025-06-27 09:15

基于此DriveVLM主要有以下几个创新点：点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近15个方向学习路线近一年来，大模型的发展突飞猛进，大模型应用于各个下游任务的工作也层出不穷，今天和为大家分享清华&理想将大模型应用在自动驾驶领域的一次尝试与探索，也是去年理想快慢双系统（E2E+VLM）的核心算法，利用大模型强大的few-shot能力，期望解决实际驾驶场景中的长尾问题，提升自动驾驶系统的认知和推理能力。 DriveVLM主要的出发点来自于目前业界自动驾驶遇到的实际困难，随着智能驾驶逐渐从 L2 往 L4 迭代，在实际场景中遇到了各种各样的长尾问题。这些长尾问题随着数据驱动的方式会逐渐收敛一些，这也是目前业界主流的思路和方法，期待通过数据驱动的方式逐渐毕竟 L4；但是大家随着研究的深入发现，真实场景中的长尾问题是无穷无尽的，只是 case by case 的数据驱动几乎无法进化到真正的 L4 无人驾驶。因此，工业界和学术界需要进一步思考自动驾驶的下一代方案。而数据集构建可以说是这篇工作最核心的内容，主要聚集自动驾驶场景关心的五个维度，下面一一展开介绍： Ch ...

如何看待目前VLA的具身智能技术？VLA还算是弱智人？

自动驾驶之心· 2025-06-27 09:15

Core Viewpoint - The article critiques the VLA (Vision-Language-Action) framework, arguing that it is fundamentally flawed and overly simplistic, primarily focusing on trivial tasks that do not reflect real-world complexities [1][18]. Group 1: VLA Framework Limitations - VLA is essentially an upgraded version of BC (Behavior Cloning) with minimal innovation, leading to misleading success rates [1][2]. - The tasks selected for VLA are overly simplistic, often limited to basic pick-and-place actions, which do not demonstrate true versatility or effectiveness [3][4]. - The framework's reliance on 2D scenarios fails to account for the 3D nature of real-world environments, limiting its applicability [10][11]. Group 2: Data and Performance Issues - VLA requires an excessive amount of data for simple tasks, undermining its efficiency and practicality [14][15]. - The success rates reported for VLA tasks are artificially inflated due to the simplicity of the tasks chosen, with claims of 100% success being misleading [5][6]. - The framework lacks clarity on its capabilities, making it difficult to determine what tasks it can perform at various stages of development [16][17]. Group 3: Overall Critique - The article argues that VLA represents a superficial approach to AI, lacking depth in understanding and modeling real-world tasks and environments [18][19]. - The author expresses frustration with the lack of meaningful progress in VLA, suggesting that it is a product of laziness and opportunism within the AI community [18][20].

具身智能技术

VLA

DeepMind Gemini Robots

具身智能技术

VLA

DeepMind Gemini Robots

数据闭环的核心 - 静态元素自动标注方案分享（车道线及静态障碍物）

自动驾驶之心· 2025-06-26 13:33

Core Viewpoint - The article emphasizes the importance of 4D automatic annotation in the autonomous driving industry, highlighting the shift from traditional 2D static element annotation to more efficient 3D scene reconstruction methods [2][3][4]. Group 1: Traditional 2D Annotation Deficiencies - Traditional 2D static element annotation is time-consuming and labor-intensive, requiring repeated work for each timestamp [2]. - The need for 3D scene reconstruction allows for static elements to be annotated only once, significantly improving efficiency [2][3]. Group 2: 4D Automatic Annotation Process - The process of 4D automatic annotation involves several steps, including converting 3D scenes to BEV views and training cloud-based models for automatic annotation [6]. - The cloud-based pipeline is distinct from the vehicle-end model, focusing on high-quality automated annotation that can be used for vehicle model training [6]. Group 3: Challenges in Automatic Annotation - Key challenges in 4D automatic annotation include high temporal consistency requirements, complex multi-modal data fusion, and the difficulty of generalizing dynamic scenes [7]. - The industry faces issues with annotation efficiency and cost, as high-precision 4D automatic annotation often requires manual verification, leading to long cycles and high costs [7]. Group 4: Course Offerings and Learning Opportunities - The article promotes a course on 4D automatic annotation, covering dynamic and static elements, OCC, and end-to-end automation processes [8][9]. - The course aims to provide a comprehensive understanding of the algorithms and practical applications in the field of autonomous driving [8][9]. Group 5: Course Structure and Target Audience - The course is structured into multiple chapters, each focusing on different aspects of 4D automatic annotation, including dynamic obstacle marking, SLAM reconstruction, and end-to-end truth generation [9][11][12][16]. - It is designed for a diverse audience, including researchers, students, and professionals looking to transition into the data loop field [22][24].