Workflow
自动驾驶之心
icon
Search documents
为什么定义2000 TOPS + VLA + VLM为L3 级算力?
自动驾驶之心· 2025-06-20 14:06
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on Xiaopeng Motors' recent paper presented at CVPR 2025, which validates the scaling laws in the context of autonomous driving and introduces new standards for computing power in Level 3 (L3) autonomous vehicles [4][6][22]. Group 1: Scaling Laws and Model Performance - Xiaopeng Motors' paper systematically verifies the effectiveness of scaling laws in autonomous driving, indicating that larger model parameters lead to improved performance [4][6]. - The research establishes a clear power-law relationship between model performance, parameter scale, data scale, and computational power, originally proposed by OpenAI [4][6]. Group 2: Computing Power Standards - The paper introduces a new computing power standard of 2000 TOPS for L3 autonomous driving, highlighting the exponential increase in computational requirements as the driving level advances [8][20]. - For L2 systems, the required computing power ranges from 80 to 300 TOPS, while L3 systems necessitate thousands of TOPS due to the complexity of urban driving scenarios [8][20]. Group 3: VLA and VLM Model Architecture - Xiaopeng's VLA (Vision-Language-Action) model architecture integrates visual understanding, reasoning, and action generation capabilities, requiring substantial computational resources [10][12]. - The architecture's visual processing module alone demands hundreds of TOPS for real-time data fusion from multiple sensors [10][12]. Group 4: Comparison of Onboard and Data Center Computing Power - The article differentiates between onboard computing power, which focuses on real-time data processing for driving decisions, and data center computing power, which is used for offline training and model optimization [12][15]. - Onboard systems must balance real-time performance and power consumption, while data centers can leverage significantly higher computational capabilities for complex model training [12][15]. Group 5: Market Dynamics and Competitive Landscape - The market for AI chips in autonomous driving is dominated by a few key players, with NVIDIA holding a 36% market share, followed by Tesla and Huawei [20]. - The competitive landscape has shifted significantly since 2020, impacting the development of AI chips and their applications in autonomous driving [17][20].
[大模型实践] 卡比人贵时代的深度学习经验
自动驾驶之心· 2025-06-20 14:06
Core Viewpoint - The article emphasizes the importance of developing new methodologies for large model experiments, focusing on key indicators, identifying true bottlenecks, balancing large and small experiments, and enhancing team collaboration [1]. Group 1: Key Indicators - Identifying key indicators is crucial as they should clearly differentiate between state-of-the-art (SoTA) models and others, guiding the direction of model iterations [4]. - Good indicators must objectively reflect performance levels and accurately indicate the direction for model improvements, avoiding the pitfalls of focusing on misleading metrics [4]. Group 2: Experimentation Methodologies - The cost of experiments has increased significantly, making it essential to conduct meaningful experiments rather than low-value ones [5]. - It is advised to conduct large experiments to identify significant issues while using small experiments to filter out incorrect ideas [6]. Group 3: Team Collaboration - Given the complexity of large model experiments, it is important for team members to understand their comparative advantages and roles within the team [8]. - Effective collaboration can be enhanced by finding ways to observe and document experiments together, increasing communication frequency [8].
打造万人的自动驾驶黄埔军校,一个死磕技术的地方~
自动驾驶之心· 2025-06-20 14:06
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving and embodied intelligence, aiming to gather industry professionals and facilitate rapid responses to challenges within the sector. The goal is to create a community of 10,000 members within three years, focusing on academic, product, and recruitment connections in the field [2][4]. Group 1: Community Development - The community aims to provide a platform for industry professionals to share the latest technological developments, engage in discussions, and access job opportunities [2][3]. - The initiative has already attracted notable figures from companies like Huawei and various leading researchers in the autonomous driving field [2]. - The community is designed to support newcomers by offering structured learning paths and resources to quickly build their technical knowledge [2]. Group 2: Knowledge Sharing and Resources - The "Autonomous Driving Heart Knowledge Planet" serves as a technical exchange platform, primarily for students and professionals looking to transition into the autonomous driving sector [4][11]. - The community has established connections with numerous companies for recruitment purposes, including well-known names like Xiaomi, NIO, and NVIDIA [4][11]. - Members have access to a wealth of resources, including over 5,000 pieces of content, live sessions with industry experts, and discounts on paid courses [14][18]. Group 3: Technological Focus Areas - The article outlines key technological areas to focus on by 2025, including visual large language models (VLM), end-to-end trajectory prediction, and 3D generative simulation techniques [6][10]. - The community has developed learning paths covering various subfields such as perception, mapping, and AI model deployment, ensuring comprehensive coverage of the autonomous driving technology stack [11][16]. - Regular live sessions will focus on cutting-edge topics like VLA, large models, and embodied intelligence, providing insights into practical applications and research advancements [19][18]. Group 4: Engagement and Interaction - The community encourages active participation, with weekly discussions and Q&A sessions to foster engagement among members [12][14]. - It aims to create a supportive environment for both beginners and advanced professionals, facilitating networking and collaboration opportunities [12][11]. - The platform is designed to be a dynamic space where members can freely ask questions and share knowledge, enhancing the overall learning experience [12][11].
学习端到端大模型,还不太明白VLM和VLA的区别。。。
自动驾驶之心· 2025-06-19 11:54
Core Insights - The article emphasizes the growing importance of large models (VLM) in the field of intelligent driving, highlighting their potential for practical applications and production [2][4]. Group 1: VLM and VLA - VLM (Vision-Language Model) focuses on foundational capabilities such as detection, question answering, spatial understanding, and reasoning [4]. - VLA (Vision-Language Action) is more action-oriented, aimed at trajectory prediction in autonomous driving, requiring a deep understanding of human-like reasoning and perception [4]. - It is recommended to learn VLM first before expanding to VLA, as VLM can predict trajectories through diffusion models, enhancing action capabilities in uncertain environments [4]. Group 2: Community and Resources - The article invites readers to join a knowledge-sharing community that offers comprehensive resources, including video courses, hardware, and coding materials related to autonomous driving [4]. - The community aims to build a network of professionals in intelligent driving and embodied intelligence, with a target of gathering 10,000 members in three years [4]. Group 3: Technical Directions - The article outlines four cutting-edge technical directions in the industry: Visual Language Models, World Models, Diffusion Models, and End-to-End Autonomous Driving [5]. - It provides links to various resources and papers that cover advancements in these areas, indicating a robust framework for ongoing research and development [6][31]. Group 4: Datasets and Applications - A variety of datasets are mentioned that are crucial for training and evaluating models in autonomous driving, including pedestrian detection, object tracking, and scene understanding [19][20]. - The article discusses the application of language-enhanced systems in autonomous driving, showcasing how natural language processing can improve vehicle navigation and interaction [20][21]. Group 5: Future Trends - The article highlights the potential for large models to significantly impact the future of autonomous driving, particularly in enhancing decision-making and control systems [24][25]. - It suggests that the integration of language models with driving systems could lead to more intuitive and human-like vehicle behavior [24][25].
2026届自动驾驶算法岗招聘,趋势变化有些大。。。
自动驾驶之心· 2025-06-19 10:47
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 本篇文章针对26届自动驾驶和互联网求职同学,和大家一起聊聊今年招聘大趋势的边和和秋招的进程。欢迎大家 收藏转发~ 大趋势的变化 去年自动驾驶和互联网整体上都不太景气,诸多公司宣布裁员、倒闭。25届的很多同学都遭遇了校招滑铁卢。今 年业内不少公司都缓过来一口气,开始回复大规模招聘(小米、比亚迪、小鹏等等),预计26届的行情整体还是 可以的,有希望追平24届。 还有一点不同,提前批的效力在逐年减弱,甚至可以说名存实亡。除了顶级天才真的能拿到提前批的超级offer, 大部分同学基本上都是在七月底到十一月底这个阶段拿到秋招offer,十一月底到过年是秋招补录阶段。 对于大公司来说,暑期实习真的很重要。广义上的可转正暑期实习从2月底会持续到10月底,即使到了秋招的时 候,也都同步会有可转正实习的hc,因为大厂目前更倾向于招聘实习转正的人,第一是更有实际的工作基础,第 二当然也是更好压价,实习转正的薪酬一般都不会开的太高。 请大家一定充分认识到暑期实习的重要性,有条件的同学尽可能的争取到一份暑期实习,联系下毕业的学长学姐 ...
斯坦福最新!大模型的幻觉分析:沉迷思考=真相消失?
自动驾驶之心· 2025-06-19 10:47
Core Viewpoint - The paper explores the relationship between reasoning capabilities and hallucinations in multimodal reasoning models, questioning whether increased reasoning leads to decreased visual perception accuracy [2][3][37]. Group 1: Reasoning Models and Hallucinations - Multimodal reasoning models exhibit a tendency to amplify hallucinations as their reasoning capabilities improve, leading to potential misinterpretations of visual data [2][3][5]. - The study introduces a new metric, RH-AUC, to assess the balance between reasoning length and perception accuracy, indicating that longer reasoning chains may lead to increased hallucinations [4][30]. Group 2: Attention Mechanism and Performance - The attention mechanism in reasoning models shows a significant drop in focus on visual elements, leading to a reliance on language-based assumptions rather than visual evidence [5][18]. - Experiments reveal that reasoning models perform poorly on perception tasks compared to non-reasoning models, indicating that hallucination rates are higher in reasoning models regardless of their size [8][37]. Group 3: Training Paradigms and Data Quality - The paper identifies two main training paradigms: pure reinforcement learning (RL-only) and supervised fine-tuning combined with reinforcement learning (SFT+RL), with RL-only models generally performing better in balancing reasoning and perception [10][35]. - Data quality is emphasized over quantity, suggesting that models trained on high-quality, domain-specific data perform better in maintaining the reasoning-hallucination balance [39][42]. Group 4: Evaluation Metrics and Future Directions - The RH-Bench benchmark is introduced, consisting of 1000 multimodal tasks to evaluate models' reasoning and perception capabilities comprehensively [30][32]. - Future research directions include exploring broader model architectures and developing mechanisms for dynamically adjusting reasoning lengths to enhance model reliability [44].
高质量3DGS表示!𝒳-Scene:新颖的大规模驾驶场景生成框架~
自动驾驶之心· 2025-06-19 10:47
以下文章来源于3D视觉之心 ,作者3D视觉之心 3D视觉之心 . 3D视觉与SLAM、点云相关内容分享 点击下方 卡片 ,关注" 3D视觉之心 "公众号 第一时间获取 3D视觉干货 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 大规模场景生成的挑战 近年来,生成式人工智能的进步对自动驾驶产生了深远影响,其中扩散模型成为数据合成和驾驶仿真的关键工具。 一些方法将扩散模型作为数据生成机器,用于生成高保真的驾驶视频或多模态的合成数据,以增强感知任务,并生 成如车辆插队等关键但罕见的情况,从而丰富规划数据。除此之外,还有一些方法将扩散模型作为世界模型,用于 预测未来的驾驶状态,从而实现端到端的规划和闭环仿真。这些研究主要强调通过时间递归生成长期视频,鼓励扩 散模型输出时序一致的视频序列,以服务于后续任务。 然而,具备空间扩展能力的大规模场景生成仍是一个新兴但尚未被充分研究的方向,其目标是构建可用于任意驾驶 仿真的广阔而沉浸式的三维环境。一些开创性工作已经探索了大规模的三维驾驶场景生成。例如,有的方法利用扩 散 ...
CVPR'25端到端冠军方案!GTRS:可泛化多模态端到端轨迹规划(英伟达&复旦)
自动驾驶之心· 2025-06-19 10:47
今天自动驾驶之心为大家分享 英伟达、复旦大学 最新的工作! GTRS:可泛化的 多模式端到端轨迹规划! 如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>点击进入→ 自动驾驶之心 『端到端自动驾驶』技术交流群 论文作者 | Zhenxin Li等 编辑 | 自动驾驶之心 论文链接:https://arxiv.org/abs/2506.06664 Github:https://github.com/NVlabs/GTRS NVIDIA技术博客:https://blogs.nvidia.com/blog/auto-research-cvpr-2025/?ncid=so-nvsh-677066 CVPR 2025 Autonomous Grand Challenge: https://opendrivelab.com/legacy/challenge2025/index.html 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 端到端自动驾驶挑战赛背景 NAVSIM v2 ...
调研了一圈,还是更想做自动驾驶!
自动驾驶之心· 2025-06-19 06:30
Core Viewpoint - The company has launched the pre-sale of the "Black Warrior Series 001," an all-in-one autonomous driving vehicle aimed at research and education, priced at 36,999 yuan, with additional courses offered for early orders [1]. Group 1: Product Overview - The "Black Warrior 001" is a lightweight solution for teaching and research, supporting various functionalities such as perception, positioning, fusion, navigation, and planning, built on an Ackermann chassis [5]. - The vehicle allows for secondary development and modification, with numerous installation positions and interfaces for adding cameras, millimeter-wave radars, and other sensors [6]. Group 2: Performance and Testing - The vehicle has been tested in various environments, including indoor, outdoor, and basement scenarios, demonstrating its capabilities in perception, positioning, fusion, navigation, and planning [8]. - Specific tests include outdoor park driving, point cloud 3D target detection, 2D and 3D laser mapping in indoor basements, slope tests, and outdoor large scene 3D mapping [11][13][15][17][19][21][23]. Group 3: Hardware Specifications - Key sensors include a Mid 360 3D LiDAR, a 2D LiDAR from Lidar, a depth camera from Orbbec, and a main control chip Nvidia Orin NX 16G [25]. - The vehicle weighs 30 kg, has a battery power of 50W, operates at 24V, and has a runtime of over 4 hours, with a maximum speed of 2 m/s [27][28]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, supporting one-click startup and providing a development environment [30]. - The vehicle supports various functionalities such as 2D and 3D SLAM, depth estimation, vehicle navigation, and obstacle avoidance [32]. Group 5: After-Sales and Support - The company offers one year of after-sales support (excluding human damage), with free repairs for damages caused by operational errors or code modifications during the warranty period [55].
自动驾驶中常提的VLA是个啥?
自动驾驶之心· 2025-06-18 13:37
Core Viewpoint - The article discusses the Vision-Language-Action (VLA) model, which integrates visual perception, language understanding, and action decision-making into a unified framework for autonomous driving, enhancing system generalization and adaptability [2][4][12]. Summary by Sections Introduction to VLA - VLA stands for Vision-Language-Action, aiming to unify the processes of environmental observation and control command output in autonomous driving [2]. - The model represents a shift from traditional modular approaches to an end-to-end system driven by large-scale data [2][4]. Technical Framework of VLA - The VLA model consists of four key components: 1. Visual Encoder: Extracts features from images and point cloud data [8]. 2. Language Encoder: Utilizes pre-trained language models to understand navigation instructions and traffic rules [11]. 3. Cross-Modal Fusion Layer: Aligns and integrates visual and language features for unified environmental understanding [11]. 4. Action Decoder: Generates control commands based on the fused multi-modal representation [8][11]. Advantages of VLA - VLA enhances scene generalization and contextual reasoning, allowing for quicker and more reasonable decision-making in complex scenarios [12]. - The integration of language understanding allows for more flexible driving strategies and improved human-vehicle interaction [12]. Industry Applications - Various companies, including DeepMind and Yuanrong Qixing, are applying VLA concepts in their autonomous driving research, showcasing its potential in real-world applications [13]. - The RT-2 model by DeepMind and the "end-to-end 2.0 version" by Yuanrong Qixing highlight the advancements in intelligent driving systems [13]. Challenges and Future Directions - Despite its advantages, VLA faces challenges such as lack of interpretability, high data quality requirements, and significant computational resource demands [13][15]. - Solutions being explored include integrating interpretability modules, optimizing trajectory generation, and combining VLA with traditional control methods to enhance safety and robustness [15][16]. - The future of VLA in autonomous driving looks promising, with expectations of becoming a foundational technology as advancements in large models and edge computing continue [16].