Workflow
自动驾驶之心
icon
Search documents
7DGS 炸场:一秒点燃动态世界!真实感实时渲染首次“七维全开”
自动驾驶之心· 2025-08-23 16:03
以下文章来源于3D视觉之心 ,作者3D视觉之心 3D视觉之心 . 3D视觉与SLAM、点云相关内容分享 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 7DGS 炸场:一秒点燃动态世界!真实感实时渲染首次"七维全开" 具有复杂视角相关效果的真实感动态场景渲染在计算机视觉与图形学中仍然具有挑战性。示例包括来自真实 CT 扫描的动态心跳可视化以及日照周期中伴随吸收与散 射效应的云层过渡。合成动态场景的新视角对于虚拟现实、增强现实、内容创作与数字孪生等众多应用至关重要。尽管在静态场景重建与渲染方面,通过神经辐射场 (NeRF)以及最近的 3D 高斯溅射(3DGS)已取得显著进展,但实现 高质量、实时的具有视角相关效果的动态场景渲染仍面临巨大的计算与表征挑战 。 核心难点在于同时建模三个基本方面: 1) 空间几何, 2) 时间动态, 3) 视角相关外观 。每个维度都带来独特挑战。空间建模必须捕捉不同尺度下复杂的场景几何;时 间建模必须表示刚性与非刚性运动,可能涉及复杂形变;视角相关建模需要捕捉复杂的光传输效应,如散射、各向异性反射与半透明性。当同时考虑时,由于它们 ...
某新势力智驾团队最后一位留守高管已于近日离职
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The departure of key personnel from a leading new force car company's intelligent driving team may significantly impact its research and development progress, team stability, and sales momentum in the second half of the year [1][2][3]. Group 1: Company Developments - The intelligent driving team of the new force car company has experienced significant turnover, with a reported attrition rate exceeding 50% in some teams this year [1]. - The company has initiated a full-scale non-compete agreement to retain talent, even requiring recent graduates to sign such agreements [1]. - The departure of the R&D head, who was a core member of the team, raises concerns about the company's ability to achieve its ambitious goals for 2024 [2]. Group 2: Industry Trends - The movement of core intelligent driving talent across the industry may present new opportunities for technological advancements [3]. - The intelligent driving landscape is evolving, with a trend towards convergence in technology routes driven by competitive pricing strategies [3]. - The departure of key figures from various intelligent driving teams, including those from Xiaopeng and NIO, indicates a broader industry shift and a new cycle of updates within the intelligent driving teams [3]. Group 3: Strategic Implications - The company is expected to launch a new paradigm of intelligent driving, which could significantly influence the sales of new models [2]. - The loss of three high-level executives responsible for critical aspects of intelligent driving may disrupt the company's overall R&D timeline and stability [2].
推荐一个大模型AI私房菜!
自动驾驶之心· 2025-08-23 16:03
Group 1 - The article emphasizes the growing interest in large model technologies, particularly in areas such as RAG (Retrieval-Augmented Generation), AI Agents, multimodal large models (pre-training, fine-tuning, reinforcement learning), and optimization for deployment and inference [1] - A community named "Large Model Heart Tech" is being established to focus on these technologies and aims to become the largest domestic community for large model technology [1] - The community is also creating a knowledge platform to provide industry and academic information, as well as to cultivate talent in the field of large models [1] Group 2 - The article describes the community as a serious content-driven platform aimed at nurturing future leaders [2]
面向量产VLA!FastDriveVLA:即插即用剪枝模块,推理加速近4倍
自动驾驶之心· 2025-08-23 16:03
Core Viewpoint - The article discusses the development of FastDriveVLA, a novel visual token pruning framework designed for autonomous driving, achieving a 50% compression rate while maintaining 97.3% performance [3][13][43]. Group 1: End-to-End Autonomous Driving - Recent advancements in end-to-end autonomous driving research have led to the adoption of visual-language-action (VLA) models, which outperform traditional modular approaches in complex scene understanding and decision-making [3][10]. - The VLA model integrates perception, action generation, and planning into a single framework, reducing information loss between modules [3][4]. Group 2: Visual Token Pruning Techniques - Existing VLM/VLA models face high computational costs due to the encoding of images into numerous visual tokens, prompting research into visual token pruning methods [4][11]. - Two primary approaches for visual token pruning are attention mechanism-based methods and similarity-based methods, both of which have limitations in driving tasks [4][14]. - FastDriveVLA introduces a reconstruction-based visual token pruning framework that focuses on retaining tokens related to foreground areas critical for driving decisions [5][13]. Group 3: FastDriveVLA Framework - FastDriveVLA employs a plug-and-play pruner called ReconPruner, trained using a pixel reconstruction task to emphasize foreground information [6][17]. - The framework includes an adversarial foreground-background reconstruction strategy to enhance the model's ability to distinguish between foreground and background tokens [20][21]. - A large-scale dataset, nuScenes-FG, was constructed to support the training of ReconPruner, containing 241,000 image-mask pairs for effective foreground segmentation [6][12][13]. Group 4: Experimental Results - FastDriveVLA achieved state-of-the-art results on the nuScenes closed-loop planning benchmark, demonstrating its effectiveness and practicality [13][28]. - The framework was evaluated under various pruning ratios (25%, 50%, 75%), consistently outperforming existing methods in key metrics such as L2 error and collision rates [30][34]. - The efficiency analysis showed that FastDriveVLA significantly reduces FLOPs and CUDA latency compared to other methods, enhancing real-time deployment capabilities [36][40]. Group 5: Contributions and Implications - The introduction of FastDriveVLA provides a new paradigm for efficient inference in VLA models, offering insights into task-specific token pruning strategies [43]. - The research highlights the importance of focusing on foreground information in autonomous driving tasks, which can lead to improved performance and reduced computational costs [5][43].
又帮到了一位同学拿到了自动驾驶算法岗......
自动驾驶之心· 2025-08-23 14:44
Core Viewpoint - The article emphasizes the importance of continuous learning and adaptation in the field of autonomous driving, particularly in light of industry shifts towards intelligent models and large models, while also highlighting the value of community support for knowledge sharing and job opportunities [1][2]. Group 1: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" is a comprehensive community platform that integrates video, text, learning paths, Q&A, and job exchange, aiming to grow from over 4,000 to nearly 10,000 members in two years [1][2]. - The community provides practical solutions for various topics such as entry points for end-to-end models, learning paths for multimodal large models, and engineering practices for data closed-loop 4D annotation [2][3]. - Members have access to over 40 technical routes, including industry applications, VLA benchmarks, and learning entry routes, significantly reducing search time for relevant information [2][3]. Group 2: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job applications and resume submissions directly to desired companies [7]. - Regular job sharing and updates on available positions are provided, creating a complete ecosystem for autonomous driving professionals [15][30]. Group 3: Technical Learning and Development - The community offers a well-structured technical stack and roadmap for beginners, covering essential areas such as mathematics, computer vision, deep learning, and programming [11][32]. - Various learning routes are available for advanced topics, including end-to-end autonomous driving, 3DGS principles, and multimodal large models, catering to both newcomers and experienced professionals [16][34][40]. - The platform also hosts live sessions with industry leaders, providing insights into cutting-edge research and practical applications in autonomous driving [58][66].
聊一聊多模态的交叉注意力机制
自动驾驶之心· 2025-08-22 16:04
作者 | Trancy Wang 编辑 | 大模型之心Tech 原文链接: https://zhuanlan.zhihu.com/p/1939104588109156480 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 >> 点击进入→ 大模型没那么大Tech技术交流群 本文只做学术分享,如有侵权,联系删文 ,自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 一、交叉注意力在多模态里的位置 在多模态任务(图文匹配、VQA、视频理解、语音-图像结合等)里,光靠把不同模态的特征拼在一起是不够的。 我们希望让 一种模态主动去问另一种模态问题 ,而另一模态则 提供相关的上下文线索 。 交叉注意力(Cross-Attention)就是干这个的——它的核心思路是: Query(Q):主动方,要从另一模态获取信息; 公式和普通的 Transformer 注意力一样: 只是多模态版本里,Q 和 K/V 来自 不同来源 。 二、常见的四种设计方式 1. 单向交叉注意力(Single-direction Cross-Attention) 特点:只有一 ...
ICCV'25!清华GS-Occ3D:纯视觉规模化Occ重建,自动标注新范式~
自动驾驶之心· 2025-08-22 16:04
Core Viewpoint - The article discusses the emergence of GS-Occ3D, a new paradigm for occupancy grid reconstruction using pure vision, which aims to address the challenges of high costs and scalability associated with traditional LiDAR-based methods in autonomous driving [3][10]. Group 1: Research Motivation and Contributions - The existing methods for occupancy grid labeling heavily rely on LiDAR, which requires expensive specialized mapping vehicles, limiting scalability [6]. - GS-Occ3D introduces a low-cost, scalable framework for occupancy grid labeling that effectively utilizes large amounts of crowd-sourced data from consumer vehicles [7]. - The method achieves state-of-the-art (SOTA) geometric reconstruction results in the Waymo dataset and demonstrates superior zero-shot generalization capabilities in the Occ3D-nuScenes dataset [10][36]. Group 2: Methodology Overview - GS-Occ3D employs a Gaussian surface representation based on octrees to optimize explicit geometric representation, enabling low-cost and efficient large-scale automatic labeling [10][13]. - The process involves generating sparse point clouds and ground surface elements from panoramic street views, followed by a labeling generation workflow that enhances point cloud density and explicitly handles occlusions [13][32]. - The resulting pure visual labels can train downstream occupancy grid models, allowing them to generalize to unseen scenarios and possess geometric reasoning capabilities [13][10]. Group 3: Quantitative Results - The method achieved a Chamfer Distance (CD) of 0.56 and a Peak Signal-to-Noise Ratio (PSNR) of 26.89 in the Waymo dataset, outperforming several existing methods [15]. - In terms of generalization and fitting results, the method demonstrated an Intersection over Union (IoU) of 44.7 and an F1 score of 61.8 on the Occ3D-Val (Waymo) dataset, indicating competitive performance [16]. - The zero-shot generalization ability of the method was highlighted, showing better performance in complex scenarios compared to LiDAR-based methods [24][32]. Group 4: Advantages of Pure Vision Method - The pure vision approach offers broader coverage compared to LiDAR, especially in large areas, and can outperform LiDAR in specific scenarios like reconstructing tall buildings [32]. - It exhibits superior zero-shot generalization capabilities, allowing models trained with pure vision labels to generalize across a wider range of geometries [32]. - The method provides rich semantic information at a lower cost, enabling the reconstruction of 3D labels with up to 66 categories, compared to only 16 categories in Occ3D [32][33]. Group 5: Challenges and Limitations - The inherent limitations of camera perspectives, such as the lack of rear visibility in the Waymo dataset, can lead to unavoidable information loss [34]. - Performance can be significantly affected by lighting conditions, particularly at night or in cases of exposure anomalies [34]. - The method may struggle in static scenes where the vehicle is stationary, necessitating prior knowledge for effective geometric reconstruction [34].
自动驾驶之心VLA技术交流群成立了(数据/模型/部署等方向)
自动驾驶之心· 2025-08-22 16:04
自动驾驶之心大模型VLA技术交流群成立了,欢迎大家加入一起交流VLA相关的内容:包括VLA数据集制 作、一段式VLA、分层VLA、基于大模型的端到端方案、基于VLM+DP的方案、量产落地、求职等内容。 感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称+VLA加群。 ...
博世拿到某新能源重卡的智驾定点
自动驾驶之心· 2025-08-22 16:04
Core Viewpoint - Bosch has secured a significant order for intelligent driving systems from a leading new energy heavy truck manufacturer, marking a major breakthrough in the commercial vehicle sector [5][6]. Group 1: Order Acquisition - Bosch's recent contract includes three major vehicle platforms, covering over a hundred domestic and international models, such as tractors, mixers, and dump trucks [5]. - The competitive bidding process was intense, with many domestic companies participating, and Bosch managed to win despite being at a disadvantage initially [5]. Group 2: Market Dynamics - The commercial vehicle sector has seen a surge in demand for intelligent driving systems, driven by policy changes mandating Advanced Emergency Braking (AEB) systems [6]. - The current market is characterized by a high level of competition for low-level Advanced Driver Assistance Systems (ADAS), while high-level systems like Navigation on Automated Driving (NOA) remain less competitive due to higher technical barriers [6].
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]