Workflow
自动驾驶之心
icon
Search documents
中国大模型的技术一号位们
自动驾驶之心· 2025-09-18 03:40
Core Viewpoint - The article discusses the rapid development and competitive landscape of AI in China, highlighting key leaders and their contributions to the advancement of AI technologies and applications in various industries [2][37]. Group 1: Key Leaders and Their Contributions - Liang Wenfeng, founder of DeepSeek, demonstrated the potential of Chinese AI startups by achieving 30 million daily active users within 20 days of product launch, showcasing rapid development and market impact [4][5]. - Lin Junyang, head of Tongyi Qianwen at Alibaba Cloud, led the team to adapt AI models for over 100,000 enterprise clients across 20 industries, emphasizing the importance of industry-specific applications [9][10]. - Wu Yonghui, head of ByteDance's Seed team, focused on user-centric AI applications, achieving over 10 million daily active users by addressing everyday needs in various scenarios [12][14]. - Bo Liefeng, core leader of Tencent's Mixyuan model, successfully integrated AI capabilities into over 200,000 enterprise clients, enhancing efficiency in sectors like finance and manufacturing [16][17]. - Xu Li, chairman of SenseTime, developed the SenseCore AI infrastructure, enabling the deployment of the Riri New Model across multiple sectors, serving over 1,000 large enterprises globally [21][23]. - Yan Junjie, founder of Minimax, introduced the first commercial trillion-parameter MoE architecture model, rapidly iterating to meet diverse enterprise needs and achieving significant user engagement [25][27]. - Yang Zhilin, founder of Moonshot AI, focused on long-context processing capabilities, leading to the successful launch of Kimi Chat, which gained millions of users in specialized fields [29][32]. - Wang Haifeng, CTO of Baidu, established the PaddlePaddle deep learning platform and led the development of the Wenxin model, solidifying Baidu's leadership in the Chinese AI landscape [33][35]. Group 2: Industry Impact - The success of these leaders and their companies illustrates the growing strength of China's AI sector, pushing the boundaries of technology and application across various industries [2][37]. - The advancements in AI technology are not only enhancing operational efficiencies but also driving digital transformation in traditional sectors, thereby increasing the competitiveness of Chinese enterprises on a global scale [10][23]. - The collaborative efforts among these companies are fostering a robust AI ecosystem, promoting innovation and practical applications that address real-world challenges [21][27].
苦战七年卷了三代!关于BEV的演进之路:哈工大&清华最新综述
自动驾驶之心· 2025-09-17 23:33
❝ BEV作为智能驾驶的量产基石,它的发展过程是怎样的?盘点BEV的三代演进之路。 BEV感知已成为自动驾驶领域的基础范式,能够提供统一的空间表征,为鲁棒的多传感器融合和多智能体协作提供支持。随着自动驾驶车辆从受控环境向现实世界部署 过渡,如何在复杂场景(如遮挡、恶劣天气和动态交通)中确保BEV感知的安全性和可靠性,仍是一项关键挑战。本文首次从安全关键视角对BEV感知进行全面综述, 系统分析了当前主流框架及实现策略,并将其划分为三个渐进阶段:单模态车载感知、多模态车载感知和多智能体协作感知。此外,本文还研究了涵盖车载、路侧及协 作场景的公开数据集,评估了这些数据集在安全性和鲁棒性方面的适用性。本文进一步指出了开放世界场景下的关键挑战(包括开放集识别、大规模未标注数据、传感 器性能退化及智能体间通信延迟),并概述了未来研究方向,如与端到端自动驾驶系统的融合、具身智能及大型语言模型的应用。 论文链接:https://arxiv.org/abs/2508.07560v1 论文标题:Progressive Bird's Eye View Perception for Safety-Critical Autonomou ...
小鹏&理想全力攻坚的VLA路线,到底都有哪些研究方向?
自动驾驶之心· 2025-09-17 23:33
Core Viewpoint - The article discusses the transition in intelligent driving technology from rule-driven to data-driven approaches, highlighting the limitations of end-to-end models in complex scenarios and the potential of VLA (Vision-Language Action) as a more streamlined solution [1][2]. Group 1: Challenges in Learning and Research - The technical stack for autonomous driving VLA has not yet converged, leading to a proliferation of algorithms and making it difficult for newcomers to enter the field [2]. - A lack of high-quality documentation and fragmented knowledge in various domains increases the entry barrier for beginners in autonomous driving VLA research [2]. Group 2: Course Development - A new course titled "Autonomous Driving VLA Practical Course" has been developed to address the challenges faced by learners, focusing on a comprehensive understanding of the VLA technical stack [3][4]. - The course aims to provide a one-stop opportunity to enhance knowledge across multiple fields, including visual perception, language modules, and action modules, while integrating cutting-edge technologies [2][3]. Group 3: Course Features - The course emphasizes quick entry into the subject matter through a Just-in-Time Learning approach, using simple language and case studies to help students grasp core technologies rapidly [3]. - It aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points to form their own research systems [4]. - Practical application is a key focus, with hands-on sessions designed to complete the theoretical-to-practical loop [5]. Group 4: Course Outline - The course covers the origins of autonomous driving VLA, foundational algorithms, and the differences between modular and integrated VLA [6][10][12]. - It includes practical sessions on dataset creation, model training, and performance enhancement, providing a comprehensive learning experience [12][14][16]. Group 5: Instructor Background - The instructors have extensive experience in multimodal perception, autonomous driving VLA, and large model frameworks, with numerous publications in top-tier conferences [22]. Group 6: Learning Outcomes - Upon completion, students are expected to thoroughly understand the current advancements in autonomous driving VLA and master core algorithms [23][24]. - The course is designed to benefit students in internships, job recruitment, and further academic pursuits in the field [26]. Group 7: Course Schedule - The course is set to begin on October 20, with a structured timeline for unlocking chapters and providing support through online Q&A sessions [27].
揭秘小鹏自动驾驶「基座模型」和 「VLA大模型」
自动驾驶之心· 2025-09-17 23:33
汽车行业先进个人与团队关注 Vehicle, 一起智能、出海、成长 作者 | Pirate Jack 来源 | Vehicle 以下文章来源于Vehicle ,作者Pirate Jack Vehicle . 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 | | 自动驾驶之心-国内最大的自驾技术平台 | | | --- | --- | --- | | 学习社区 | 论文辅导 | 在线课程 | | 产品宣传 | 内推求职 | 展会服务 | | 企业咨询 | 硬件教具 - | 项目对接 | >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 2025 年 的 CVPR 自 动 驾 驶 Workshop 上 , 小 鹏 汽 车 的 Liu Xianming 先 生 做了一篇名为《Scaling up Autonomous Driving via Large Foundation Models》的演讲。 之前,网络上有不少小鹏此次CVPR的 VLA演讲信息,但那些是别人想让你看到的广告推文。本文根据 Liu Xianming ...
超高性价比3D扫描仪!点云/视觉全场景厘米级重建
自动驾驶之心· 2025-09-17 23:33
最强性价比3D激光扫描仪 面向工业场景和教研场景的 超高性价比3D扫描仪来了!GeoScan S1是国内目前最强性价比实景三维激光扫描 仪,轻量化设计,一键启动,便可拥有高效实用的三维解决方案。以多模态传感器融合算法为核心,实现厘米级 精度的三维场景实时重构。可广泛用于多种作业领域。 每秒20万级点云成图,70米测量距离,360°全域覆盖,支持20万平米以上的大场景,扫描可选配3D高斯数据采 集模块,实现高保真实景还原。支持跨平台集成,配备高带宽网口及双USB 3.0接口,为科研实验提供灵活扩展 空间。降低开发门槛,助力开发者快速掌握研发能力,开启更多可能。 GeoScan S1设备自带手持Ubuntu系统和多种传感器设备,手柄集成了电源,可通过D-TAP转XT30母头输出至 GeoScan S1设备本体,给雷达、摄像头以及主控板提供电源。 基础版重建效果一览! 使用门槛低 :操作简单直观,一键启动即可 执行扫描作业 扫描结果导出即用 :无需复杂部署和繁琐处理,扫 描结果导出即用 高效率高精度建图 :模型精度高,行走之间轻松扫 描大场景 业内最优惠价格 :性价比高,高度 集成多传感器, 往下翻~ 重磅!3DG ...
前理想CTO跨行具身创业,多家资本助力......
自动驾驶之心· 2025-09-17 03:26
Core Viewpoint - The article highlights the growing interest and investment in the field of embodied intelligence, particularly with the involvement of key industry figures such as Wang Kai, who has transitioned from a CTO role at Li Auto to an investment partner at Yuanjing Capital, indicating a shift towards commercialization in this sector [2][3]. Group 1: Investment and Industry Dynamics - Wang Kai, a former CTO of Li Auto, is now engaged in embodied intelligence entrepreneurship, attracting attention from various investment institutions [2][3]. - The startup has garnered significant investment interest, with firms like Sequoia Capital and BlueRun Ventures contributing a total of $50 million, reflecting the potential seen in the embodied intelligence sector [3]. - The emphasis on the founder's production capabilities is a key factor for investors, as the industry requires strong expertise in mass production to advance commercialization [3]. Group 2: Key Personnel and Contributions - Wang Kai's previous experience at Li Auto involved overseeing smart driving research, including cockpit systems, autonomous driving, and platform development, which positions him as a valuable asset in the new venture [3]. - Another high-ranking executive from the autonomous driving sector is also set to participate in a leading new force's end-to-end and vehicle-level mass production efforts, highlighting the need for experienced professionals in the embodied intelligence field [3].
自动驾驶之心企业合作邀请函
自动驾驶之心· 2025-09-17 02:01
自动驾驶之心是具身智能领域的优秀创作和宣传的媒体平台。近一年内,我们和多家自驾公司签订 长期合作事项,包括但不限于品牌宣传、产品宣传、联合运营等。 我们期待进一步的合作!!! 联系方式 随着团队的不断扩大,我们期望在上述业务上和更多优秀的公司建立联系,推动自驾领域的高速发 展。欢迎有相关业务需求的公司或团队联系我们。 添加商务微信oooops-life做进一步沟通。 ...
那些号称端到端包治百病的人,压根从来没做过PnC......
自动驾驶之心· 2025-09-16 23:33
Core Viewpoint - The article discusses the current state and future potential of end-to-end (E2E) autonomous driving systems, emphasizing the need for a shift from modular to E2E approaches in the industry, while acknowledging the challenges and limitations that still exist in achieving maturity in this technology [3][5]. Group 1: End-to-End Autonomous Driving - The concept of end-to-end systems involves directly processing raw sensor data to output control signals for vehicles, representing a significant shift from traditional modular approaches [3][4]. - E2E systems are seen as a way to provide a comprehensive representation of the information affecting vehicle behavior, which is crucial for handling the open-set scenarios of autonomous driving [4]. - The industry is currently divided, with some companies focusing on Vehicle Language Architecture (VLA) and others on traditional methods, but there is a consensus that E2E systems are the future [2][5]. Group 2: Industry Trends and Challenges - There is a growing recognition that autonomous driving is transitioning from rule-based to knowledge-driven systems, which necessitates a deeper understanding of E2E methodologies [5]. - Despite the high potential of E2E systems, there are still significant challenges to overcome before they can fully replace traditional planning and control methods [5]. - The article suggests that companies should allow more time for E2E systems to mature rather than rushing to implement them without adequate understanding [5]. Group 3: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community aims to provide a platform for sharing knowledge and resources related to autonomous driving, including technical routes and job opportunities [8][18]. - The community has gathered over 4,000 members and aims to expand to nearly 10,000 within two years, offering a space for both beginners and advanced learners to engage with industry experts [8][18]. - Various learning resources, including video tutorials and technical discussions, are available to help members navigate the complexities of autonomous driving technologies [12][18].
自动驾驶基础模型应该以能力为导向,而不仅是局限于方法本身
自动驾驶之心· 2025-09-16 23:33
Core Insights - The article discusses the transformative impact of foundational models on the autonomous driving perception domain, shifting from task-specific deep learning models to versatile architectures trained on vast and diverse datasets [2][4] - It introduces a new classification framework focusing on four core capabilities essential for robust performance in dynamic driving environments: general knowledge, spatial understanding, multi-sensor robustness, and temporal reasoning [2][5] Group 1: Introduction and Background - Autonomous driving perception is crucial for enabling vehicles to interpret their surroundings in real-time, involving key tasks such as object detection, semantic segmentation, and tracking [3] - Traditional models, designed for specific tasks, exhibit limited scalability and poor generalization, particularly in "long-tail scenarios" where rare but critical events occur [3][4] Group 2: Foundational Models - Foundational models, developed through self-supervised or unsupervised learning strategies, leverage large-scale datasets to learn general representations applicable across various downstream tasks [4][5] - These models demonstrate significant advantages in autonomous driving due to their inherent generalization capabilities, efficient transfer learning, and reduced reliance on labeled datasets [4][5] Group 3: Key Capabilities - The four key dimensions for designing foundational models tailored for autonomous driving perception are: 1. General Knowledge: Ability to adapt to a wide range of driving scenarios, including rare situations [5][6] 2. Spatial Understanding: Deep comprehension of 3D spatial structures and relationships [5][6] 3. Multi-Sensor Robustness: Maintaining high performance under varying environmental conditions and sensor failures [5][6] 4. Temporal Reasoning: Capturing temporal dependencies and predicting future states of the environment [6] Group 4: Integration and Challenges - The article outlines three mechanisms for integrating foundational models into autonomous driving technology stacks: feature-level distillation, pseudo-label supervision, and direct integration [37][40] - It highlights the challenges faced in deploying these models, including the need for effective domain adaptation, addressing hallucination risks, and ensuring efficiency in real-time applications [58][61] Group 5: Future Directions - The article emphasizes the importance of advancing research in foundational models to enhance their safety and effectiveness in autonomous driving systems, addressing current limitations and exploring new methodologies [2][5][58]
3D/4D World Model(WM)近期发展的总结和思考
自动驾驶之心· 2025-09-16 23:33
Core Viewpoint - The article discusses the current state of embodied intelligence, focusing on data collection and utilization, and emphasizes the importance of 3D/4D world models in enhancing spatial understanding and interaction capabilities in autonomous driving and related fields [3][4]. Group 1: 3D/4D World Models - The development of 3D/4D world models has diverged into two main approaches: implicit and explicit models, each with its own limitations [4][7]. - Implicit models enhance spatial understanding by extracting 3D/4D content, while explicit models require detailed structural information to ensure system stability and usability [7][8]. - Current research primarily focuses on static 3D scenes, with methods for constructing and enriching environments being well-established and ready for practical application [8]. Group 2: Challenges and Solutions - Existing challenges in 3D geometry modeling include the rough optimization of physical surfaces and the visual gap between generated meshes and real-world applications [9][10]. - The integration of mesh supervision and structured processing is being explored to improve surface quality in 3D reconstruction [10]. - The need for cross-physics simulator platform deployment is highlighted, as existing solutions often rely on specific physics parameters from platforms like Mujoco [10]. Group 3: Video Generation and Motion Understanding - The emergence of large-scale data cleaning and annotation has improved motion prediction capabilities in 3D models, with advancements in 3DGS/4DGS and world model integration [11]. - Current video generation techniques struggle with understanding physical interactions and changes in the environment, indicating a gap in the ability to simulate realistic motion [15]. - Future developments may focus on combining simulation and video generation to enhance the understanding of physical properties and interactions [15]. Group 4: Future Directions - The article predicts that future work will increasingly incorporate physical knowledge into 3D/4D models, aiming for better direct physical understanding and visual reasoning capabilities [16]. - The evolution of world models is expected to become modular within embodied intelligence frameworks, depending on ongoing research and simplification of world model definitions [16].