Workflow
自动驾驶之心
icon
Search documents
大模型面经 - 快手快 Star
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - The article discusses the advancements and opportunities in the field of autonomous driving, emphasizing the importance of multi-modal large models and their applications in various aspects of the industry [5][6]. Group 1: Interview Insights - The interview process for positions related to multi-modal large models involves detailed discussions about candidates' research papers, particularly focusing on methodologies and results [4][5]. - Candidates are expected to demonstrate knowledge of current multi-modal large models and their paradigms, including specific models like BLIP-2 and Qwen-VL [5]. - Technical questions cover various topics such as Learnable Query, KV Cache, and the training and fine-tuning processes of large models [5][6]. Group 2: Community and Resources - The article highlights a community with nearly 4,000 members, including over 300 companies and research institutions in the autonomous driving sector, providing a platform for knowledge exchange [7]. - It mentions a comprehensive learning path covering over 30 areas of autonomous driving technology, from perception to planning and control [7]. - The community offers resources on various technical solutions and industry dynamics, aiming to support newcomers in the field of autonomous driving [7].
VLA的Action到底是个啥?谈谈Diffusion:从图像生成到端到端轨迹规划~
自动驾驶之心· 2025-07-19 10:19
Core Viewpoint - The article discusses the principles and applications of diffusion models in the context of autonomous driving, highlighting their advantages over generative adversarial networks (GANs) and detailing specific use cases in the industry. Group 1: Diffusion Model Principles - Diffusion models are generative models that focus on denoising, learning and simulating data distributions through a forward diffusion process and a reverse generation process [2][4]. - The forward diffusion process adds noise to the initial data distribution, while the reverse generation process aims to remove noise to recover the original data [5][6]. - The models typically utilize a Markov chain to describe the state transitions during the noise addition and removal processes [8]. Group 2: Comparison with Generative Adversarial Networks - Both diffusion models and GANs involve noise addition and removal processes, but they differ in their core mechanisms: diffusion models rely on probabilistic modeling, while GANs use adversarial training between a generator and a discriminator [20][27]. - Diffusion models are generally more stable during training and produce higher quality samples, especially at high resolutions, compared to GANs, which can suffer from mode collapse and require training multiple networks [27][28]. Group 3: Applications in Autonomous Driving - Diffusion models are applied in various areas of autonomous driving, including synthetic data generation, scene prediction, perception enhancement, and path planning [29]. - They can generate realistic driving scene data to address the challenges of data scarcity and high annotation costs, particularly for rare scenarios like extreme weather [30][31]. - In scene prediction, diffusion models can forecast dynamic changes in driving environments and generate potential behaviors of traffic participants [33]. - For perception tasks, diffusion models enhance data quality by denoising bird's-eye view (BEV) images and improving sensor data consistency [34][35]. - In path planning, diffusion models support multimodal path generation, enhancing safety and adaptability in complex driving conditions [36]. Group 4: Notable Industry Implementations - Companies like Haomo Technology and Horizon Robotics are developing advanced algorithms based on diffusion models for real-world applications, achieving state-of-the-art performance in various driving scenarios [47][48]. - The integration of diffusion models with large language models (LLMs) and other technologies is expected to drive further innovations in the autonomous driving sector [46].
盘点 | 浙江大学高飞团队2025上半年无人机硬核成果
自动驾驶之心· 2025-07-19 10:19
以下文章来源于深蓝AI ,作者深蓝学院 深蓝AI . 专注于人工智能、机器人与自动驾驶的学习平台。 作者 | 深蓝学院 来源 | 深蓝AI 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 FIRI: Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space 作为机器人与无人机领域的杰出学者,高飞老师始终站在研究的最前沿。 高飞:国家优青。浙江大学控制学院长聘副教授,研究员,博士生导师。主要研究方向:空中机器人、轨迹规划、自主导航、集群协同、定位感知。近年来,以第一作者/通讯作者身份 在知名机器人期刊、会议发表论文70余篇;获IEEE TRO 2020年最佳论文荣誉提名奖、国际基础科学大会ICBS 2024前沿科学奖、IEEE ICRA 2024年最佳论文提名等学术荣誉;入选爱思 唯尔数据库2023/24年度全球前2%顶尖科学家。 实 ...
DeepSeek终于丢了开源第一王座。。。
自动驾驶之心· 2025-07-19 10:19
Core Viewpoint - Kimi K2 has surpassed DeepSeek to become the top open-source model globally, ranking fifth overall and closely following top proprietary models like Musk's Grok 4 [3][4]. Group 1: Ranking and Performance - Kimi K2 achieved a score of 1420, placing it fifth in the overall ranking, with a notable performance in various capabilities, including being tied for first in multi-turn dialogue and second in programming ability [4][7]. - The top ten models now all have scores above 1400, indicating that the performance gap between open-source and proprietary models is narrowing [22][24]. Group 2: Community Engagement and Adoption - Kimi K2 has gained significant attention in the open-source community, with 5.6K stars on GitHub and nearly 100,000 downloads on Hugging Face within a week of its release [6][5]. - The CEO of Perplexity has publicly endorsed Kimi K2, indicating plans to utilize the model for further training, showcasing its potential in practical applications [8]. Group 3: Architectural Decisions - Kimi K2 inherits the architecture of DeepSeek V3, with specific parameter adjustments made to optimize performance while managing costs effectively [10][14]. - The adjustments include increasing the number of experts while reducing the number of attention heads, which helps maintain efficiency without significantly impacting performance [15][18]. Group 4: Industry Trends - The perception that open-source models are inferior is being challenged, with industry experts predicting that open-source will increasingly rival proprietary models in performance [22][27]. - Tim Dettmers from the Allen Institute for AI suggests that open-source models defeating proprietary ones will become more common, highlighting a shift in the AI landscape [28].
厘米级精度的三维场景实时重构!这款三维激光扫描仪太好用了~
自动驾驶之心· 2025-07-19 10:19
Core Viewpoint - GeoScan S1 is presented as a highly cost-effective handheld 3D laser scanner, designed for various operational fields with features such as lightweight design, one-button startup, and centimeter-level precision in real-time 3D scene reconstruction [1][4]. Group 1: Product Features - The GeoScan S1 can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large scenes over 200,000 square meters [1][23][24]. - It integrates multiple sensors, including RTK, 3D laser radar, and dual wide-angle cameras, allowing for high precision and efficiency in mapping [7][28]. - The device operates on a handheld Ubuntu system and includes various sensor devices, with a power supply integrated into the handle [2][16]. Group 2: Usability and Efficiency - The device is designed for ease of use, allowing for simple operation with immediate export of scanning results without complex deployment [3][4]. - It features a small and integrated design that maximizes hardware performance, making it suitable for complex indoor and outdoor environments [7][32]. - The scanner supports real-time modeling and high-fidelity restoration of scenes, utilizing advanced multi-sensor SLAM algorithms [21][28]. Group 3: Market Position and Pricing - GeoScan S1 is marketed as the most affordable option in the industry, with a starting price of 19,800 yuan for the basic version [4][51]. - The product has undergone extensive validation through numerous projects, backed by collaborations with academic institutions [3][4]. Group 4: Application Scenarios - The scanner is capable of accurately constructing 3D scene maps in various environments, including office buildings, parking lots, industrial parks, tunnels, forests, and mines [32][40]. - It can be integrated with unmanned platforms such as drones and robotic vehicles, facilitating automated operations [38].
博士毕业,五篇顶会起步。。。
自动驾驶之心· 2025-07-19 06:49
Core Viewpoint - The article emphasizes the importance of timely submission and high-quality research papers for academic success, particularly in the field of autonomous driving and AI research, while offering a structured 1v1 guidance program to assist researchers in enhancing their paper quality and submission strategy [2]. Group 1: Pain Points Addressed - The program addresses the lack of guidance for students, helping them navigate the research process and develop a clear understanding of research methodologies [6]. - It aims to assist students in establishing a systematic approach to research, combining theoretical models with practical coding skills [6]. - The service is designed for computer science students at various academic levels who seek to enhance their research capabilities and academic credentials [6]. Group 2: Course Content - The 1v1 research paper guidance includes several stages: topic selection, experimental design, writing, and submission [5][9][11][12]. - In the topic selection phase, mentors help students brainstorm ideas or provide direct suggestions based on their needs [7]. - During the experimental phase, mentors guide students through the entire process, ensuring the feasibility and quality of their experiments [9][14]. - The writing phase focuses on crafting a compelling research paper that meets high standards [11][15]. - In the submission phase, mentors recommend suitable journals and assist with the submission process [12][16]. Group 3: Course Outcomes - Participants can expect to produce a high-quality paper suitable for their target publication [23]. - The program enhances participants' understanding of the research process and improves their research skills [23]. - Students will learn effective writing and submission strategies tailored to their specific research areas [23][24]. Group 4: Course Structure - The total guidance period varies from 3 to 18 months depending on the target publication level, with specific course hours allocated for each category [24]. - The core guidance period consists of weekly 1-on-1 meetings, while the maintenance period provides ongoing support for revisions and feedback [26]. Group 5: Communication and Support - The course utilizes online platforms for live sessions and provides a dedicated communication group for ongoing support and queries [22][27]. - Each student has access to a private discussion group with mentors for idea exploration and course-related questions [23].
死磕技术的自动驾驶黄埔军校,三周年了~
自动驾驶之心· 2025-07-19 06:32
Core Viewpoint - The article discusses the significant progress made in the field of autonomous driving and embodied intelligence over the past year, highlighting the establishment of various platforms and services aimed at enhancing education and employment opportunities in these sectors [2]. Group 1: Company Developments - The company has developed four key IPs: "Autonomous Driving Heart," "Embodied Intelligence Heart," "3D Vision Heart," and "Large Model Heart," expanding its reach through various platforms including knowledge sharing and community engagement [2]. - The transition from purely online education to a comprehensive service platform that includes hardware, offline training, and job placement services has been emphasized, showcasing a strategic shift in business operations [2]. - The establishment of a physical office in Hangzhou and the recruitment of talented individuals indicate the company's commitment to growth and industry engagement [2]. Group 2: Community and Educational Initiatives - The "Autonomous Driving Heart Knowledge Planet" has become the largest community for autonomous driving learning in China, with nearly 4,000 members and over 100 industry experts contributing to discussions and knowledge sharing [4]. - The community has compiled over 30 learning pathways covering various aspects of autonomous driving technology, including perception, mapping, and AI model deployment, aimed at facilitating both newcomers and experienced professionals [4]. - The platform encourages active participation and problem-solving among members, fostering a collaborative environment for learning and professional development [4]. Group 3: Technological Focus Areas - The article highlights four major technological directions within the community: Visual Large Language Models (VLM), World Models, Diffusion Models, and End-to-End Autonomous Driving, with resources and discussions centered around these topics [6][33]. - The community provides access to cutting-edge research, datasets, and application examples, ensuring members stay informed about the latest advancements in autonomous driving and related fields [6][33]. - The focus on embodied intelligence and large models reflects the industry's shift towards integrating advanced AI capabilities into autonomous systems, indicating a trend towards more sophisticated and capable driving solutions [2].
死磕技术的自动驾驶黄埔军校,三周年了。。。
自动驾驶之心· 2025-07-19 03:04
Core Insights - The article emphasizes the transition of autonomous driving technology from Level 2/3 (assisted driving) to Level 4/5 (fully autonomous driving) by 2025, highlighting the competitive landscape in AI, particularly in autonomous driving, embodied intelligence, and large model agents [2][4]. Group 1: Autonomous Driving Community - The "Autonomous Driving Heart Knowledge Planet" is established as the largest community for autonomous driving technology in China, aiming to serve as a training ground for industry professionals [4][6]. - The community has nearly 4,000 members and over 100 industry experts, providing a platform for discussions, learning routes, and job referrals [4][6]. - The community focuses on various subfields of autonomous driving, including end-to-end driving, world models, and multi-sensor fusion, among others [4][6]. Group 2: Learning Modules and Resources - The knowledge community includes four main technical areas: visual large language models, world models, diffusion models, and end-to-end autonomous driving [6][7]. - It offers a comprehensive collection of resources, including cutting-edge articles, datasets, and application summaries relevant to the autonomous driving sector [6][7]. Group 3: Job Opportunities and Networking - The community has established direct referral channels with numerous autonomous driving companies, facilitating job placements for members [4][6]. - Active participation is encouraged, with a focus on fostering a collaborative environment for both newcomers and experienced professionals [4][6]. Group 4: Technical Insights - The article outlines various learning paths and technical insights into autonomous driving, emphasizing the importance of understanding perception, mapping, planning, and control in the development of autonomous systems [4][6][24]. - It highlights the significance of large language models and their integration into autonomous driving applications, enhancing decision-making and navigation capabilities [25][26].
ICCV'25南开AD-GS:自监督智驾高质量闭环仿真,PSNR暴涨2个点!
自动驾驶之心· 2025-07-18 10:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 南开大学团队 ICCV'25中稿的 最新工作! AD-GS: 自监督自动驾驶高质量闭环仿真,PSNR暴涨2个点! 如果您有相关工作需要分 享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Jiawei Xu等 编辑 | 自动驾驶之心 背景与挑战 自动驾驶场景的动态建模与渲染对仿真系统至关重要,但现有方法存在明显局限:依赖人工3D标注的方法 成本高昂,难以大规模应用;自监督方法则面临动态物体运动捕捉不准确、场景分解粗糙导致渲染伪影等 问题。 动态城市驾驶场景的高质量渲染需要精准捕捉车辆、行人等动态物体的运动,同时实现场景的有效分解。 传统自监督方法中,神经网络建模运动计算量大且局部细节捕捉不足,仅用三角函数等预定义函数虽提升 速度却难以处理局部运动;场景分解依赖复杂语义标注,噪声干扰严重,导致重建质量下降。 核心创新 AD-GS提出一种全新自监督框架,基 ...
宇树科技,开启上市辅导
自动驾驶之心· 2025-07-18 10:32
来源 | 财联社 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 | 2025年7月7日 | 辅导协议签署 | | --- | --- | | | 时 间 | | 导 机 构 中信证券股份有限公司(以下简称"中信证券") | 利 | | 律师事务所 北京德恒律师事务所(以下简称"德恒律师") | | >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 中国证监会官网显示,宇树科技已开启上市辅导,由中信证券担任辅导机构。 辅导备案报告显示,宇树科技控股股东、实际控制人为王兴兴,王兴兴直接持有公司23.8216%股 权,并通过上海宇翼企业管理咨询合伙企业(有限合伙)控制公司10.9414%股权,合计控制公司 34.7630%股权。 | 辅导对象 | 杭州宇树科技股份有限公司(以下简称"字树科技"、"公司") | | | | --- | --- | --- | --- | | 成立日期 | 2016年8月26日 | | | | 状数据执 | 36,401.7906 万元 | 法定代表人 | 王兴兴 | | 京家康 坎 | 浙 ...