Workflow
自动驾驶之心
icon
Search documents
某具身智能创始人“身兼数职”
自动驾驶之心· 2025-08-09 16:03
Core Viewpoint - The article discusses the current state of investment in embodied intelligence companies, highlighting the dual roles of founders who often maintain academic positions while running startups, raising questions about their commitment to the entrepreneurial venture [5][6]. Group 1: Investment Trends - This year has seen a surge in investment in embodied intelligence, with significant capital flowing into these companies, often amounting to hundreds of millions [5][6]. - Founders of these companies frequently hold dual roles, such as being assistant professors at prestigious universities, which raises concerns about their full commitment to their startups [6]. Group 2: Founder Dynamics - Many founders are described as "multi-tasking," engaging in various roles including consulting for automotive companies and publishing papers, which can lead to a lack of focus on their primary business [5][6]. - The article notes that some founders, despite their academic accolades, may lack the practical experience necessary for the high-pressure environment of production, leading to a disconnect between their academic background and industry demands [7]. Group 3: Industry Challenges - The transition from academia to industry can be challenging, with some academics struggling to adapt to the rigorous demands of production, resulting in a shift in their professional demeanor [7]. - The article suggests that the current phase of embodied intelligence is still in the early stages, characterized by storytelling and presentations rather than tangible product development [7].
自动驾驶论文速递 | 端到端、分割、轨迹规划、仿真等~
自动驾驶之心· 2025-08-09 13:26
Core Insights - The article discusses advancements in autonomous driving technologies, highlighting various frameworks and their contributions to improving safety, efficiency, and robustness in real-world scenarios. Group 1: DRIVE Framework - The DRIVE framework proposed by Stanford University and Microsoft integrates dynamic rule inference and verified evaluation for constraint-aware autonomous driving, achieving a 0.0% soft constraint violation rate and enhancing trajectory smoothness and generalization capabilities [2][6]. Group 2: Hybrid Learning-Optimization Framework - A hybrid learning-optimization trajectory planning framework developed by Beijing Jiaotong University and Hainan University achieves a 97% success rate and real-time planning performance of 54 milliseconds in highway scenarios [11][12]. Group 3: RoboTron-Sim - The RoboTron-Sim framework, developed by Meituan and Sun Yat-sen University, enhances the robustness of autonomous driving in extreme scenarios, achieving a 51.3% reduction in collision rates and a 51.5% improvement in trajectory accuracy on the nuScenes test [18][20]. Group 4: SAV Framework - The SAV framework proposed by Anhui University achieves high-precision vehicle part segmentation with an 81.23% mean Intersection over Union (mIoU) on the VehicleSeg10K dataset, surpassing previous best methods by 4.33% [34][40].
2000人了,这个具身领域的黄埔军校有哪些料?
自动驾驶之心· 2025-08-09 08:21
昨天下午有个同学找峰哥吐槽,公司让调试机器人,不知道怎么做数据采集和调试,自由度太多了。如何 分析问题也是一头雾水,在校跑跑demo还可以,真的上手真机了,坑还是很多。 这类问题前面在咱们的具身社区里面已经碰到过多次了,如何使用设备?如何有效采集数据?如何部署 VA、VLA模型等。是采集背景太复杂还是数据比较dirty? 后面我们也很快给他相关答复,快速用到项目里 面了。 一个社区能在大家最需要帮助的时候解决问题,无疑是非常有价值的。具身智能之心知识星球(国内首个 具身全栈技术社区),目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。遇到什么问题就 分享什么解决方案,哪块研究最前沿,就给大家源源不断提供解决思路,还有求职岗位第一时间对接给大 家!除了上面的问题,我们还为大家梳理了很多其它的内容: 机器人仿真和数据采集有哪些平台? 人形机器人怎么做模仿学习?VLA为什么难做? VLA在机器人抓取与规划任务中是怎么用的? VLA+RL是怎么做的?为什么work? sim2real效果不好怎么办?real2sim2real是怎么work的? 分层决策一般是怎么做的?和端到端比优势劣势有哪些? 具身机器人的研 ...
给自动驾驶感知工程师的规划速成课
自动驾驶之心· 2025-08-08 16:04
Core Insights - The article discusses the evolution and importance of planning modules in autonomous driving, emphasizing the need for engineers to understand both traditional and machine learning-based approaches to effectively address challenges in the field [5][8][10]. Group 1: Importance of Planning - Understanding planning is crucial for engineers, especially in the context of autonomous driving, as it allows for better service to downstream customers and enhances problem-solving capabilities [8][10]. - The transition from rule-based systems to machine learning systems in planning will likely see a coexistence of both methods for an extended period, with a gradual shift in their usage ratio from 8:2 to 2:8 [8][10]. Group 2: Planning System Overview - The planning system in autonomous vehicles is essential for generating safe, comfortable, and efficient driving trajectories, relying on inputs from perception outputs [11][12]. - Traditional planning modules consist of global path planning, behavior planning, and trajectory planning, with behavior and trajectory planning often working in tandem [12]. Group 3: Challenges in Planning - A significant challenge in the planning technology stack is the lack of standardized terminology, leading to confusion in both academic and industrial contexts [15]. - The article highlights the need for a unified approach to behavior planning, as the current lack of consensus on semantic actions limits the effectiveness of planning systems [18]. Group 4: Planning Techniques - The article outlines three primary tools used in planning: search, sampling, and optimization, each with its own methodologies and applications in autonomous driving [24][41]. - Search methods, such as Dijkstra and A* algorithms, are popular for path planning, while sampling methods like Monte Carlo are used for evaluating numerous options quickly [25][32]. Group 5: Industrial Practices - The article discusses the distinction between decoupled and joint spatiotemporal planning methods, with decoupled solutions being easier to implement but potentially less optimal in complex scenarios [52][54]. - The Apollo EM planner is presented as an example of a decoupled planning approach, which simplifies the problem by breaking it into two-dimensional issues [56][58]. Group 6: Decision-Making in Autonomous Driving - Decision-making in autonomous driving focuses on interactions with other road users, addressing uncertainties and dynamic behaviors that complicate planning [68][69]. - The use of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP) frameworks is essential for handling the probabilistic nature of interactions in driving scenarios [70][74].
自动驾驶中常提的VLM是个啥?与VLA有什么区别?
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The article discusses the significance of Vision-Language Models (VLM) in the context of autonomous driving, highlighting their ability to integrate visual perception and natural language processing to enhance vehicle understanding and interaction with complex road environments [4][19]. Summary by Sections What is VLM? - VLM stands for Vision-Language Model, which combines the capabilities of understanding images and text within a single AI system. It enables deep comprehension of visual content and natural language interaction, enhancing applications like image retrieval, writing assistance, and robotic navigation [6]. How to Make VLM Work Efficiently? - VLM processes raw road images into feature representations using visual encoders, such as Convolutional Neural Networks (CNN) and Vision Transformers (ViT). Language encoders and decoders handle natural language input and output, learning semantic relationships between tokens [8]. Key Mechanism of VLM - The alignment of visual features and language modules is crucial for VLM. Cross-attention mechanisms allow the language decoder to focus on relevant image areas when generating text, ensuring high consistency between generated language and actual scenes [9]. Training Process of VLM - The training process for VLM typically involves pre-training on large datasets followed by fine-tuning with specific datasets related to autonomous driving scenarios, ensuring the model can accurately recognize and respond to traffic signs and conditions [11]. Applications of VLM - VLM supports various intelligent functions, including real-time scene alerts, interactive semantic Q&A, and recognition of road signs and text. It can generate natural language prompts based on visual inputs, enhancing driver awareness and decision-making [12]. Real-time Operation of VLM - VLM operates in a "cloud-edge collaboration" architecture, where large-scale pre-training occurs in the cloud, and optimized lightweight models are deployed in vehicles for real-time processing. This setup allows for quick responses to safety alerts and complex analyses [14]. Data Annotation and Quality Assurance - Data annotation is critical for VLM deployment, requiring detailed labeling of images under various conditions. This process ensures high-quality training data, which is essential for the model's performance in real-world scenarios [14]. Safety and Robustness - Safety and robustness are paramount in autonomous driving. VLM must quickly assess uncertainties and implement fallback measures when recognition errors occur, ensuring reliable operation under adverse conditions [15]. Differences Between VLA and VLM - VLA (Vision-Language-Action) extends VLM by integrating action decision-making capabilities. While VLM focuses on understanding and expressing visual information, VLA encompasses perception, cognition, and execution, making it essential for real-world applications like autonomous driving [18]. Future Developments - The continuous evolution of large language models (LLM) and large vision models (LVM) will enhance VLM's capabilities in multi-modal integration, knowledge updates, and human-machine collaboration, leading to safer and more comfortable autonomous driving experiences [16][19].
从自动驾驶到具身智能,这几个社区撑起了半边天!
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The furniture and autonomous driving industries are experiencing significant growth in production, financing, and recruitment, leading to a highly competitive job market where skilled professionals are in high demand [1]. Group 1: Industry Trends - The industry is focusing on practical technologies, with companies competing to secure talent with relevant skills [1]. - The job market is described as "highly competitive," making it difficult for candidates to secure positions despite the availability of openings [1]. Group 2: Recommended Learning Communities - "Smart Driving Frontier" is a comprehensive media platform dedicated to the autonomous driving sector, providing technical insights and industry news [1]. - "Computer Vision Research Institute" focuses on AI research and practical applications, sharing the latest algorithms and project experiences [3]. - "Visual Language Navigation" aims to create a professional platform for navigation technologies, sharing technical insights and industry news [5]. - "Embodied Intelligence Research Lab" emphasizes core areas such as reinforcement learning and multi-agent collaboration, providing research updates and practical case studies [6]. - "Embodied Intelligence Heart" is the largest community for embodied intelligence, covering various technical directions and encouraging collaboration among developers [7]. - "arXiv Daily Academic Express" offers daily updates on academic papers across multiple fields, including AI and robotics, facilitating quick access to relevant research [8]. - "Autonomous Driving Heart" is a community for developers in the autonomous driving field, focusing on various technical aspects and job opportunities [10].
基于开源Qwen2.5-VL实现自动驾驶VLM微调
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the LLaMA Factory framework and the Qwen2.5-VL model, which enhance the capabilities of vision-language-action models for autonomous driving applications [4][5]. Group 1: LLaMA Factory Overview - LLaMA Factory is an open-source low-code framework for fine-tuning large models, gaining popularity in the open-source community with over 40,000 stars on GitHub [3]. - The framework integrates widely used fine-tuning techniques, making it suitable for developing autonomous driving assistants that can interpret traffic conditions through natural language [3]. Group 2: Qwen2.5-VL Model - The Qwen2.5-VL model serves as the foundational model for the project, achieving significant breakthroughs in visual recognition, object localization, document parsing, and long video understanding [4]. - It offers three model sizes, with the flagship Qwen2.5-VL-72B performing comparably to advanced models like GPT-4o and Claude 3.5 Sonnet, while smaller versions excel in resource-constrained environments [4]. Group 3: CoVLA Dataset - The CoVLA dataset, comprising 10,000 real driving scenes and over 80 hours of video, is utilized for training and evaluating vision-language-action models [5]. - This dataset surpasses existing datasets in scale and annotation richness, providing a comprehensive platform for developing safer and more reliable autonomous driving systems [5]. Group 4: Model Training and Testing - Instructions for downloading and installing LLaMA Factory and the Qwen2.5-VL model are provided, including commands for setting up the environment and testing the model [6][7]. - The article details the process of fine-tuning the model using the SwanLab tool for visual tracking of the training process, emphasizing the importance of adjusting parameters to avoid memory issues [11][17]. - After training, the fine-tuned model demonstrates improved response quality in dialogue scenarios related to autonomous driving risks compared to the original model [19].
准备扩大自驾团队了,欢迎加入我们~
自动驾驶之心· 2025-08-08 03:20
目前自动驾驶和具身智能两个方向我们已经和业内主流的公司及相关高校建立起深度的合作,大模型方向 也正在快速搭建。我们不止聚焦在技术本身,更愿意和大家一起共创整个AI领域,分享认知成长的喜悦。 对于热门事件,我同样希望我们提供全网独一份的内容价值。 不积跬步无以至千里,我们深知一个人的力量是有限的,所以我们期待更多优秀的小伙伴与我们一起同行~ 内容运营 - 实习生 工作内容: 岗位要求: 1. 自驾、大模型、具身相关研究方向,本科及以上学历,硕士优先; 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 大家好,我们是自动驾驶之心/具身智能/大模型之心Tech团队。非常高兴在这里和你相遇,如果你也认同技 术内容可以改变世界,那你可能就是我们在找的人! 我们在做什么? 我们希望通过技术内容连接学术界和工业界,成为企业和学校沟通的桥梁,更乃至数十万的AI开发者和创 业者。我们致力于为大家带来全网最新最权威的技术信息,团队聚焦在自动驾驶、具身智能、大模型等AI 最前沿的技术领域,涵盖学术论文解读、业内量产方案分析、大模型评测、商业动态、行业招聘、开源项 目等,并通过公众 ...
死磕技术的自动驾驶黄埔军校,4000人了!
自动驾驶之心· 2025-08-08 03:20
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 自动驾驶之心的星友已经正式突破四千人了,三年了不容易。知识星球截止到目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。几个运营的小伙 伴每天都在复盘,什么样的社区才是大家需要的?我们有没有什么地方没有考虑到?花拳绣腿的不行、没人交流的也不行、找不到工作的更不行。 未来我们计划继续优化星球内容,今天也和大家汇报一下。打算开展一个星友面对面的模块,争取每个月线上和大家一起聊聊,针对共性的问题一起探讨下。未 来还将持续邀请邀请学术界和工业界的大佬做一些有深度的圆桌访谈! 我们是一个认真做内容的社区,一个培养未来领袖的地方。自动驾驶之心一直致力在推动行业发展,成为企业和高校沟通的桥梁。我们的愿景是让AI与自动驾驶 走进每个有需要的同学! 目前星球内部为大家梳理了近40+技术路线,无论你是咨询行业应用、还是要找最新的VLA benchmark、综述和学习入门路线,都能极大缩短检索时间。星球还 为大家邀请了数十位自动驾驶领域嘉宾,都是活跃在一线产业界和工业界的大佬(经常出现的顶会和各类访谈中哦)。欢迎随时提问,他 ...
手持激光雷达即可在线实时重建点云!超高性价比3D扫描仪来了~
自动驾驶之心· 2025-08-07 23:32
Core Viewpoint - The GeoScan S1 is presented as the most cost-effective 3D laser scanner in China, designed for various applications such as campus and indoor scene reconstruction, featuring lightweight design and user-friendly operation [1][7]. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in real-time 3D scene reconstruction using a multi-modal sensor fusion algorithm [1]. - It can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large scenes over 200,000 square meters [1][27]. - The device is equipped with a built-in Ubuntu system and various sensor devices, allowing for flexible power supply and integration with other equipment [3][10]. - It supports real-time mapping and high-precision modeling, with relative accuracy better than 3 cm and absolute accuracy better than 5 cm [20]. Group 2: User Experience - The device is designed for ease of use, allowing users to start scanning with a single button and export results without complex setups [5]. - It features a compact design with integrated sensors and expandable interfaces, enhancing hardware performance [10][36]. - The GeoScan S1 supports offline and online rendering, providing users with immediate visualization of scanning results [6]. Group 3: Market Positioning - The product is marketed at a competitive price, starting from 19,800 yuan for the basic version, with additional versions available to meet various needs [7][56]. - The company emphasizes its strong background and project validation through collaborations with academic institutions, showcasing its credibility in the industry [7]. Group 4: Application Scenarios - The GeoScan S1 is suitable for a wide range of environments, including office buildings, parking lots, industrial parks, tunnels, forests, and mines, effectively completing 3D scene mapping [36][45]. - It can be integrated with various platforms such as drones, unmanned vehicles, and robots, facilitating unmanned operations [42].