自动驾驶之心 - filings, earnings calls, financial reports, news

自动驾驶之心

Search documents

2000人了，这个具身领域的黄埔军校有哪些料？

自动驾驶之心· 2025-08-09 08:21

昨天下午有个同学找峰哥吐槽，公司让调试机器人，不知道怎么做数据采集和调试，自由度太多了。如何分析问题也是一头雾水，在校跑跑demo还可以，真的上手真机了，坑还是很多。这类问题前面在咱们的具身社区里面已经碰到过多次了，如何使用设备？如何有效采集数据？如何部署 VA、VLA模型等。是采集背景太复杂还是数据比较dirty? 后面我们也很快给他相关答复，快速用到项目里面了。一个社区能在大家最需要帮助的时候解决问题，无疑是非常有价值的。具身智能之心知识星球（国内首个具身全栈技术社区），目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。遇到什么问题就分享什么解决方案，哪块研究最前沿，就给大家源源不断提供解决思路，还有求职岗位第一时间对接给大家！除了上面的问题，我们还为大家梳理了很多其它的内容：机器人仿真和数据采集有哪些平台？人形机器人怎么做模仿学习？VLA为什么难做？ VLA在机器人抓取与规划任务中是怎么用的？ VLA+RL是怎么做的？为什么work？ sim2real效果不好怎么办？real2sim2real是怎么work的？分层决策一般是怎么做的？和端到端比优势劣势有哪些？具身机器人的研 ...

自动驾驶之心· 2025-08-08 16:04

Core Insights - The article discusses the evolution and importance of planning modules in autonomous driving, emphasizing the need for engineers to understand both traditional and machine learning-based approaches to effectively address challenges in the field [5][8][10]. Group 1: Importance of Planning - Understanding planning is crucial for engineers, especially in the context of autonomous driving, as it allows for better service to downstream customers and enhances problem-solving capabilities [8][10]. - The transition from rule-based systems to machine learning systems in planning will likely see a coexistence of both methods for an extended period, with a gradual shift in their usage ratio from 8:2 to 2:8 [8][10]. Group 2: Planning System Overview - The planning system in autonomous vehicles is essential for generating safe, comfortable, and efficient driving trajectories, relying on inputs from perception outputs [11][12]. - Traditional planning modules consist of global path planning, behavior planning, and trajectory planning, with behavior and trajectory planning often working in tandem [12]. Group 3: Challenges in Planning - A significant challenge in the planning technology stack is the lack of standardized terminology, leading to confusion in both academic and industrial contexts [15]. - The article highlights the need for a unified approach to behavior planning, as the current lack of consensus on semantic actions limits the effectiveness of planning systems [18]. Group 4: Planning Techniques - The article outlines three primary tools used in planning: search, sampling, and optimization, each with its own methodologies and applications in autonomous driving [24][41]. - Search methods, such as Dijkstra and A* algorithms, are popular for path planning, while sampling methods like Monte Carlo are used for evaluating numerous options quickly [25][32]. Group 5: Industrial Practices - The article discusses the distinction between decoupled and joint spatiotemporal planning methods, with decoupled solutions being easier to implement but potentially less optimal in complex scenarios [52][54]. - The Apollo EM planner is presented as an example of a decoupled planning approach, which simplifies the problem by breaking it into two-dimensional issues [56][58]. Group 6: Decision-Making in Autonomous Driving - Decision-making in autonomous driving focuses on interactions with other road users, addressing uncertainties and dynamic behaviors that complicate planning [68][69]. - The use of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP) frameworks is essential for handling the probabilistic nature of interactions in driving scenarios [70][74].

自动驾驶中常提的VLM是个啥？与VLA有什么区别？

自动驾驶之心· 2025-08-08 16:04

Core Viewpoint - The article discusses the significance of Vision-Language Models (VLM) in the context of autonomous driving, highlighting their ability to integrate visual perception and natural language processing to enhance vehicle understanding and interaction with complex road environments [4][19]. Summary by Sections What is VLM? - VLM stands for Vision-Language Model, which combines the capabilities of understanding images and text within a single AI system. It enables deep comprehension of visual content and natural language interaction, enhancing applications like image retrieval, writing assistance, and robotic navigation [6]. How to Make VLM Work Efficiently? - VLM processes raw road images into feature representations using visual encoders, such as Convolutional Neural Networks (CNN) and Vision Transformers (ViT). Language encoders and decoders handle natural language input and output, learning semantic relationships between tokens [8]. Key Mechanism of VLM - The alignment of visual features and language modules is crucial for VLM. Cross-attention mechanisms allow the language decoder to focus on relevant image areas when generating text, ensuring high consistency between generated language and actual scenes [9]. Training Process of VLM - The training process for VLM typically involves pre-training on large datasets followed by fine-tuning with specific datasets related to autonomous driving scenarios, ensuring the model can accurately recognize and respond to traffic signs and conditions [11]. Applications of VLM - VLM supports various intelligent functions, including real-time scene alerts, interactive semantic Q&A, and recognition of road signs and text. It can generate natural language prompts based on visual inputs, enhancing driver awareness and decision-making [12]. Real-time Operation of VLM - VLM operates in a "cloud-edge collaboration" architecture, where large-scale pre-training occurs in the cloud, and optimized lightweight models are deployed in vehicles for real-time processing. This setup allows for quick responses to safety alerts and complex analyses [14]. Data Annotation and Quality Assurance - Data annotation is critical for VLM deployment, requiring detailed labeling of images under various conditions. This process ensures high-quality training data, which is essential for the model's performance in real-world scenarios [14]. Safety and Robustness - Safety and robustness are paramount in autonomous driving. VLM must quickly assess uncertainties and implement fallback measures when recognition errors occur, ensuring reliable operation under adverse conditions [15]. Differences Between VLA and VLM - VLA (Vision-Language-Action) extends VLM by integrating action decision-making capabilities. While VLM focuses on understanding and expressing visual information, VLA encompasses perception, cognition, and execution, making it essential for real-world applications like autonomous driving [18]. Future Developments - The continuous evolution of large language models (LLM) and large vision models (LVM) will enhance VLM's capabilities in multi-modal integration, knowledge updates, and human-machine collaboration, leading to safer and more comfortable autonomous driving experiences [16][19].