Workflow
自动驾驶之心
icon
Search documents
当我们谈大模型和vla岗位的时候,究竟有哪些内容?(附岗位)
自动驾驶之心· 2025-07-11 11:23
Core Viewpoint - The article discusses the differences between VLA (Vision-Language-Action) and end-to-end models in the context of autonomous driving, emphasizing the importance of large models and their applications in the industry [2]. Group 1: Job Descriptions and Requirements - Positions related to large model development, including VLA and end-to-end roles, are highlighted, with a focus on skills in fine-tuning, lightweight models, and deployment [2]. - The job of an end-to-end/VLA engineer involves developing and implementing driving systems, optimizing model structures, and constructing high-quality training datasets [6]. - The VLA/VLM algorithm position requires a master's degree in computer science or AI, with 3-5 years of experience in autonomous driving or AI algorithms, and proficiency in VLA/VLM architectures [8][10]. Group 2: Technical Skills and Experience - Candidates are expected to have experience with multimodal large language models, fine-tuning existing models for specific business scenarios, and familiarity with Transformer and multimodal technologies [5]. - Experience in computer vision, trajectory prediction, and decision planning is essential, along with a strong foundation in mainstream technologies and frameworks like PyTorch [9]. - The article emphasizes the need for candidates to have published papers in top conferences or achieved notable results in international competitions [9][11].
暑假打比赛!RealADSim Workshop智驾挑战赛正式开启,奖池总金额超30万(ICCV'25)
自动驾驶之心· 2025-07-11 09:42
Core Viewpoint - The article emphasizes the significance of high-fidelity simulation technology in overcoming the challenges of testing autonomous driving algorithms, particularly through the introduction of New View Synthesis (NVS) technology, which allows for the creation of closed-loop driving simulation environments based on real-world data [1][2]. Group 1: Challenges and Tasks - The workshop addresses two main challenges in the application of NVS technology, focusing on the need for improved rendering quality in extrapolated views and the evaluation of driving algorithms in closed-loop simulation environments [2][3]. - The first track, "Extrapolated View New View Synthesis," aims to enhance rendering quality under sparse input views, which is crucial for evaluating autonomous driving algorithms in various scenarios [3][4]. - The second track, "Closed-Loop Simulation Evaluation," highlights the importance of creating high-fidelity simulation environments that bridge the gap between real-world data and interactive assessments, overcoming the limitations of traditional static datasets [5][6]. Group 2: Competition Details - Each track of the workshop offers awards, including a Creative Award of $9,000, and the competition is set to commence on June 30, 2025, with submissions due by August 31, 2025 [8][9]. - The workshop encourages global participation to advance autonomous driving technology, providing a platform for challenging and valuable research [10][11].
从近30篇具身综述中!看领域发展兴衰(VLA/VLN/强化学习/Diffusion Policy等方向)
自动驾驶之心· 2025-07-11 06:46
Core Insights - The article provides a comprehensive overview of various surveys and research papers related to embodied intelligence, focusing on areas such as vision-language-action models, reinforcement learning, and robotics applications [1][2][3][4][5][6][7][8][9] Group 1: Vision-Language-Action Models - A survey on Vision-Language-Action (VLA) models highlights their significance in autonomous driving and human motor learning, discussing progress, challenges, and future trends [2][3][8] - The exploration of VLA models emphasizes their applications in embodied AI, showcasing various datasets and methodologies [8][9] Group 2: Robotics and Reinforcement Learning - Research on foundation models in robotics addresses applications, challenges, and future directions, indicating a growing interest in integrating AI with robotic systems [3][4] - Deep reinforcement learning is identified as a key area with real-world successes, suggesting its potential for enhancing robotic capabilities [3] Group 3: Multimodal and Generative Approaches - The article discusses multimodal fusion and vision-language models, which are crucial for improving robot vision and interaction with the environment [6] - Generative artificial intelligence in robotic manipulation is highlighted as an emerging field, indicating a shift towards more sophisticated AI-driven robotic systems [6] Group 4: Datasets and Community Engagement - The article encourages engagement with a community focused on embodied intelligence, offering access to a wealth of resources, including datasets and collaborative projects [9]
传统规划控制不太好找工作了。。。
自动驾驶之心· 2025-07-11 06:46
Core Viewpoint - The article emphasizes the evolving landscape of autonomous driving, particularly the integration of traditional planning and control (PnC) with end-to-end systems, highlighting the necessity for professionals to adapt to these changes in order to remain competitive in the job market [2][4][29]. Group 1: Industry Trends - The shift towards end-to-end and VLA (Vision-Language Alignment) systems is impacting traditional PnC roles, which are now required to incorporate more advanced algorithms and frameworks [2][4]. - As of 2025, end-to-end systems are expected to become more prevalent, yet traditional PnC methods will still play a crucial role, especially in safety-critical applications like Level 4 autonomous driving [4][29]. - The article discusses the importance of understanding both traditional and modern approaches to planning and control, as they are increasingly being integrated in practical applications [4][29]. Group 2: Educational Offerings - The company has launched specialized courses aimed at bridging the gap between theoretical knowledge and practical application in the field of autonomous driving, focusing on real-world challenges and interview preparation [5][7]. - The courses are designed to provide hands-on experience with current industry practices, including classic and innovative solutions in PnC, and are tailored for individuals with some background in the field [8][12]. - The curriculum includes modules on foundational algorithms, decision-making frameworks, and advanced topics such as contingency planning and interactive planning, which are critical for modern autonomous driving systems [20][21][24][26][29]. Group 3: Career Development - The courses not only focus on technical skills but also offer support in job application processes, including resume reviews and mock interviews, to enhance employability [9][10][31]. - Previous participants have successfully secured positions at major companies in the autonomous driving sector, indicating the effectiveness of the training provided [10][12]. - The program aims to equip participants with the skills necessary to construct decision-making systems and address real-world challenges in autonomous driving, thereby enhancing their career prospects [13][29].
自驾搞科研别蛮干!用对套路弯道超车~
自动驾驶之心· 2025-07-11 01:14
Core Viewpoint - The article emphasizes the importance of learning from experienced mentors in the field of research, particularly in LLM/MLLM, to accelerate the research process and achieve results more efficiently [1]. Group 1: Course Offerings - The program offers a 1v6 elite small class format, allowing for personalized guidance from a mentor throughout the research process [5]. - The course covers everything from model theory to practical coding, helping participants build their own knowledge systems and understand algorithm design and innovation in LLM/MLLM [1][10]. - Participants will receive tailored ideas from the mentor to kickstart their research, even if they lack a clear direction initially [7]. Group 2: Instructor Background - The instructor has a strong academic background, having graduated from a prestigious computer science university and worked as an algorithm researcher in various companies [2]. - The instructor's research includes computer vision, efficient model compression algorithms, and multimodal large language models, with a focus on lightweight models and efficient fine-tuning techniques [2][3]. Group 3: Target Audience - The program is suitable for graduate students and professionals in the fields of autonomous driving, AI, and those looking to enhance their algorithmic knowledge and research skills [11]. - It caters to individuals who need to publish papers for academic recognition or those who want to systematically master model compression and multimodal reasoning [11]. Group 4: Course Structure and Requirements - The course is designed to accommodate students with varying levels of foundational knowledge, with adjustments made to the depth of instruction based on participants' backgrounds [14]. - Participants are expected to have a basic understanding of deep learning and machine learning, familiarity with Python and PyTorch, and a willingness to engage actively in the learning process [16][19].
具身数采方案一览!遥操作和动捕的方式、难点和挑战(2w字干货分享)
自动驾驶之心· 2025-07-10 12:40
Core Viewpoint - The article discusses the significance of remote operation (遥操作) in the context of embodied intelligence, emphasizing its historical roots and contemporary relevance in robotics and data collection [3][15][17]. Group 1: Understanding Remote Operation - Remote operation is not a new concept; it has been around for decades, primarily in military and aerospace applications [8][10]. - Examples of remote operation include surgical robots and remote-controlled excavators, showcasing its practical applications [8][10]. - The ideal remote operation involves spatial separation, allowing operators to control robots from a distance, thus creating value through this separation [10][15]. Group 2: Remote Operation Experience - Various types of remote operation experiences were shared, with a focus on the comfort level of different methods [19][20]. - The most comfortable method identified is pure visual inverse kinematics (IK), which allows for greater freedom of movement compared to rigid control systems [30][28]. Group 3: Future of Remote Operation - The discussion includes visions for future remote operation systems, highlighting the need for a complete control loop involving both human-to-machine and machine-to-human interactions [33][34]. - The potential for pure virtual and pure physical solutions was explored, suggesting that future systems may integrate both approaches for optimal user experience [37][39]. Group 4: Data Collection and Its Importance - Remote operation is crucial for data collection, which is essential for training robots to mimic human actions [55][64]. - The concept of "borrowing to repair the truth" was introduced, indicating that advancements in remote operation are driven by the need for better data collection in robotics [64][65]. Group 5: Implications for Robotics - The emergence of the "robot cockpit" concept indicates a trend towards more intuitive control systems for robots, integrating various functionalities into a cohesive interface [67][70]. - The challenges of controlling multiple joints in robots were discussed, emphasizing the need for innovative hardware and interaction designs to manage complex operations [68][70]. Group 6: Motion Capture and Its Challenges - Motion capture systems are essential for remote operation, but they face challenges such as precision and the need for complex setups [93][95]. - The discussion highlighted the importance of human adaptability in using motion capture systems, suggesting that users can adjust to various input methods effectively [80][81]. Group 7: ALOHA System Innovations - The ALOHA system represents a significant innovation in remote operation, focusing on minimal hardware configurations and end-to-end algorithm frameworks [102][104]. - This system has prompted the industry to rethink robot design and operational paradigms, indicating its potential long-term impact [103][104].
端到端VLA这薪资,让我心动了。。。
自动驾驶之心· 2025-07-10 12:40
Core Viewpoint - End-to-End Autonomous Driving (E2E) is the core algorithm for intelligent driving mass production, marking a new phase in the industry with significant advancements and competition following the recognition of UniAD at CVPR [2] Group 1: E2E Autonomous Driving Overview - E2E can be categorized into single-stage and two-stage approaches, directly modeling from sensor data to vehicle control information, thus avoiding error accumulation seen in modular methods [2] - The emergence of BEV perception has bridged gaps between modular methods, leading to a significant technological leap [2] - The rapid development of E2E has led to a surge in demand for VLM/VLA expertise, with potential salaries reaching millions annually [2] Group 2: Learning Challenges - The fast-paced evolution of E2E technology has made previous learning materials outdated, necessitating a comprehensive understanding of multi-modal large models, BEV perception, reinforcement learning, and more [3] - Beginners face challenges in synthesizing knowledge from numerous fragmented papers and transitioning from theory to practice due to a lack of high-quality documentation [3] Group 3: Course Development - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address learning challenges, focusing on Just-in-Time Learning to help students quickly grasp core technologies [4] - The course aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points [5] - Practical applications are integrated into the course to ensure a complete learning loop from theory to practice [6] Group 4: Course Structure - The course consists of multiple chapters covering the history and evolution of E2E algorithms, background knowledge, two-stage and one-stage E2E methods, and the latest advancements in VLA [8][9][10] - Key topics include the introduction of E2E algorithms, background knowledge on VLA, and practical applications of diffusion models and reinforcement learning [11][12] Group 5: Target Audience and Outcomes - The course is designed for individuals with a foundational understanding of autonomous driving and aims to elevate participants to a level comparable to one year of experience as an E2E algorithm engineer [19] - Participants will gain a deep understanding of key technologies such as BEV perception, multi-modal large models, and reinforcement learning, enabling them to apply learned concepts to real-world projects [19]
ICCV25! 上交&中科院MambaFusion: 首个SOTA Mamba多模态3D检测
自动驾驶之心· 2025-07-10 12:40
Core Viewpoint - The article presents MambaFusion, a state-of-the-art (SOTA) framework for multi-modal 3D object detection, utilizing a pure Mamba module for efficient dense global fusion, achieving significant performance improvements in camera-LiDAR integration [1][3][30]. Summary by Sections Introduction - 3D object detection is essential for modern autonomous driving, providing critical environmental understanding for downstream tasks like perception and motion planning. Multi-sensor fusion, particularly between LiDAR and cameras, enhances detection accuracy and robustness due to their complementary strengths [4]. Methodology - The proposed method includes a high-fidelity LiDAR encoding that compresses voxel data in continuous space, preserving precise height information and improving feature alignment between camera and LiDAR [2][18]. - The Hybrid Mamba Block (HMB) is introduced, which combines local and global context learning to enhance multi-modal 3D detection performance [15][11]. Key Contributions 1. Introduction of the Hybrid Mamba Block, the first dense global fusion module supporting pure linear attention, balancing efficiency and global perception [11]. 2. Development of high-fidelity LiDAR encoding that significantly improves multi-modal alignment accuracy [11][18]. 3. Validation of the feasibility of pure linear fusion, achieving SOTA performance in camera-LiDAR 3D object detection [11][30]. Experimental Results - The method achieved a 75.0 NDS score on the nuScenes validation set, outperforming various top-tier methods while also demonstrating superior inference speed [2][24]. - Compared to the IS-FUSION method, MambaFusion showed a 50% increase in inference speed while maintaining competitive detection accuracy [24][30]. Conclusion - MambaFusion represents a significant advancement in multi-modal 3D object detection, demonstrating effective dense global fusion capabilities and precise cross-modal feature alignment, with implications for further research in the field [30].
自动驾驶之心课程续费来啦!欢迎和我们一起继续成长
自动驾驶之心· 2025-07-10 12:40
Group 1 - The core message is that existing students can renew their courses at discounted rates instead of paying the full price [1] - The company offers four renewal options: 1 month, 3 months, 6 months, and 12 months, with increasing discounts for longer durations [2] - The pricing structure for renewal is as follows: - 1 month: (Original Price / 12) x 1 x 100% - 3 months: (Original Price / 12) x 3 x 70% - 6 months: (Original Price / 12) x 6 x 50% - 12 months: (Original Price / 12) x 12 x 30% [2]
新学习了下AI Agent,分享给大家~
自动驾驶之心· 2025-07-10 10:05
Core Insights - The article discusses the evolution of AI over the past decade, highlighting the transition from traditional machine learning to deep learning, and now to the emerging paradigm of Agentic AI, ultimately aiming towards Physical AI [2]. Group 1: Evolution of AI - The acceleration of AI technology is described as exponential, with breakthroughs in deep learning over the past decade surpassing the cumulative advancements of traditional machine learning over thirty years [2]. - The emergence of ChatGPT has led to advancements in AI that have outpaced the entire deep learning era within just two and a half years [2]. Group 2: Stages of AI Development - The article outlines the current milestones in Agentic AI, marking a fundamental shift in AI capabilities [3]. - The first stage of the large model phase is represented by OpenAI's o1 and DeepSeek-R1, which are expected to mature by Fall 2024 [5]. - The second stage will see the launch of the o3 model and the emergence of various intelligent applications by early 2025 [5]. Group 3: Agentic AI Capabilities - Agentic AI introduces task planning and tool invocation capabilities, allowing AI to understand and execute high-level goal-oriented tasks, effectively becoming an Auto-Pilot system [10]. - The core definition of Agentic AI includes autonomous understanding, planning, memory, and tool invocation abilities, enabling the automation of complex tasks [10]. Group 4: Learning Mechanisms - The evolution of solutions includes prompt engineering techniques such as Chain of Thought (CoT) and Tree of Thought (ToT) to stimulate contextual learning in models [14]. - Supervised learning provides standard solution pathways, while reinforcement learning allows for autonomous exploration of optimal paths [15]. Group 5: Product Milestones - The o1 model has validated the feasibility of reasoning models, while R1 has optimized efficiency and reduced technical application barriers [18]. - The dual-path invocation mechanism includes preset processes for high determinism and prompt-triggered responses for adaptability in dynamic environments [19]. Group 6: Future Directions and Applications - The article discusses the integration of various agent types, including Operator agents for environmental interaction and Deep Research agents for knowledge integration [28]. - The development trend emphasizes the need for a foundational Agent OS to overcome memory mechanism limitations and drive continuous model evolution through user behavior data [30].