自动驾驶之心
Search documents
下一代大模型高效计算:参数压缩、硬件适配与多模态推理、CoT等方向论文指导班来啦!
自动驾驶之心· 2025-07-04 07:13
Core Insights - The article discusses the rapid development of large language models (LLMs) and multimodal models, focusing on enhancing model efficiency, expanding knowledge capabilities, and improving reasoning performance as core issues in current AI research [1][2]. Course Overview - The course systematically explores cutting-edge optimization methods for large models, emphasizing three key areas: parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [1]. - It addresses core challenges in model optimization, including lightweight methods such as pruning, sparsification, and quantization for parameter compression; dynamic knowledge injection techniques like retrieval-augmented generation (RAG) and parameter-efficient fine-tuning (PEFT) for knowledge expansion; and advanced reasoning paradigms such as chain-of-thought (CoT) and reinforcement learning optimization (GRPO) for reasoning enhancement [1]. Course Objectives - The course aims to help students systematically master key theoretical knowledge in specified directions and develop a clearer understanding of the content [5]. - It seeks to bridge the gap for students who lack direction and practical skills, enabling them to combine theoretical knowledge with coding practice and lay the groundwork for developing new models [5]. - The course also focuses on improving students' academic writing skills, providing guidance on manuscript preparation and submission [5]. Target Audience - The course is designed for master's and doctoral students in the field of large models, those seeking to enhance their resumes for graduate studies abroad, and professionals in the AI field looking to systematically improve their algorithmic theory and writing skills [6]. Admission Requirements - Basic requirements include a foundational understanding of deep learning/machine learning, familiarity with Python syntax, and experience with PyTorch [7]. Course Structure - The course consists of 12 weeks of online group research followed by 2 weeks of paper guidance, culminating in a 10-week paper maintenance period [11]. - Students will analyze classic and cutting-edge papers, understand key algorithms and principles, and develop their research ideas [11]. Weekly Breakdown - The course covers various topics, including model pruning, quantization, dynamic knowledge expansion, advanced reasoning techniques, and multimodal understanding [16][18]. - Each week includes specific themes and outputs, such as determining research ideas, optimizing model size and performance, and enhancing coding capabilities [16][18]. Additional Resources - The course provides access to datasets from public sources and baseline code tailored to specific applications [13][14]. - Essential papers and resources are recommended for foundational knowledge and advanced techniques in model optimization [15][17].
拿到主机厂实习,犹豫要不要去。。。
自动驾驶之心· 2025-07-04 04:27
Core Insights - The article emphasizes the importance of internships for recent graduates, particularly in the fields of autonomous driving and related technologies, suggesting that any internship experience is better than none [2][3] - The establishment of the AutoRobo Knowledge Community aims to provide a platform for job seekers in the autonomous driving and robotics sectors, facilitating connections and sharing valuable resources [4] Group 1: Internship Guidance - The article advises graduates to pursue internships even if the positions are not ideal, as this experience is crucial for future job applications [2][3] - It highlights the competitive nature of the job market, where companies prefer candidates with relevant internship experience [3] Group 2: AutoRobo Knowledge Community - The AutoRobo Knowledge Community has been created to assist job seekers in the fields of autonomous driving, embodied intelligence, and robotics, with nearly 1,000 members including professionals and students [4] - The community offers resources such as interview questions, industry reports, salary negotiation tips, and resume optimization services [4][6][11] Group 3: Industry Insights - The community provides access to various industry reports that help members understand the current state and future prospects of the autonomous driving and robotics sectors [17][21] - Specific topics covered include trajectory prediction, occupancy perception, and end-to-end autonomous driving, among others [12][16]
对VLA的RL最新进展的梳理~
自动驾驶之心· 2025-07-03 12:41
Core Viewpoint - The article discusses the recent advancements in Vision-Language-Action (VLA) models, particularly focusing on the integration of Reinforcement Learning (RL) techniques to enhance their performance and stability in various tasks [1]. Group 1: Early Exploration of iRe-VLA - The core algorithm of iRe-VLA is PPO, which introduces a two-stage training paradigm to address instability in online reinforcement learning [2]. - The implementation utilizes BLIP-2 3B as the VLM backbone, replacing the final fully connected layer with an action head that includes a token learner and an MLP [2]. - The experimental setup involves simulation environments like Meatworld and Franka Kitchen, with tasks divided into three categories for evaluation [2]. Group 2: Preference Alignment with GRAPE - GRAPE introduces preference alignment into VLA training, specifically designed for VLA characteristics [6]. - The reward for each trajectory is composed of three parts: success reward, self-reward, and external reward based on a custom cost function [8]. - The external reward is calculated by decomposing trajectories into stages and evaluating them using a VLM task decomposer [9]. Group 3: LOOP and RIPT-VLA - LOOP combines RLOO and PPO to address challenges in sparse rewards and long sequences in multi-task scenarios [11]. - The RIPT-VLA employs the LOOP algorithm for online RL and provides open-source code for implementation [13]. - The approach includes various tricks to enhance training efficiency, such as dynamic rejection mechanisms and multi-task sampling [15]. Group 4: System and Algorithm Innovations in RL4VLA - RL4VLA models the action generation process as a multi-modal dialogue, using PPO training with dense pseudo-rewards to guide the training process [18]. - The training involves a Robotic Process Reward Model that predicts the likelihood of action sequences, enhancing the reward signal [20]. - The article emphasizes adaptive curriculum selection strategies to improve sample efficiency and generalization capabilities [21][23]. Group 5: Engineering Challenges and Future Directions - The article highlights the need for new RL algorithms suitable for VLA-RL, particularly addressing sparse reward issues and enhancing sample efficiency [30]. - It points out the engineering challenges in improving sampling efficiency and managing memory costs in VLA scenarios [30]. - The exploration of effective reward design and the implementation of RL in non-autoregressive VLA structures are identified as critical areas for future research [30].
肝了几个月,新的端到端闭环仿真系统终于用上了。
自动驾驶之心· 2025-07-03 12:41
Core Viewpoint - The article discusses the development and implementation of the Street Gaussians algorithm for dynamic scene representation in autonomous driving, highlighting its efficiency in training and rendering compared to previous methods [2][3]. Group 1: Background and Challenges - Previous methods faced challenges such as slow training and rendering speeds, as well as inaccuracies in vehicle pose tracking [3]. - Street Gaussians aims to generate realistic images for view synthesis in dynamic urban street scenes by modeling them as a combination of foreground moving vehicles and static backgrounds [3][4]. Group 2: Technical Implementation - The background model is represented as a set of points in world coordinates, each assigned a 3D Gaussian to represent geometry and color, with parameters optimized to avoid invalid values [8]. - The object model for moving vehicles includes a set of optimizable tracking poses and point clouds, with similar Gaussian attributes to the background model but defined in local coordinates [11]. - A 4D spherical harmonic model is introduced to encode temporal information into the appearance of moving vehicles without high storage costs [12]. Group 3: Initialization and Data Handling - Street Gaussians utilizes aggregated LiDAR point clouds for initialization, addressing the limitations of traditional SfM point clouds in urban environments [17]. - For objects with fewer than 2,000 LiDAR points, random sampling is employed to ensure sufficient data for model initialization [17]. Group 4: Course and Learning Opportunities - The article promotes a specialized course on 3D Gaussian Splatting (3DGS), covering various subfields and practical applications in autonomous driving, aimed at enhancing understanding and implementation skills [26][35].
你被哪个后来知道很致命的BUG困扰过一周以上吗?
自动驾驶之心· 2025-07-03 12:41
Core Insights - The article discusses the challenges and experiences in training AI models using reinforcement learning, highlighting the importance of reward design and the pitfalls that can arise during the process [1][2]. Group 1: Reinforcement Learning Challenges - The author shares experiences from a project where a robot was trained to run, illustrating how different reward structures led to unexpected behaviors, such as jumping too far and falling [1]. - The design of learning objectives is crucial, as poorly defined goals can lead to models that do not perform as intended, such as generating repetitive outputs or failing to learn effectively [2]. Group 2: AI Model Training Insights - The robustness of neural networks allows them to continue iterating despite bugs in the code, which can lead to unexpected improvements when the bugs are eventually removed [2]. - The article emphasizes the collaborative nature of deep learning projects, where introducing bugs can inspire creative solutions from team members [2]. Group 3: Community and Learning Resources - The article mentions a community of nearly 4,000 members, including over 300 companies and research institutions in the autonomous driving sector, providing a platform for learning and sharing knowledge [3]. - Various technical areas related to autonomous driving are covered, including perception, mapping, and control, indicating a comprehensive approach to education in this field [3].
自动驾驶论文速递 | ICCV最新论文、端到端、高精地图、世界模型等~
自动驾驶之心· 2025-07-03 11:53
Core Insights - The article discusses advancements in autonomous driving frameworks, specifically highlighting the World4Drive, SafeMap, TopoStreamer, and BEV-VAE models, which improve various aspects of autonomous driving technology. Group 1: World4Drive Framework - The World4Drive framework, developed by CASIA and Li Auto, integrates spatial semantic priors and multimodal driving intention modeling, achieving an 18.1% reduction in L2 error (0.61m to 0.50m) and a 46.7% decrease in collision rate (0.30% to 0.16%) [2][3] - It introduces an intention-aware latent world model that simulates the evolution of the physical world under different driving intentions, closely aligning with human decision-making logic [3] - The framework demonstrates state-of-the-art planning performance without the need for perception annotations, with a training convergence speed improvement of 3.75 times [3] Group 2: SafeMap Framework - The SafeMap framework, proposed by Tsinghua University and others, utilizes dynamic Gaussian sampling and panoramic feature distillation to construct robust high-definition maps from incomplete observations, achieving an 11.1% improvement in mAP when key views are missing [9][10] - It features two innovative modules: G-PVR for perspective view reconstruction and D-BEVC for correcting bird's-eye view features, ensuring high accuracy in map construction even with missing camera views [10] - Experimental results show SafeMap significantly outperforms existing methods, providing a plug-and-play solution for enhancing map robustness [10] Group 3: TopoStreamer Model - The TopoStreamer model, developed by CUHK and Tencent, addresses the temporal consistency challenges in lane topology reasoning, achieving a 3.4% improvement in lane segment perception mAP (reaching 36.6%) and a 2.1% increase in centerline perception OLS (reaching 44.4%) [18][21] - It introduces three innovative modules to ensure temporal consistency in lane attributes and improve feature representation learning [21] - TopoStreamer achieves state-of-the-art performance in lane segment topology reasoning on the OpenLane-V2 benchmark dataset [21] Group 4: BEV-VAE Framework - The BEV-VAE framework, proposed by Shanghai QiZhi Institute and Tsinghua University, constructs a bird's-eye view latent space for multi-view image generation and precise 3D layout control, achieving a spatial consistency metric (MVSC) of 0.9505 on the Argoverse 2 dataset [29][31] - It supports new view synthesis by adjusting camera poses and demonstrates strong cross-view consistency [34] - The framework allows for controllable synthesis based on 3D object layouts, enhancing the capabilities of autonomous driving scene understanding [34]
清华最新RoboScape:基于物理信息的具身世界模型~
自动驾驶之心· 2025-07-03 06:34
Core Viewpoint - The article discusses the development of RoboScape, a physics-informed embodied world model that enhances video generation quality by integrating physical knowledge into the modeling process, addressing limitations in existing models related to physical perception and object manipulation [2][22]. Research Background and Core Issues - The existing models in embodied intelligence face significant limitations in physical perception, particularly in robot scenarios involving contact, leading to unrealistic object deformation and motion discontinuities [2]. - Current attempts to integrate physical knowledge are categorized into three types: physical prior regularization, knowledge distillation from physical simulators, and material field modeling, each with its own limitations [2]. Core Method - The focus is on learning an embodied world model as a dynamic function to predict the next visual observation based on past observations and robot actions [4]. Data Processing Pipeline - A four-step processing pipeline is designed to construct a multimodal embodied dataset with physical priors, based on the AGIBOT-World dataset [5]. RoboScape Model Architecture - The architecture utilizes a self-regressive Transformer framework to generate controllable robot videos, integrating physical knowledge through two auxiliary tasks: physical attribute labeling and video slicing [7]. Time Depth Prediction - To enhance 3D geometric consistency, a time depth prediction branch is added to the RGB prediction backbone, employing a dual-branch cooperative self-regressive Transformer [9]. Adaptive Keypoint Dynamic Learning - The model employs self-supervised tracking of contact-driven keypoints to implicitly encode material properties, enhancing the modeling of object deformation and motion patterns [10]. Joint Training Objectives - The overall training objective integrates various loss functions to balance the contributions of different components [12]. Experimental Validation - The model's performance is evaluated across three dimensions: appearance fidelity, geometric consistency, and action controllability, showing superior results compared to baseline models [14][20]. Dataset and Implementation Details - The dataset comprises 50,000 video segments covering 147 tasks and 72 skills, with training conducted on 32 NVIDIA A800 GPUs over five epochs [15]. Downstream Application Validation - In robot policy training, the model demonstrates performance close to real data training results, indicating the effectiveness of synthetic data for complex tasks [18]. Conclusion and Future Plans - RoboScape effectively integrates physical knowledge into video generation without relying on external physics engines, with plans to combine generative world models with real robots for further validation in practical scenarios [22][23].
博士毕业,五篇顶会起步。。。
自动驾驶之心· 2025-07-03 06:34
Core Viewpoint - The article emphasizes the importance of timely submission and high-quality research papers for academic success, particularly in the field of autonomous driving and AI research, while offering a structured 1v1 guidance program to help researchers navigate the complexities of the research and publication process [2][3]. Group 1: Pain Points Addressed - The program addresses the lack of guidance and structured support for researchers, particularly those who are left to navigate their research independently [6]. - It aims to help students establish a clear research framework and improve their practical skills by integrating theoretical models with coding practices [6][13]. - The service is designed for computer science students at various academic levels who seek to enhance their research capabilities and academic achievements [6][13]. Group 2: Course Content - The 1v1 research paper guidance covers multiple stages, including topic selection, experimental design, writing, and submission [5][9][11][12]. - In the topic selection phase, mentors assist students in brainstorming ideas or providing direct suggestions based on their needs [7]. - During the experimental phase, mentors guide students through the entire process, ensuring the feasibility and quality of their experiments [9][14]. - The writing phase focuses on helping students craft compelling research papers that meet high standards [11][15]. - In the submission phase, mentors recommend suitable journals and assist with the submission process [12][16]. Group 3: Course Outcomes - Participants can expect to produce high-quality papers tailored to their target publication venues [23]. - The program enhances participants' understanding of the research process, writing techniques, and publication strategies [23][24]. - Students will gain insights into cutting-edge technologies and research trends in their fields [23][24]. Group 4: Course Structure and Duration - The total guidance period varies from 3 to 18 months, depending on the target publication level [24]. - The core guidance period includes weekly 1-on-1 sessions, while the maintenance period provides ongoing support after paper submission [26]. - Specific course hours are allocated based on the publication tier, with varying numbers of sessions for different categories [24].
咬牙坚持了半年,上岸小厂心满意足了。。。
自动驾驶之心· 2025-07-02 13:54
Core Viewpoint - The article discusses the advancements in AI technology, particularly in autonomous driving and embodied intelligence, highlighting the saturation of the autonomous driving industry and the challenges faced by job seekers in this field [2]. Group 1: Industry Overview - The autonomous driving sector has seen significant breakthroughs, with L2 to L4 functionalities being mass-produced, alongside developments in humanoid robots and quadrupedal robots [2]. - The industry is experiencing a high demand for technology and talent, as evidenced by the establishment of a job-seeking community called AutoRobo, which focuses on autonomous driving, embodied intelligence, and robotics [2][3]. Group 2: Community and Resources - AutoRobo knowledge community has nearly 1000 members, including professionals from companies like Horizon Robotics, Li Auto, Huawei, and Xiaomi, as well as students preparing for upcoming job fairs [2][4]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and job referrals, aimed at helping members navigate their job search effectively [3][4]. Group 3: Interview Preparation - The community has compiled a comprehensive list of interview questions across various topics related to autonomous driving and embodied intelligence, including algorithms, development, and product roles [9][10][11]. - Specific areas covered include multi-sensor fusion, perception algorithms, and decision-making processes, providing members with practical insights for their job applications [10][14]. Group 4: Industry Reports and Insights - The community offers access to industry reports that detail the current state, development trends, and market opportunities within the autonomous driving and embodied intelligence sectors [15][19]. - Reports include insights into trajectory prediction, occupancy perception, and the overall landscape of the humanoid robotics market, helping members understand the industry's dynamics [15][19].
今年,传统规划控制怎么找工作?
自动驾驶之心· 2025-07-02 13:54
Core Viewpoint - The article emphasizes the evolving landscape of autonomous driving, highlighting the integration of traditional planning and control with end-to-end systems, and the importance of adapting to industry trends for job seekers in this field [2][4][29]. Group 1: Industry Trends - The shift towards end-to-end and VLA (Vision-Language Alignment) systems is impacting traditional planning and control roles, which are still essential for safety-critical applications like L4 autonomous driving [2][4][29]. - There is a growing emphasis on combining rule-based algorithms with end-to-end approaches in job interviews, indicating a need for candidates to be proficient in both areas [3][4]. Group 2: Educational Offerings - The company has launched specialized courses aimed at addressing real-world challenges in autonomous driving planning and control, focusing on practical applications and interview preparation [5][7][10]. - The courses are designed to provide hands-on experience with industry-relevant projects, enhancing participants' resumes and job prospects [8][10][12]. Group 3: Course Structure - The curriculum covers foundational algorithms, decision-making frameworks, and advanced topics such as contingency planning and interactive planning, ensuring a comprehensive understanding of the field [20][21][24][26][29]. - The course also includes interview coaching, resume enhancement, and personalized guidance from industry experts, aimed at increasing participants' employability [31][34][36]. Group 4: Target Audience - The courses are tailored for individuals with a background in vehicle engineering, automation, computer science, and related fields, as well as those looking to transition into autonomous driving roles [37][39]. - Participants are expected to have a basic understanding of programming and relevant mathematical concepts to fully benefit from the training [38][39].