自动驾驶之心
Search documents
9篇NeurIPS工作,我们读出了「3D渲染与重建」的三个确定方向
自动驾驶之心· 2025-10-19 23:32
Core Insights - The article discusses the advancements in 3D Rendering & Reconstruction, particularly focusing on dynamic scene reconstruction and the integration of generative and editable 3D assets. It highlights the shift from merely rendering to creating and manipulating 3D environments, emphasizing the importance of efficiency, stability, and usability in real-world applications [2][60]. Group 1: Dynamic Scene and Temporal Reconstruction - Research in dynamic scene reconstruction aims to not only rebuild static geometries but also to express, compress, and render changes over time, effectively creating a 4D representation [2][4]. - The ReCon-GS framework improves training efficiency by approximately 15%, reduces memory usage by half while maintaining the same visual quality, and enhances the stability and robustness of free-viewpoint video (FVV) synthesis [5][6]. - ProDyG introduces a closed-loop system for tracking, mapping, and rendering, achieving dynamic SLAM-level camera tracking and improved stability for long sequences [10][12]. Group 2: Structural Innovations in Gaussian Splatting - The research focuses on making 3D Gaussian Splatting (3DGS) deployable and maintainable, ensuring that large scenes do not exceed memory limits and can run on mobile devices [20][21]. - The LODGE framework enhances the usability of large-scale 3DGS rendering by integrating Level-of-Detail (LOD) techniques, resulting in lower latency and memory usage [23][24]. - The Gaussian Herding across Pens method achieves near-lossless quality while retaining only about 10% of the original Gaussian data, providing a mathematically grounded approach to global compression [28][29]. Group 3: Generative and Editable 3D - The focus of generative and editable 3D research is to not only recreate real-world scenes but also to generate new assets, allowing for component splitting, rigging, animation, and material modification [42][44]. - The PhysX-3D framework emphasizes the generation of 3D assets that are not only visually appealing but also functional for physical simulations and robotics applications [46][47]. - The PartCrafter model enables the generation of modular 3D meshes that can be easily edited and rearranged, improving the efficiency of asset creation [48][50]. Group 4: Current Trends and Future Directions - The current research trends indicate a clear direction towards making dynamic reconstruction more efficient and stable, refining Gaussian methods for practical deployment, and enhancing the capabilities of 3D asset generation and editing [60]. - The evaluation criteria for these technologies are evolving to include not just clarity or scores but also latency, bandwidth, energy consumption, stability, and editability, which are crucial for real-world applications [60].
4000人的自动驾驶技术社区,日常提供这些咨询......
自动驾驶之心· 2025-10-19 23:32
Core Insights - The article emphasizes the importance of making learning engaging and serving as a bridge between industries and educational institutions, particularly in the fields of AI and autonomous driving [1] Group 1: Community and Resources - The community has created a comprehensive platform for academic and industrial exchanges, providing access to cutting-edge content, industry insights, and job opportunities [2][12] - The platform has compiled over 40 technical routes and invited numerous industry experts to answer questions and provide guidance [2][15] - Members can access a variety of resources, including open-source projects, datasets, and learning paths tailored for different levels of expertise [15][30][32] Group 2: Learning Pathways - The community offers structured learning pathways for beginners, intermediate, and advanced learners in autonomous driving technologies [8][10][16] - Specific learning routes include areas such as perception, simulation, and planning control, catering to both academic and practical applications [15][34] - The platform also provides a detailed overview of the latest trends and technologies in autonomous driving, including VLA (Vehicle Language Architecture) and world models [42][38] Group 3: Networking and Collaboration - The community facilitates networking among members from prestigious universities and leading companies in the autonomous driving sector [15][26] - Regular live sessions and discussions with industry leaders are organized to enhance knowledge sharing and collaboration [79][80] - Members are encouraged to engage in discussions about career choices and research directions, fostering a supportive environment for professional growth [80][82]
李想:特斯拉V14也用了VLA相同的技术
自动驾驶之心· 2025-10-19 23:32
Core Insights - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [17][18]. Group 1: Stages of AI Development - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [19][4]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [20][21]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of professionalism and reliability, comparable to a person in a specialized job [22][23]. - The fourth stage is Innovators, focusing on the ability to generate and solve problems through real-world training and feedback, which is essential for enhancing the capabilities of AI [25][26]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to how businesses manage human resources [27][28]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times in the next five years, while training computational needs may expand by 10 times [10][29]. - The article highlights the necessity for both edge computing and cloud-based processing to support the various stages of AI development [28][29]. Group 3: Ideal Automotive Applications - The company is developing its own reasoning models (MindVLA/MindGPT) and agents (Driver Agent/Ideal Classmate Agent) to enhance its autonomous driving capabilities [31][33]. - By 2026, the company plans to equip its autonomous vehicles with self-developed advanced edge chips for deeper integration with AI [12][33]. Group 4: Training and Skill Development - Effective training for AI involves enhancing three key abilities: information processing, problem formulation and solving, and resource allocation [39][40][41]. - The article emphasizes that successful AI applications require extensive training, akin to the 10,000 hours of practice needed for mastery in a profession [36][42].
过去一个月高强度RL的实践和思考 - 如何涨点?
自动驾驶之心· 2025-10-19 23:32
Core Insights - The article discusses the recent advancements and challenges in Reinforcement Learning (RL) for Visual Language Models (VLM), emphasizing the importance of foundational work and iterative improvements in achieving performance gains [2][4]. RL Goals - The primary objectives for RL in VLM include achieving a 1-2 point increase in overall performance on SFT model versions and exceeding 1-2 points in specific benchmarks such as mathematics and instruction adherence [5]. RL Overall Approach - The essence of RL is to enhance sampling efficiency rather than enabling the base model to learn new knowledge. It is noted that the base model can outperform RL models in terms of correct response probability when given unlimited attempts [7][8]. Challenges in VLM RL - Key challenges include the selection of efficient RL algorithms, the need for high infrastructure requirements, and the sensitivity of RL to data quality and organization [10][12]. Data Organization - Effective data organization is crucial, requiring a balanced mix of tasks and high-quality input data. The output length is also significantly related to the RL algorithm used, necessitating careful consideration of training data characteristics [13][14]. Key Findings and Conclusions - Short responses negatively impact training effectiveness, and it is essential to construct pairs of responses with clear distinctions between acceptable and rejectable outputs. The importance of meticulous data checking and the absence of a "silver bullet" solution are emphasized [19][24].
对比学习视角,GRPO即DPO?
自动驾驶之心· 2025-10-18 16:03
Core Insights - The article discusses the development of efficient GRPO (Generalized Reinforcement Policy Optimization) and its implications for reinforcement learning, highlighting the challenges and breakthroughs encountered during the research process [1][2]. Group 1: Research Development - The initial focus was on improving the speed of GRPO, with an emphasis on sampling efficiency, which is a common challenge in reinforcement learning [2][3]. - The author experimented with tree-based sampling methods but found that they did not yield the expected improvements in efficiency [3]. - A second approach involved "speculative sampling," which aimed to exit upon obtaining a correct sample, but faced implementation challenges that hindered performance [3][4]. Group 2: Methodological Innovations - The third approach utilized historical data to estimate the correctness of prompts, leading to a more efficient sampling strategy based on Bayesian methods [4]. - Experiments showed that reducing the number of rollouts per prompt did not significantly impact performance, indicating robustness in the methodology [4][5]. - The exploration of contrastive learning principles led to insights about the relationship between DPO (Direct Policy Optimization) and GRPO, suggesting potential avenues for further research [5]. Group 3: Community and Collaboration - The article emphasizes the importance of community engagement in advancing research, highlighting the role of discussions and collaborations in refining ideas and methodologies [8][10]. - The establishment of a comprehensive community focused on large model technologies aims to facilitate knowledge sharing and collaboration across various domains, including academic research and practical applications [9][10].
某新势力多位智驾高管离职......
自动驾驶之心· 2025-10-18 16:03
Core Insights - Multiple high-level executives have recently left NIO's autonomous driving division, indicating potential instability within the company [4][9] - The departures include key figures responsible for product development, technology platforms, and future innovations, which could impact NIO's strategic direction [5][9] - NIO claims these changes are part of an "active organizational restructuring" aimed at enhancing the integration of general artificial intelligence technologies into their autonomous driving experience [11] Executive Departures - Huang Xin, a senior product manager in the autonomous driving field, previously worked at XPeng Motors and joined NIO in 2022 as Vice President [6] - Bai Yuli, who joined NIO in 2020, was responsible for the artificial intelligence platform and also led the cloud engineering department [7] - Ma Ningning, who played a crucial role in developing NIO's core technology concept, the world model, has also left [8] Impact on Autonomous Driving Strategy - The recent exits of these executives affect four core areas of NIO's autonomous driving business: product, platform, algorithms, and future development [11] - NIO is restructuring its autonomous driving department to align with advancements in general artificial intelligence, aiming to enhance the development and delivery of their autonomous driving experience [11] Future Developments - NIO plans to launch iterations of the world model 2.0 from late this year to the first quarter of next year, indicating ongoing commitment to innovation despite recent leadership changes [13] - The ambition behind the world model is to enable the system to learn spatial and physical laws, enhancing its understanding of the environment [11] Industry Trends - There have been significant organizational changes across various companies in the automotive sector, suggesting a potential shift in the landscape of autonomous driving technology [14]
明日开课!自动驾驶VLA三大体系学习路线图:算法+实践
自动驾驶之心· 2025-10-18 16:03
Core Insights - The focus of academia and industry is shifting towards VLA (Vision-Language-Action) for enhancing autonomous driving capabilities, providing human-like reasoning in vehicle decision-making processes [1][4] - Traditional methods in perception and lane detection are becoming mature, leading to a decline in interest, while VLA is seen as a critical area for development by major players in the autonomous driving sector [4] Summary by Sections Introduction to VLA - VLA is categorized into modular VLA, integrated VLA, and reasoning-enhanced VLA, which are essential for improving the reliability and safety of autonomous driving [1][4] Course Overview - A comprehensive learning roadmap for VLA has been designed, covering principles to practical applications, with a focus on core areas such as visual perception, large language models, action modeling, and dataset creation [6] Course Content - The course includes detailed explanations of cutting-edge algorithms like CoT, MoE, RAG, and reinforcement learning, aimed at deepening understanding of autonomous driving perception systems [6] Course Structure - The course is structured into six chapters, each focusing on different aspects of VLA, including algorithm introduction, foundational algorithms, VLM as an interpreter, modular and integrated VLA, reasoning-enhanced VLA, and a final project [12][20] Chapter Highlights - Chapter 1 provides an overview of VLA algorithms and their development history, along with benchmarks and evaluation metrics [13] - Chapter 2 delves into foundational algorithms related to Vision, Language, and Action, and discusses the deployment of large models [14] - Chapter 3 focuses on VLM's role as an interpreter in autonomous driving, covering classic and recent algorithms [15] - Chapter 4 discusses modular and integrated VLA, emphasizing the evolution of language models in planning and control [16] - Chapter 5 explores reasoning-enhanced VLA, introducing new modules for decision-making and action output [17] - Chapter 6 involves a hands-on project where participants will build and fine-tune their own VLA models [20] Learning Outcomes - The course aims to provide a deep understanding of current advancements in VLA, covering three main subfields: VLM as an interpreter, modular & integrated VLA, and reasoning-enhanced VLA [24] - Participants will gain insights into key AI technologies such as visual perception, multimodal large models, and reinforcement learning, enabling them to apply their knowledge in practical projects [24]
小米最新大模型成果!罗福莉现身了
自动驾驶之心· 2025-10-18 16:03
Core Insights - Xiaomi's AI team, in collaboration with Peking University, has recently published a paper focusing on MoE (Mixture of Experts) and reinforcement learning, revealing new advancements in large model training [2][8]. Group 1: Research Findings - The paper proposes a novel approach to enhance the stability and efficiency of large model reinforcement learning within the MoE framework [8][10]. - Current reinforcement learning methods face challenges in balancing efficiency and stability, often leading to catastrophic failures during training [14][24]. - The research introduces a method called Rollout Routing Replay (R3), which locks the routing distribution during inference and reuses it during training, ensuring consistency between the two phases [30][31]. Group 2: Experimental Results - Experiments conducted on the Qwen3-30B-A3B model demonstrate that R3 consistently outperforms other methods across various metrics, achieving higher scores in multiple scenarios [41][42]. - The introduction of R3 significantly reduces the occurrence of training crashes, maintaining a stable performance curve even after extended training periods [44][48]. - R3 not only stabilizes the model but also accelerates the optimization process, allowing for quicker identification of effective strategies [50]. Group 3: Team and Contributors - The research team includes notable contributors such as Wenhan Ma, a researcher from Xiaomi's LLM-Core team, and Luo Fuli, who has a strong academic background and has previously worked on significant AI projects [52][59]. - The paper also acknowledges the contributions of Professor Sui Zhifang from Peking University, who has extensive experience in computational linguistics and AI research [62][66].
大疆卓驭感知算法工程师面试
自动驾驶之心· 2025-10-18 16:03
Core Viewpoint - The article discusses the recruitment process and qualifications for a dynamic target perception algorithm engineer in the autonomous driving industry, highlighting the importance of various technical skills and experience in sensor fusion and deep learning [4][6][8]. Group 1: Job Responsibilities - The role involves processing large amounts of autonomous driving data, building automated ground truth labeling systems, and designing cutting-edge AI and vision technologies [6]. - Responsibilities include detecting static scene elements like lane lines and traffic signs, tracking dynamic targets, and predicting the future trajectories and intentions of moving objects [8]. - The engineer will work on multi-sensor fusion, depth estimation, and developing calibration methods for various sensors [8]. Group 2: Qualifications - Candidates should have a master's degree in computer science, automation, mathematics, or related fields, with experience in perception algorithms for autonomous driving or ADAS systems being a plus [6]. - Proficiency in programming languages such as C++ or Python, along with solid knowledge of algorithms and data structures, is required [8]. - Familiarity with multi-view geometry, computer vision technologies, deep learning, and filtering and optimization algorithms is essential [8]. Group 3: Community and Learning Resources - The article mentions a community of nearly 4,000 members and over 300 autonomous driving companies and research institutions, providing a comprehensive learning path for various autonomous driving technologies [9]. - Topics covered include large models, end-to-end autonomous driving, sensor calibration, and multi-sensor fusion [9].
聊聊 AI Agent 到底有多大创新?
自动驾驶之心· 2025-10-18 04:00
Core Insights - The article discusses the current limitations and challenges faced by AI agent technologies, particularly in comparison to traditional task bots, highlighting that the user experience has not significantly improved over the past decade [1][2]. Group 1: Planning Challenges - The planning phase is time-consuming, and as the number of tools increases, the accuracy of turbo models declines, necessitating the use of flagship models, which further increases latency [2][5]. - The quality of planning is insufficient; the workflows generated by models are less effective than those designed by humans, particularly in complex scenarios [2][8]. - The core issue with slow planning is the underestimation of the costs associated with tool discovery and parameter alignment, leading to a complex optimization problem when dynamically selecting tools [5][21]. Group 2: Reflection Issues - Reflection processes can lead to self-reinforcing cycles of inefficiency due to a lack of fine-grained computable signals and clear stopping conditions [3][15]. - Current models rely on weak feedback mechanisms, which can result in reinforcing incorrect assumptions rather than correcting errors [15][20]. - Proposed solutions include structured reflection processes that allow models to learn from mistakes and improve their performance through reinforcement learning [18][20]. Group 3: Engineering Solutions - Suggestions for improving planning quality include decomposing plans into milestones and local prompts, which can enhance stability and reusability [8][10]. - Implementing parallel execution of tasks can reduce overall processing time, with evidence showing a 20% reduction in time for non-dependent tool calls [6][21]. - The introduction of routing strategies can streamline task execution by directing simpler tasks to specialized executors, reserving complex planning for stronger reasoning models [6][21]. Group 4: Future Directions - The article emphasizes the importance of combining reinforcement learning with agent models to enhance their reasoning and execution capabilities, indicating a trend towards end-to-end learning approaches [20][21]. - The potential for AI agents to become valuable applications of large language models (LLMs) in real-world scenarios is highlighted, with ongoing improvements expected as models evolve [21].