自动驾驶之心 - filings, earnings calls, financial reports, news

自动驾驶之心

Search documents

输出你的insights！寻找散落在各地的自动驾驶热爱者（产品/4D标注/世界模型等）

自动驾驶之心· 2025-10-27 09:14

Core Viewpoint - The article emphasizes the need for collaboration in the autonomous driving industry, inviting professionals to participate in training, course development, and research support to drive industry progress [2]. Group 1: Collaboration and Opportunities - The company is seeking partnerships with professionals in the autonomous driving field to enhance training and job assistance initiatives [2]. - High compensation and abundant industry resources will be provided to collaborators [3]. - The main focus areas for collaboration include roles such as autonomous driving product managers, 4D annotation/data loop, world models, VLA, autonomous driving large models, reinforcement learning, and end-to-end systems [4]. Group 2: Training and Development - The training collaboration is aimed at both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) audiences [5]. - The company is also interested in course development and original article creation as part of its training initiatives [5]. Group 3: Contact Information - Interested parties can reach out via WeChat for further consultation [6].

今日暴论：Deepseek-OCR干翻了所有架构

自动驾驶之心· 2025-10-27 00:03

Core Viewpoint - DeepSeek has introduced a new model, DeepSeek-OCR, which significantly reduces the number of tokens required to store and process information by utilizing images as memory carriers instead of relying solely on text tokens [3][6][12]. Group 1: Model Capabilities - DeepSeek-OCR can store nearly the same amount of information using only one-tenth of the tokens compared to traditional models [40][41]. - In tests, DeepSeek-OCR achieved superior performance, using only 100 visual tokens to surpass the 256 tokens required by GOT-OCR 2.0, and less than 800 visual tokens to outperform MinerU 2.0, which typically requires over 6000 tokens [13][14]. - The model supports various resolutions and compression modes, allowing it to adapt to different document complexities, such as using only 64 visual tokens for simple documents [18][21]. Group 2: Data Collection and Utilization - DeepSeek-OCR can capture previously uncollected data from two-dimensional information, such as graphs and images in academic papers, which traditional models could not interpret [32][33]. - The model can generate over 200,000 pages of training data in a day on an A100 GPU, indicating its efficiency in data collection [35]. Group 3: Resource Efficiency - By using images for memory, DeepSeek-OCR reduces the computational load, allowing for a significant decrease in token usage without sacrificing performance [40][41]. - The model can maintain 96.5% accuracy while using only one-tenth of the original token count, demonstrating its effectiveness in resource management [41][42]. Group 4: Open Source and Community Contributions - The development of DeepSeek-OCR is a collaborative effort, utilizing various open-source resources, including Huawei's Wukong dataset and Meta's SAM for image feature extraction [51][53]. - The integration of multiple open-source models has enabled DeepSeek to create an AI capable of "thinking in images," showcasing the power of community-driven innovation [53].

大模型

图像记忆

开源社区

Artificial Intelligence

Artificial Intelligence

DeepSeek-OCR

SAM

北大World-in-World：闭环下的具身世界模型评估框架！

自动驾驶之心· 2025-10-27 00:03

Core Insights - The article discusses the need to redefine the evaluation of world models in embodied intelligence, emphasizing that visual quality does not equate to task effectiveness [5][26]. - The introduction of the "World-in-World" platform aims to assess world models through closed-loop interactions, focusing on their practical utility rather than just visual fidelity [6][26]. Evaluation of World Models - Current evaluation systems prioritize visual clarity and scene rationality, neglecting whether these models can assist agents in decision-making for real tasks [5][6]. - The platform introduces a closed-loop system that integrates observation, decision-making, execution, and re-observation, ensuring fair and practical assessments [6][7]. Model Compatibility and Decision-Making - A unified action API is established to standardize input across different world models, allowing them to process the same tasks effectively [7]. - The decision-making process is structured into three phases: proposal generation, simulation of outcomes, and selection of the optimal action based on task goals [8][13]. Experimental Findings - Experiments with 12 mainstream world models revealed that visual realism does not guarantee task success; instead, action alignment is crucial [18][20]. - Fine-tuning smaller models with task-specific data proved more effective than simply using larger pre-trained models, highlighting a cost-effective optimization strategy [21][23]. - Increasing computational effort for simulations significantly improved task success rates, suggesting that more extensive predictive modeling leads to better decision-making [23]. Limitations and Future Directions - While models excel in perception and navigation, they struggle with physical manipulation tasks due to a lack of physical modeling considerations [25]. - The article concludes that future developments should focus on enhancing controllability, utilizing task data for fine-tuning, and incorporating physical modeling to improve the practical application of world models in robotics [26].

正式结课！工业界大佬带队三个月搞定端到端自动驾驶

自动驾驶之心· 2025-10-27 00:03

Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry is witnessing rapid development in end-to-end methods, particularly the one-stage approach exemplified by UniAD, which directly models vehicle trajectories from sensor inputs [1][3]. - There are two main paradigms in the industry: one-stage and two-stage methods, with the one-stage approach gaining traction and leading to various derivatives based on perception, world models, diffusion models, and VLA [3][5]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, focusing on cutting-edge algorithms in both one-stage and two-stage end-to-end methods, aimed at bridging academic and industrial advancements [5][15]. - The course is structured into several chapters, covering the history and evolution of end-to-end methods, background knowledge on VLA, and detailed discussions on both one-stage and two-stage approaches [9][10][12]. Group 3: Key Technologies - The course emphasizes critical technologies such as BEV perception, visual language models (VLM), diffusion models, and reinforcement learning, which are essential for mastering the latest advancements in autonomous driving [5][11][19]. - The second chapter of the course is highlighted as containing the most frequently asked technical keywords for job interviews in the next two years [10]. Group 4: Practical Applications - The course includes practical assignments, such as RLHF fine-tuning, allowing participants to apply their knowledge in real-world scenarios and understand how to build and experiment with pre-trained and reinforcement learning modules [13][19]. - The curriculum also covers various subfields of one-stage end-to-end methods, including those based on perception, world models, diffusion models, and VLA, providing a comprehensive understanding of the current landscape in autonomous driving technology [14][19].

2025年的理想还在不断突破，年度成果一览......

自动驾驶之心· 2025-10-27 00:03

Core Insights - Li Auto has successfully entered the domestic smart driving tier one since the mass production of its end-to-end + VLM dual system last year, maintaining a leading position in both academic work and mass production solutions [3][4] - The company is transitioning from a new energy vehicle brand to an AI enterprise, driven by advancements in embodied intelligence and large models [3] - The VLA driver model, featuring innovative architecture, enhances capabilities in spatial understanding, reasoning, communication, memory, and behavior [3][4] VLA & VLM - ReflectDrive introduces discrete diffusion for reflective vision-language-action models in autonomous driving, aiming for scalable and efficient trajectory generation [8][13] - OmniReason establishes a temporal-guided framework for VLA, emphasizing causal reasoning in diverse driving scenarios [11][16] - LightVLA presents a differentiable token pruning framework to enhance efficiency in VLA models, achieving significant reductions in computational load while improving success rates [14][17] - DriveAgent-R1 focuses on human-like driving decisions, introducing a hybrid thinking architecture that adapts to complex environments [19] End-to-End Trajectory Generation - World4Drive is an open-source VLA dataset covering diverse driving scenarios across 148 cities in China, ensuring high-quality and representative data [21][25] - TransDiffuser enhances trajectory generation through a novel end-to-end framework that integrates multimodal driving intentions without relying on perception annotations [23][26] World Models - RLGF proposes a reinforcement learning framework for generating driving videos, addressing geometric distortion issues in autonomous driving [29][34] - GeoDrive innovatively incorporates 3D point cloud rendering into the generation paradigm, improving spatial consistency and controllability [40] Other Innovations - TokenFLEX introduces a unified training framework for dynamic visual token inference, enhancing model robustness across varying token counts [50] - RuscaRL addresses exploration bottlenecks in reinforcement learning, promoting independent learning through structured external support [56]

摇人！寻找散落在各地的自动驾驶热爱者（产品/4D标注/世界模型等）

自动驾驶之心· 2025-10-25 16:03

Core Viewpoint - The article emphasizes the need for collaboration in the autonomous driving industry, inviting professionals to participate in training, course development, and research support to drive industry progress [2]. Group 1: Collaboration and Opportunities - The company is seeking partnerships with professionals in the autonomous driving field to enhance training and job guidance services [2]. - High compensation and abundant industry resources will be provided to collaborators [3]. - The main focus areas for collaboration include roles such as autonomous driving product managers, 4D annotation/data loop, world models, VLA, autonomous driving large models, reinforcement learning, and end-to-end systems [4]. Group 2: Training and Development - The positions are primarily aimed at B2B training for enterprises, universities, and research institutions, as well as C2C training for students and job seekers [5]. - The company encourages interested individuals to reach out for further consultation via WeChat [6].

全球首个「百万引用」学者诞生！Bengio封神，辛顿、何恺明紧跟

自动驾驶之心· 2025-10-25 16:03

Core Insights - Yoshua Bengio has become the first scholar globally to surpass one million citations on Google Scholar, marking a significant milestone in AI academic influence [3][5][6] - Geoffrey Hinton follows closely with approximately 970,000 citations, positioning him as the second-highest cited scholar [5][6] - The citation growth of AI papers has surged, reflecting the current AI era's prominence [19][30] Citation Rankings - Yoshua Bengio ranks first globally in total citations, with a significant increase in citations post-2018 when he received the Turing Award [6][9][38] - Geoffrey Hinton ranks second, with a notable citation count of 972,944, showcasing his enduring impact in the field [5][8] - Yann LeCun, another Turing Award winner, has over 430,000 citations, but remains lower than both Bengio and Hinton [13][18] AI Research Growth - The total number of AI papers has nearly tripled from approximately 88,000 in 2010 to over 240,000 in 2022, indicating a massive increase in research output [30] - By 2023, AI papers constituted 41.8% of all computer science papers, up from 21.6% in 2013, highlighting AI's growing dominance in the field [31][32] - The foundational works of AI pioneers have become standard references in subsequent research, contributing to their citation growth [22][33] Key Contributions - The introduction of AlexNet in 2012 is considered a pivotal moment that significantly advanced deep learning methodologies [20] - The development of the Transformer model in 2017 and subsequent innovations like BERT have further accelerated research and citations in AI [24][27] - The increasing number of AI-related submissions to top conferences reflects the field's rapid evolution and the growing interest in AI research [36]

Artificial Intelligence

Deep Learning

Transformer

Generative AI

Artificial Intelligence

ChatGPT

Artificial Intelligence

Deep Learning

Transformer

Generative AI

Artificial Intelligence

ChatGPT

Tesla终于分享点东西了，世界模型和闭环评测都强的可怕......

自动驾驶之心· 2025-10-25 16:03

Core Insights - Tesla has shared insights into its architecture, emphasizing the use of a large model and extensive data, which allows for a fixed computation time and high-frequency actions in its Full Self-Driving (FSD) system [5][6]. Group 1: Reasons for End-to-End Approach - The complexity of human driving behavior makes it difficult to define a single evaluation function, leading to challenges in rule-based optimization [8]. - The interface definition between perception, prediction, and planning is problematic, resulting in information loss [8]. - An end-to-end approach is better suited for scalability and addressing long-tail problems [8]. - Fixed computation time based on neural networks reduces latency compared to traditional methods [8]. - Philosophically, reliance on computational power and data is preferred over human experience [8]. Group 2: Challenges of End-to-End Systems - The three main challenges faced by end-to-end systems include evaluation, the curse of dimensionality, and ensuring interpretability and safety [19][20]. - The curse of dimensionality leads to insufficient supervisory signals when transitioning from high-dimensional to low-dimensional spaces [21]. - Ensuring interpretability and safety is crucial, as the model must genuinely understand driving behavior rather than just fitting shortcuts [23]. Group 3: Evaluation Challenges - High-quality datasets cannot solely describe performance through loss metrics, indicating a need for more comprehensive evaluation methods [39]. - Open-loop evaluations cannot replace closed-loop assessments, highlighting the necessity for real-world testing [39]. - Driving behavior is multimodal, requiring evaluation metrics that encompass various driving actions [39]. - One proposed method involves predicting the consequences of actions, potentially using a critic to assess model performance [39]. - Balancing the evaluation dataset is essential for accurate assessments [39]. Group 4: World Model Simulator - Tesla introduced a world model simulator that generates subsequent videos based on real scenarios, indicating a high barrier to entry for this technology [41]. - The simulator allows for replaying previous issues to assess improvements, akin to two-stage simulations [44]. - This technology can also be applied to humanoid robots, enabling reinforcement training and simulation [46].

0.1$一键Get神仙主页！让科研人不再熬夜秃头的Paper2Page来了

自动驾驶之心· 2025-10-25 16:03

每年，AI 领域有数以万计的论文涌现，但大多数研究者都会遇到同一个问题：如何让我的工作脱颖而出？一份精美的项目主页，往往是论文"出圈"的第一步。它不仅是成果展示的窗口，更是吸引合作、获得引用的重要渠道。然而，从论文PDF 到交互式网页，这其中充满了重复、琐碎和低效的工作：筛选模版，从论文中挑选文字、复制粘贴、贴图排版、写 HTML, CSS……足以让科研人头秃。论文链接: https://arxiv.org/abs/2510.19600 项目主页: https://mqleet.github.io/AutoPage_ProjectPage (刷新页面可以看到不同风格的项目主页！) 代码: https://github.com/AutoLab-SAI-SJTU/AutoPage (求Star⭐️) Huggingface Space: https://huggingface.co/spaces/Mqleet/AutoPage 自动解析章节结构与图表信息智能生成叙事文本与模块化内容块自动调整图像大小和排版一键渲染出支持动态交互的网页结构这些页面不仅忠实呈现了论文的核心思想，还能根据用户指令快速微调 ...

CVPR 2026倒计时Day21，冲这个方向简直降维打击！

自动驾驶之心· 2025-10-24 16:03

Core Viewpoint - The article emphasizes the importance of targeted guidance and mentorship for students aiming to publish high-quality research papers in top conferences like CVPR and ICRA, highlighting the need for strategic focus in the final stages of the submission process [2][3]. Group 1: Submission Insights - The current submission volume for CVPR 2026 has exceeded 2000, indicating a competitive landscape similar to ICLR [1]. - Historical trends show that successful submissions often focus on specific breakthroughs and verifiable improvements rather than broad themes, aligning closely with the main topics of the conference [1]. - The anticipated main theme for CVPR 2026 is likely to revolve around "world models," suggesting a strategic direction for potential submissions [1]. Group 2: Mentorship and Guidance - The organization offers specialized mentorship programs aimed at helping students navigate the complexities of research paper writing and submission, particularly for those in the fields of autonomous driving and AI [2][3]. - With over 300 dedicated instructors from top global universities, the organization provides a wealth of academic resources and expertise to assist students in producing high-quality research [3]. - The mentorship program includes personalized guidance through the entire research process, from topic selection to submission, ensuring that students are well-prepared for the rigorous demands of top-tier conferences [11]. Group 3: Student Support and Outcomes - The organization addresses common challenges faced by students, such as lack of guidance, fragmented knowledge, and difficulties in understanding the research process [5]. - Students are encouraged to develop a systematic understanding of both classic and cutting-edge algorithms, enhancing their practical skills and research capabilities [5]. - Successful participants in the program may receive recommendations from prestigious institutions and direct job placements in leading tech companies, emphasizing the program's potential impact on students' academic and professional trajectories [16].