Workflow
自动驾驶之心
icon
Search documents
刚刚,李飞飞空间智能最新成果!3D世界生成进入「无限探索」时代
自动驾驶之心· 2025-09-19 16:03
Core Viewpoint - The article discusses the launch of Marble, a new spatial intelligence model by World Labs, which allows users to generate persistent, navigable 3D worlds from a single image or text prompt, marking a significant advancement in large-scale 3D generation technology [4][5][21]. Group 1: Product Features - Marble enables the creation of expansive 3D environments that are permanent and free to explore, distinguishing it from other models like Google's Genie [9][21]. - Users can generate 3D worlds with improved geometric structure and style diversity, allowing for a richer and more complex 3D experience compared to previous technologies [21][24]. - The model supports seamless integration of generated worlds into web-based 3D experiences, utilizing the open-source rendering library Spark for efficient performance across various devices, including VR headsets [21][24]. Group 2: User Experience - The generated 3D worlds allow for free navigation in a browser without any cost, providing a more immersive experience than traditional depth maps or point clouds [24]. - Users can combine multiple generated results to create larger, cohesive environments, enhancing the potential for creative applications [22][31]. - The model's ability to transform various styles into 3D worlds enables users to iterate on appearance and style, catering to diverse creative needs [25][26]. Group 3: Community Feedback - Initial user tests have shown positive results, with suggestions for improvements such as connecting different generated worlds more easily [14][21]. - The community's engagement highlights the excitement around the potential applications of Marble in various creative and technical fields [10][14].
2025年自动驾驶公司一览表
自动驾驶之心· 2025-09-19 16:03
Core Viewpoint - The autonomous driving industry is undergoing a new round of reshuffling and resource integration, with various companies striving to achieve Level 3 (L3) automation as the next technological breakthrough [1]. New Forces - Companies such as NIO, Xpeng, Li Auto, Xiaomi, Leap Motor, Didi, WM Motor, and others are emerging as new players in the autonomous driving sector [2][3]. Tier 1 Suppliers - Major Tier 1 suppliers include Huawei, Baidu, DJI, ZTE, Tencent, and others, focusing on smart cockpits, high-precision maps, and simulation toolchains [5]. Robotaxi - Key players in the Robotaxi segment include Baidu, Pony.ai, and Didi, among others, indicating a competitive landscape in autonomous ride-hailing services [7]. Robotruck - Companies involved in the Robotruck sector include Zhijia Technology, Yincheng Technology, and others, highlighting the growth of autonomous trucking solutions [9]. Robobus - Notable companies in the Robobus category include Baidu, Pony.ai, and SenseTime, showcasing advancements in autonomous bus services [11]. Logistics and Delivery - Major players in logistics and delivery automation include Meituan, Alibaba Damo Academy, JD.com, and others, reflecting the integration of autonomous technology in supply chain operations [13]. Traditional OEMs - Established original equipment manufacturers (OEMs) such as SAIC, GAC, BYD, and others are also investing in autonomous driving technologies [15]. Agricultural Autonomous Driving - Companies like Fengjiang Intelligent and Zhonglian Heavy Industry are focusing on agricultural applications of autonomous driving [17]. Mining Autonomous Driving - Key players in mining automation include Yikong Zhijia and Taga Zhixing, indicating the sector's interest in autonomous solutions [19]. Sanitation Autonomous Driving - Companies such as Zhixingzhe and Koo Wah are developing autonomous solutions for sanitation services [21]. Intelligent Parking - Major players in intelligent parking solutions include Baidu, Desay SV, and others, reflecting the growing need for automated parking systems [23]. Computing Platforms - Companies like Huawei and Horizon Robotics are providing computing platforms essential for autonomous driving technologies [24]. High-Precision Mapping - Key players in high-precision mapping include Baidu, AutoNavi, and Tencent, which are crucial for the development of autonomous navigation systems [25]. Conclusion - The autonomous driving industry is characterized by continuous technological evolution and the collaborative efforts of numerous stakeholders, with the journey towards L3 automation being a collective endeavor [26].
毕竟,没有数据闭环的端到端/VLA只是半成品
自动驾驶之心· 2025-09-19 11:24
Core Viewpoint - The future of autonomous driving technology will focus on safer driving, better user experience, and comprehensive scenario coverage, necessitating a robust operational model from both manufacturers and suppliers [1]. Group 1: Data-Driven Technology - Future autonomous driving companies are expected to resemble "data-driven technology companies," where competition will shift from algorithms to the efficiency of data loops [2]. - The ability to quickly collect, clean, label, train, and validate data will be crucial for gaining a competitive edge, requiring advanced automation tools and AI-driven data pipelines [2]. - The architecture involving VLA/VLM will be essential for enhancing user experience, with a focus on building robust, efficient, and low-cost closed-loop simulations [2]. Group 2: Algorithm and Data Services - When considering algorithms, the supporting data services and automated labeling infrastructure must also be taken into account, especially for companies under profit pressure [3]. - The industry is exploring solutions like DiffVLA to transition smoothly into the VLA era while leveraging existing data and tools [3]. - Current research focuses on introducing new data sources and learning paradigms, indicating that the field remains open for exploration and innovation [3]. Group 3: Simulation and Training - There is a consensus in academia and industry on the importance of closed-loop systems involving agent simulators, sensor simulators, and driving policies [4]. - Companies that can effectively address the sim-to-real domain gap and build efficient closed-loop training systems will likely lead the autonomous driving market [4]. - Without a data loop, end-to-end/VLA systems are considered incomplete [5]. Group 4: Community and Knowledge Sharing - The "Autonomous Driving Knowledge Planet" community aims to provide a platform for technical exchange and problem-solving among members from leading universities and companies in the autonomous driving sector [12]. - The community has compiled extensive resources, including over 40 technical routes and numerous datasets, to facilitate learning and application in projects [12]. - Regular discussions with industry leaders on trends and challenges in autonomous driving are part of the community's offerings [12].
一个P7,从自驾到具身的转行建议......
自动驾驶之心· 2025-09-19 00:30
一个P7,从自驾到具身的转行思路...... 最近和一个P7的朋友聊天,去某大厂的具身实验室做负责人了。因为刚搭建,很多东西不是很成熟,和自 驾组建的时候非常像。缺数据、缺算力和设备。回顾自驾的种种,现在转具身之后,发现很多问题依然是 相似的,自驾优化的那套方法论甚至拿来就可以直接用,只是面向的对象和因素变了。他谈到了几个观点 蛮有意思,希望可以对大家有一定启发。 关于数据 没数据或数据少,第一时间和想到了real2sim2real方案或者sim2real方案。本体有,但数据少采集成本高, 能否使用自采集方式。让机器人自己采集数据并记录,通过算法来筛选和提出dirty数据。这一点和自驾的 数据闭环和自动标注比较相似。 关于算法 如果要商业化,最新的技术应该往后靠,等待技术的成熟。当前已经验证的技术应该被优先推上去,解决 部分问题,满足部分场景和功能的需求。就像VLA,用在智驾和机械臂上都还好,如果上人形,难度会非 常大。强化的方式,依然work,那么就应该使用这种方案。 如果算法和数据都更smooth,人形vla就是时候上了。 部署的一些思路 不用太担心部署问题,我们很擅长做轻量化和部署,算力索尔我觉得基本够 ...
上交严骏驰团队:近一年顶会顶刊硬核成果盘点
自动驾驶之心· 2025-09-18 23:33
Core Insights - The article discusses the groundbreaking research conducted by Professor Yan Junchi's team at Shanghai Jiao Tong University, focusing on advancements in AI, robotics, and autonomous driving [2][32]. - The team's recent publications in top conferences like CVPR, ICLR, and NeurIPS highlight key trends in AI research, emphasizing the integration of theory and practice, the transformative impact of AI on traditional scientific computing, and the development of more robust, efficient, and autonomous intelligent systems [32]. Group 1: Recent Research Highlights - The paper "Grounding and Enhancing Grid-based Models for Neural Fields" introduces a systematic theoretical framework for grid-based neural field models, leading to the development of the MulFAGrid model, which achieves superior performance in various tasks [4][5]. - The "CR2PQ" method addresses the challenge of cross-view pixel correspondence in dense visual representation learning, demonstrating significant performance improvements over previous methods [6][7]. - The "BTBS-LNS" method effectively tackles the limitations of policy learning in large neighborhood search for mixed-integer programming (MIP), showing competitive performance against commercial solvers like Gurobi [8][10][11]. Group 2: Performance Metrics - The MulFAGrid model achieved a PSNR of 56.19 in 2D image fitting tasks and an IoU of 0.9995 in 3D signed distance field reconstruction tasks, outperforming previous grid-based models [5]. - The CR2PQ method demonstrated a 10.4% mAP^bb and 7.9% mAP^mk improvement over state-of-the-art methods after only 40 pre-training epochs [7]. - The BTBS-LNS method outperformed Gurobi by providing a 10% better primal gap in benchmark tests within a 300-second cutoff time [11]. Group 3: Future Trends in AI Research - The research indicates a shift towards a deeper integration of theoretical foundations with practical applications in AI, suggesting a future where AI technologies are more robust and capable of real-world applications [32]. - The advancements in AI research are expected to lead to smarter robots, more powerful design tools, and more efficient business solutions in the near future [32].
纯视觉最新SOTA!AdaThinkDrive:更灵活的自动驾驶VLA思维链(清华&小米)
自动驾驶之心· 2025-09-18 23:33
Core Viewpoint - The article discusses the limitations of existing Chain-of-Thought (CoT) reasoning methods in Vision-Language-Action (VLA) models for autonomous driving, particularly in simple scenarios where they do not improve decision quality and introduce unnecessary computational overhead. It introduces AdaThinkDrive, a new VLA framework that employs a dual-mode reasoning mechanism inspired by the "fast and slow thinking" theory, allowing the model to adaptively choose when to reason based on scene complexity [3][4][10]. Group 1: Introduction and Background - The shift from traditional modular approaches to end-to-end architectures in autonomous driving systems is highlighted, noting that while modular methods offer flexibility, they suffer from information loss between components, leading to cumulative errors in complex scenarios. End-to-end methods mitigate this issue but are still limited by their reliance on supervised data [7]. - The article categorizes current VLA methods into two paradigms: meta-action methods focusing on high-level guidance and planning-based methods that predict trajectories directly from raw inputs. The application of CoT techniques is becoming more prevalent, particularly in complex scenarios, but their effectiveness in simple scenarios is questioned [14][15]. Group 2: AdaThinkDrive Framework - AdaThinkDrive is proposed as an end-to-end VLA framework that incorporates a "fast answer/slow thinking" mechanism, allowing the model to switch adaptively between direct prediction and explicit reasoning based on scene complexity. This is achieved through a three-stage adaptive reasoning strategy [11][18]. - The framework's performance is validated through extensive experiments on the Navsim benchmark, achieving a Predictive Driver Model Score (PDMS) of 90.3, which is 1.7 points higher than the best pure visual baseline model. The model demonstrates superior adaptive reasoning capabilities, selectively enabling CoT in 96% of complex scenarios and defaulting to direct trajectory prediction in 84% of simple scenarios [4][18][50]. Group 3: Experimental Results and Analysis - The article presents a comprehensive evaluation of AdaThinkDrive against existing models, showing that it outperforms both "always think" and "never think" baseline models, with PDMS improvements of 2.0 and 1.4 points, respectively. Additionally, the reasoning time is reduced by 14% compared to the "always think" baseline, indicating a balance between accuracy and efficiency [4][18][58]. - The results indicate that the optimal reasoning strategy is not universal but depends on scene complexity, emphasizing the need for models to adaptively enable reasoning based on the context [10][18]. Group 4: Conclusion - The article concludes that reasoning in simple scenarios often increases computational costs without enhancing decision quality. AdaThinkDrive addresses this by allowing agents to learn when to think, guided by an adaptive thinking reward mechanism. The experimental results on the NAVSIM benchmark demonstrate that AdaThinkDrive achieves state-of-the-art performance, underscoring the importance of adaptive thinking for accurate and efficient decision-making in autonomous driving systems [66].
当前的自动驾驶VLA,还有很多模块需要优化...
自动驾驶之心· 2025-09-18 11:00
Core Viewpoint - VLA (Vision-Language-Action) is emerging as a mainstream keyword in autonomous driving, with rapid advancements in both academia and industry, aiming to overcome the limitations of traditional modular architectures and enhance the capabilities of autonomous systems [1][5]. Summary by Sections VLA Research and Development - The transition from traditional modular architectures to end-to-end models is marked by the introduction of VLA, which aims to unify sensor inputs directly into driving commands, addressing previous bottlenecks in the development of autonomous driving systems [2][5]. - The VLA model leverages large language models (LLMs) to enhance reasoning, explanation, and interaction capabilities, making it a significant advancement in the field [5]. Traditional Modular Architecture - Early autonomous driving systems (L2-L4) utilized a modular design, where each module (e.g., object detection, trajectory prediction) was developed independently, leading to issues such as error accumulation and information loss [3]. - The limitations of traditional architectures include reliance on manually designed rules, making it difficult to handle complex traffic scenarios [3][4]. Emergence of Pure Vision End-to-End Models - The rise of pure vision end-to-end models, exemplified by NVIDIA's DAVE-2 and Wayve, aimed to simplify system architecture through imitation learning, but faced challenges related to transparency and generalization in unseen scenarios [4][5]. VLA Paradigm - The VLA paradigm introduces a new approach where language serves as a bridge between perception and action, enhancing the model's interpretability and trustworthiness [5]. - VLA models can utilize pre-trained knowledge from LLMs to better understand complex traffic situations and make logical decisions, improving generalization to novel scenarios [5]. Course Objectives and Structure - The course aims to provide a systematic understanding of VLA, addressing gaps in knowledge and practical skills, and includes a comprehensive curriculum covering various aspects of VLA research [6][12]. - The program consists of 12 weeks of online group research, followed by 2 weeks of paper guidance, and an additional 10 weeks for paper maintenance, focusing on both theoretical and practical applications [7][30]. Enrollment and Requirements - The course is designed for individuals with a background in deep learning and basic knowledge of autonomous driving algorithms, requiring familiarity with Python and PyTorch [16][19]. - The class size is limited to 6-8 participants to ensure personalized attention and effective learning [11]. Course Highlights - Participants will gain insights into classic and cutting-edge papers, coding skills, and methodologies for writing and submitting research papers, enhancing their academic and professional profiles [12][15][30].
千万美元奖金!2077AI启动Project EVA,邀全球超人挑战AI认知极限
自动驾驶之心· 2025-09-18 11:00
Core Insights - The 2077AI Open Source Foundation has launched Project EVA, a global AI evaluation challenge with a total prize pool of $10.24 million, aimed at exploring the true capabilities of large language models (LLMs) [1][2] - The project seeks to move beyond traditional AI benchmarks to a new paradigm that tests AI's limits in complex logic, deep causality, counterfactual reasoning, and ethical dilemmas [1] - Participants are encouraged to design insightful "extreme problems" to challenge the cognitive blind spots of current leading AI models [1][2] Group 1 - Project EVA is not a programming competition but a trial of wisdom and creativity, focusing on defining the future of AI through innovative problem design [1][2] - The initiative invites top AI researchers, algorithm engineers, and cross-disciplinary experts from fields like philosophy, linguistics, and art to participate [2] - The project emphasizes the importance of a global community in driving disruptive ideas and advancing AI technology [2][3] Group 2 - The registration for Project EVA is now open, allowing participants to secure their spots and receive updates on competition rules, evaluation standards, and schedules [2] - The 2077AI Open Source Foundation is a non-profit organization dedicated to promoting high-quality data openness and cutting-edge AI research [3] - The foundation believes that openness, collaboration, and sharing are essential for the healthy development of AI technology [3]
科研论文这件事,总是开窍后已太晚......
自动驾驶之心· 2025-09-18 03:40
Core Viewpoint - The article emphasizes the importance of early action in academic research, particularly for master's students, to avoid delays in thesis completion and potential extensions of study periods [1][2]. Group 1: Types of Delays - "Waiting for Guidance" Type: Students feel lost without clear direction from their advisors and end up passively waiting, wasting time [2]. - "Perfectionist" Type: Students aim to master all knowledge and produce perfect results before starting their writing, leading to endless delays [2]. - "Procrastinator" Type: Students avoid the daunting tasks of literature review and writing, distracting themselves with other activities [2]. - "Underestimating Time" Type: Students mistakenly believe that the process from idea to publication is quick, not realizing it can take months or even years [2]. Group 2: Importance of Early Action - The core message is to take action as early as possible, treating paper writing as a continuous goal throughout the master's program rather than a last-minute task [2][3]. - Starting research during the summer after the first year provides nearly two years to refine 1-2 high-quality papers, while waiting until the second year leaves less than a year with high pressure from other commitments [3]. Group 3: Actionable Guidelines - Establish a "paper awareness" from the first semester, understanding graduation requirements and familiarizing with key journals and conferences [4][5]. - Proactively communicate with advisors about research directions, even if ideas are not fully formed, to utilize the summer after the first year effectively [5]. - Embrace iterative research: focus on completing initial drafts rather than achieving perfection, starting with small goals like replicating a classic paper or running a baseline model [5]. - Quick trial and error is encouraged; initial results, even if not ideal, should be organized into drafts for feedback, which is crucial for improving research and writing skills [5].
中国大模型的技术一号位们
自动驾驶之心· 2025-09-18 03:40
Core Viewpoint - The article discusses the rapid development and competitive landscape of AI in China, highlighting key leaders and their contributions to the advancement of AI technologies and applications in various industries [2][37]. Group 1: Key Leaders and Their Contributions - Liang Wenfeng, founder of DeepSeek, demonstrated the potential of Chinese AI startups by achieving 30 million daily active users within 20 days of product launch, showcasing rapid development and market impact [4][5]. - Lin Junyang, head of Tongyi Qianwen at Alibaba Cloud, led the team to adapt AI models for over 100,000 enterprise clients across 20 industries, emphasizing the importance of industry-specific applications [9][10]. - Wu Yonghui, head of ByteDance's Seed team, focused on user-centric AI applications, achieving over 10 million daily active users by addressing everyday needs in various scenarios [12][14]. - Bo Liefeng, core leader of Tencent's Mixyuan model, successfully integrated AI capabilities into over 200,000 enterprise clients, enhancing efficiency in sectors like finance and manufacturing [16][17]. - Xu Li, chairman of SenseTime, developed the SenseCore AI infrastructure, enabling the deployment of the Riri New Model across multiple sectors, serving over 1,000 large enterprises globally [21][23]. - Yan Junjie, founder of Minimax, introduced the first commercial trillion-parameter MoE architecture model, rapidly iterating to meet diverse enterprise needs and achieving significant user engagement [25][27]. - Yang Zhilin, founder of Moonshot AI, focused on long-context processing capabilities, leading to the successful launch of Kimi Chat, which gained millions of users in specialized fields [29][32]. - Wang Haifeng, CTO of Baidu, established the PaddlePaddle deep learning platform and led the development of the Wenxin model, solidifying Baidu's leadership in the Chinese AI landscape [33][35]. Group 2: Industry Impact - The success of these leaders and their companies illustrates the growing strength of China's AI sector, pushing the boundaries of technology and application across various industries [2][37]. - The advancements in AI technology are not only enhancing operational efficiencies but also driving digital transformation in traditional sectors, thereby increasing the competitiveness of Chinese enterprises on a global scale [10][23]. - The collaborative efforts among these companies are fostering a robust AI ecosystem, promoting innovation and practical applications that address real-world challenges [21][27].