Workflow
自动驾驶之心
icon
Search documents
暑假打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛启动~
自动驾驶之心· 2025-07-17 07:29
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is supported by sponsors and technical support units, including Beijing Jiuzhang Yunjing Technology Co., Ltd. [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - The Spatial Intelligence track involves constructing a 3D reconstruction model based on multi-view aerial images [17]. - The Embodied Intelligence track focuses on completing tasks in dynamic occlusion simulation environments [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with specific metrics like PSNR and F1-Score [19][20]. - For Embodied Intelligence, evaluation will assess task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Awards and Recognition - Each track will have awards, including cash prizes and computing vouchers, sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd. [25]. - Awards include first prize of 6,000 RMB and 500 computing vouchers, with additional prizes for second and third places [25]. Group 7: Intellectual Property and Data Usage - Participants must sign a data usage agreement, ensuring that the provided datasets are used solely for the competition and deleted afterward [29]. - Teams must guarantee that their submitted results are reproducible and that all algorithms and related intellectual property belong to them [29]. Group 8: Conference Information - The 8th China Conference on Pattern Recognition and Computer Vision (PRCV 2025) will be held from October 15 to 18, 2025, in Shanghai [27]. - The conference will feature keynote speeches from leading experts and various forums to promote academic and industry collaboration [28].
不容易,谈薪阶段成功argue到了期望薪资~
自动驾驶之心· 2025-07-17 07:29
Core Viewpoint - The article emphasizes the key attributes that HR looks for during interviews in the autonomous driving sector, focusing on stability, communication skills, and a positive attitude. Group 1: Key Attributes HR Values - Stability: HR prefers candidates with a stable work history and a sense of responsibility, avoiding those who frequently change jobs [1] - Thinking Ability: Candidates should demonstrate logical reasoning, situational response skills, and emotional intelligence [1] - Personality Traits: A positive attitude, teamwork spirit, and emotional stability are crucial for comfortable collaboration [1] - Stress Resistance: Candidates should show the ability to handle pressure and the willingness to start over after failures [1] - Communication Skills: HR values candidates who prioritize the bigger picture, engage in active communication, and express their viewpoints confidently [1] Group 2: Common Interview Questions - Self-Introduction: Candidates should present themselves with humility and confidence, using a clear structure to highlight their strengths [2] - Stability Questions: When asked about leaving previous jobs, candidates should provide objective reasons without negativity, focusing on growth opportunities in the new role [3] - Conflict Resolution: Candidates should reflect on their own perspectives when discussing conflicts with supervisors, emphasizing a collaborative approach [4] - Supervisor Expectations: Candidates should prioritize company interests and focus on major issues while being compliant with minor ones [5] Group 3: Salary and Other Considerations - Offers: Candidates should aim to have multiple offers to strengthen their negotiating position, ideally seeking a salary range slightly above the maximum of the expected salary [6] - Salary Expectations: Candidates should research the salary range for their prospective boss and aim for a reasonable increase [6] - Questions for HR: Candidates should express eagerness by asking about specific roles, business directions, and promotion rules, while also clarifying salary structures and benefits [6]
研二多发几篇论文,也不至于到现在这个地步……
自动驾驶之心· 2025-07-17 02:19
Core Viewpoint - The article emphasizes the importance of high-quality research papers for students, especially those pursuing master's or doctoral degrees, to enhance their academic and career prospects [1]. Group 1: Challenges Faced by Students - Many students struggle to secure jobs due to average research outcomes and are considering pursuing doctoral studies to alleviate employment pressure [1]. - Students often face difficulties in research paper writing, including topic selection, framework confusion, and weak argumentation, especially when lacking guidance from supervisors [1]. Group 2: Services Offered - The company provides professional assistance for students in writing research papers, particularly in the fields of autonomous driving and artificial intelligence [3][4]. - The guidance process includes defining research directions, literature review, experimental design, data collection, drafting, and submission to journals [4]. Group 3: Target Audience - The services are suitable for students who are under supervision, lack guidance, need to accumulate research experience, or aim to enhance their academic credentials for job applications or further studies [11]. Group 4: Unique Selling Points - The company boasts a high acceptance rate of 96% for students it has guided, with over 400 students assisted in the past three years [3]. - It offers personalized guidance with a team of over 300 instructors from top global universities, ensuring a tailored approach to each student's needs [3][10]. Group 5: Additional Benefits - Outstanding students may receive recommendation letters from prestigious institutions and opportunities for internships at leading tech companies [14]. - The company provides a matching system to pair students with suitable mentors based on their research interests and goals [13].
小模型逆袭!复旦&创智邱锡鹏团队造出「世界感知」具身智能体~
自动驾驶之心· 2025-07-17 02:19
Core Viewpoint - The article discusses the introduction of the World-Aware Planning Narrative Enhancement (WAP) framework, which significantly improves the performance of large vision-language models (LVLMs) in embodied planning tasks by integrating four-dimensional cognitive narratives and closed-loop observation methods [3][16]. Group 1: Introduction - LVLMs are becoming central in embodied planning, but existing methods often rely on environment-agnostic imitation learning, leading to poor performance in unfamiliar scenarios [3]. - WAP aims to enhance model capabilities by injecting four-dimensional cognitive narratives (visual, spatial, functional, syntactic) into the data layer, allowing models to better understand their environment before reasoning [3][4]. Group 2: Technical Methodology - WAP's main distinction is its explicit binding of instructions to environmental context, relying solely on visual closed-loop feedback without privileged information [6]. - The framework employs a three-stage curriculum learning approach, using only RGB observations and no privileged feedback to train the model [12]. Group 3: Experimental Results - The Qwen2.5-VL model achieved a success rate increase from 2% to 62.7% (+60.7 percentage points) on the EB-ALFRED benchmark, surpassing models like GPT-4o and Claude-3.5 [4][14]. - The model demonstrated a long-range task success rate improvement from 0% to 70%, indicating the effectiveness of the WAP framework in complex planning scenarios [14]. - A case study illustrated WAP's ability to decompose complex instructions into manageable steps, showcasing its superiority over baseline models that failed to consider implicit conditions [15]. Group 4: Conclusion and Future Work - WAP successfully integrates "world knowledge" into data and reasoning chains, allowing small-scale open-source LVLMs to outperform commercial models in pure visual closed-loop settings [16]. - Future work includes enhancing continuous control, expanding to dynamic industrial/outdoor environments, and exploring self-supervised narrative evolution for iterative data-model improvement [17].
超越SOTA近40%!西交I2-World:超强OCC世界模型实现3G训练显存37 FPS推理~
自动驾驶之心· 2025-07-16 11:11
Core Insights - The article discusses the introduction of I2-World, a new framework for 4D OCC (Occupancy) prediction, which shows a performance improvement of nearly 40% compared to existing models [1][9][28]. - I2-World utilizes a dual-tokenization approach, separating the scene into intra-scene and inter-scene tokenizers, enhancing both spatial detail and temporal dynamics [5][6][14]. - The framework achieves state-of-the-art results in mIoU and IoU metrics, with improvements of 25.1% and 36.9% respectively, while maintaining high computational efficiency [9][28]. Group 1: Introduction and Background - 3D OCC provides more geometric and detail information about 3D scenes, making it more suitable for autonomous driving systems compared to traditional methods [4]. - The development of generative AI has highlighted the potential of occupancy-based world models to simulate complex traffic scenarios and address corner cases [4]. - Existing tokenization methods face challenges in efficiently compressing 3D scenes while retaining temporal dynamics [4][14]. Group 2: I2-World Framework - I2-World consists of two main components: I2-Scene Tokenizer and I2-Former, which work together to enhance the efficiency and accuracy of 4D OCC predictions [5][6]. - The I2-Scene Tokenizer decouples the tokenization process into two complementary components, focusing on capturing fine-grained details and modeling dynamic motion [5][6][14]. - I2-Former employs a mixed architecture that integrates both encoding and decoding processes, allowing for high-fidelity scene generation [6][9]. Group 3: Performance Metrics - I2-World establishes new state-of-the-art levels in the Occ3D benchmark, achieving a 25.1% improvement in mIoU and a 36.9% improvement in IoU [9][28]. - The model operates with a training memory requirement of only 2.9 GB and achieves a real-time inference speed of 37 FPS [9][28]. - The end-to-end variant, I2-World-STC, shows even more promising results, with a 50.9% improvement in mIoU [28]. Group 4: Experimental Results - The article presents a comprehensive evaluation of I2-World's performance across various metrics, demonstrating its effectiveness in 4D occupancy space prediction [28][31]. - The framework's ability to generalize across different datasets is highlighted, showcasing its potential as an automated labeling solution [31]. - Ablation studies confirm the contributions of each component within the I2-Scene Tokenizer and I2-Former, validating the design choices made in the framework [33][35]. Group 5: Conclusion - I2-World represents a significant advancement in 3D scene tokenization for autonomous driving applications, achieving efficient compression and high-fidelity generation [42]. - The framework's design allows for fine-grained control over scene predictions, making it adaptable to various driving scenarios [24][42]. - The experimental results affirm the framework's potential as a robust solution for dynamic scene understanding in autonomous systems [42].
ICML'25 | 统一多模态3D全景分割:图像与LiDAR如何对齐和互补?
自动驾驶之心· 2025-07-16 11:11
Core Insights - The article discusses the innovative IAL (Image-Assists-LiDAR) framework that enhances multi-modal 3D panoptic segmentation by effectively combining LiDAR and camera data [2][3]. Technical Innovations - IAL introduces three core technological breakthroughs: 1. An end-to-end framework that directly outputs panoptic segmentation results without complex post-processing [7]. 2. A novel PieAug paradigm for modal synchronization enhancement, improving training efficiency and generalization [7]. 3. Precise feature fusion through Geometric-guided Token Fusion (GTF) and Prior-driven Query Generation (PQG), achieving accurate alignment and complementarity between LiDAR and image features [7]. Problem Identification and Solutions - Existing multi-modal segmentation methods often enhance only LiDAR data, leading to misalignment with camera images, which negatively impacts feature fusion [9]. - The "cake-cutting" strategy segments scenes into fan-shaped slices along angle and height axes, creating paired point clouds and multi-view image units [9]. - The PieAug strategy is compatible with existing LiDAR-only enhancement methods while achieving cross-modal alignment [9]. Feature Fusion Module - The GTF feature fusion module aggregates image features accurately through physical point projection, addressing significant positional biases in voxel-level projections [10]. - Traditional methods overlook the receptive field differences between sensors, limiting feature expression capabilities [10]. Query Initialization - The PQG query initialization employs a three-pronged query generation mechanism to improve recall rates for distant small objects [12]. - This mechanism includes geometric prior queries, texture prior queries, and no-prior queries to enhance detection of challenging samples [12]. Model Performance - IAL achieved state-of-the-art (SOTA) performance on nuScenes and SemanticKITTI datasets, surpassing previous methods by up to 5.1% in PQ [16]. - The model's performance metrics include a PQ of 82.0, RO of 91.6, and mIoU of 79.9, demonstrating significant improvements over competitors [14]. Visualization Results - IAL shows notable enhancements in distinguishing adjacent targets, detecting distant targets, and identifying false positives and negatives [17].
性价比极高!黑武士001:你的第一台自动驾驶全栈小车
自动驾驶之心· 2025-07-16 11:11
Core Viewpoint - The article announces the launch of the "Black Warrior Series 001," a full-stack autonomous driving vehicle aimed at research and education, with a promotional price of 34,999 yuan and a deposit scheme to encourage early orders [1]. Group 1: Product Overview - The Black Warrior 001 is a lightweight solution developed by the Autonomous Driving Heart team, supporting various functionalities such as perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [2]. - The product is designed for multiple applications, including undergraduate learning, graduate research, and as teaching tools for educational institutions and training companies [5]. Group 2: Performance and Testing - The vehicle has been tested in various environments, including indoor, outdoor, and parking scenarios, demonstrating its capabilities in perception, localization, fusion, navigation, and planning [3]. - Specific tests include 3D point cloud target detection, 2D and 3D laser mapping in indoor parking, slope tests, and outdoor large scene 3D mapping [6][7][8][9][10]. Group 3: Hardware Specifications - Key hardware components include: - 3D LiDAR: Mid 360 - 2D LiDAR: from Lidar Technology - Depth Camera: from Orbbec, equipped with IMU - Main Control Chip: Nvidia Orin NX 16G - Display: 1080p [12]. - The vehicle weighs 30 kg, has a battery power of 50W, operates at 24V, and has a runtime of over 4 hours, with a maximum speed of 2 m/s [14]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, supporting one-click startup and providing a development environment [16]. - The vehicle supports various functionalities such as 2D and 3D SLAM, target detection, and vehicle navigation and obstacle avoidance [17]. Group 5: After-Sales and Support - The company offers one year of after-sales support (excluding human damage), with free repairs for damages caused by operational errors or code modifications during the warranty period [39].
ICML 2025杰出论文出炉:8篇获奖,南大研究者榜上有名
自动驾驶之心· 2025-07-16 11:11
Core Insights - The article discusses the recent ICML 2025 conference, highlighting the award-winning papers and the growing interest in AI research, evidenced by the increase in submissions and acceptance rates [3][5]. Group 1: Award-Winning Papers - A total of 8 papers were awarded this year, including 6 outstanding papers and 2 outstanding position papers [3]. - The conference received 12,107 valid paper submissions, with 3,260 accepted, resulting in an acceptance rate of 26.9%, a significant increase from 9,653 submissions in 2024 [5]. Group 2: Outstanding Papers - **Paper 1**: Explores masked diffusion models (MDMs) and their performance improvements through adaptive token decoding strategies, achieving a solution accuracy increase from less than 7% to approximately 90% in logic puzzles [10]. - **Paper 2**: Investigates the role of predictive technologies in identifying vulnerable populations for government assistance, providing a framework for policymakers [14]. - **Paper 3**: Introduces CollabLLM, a framework enhancing collaboration between humans and large language models, improving task performance by 18.5% and user satisfaction by 17.6% [19]. - **Paper 4**: Discusses the limitations of next-token prediction in creative tasks and proposes new methods for enhancing creativity in language models [22][23]. - **Paper 5**: Reassesses conformal prediction from a Bayesian perspective, offering a practical alternative for uncertainty quantification in high-risk scenarios [27]. - **Paper 6**: Addresses score matching techniques for incomplete data, providing methods that perform well in both low-dimensional and high-dimensional settings [31]. Group 3: Outstanding Position Papers - **Position Paper 1**: Proposes a dual feedback mechanism for peer review in AI conferences to enhance accountability and quality [39]. - **Position Paper 2**: Emphasizes the need for AI safety to consider the future of work, advocating for a human-centered approach to AI governance [44].
入职小米两个月了,还没摸过算法代码。。。
自动驾驶之心· 2025-07-16 08:46
Core Viewpoint - The article discusses the current trends and opportunities in the autonomous driving industry, emphasizing the importance of skill development and networking for job seekers in this field [4][7][8]. Group 1: Job Market Insights - The article highlights the challenges faced by recent graduates in aligning their job roles with their expectations, particularly in the context of internships and entry-level positions [2][4]. - It suggests that candidates should focus on relevant experiences, even if their current roles do not directly align with their career goals, and emphasizes the importance of showcasing all relevant skills on resumes [6][7]. Group 2: Skill Development and Learning Resources - The article encourages individuals to continue developing skills in autonomous driving, particularly in areas like large models and data processing, which are currently in demand [6][8]. - It mentions the availability of various resources, including online courses and community support, to help individuals enhance their knowledge and skills in the autonomous driving sector [8][10]. Group 3: Community and Networking - The article promotes joining communities focused on autonomous driving and embodied intelligence, which can provide valuable networking opportunities and access to industry insights [8][10]. - It emphasizes the importance of collaboration and knowledge sharing within these communities to stay updated on the latest trends and technologies in the field [8][10].
三周年了!从自动驾驶到具身智能:一个AI教育平台的破局与坚守~
自动驾驶之心· 2025-07-16 08:14
Core Viewpoint - The article highlights the significant progress made in the autonomous driving sector over the past year, emphasizing the transition from end-to-end solutions to more advanced models like VLM and VLA, and the importance of innovation and execution in sustaining growth and survival in the industry [2][7]. Summary by Sections Company Progress - The company has developed four key intellectual properties (IPs): Autonomous Driving Heart, Embodied Intelligence Heart, 3D Vision Heart, and Large Model Heart, expanding its reach through various platforms such as WeChat, Bilibili, and Zhihu [2]. - A shift from purely online education to a comprehensive service platform that includes hardware, offline training, and job placement has been initiated, with a new office established in Hangzhou [2]. Insights on Business Strategy - The article discusses the importance of understanding market needs and business pain points, suggesting that many businesses fail to recognize the long-term value of their endeavors [4]. - The company emphasizes a strategy of "maintaining a broad perspective while achieving incremental progress," focusing on long-term value creation while also addressing immediate commercial opportunities [4]. Challenges and Solutions - The company acknowledges challenges in maintaining course quality and the need for rigorous management to ensure standards are met, especially as the platform grows [5][6]. - In response to feedback regarding course quality, the company has committed to re-recording and supplementing materials, demonstrating a willingness to adapt and improve based on user input [6]. Innovation and Execution - The article stresses that true innovation is essential for survival in the competitive landscape of AI education and self-media, with a focus on execution as a key differentiator [6][7]. - The company aims to transition from being solely an educational entity to a technology company, with plans to stabilize operations by the second half of 2025 [8]. Future Goals - The overarching goal is to make AI education accessible to all students in need, ensuring that AI is easier to learn and engage with [9]. - A promotional campaign is underway to celebrate the third anniversary, offering significant discounts on various courses related to autonomous driving and AI [10].