VLA
Search documents
搞过自驾的小伙伴,在其他领域还是很抢手
自动驾驶之心· 2025-12-28 03:30
Core Insights - The autonomous driving industry has experienced significant developments this year, focusing on technology, cost, and efficiency improvements as it matures [1] - There has been a notable shift in talent, with many professionals transitioning to other sectors like L4, embodiment, and drones, while algorithm talent in autonomous driving remains highly sought after [1][2] - Major technological advancements in autonomous driving have consolidated around key areas such as end-to-end systems, VLA, world models, and reinforcement learning, with many midstream companies actively hiring [3] Industry Trends - The autonomous driving sector is seeing an increase in B-end clients and a movement towards offline engagement, while C-end services are becoming more specialized [1] - The community of paid members in the autonomous driving sector has surpassed 4,000, indicating growing interest and engagement in technology development and job opportunities [3] - The industry is characterized by strong collaboration capabilities among professionals who have experience with large clusters and corner cases, which are lacking in other sectors [2]
收到很多同学关于自驾方向选择的咨询......
自动驾驶之心· 2025-12-26 09:18
Core Insights - The article discusses various cutting-edge directions in autonomous driving research, emphasizing the importance of deep learning and traditional methods for students in related fields [2][3]. Group 1: Research Directions - Key areas of focus include VLA, end-to-end learning, reinforcement learning, 3D goal detection, and occupancy networks, which are recommended for students in computer science and automation [2][3]. - For mechanical and vehicle engineering students, traditional methods like PnC and 3DGS are suggested as they require lower computational power and are easier to start with [2]. Group 2: Guidance and Support - The article announces the launch of a paper guidance service that offers support in various research areas, including multi-sensor fusion, trajectory prediction, and semantic segmentation [3][6]. - Services provided include topic selection, full process guidance, and experimental support, aimed at enhancing the research capabilities of students [6][7]. Group 3: Publication Opportunities - The guidance service has a high acceptance rate for papers submitted to top conferences and journals, including CVPR, AAAI, and ICLR [7]. - The article highlights the availability of support for various publication levels, including CCF-A, CCF-B, and SCI indexed journals [10].
冷静看待VLA:不是救世主,也不是“垃圾”
自动驾驶之心· 2025-12-26 09:18
Core Viewpoint - The article critiques the VLA (Visual Language Agent) approach, emphasizing that while it has merits, it also has significant limitations that need to be addressed for better performance in complex environments [1]. Group 1: Challenges and Limitations - The main challenge lies in enabling models to generalize effectively [2]. - Current models struggle in complex environments due to simplistic task settings, often limited to "grab-and-drop" scenarios with minimal obstacles [6]. - The reliance on large datasets and the black-box nature of systems hinder understanding of model capabilities [6]. Group 2: Proposed Solutions - A focus on designing effective subgoal embeddings is crucial for ensuring generalization, potentially using cross-attention mechanisms to link task text tokens with image patch tokens [3][4]. - The article suggests that learning-based methods may outperform traditional methods in complex environments, as they can adapt to visual observation errors and continuously correct actions [4]. - An explicit VLA approach is recommended, where large models break down tasks into subgoals, allowing for clearer structure and reduced training requirements [8].
小米陈光:我们不想制造技术焦虑了
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-25 08:24
Core Viewpoint - The smart driving industry is experiencing a "term overload" phenomenon, with various factions emerging around different models such as VLA (Vision Language Action), VA (Vision Action), and WA (World Action) [2] Group 1: Industry Trends - The industry is divided between proponents of VLA, like Li Auto and Yuanrong Qixing, and opponents like Huawei and Xiaopeng, who prefer WA [2] - Xiaomi is focusing on end-to-end development, showcasing significant potential in this area, despite starting later than competitors like Li Auto and NIO [3][6] - Xiaomi's end-to-end algorithm has evolved rapidly, with multiple versions released within a year, indicating a fast-paced development cycle [6] Group 2: Technological Development - Xiaomi's latest version of its HAD (Highly Automated Driving) system incorporates world models and reinforcement learning, enhancing its cognitive capabilities [3][4] - The introduction of world models and reinforcement learning is seen as a necessary evolution from simple data-driven approaches to more complex cognitive-driven methodologies [9][10] - Xiaomi's approach emphasizes maximizing the model's intelligence density within limited computational resources [8][15] Group 3: Team Structure and Strategy - Xiaomi's smart driving team has grown to over 1,800 members, reflecting a rapid scaling compared to competitors [6][12] - The team is divided into three groups focusing on different technological routes, including end-to-end, VLA, and other exploratory research [4][13] - Xiaomi's strategy is characterized by a gradual introduction of new technologies, prioritizing user experience over merely adopting the latest advancements [5][10] Group 4: Challenges and Responses - The integration of reinforcement learning faces challenges, such as ensuring the fidelity of world models and managing computational efficiency [4][33] - Xiaomi's team has encountered external criticism, which they view as a necessary part of their growth and development process [25][26] - The company aims to balance the introduction of new technologies with the need for practical, user-friendly solutions [10][11]
专访地平线副总裁吕鹏:做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-23 00:45
Core Insights - The domestic market for passenger cars priced above 200,000 yuan accounts for 30% of the market share, while those below 130,000 yuan hold a significant 50% share, indicating a vast opportunity for companies like Horizon and Momenta to capture market share in the autonomous driving sector [1][13] - Horizon has launched its Horizon SuperDrive (HSD) solution based on the Journey 6 series chip, entering mass production with significant activation numbers shortly after the launch of new models [1][14] - The company aims to make urban assisted driving technology accessible to vehicles priced at 100,000 yuan, targeting a production scale of 10 million units within the next 3-5 years [2][14] Market Dynamics - The market for vehicles priced below 130,000 yuan is largely untapped in terms of urban assisted driving features, attracting various autonomous driving companies to accelerate their market strategies [1][13] - Horizon's HSD solution has seen rapid adoption, with over 12,000 activations within two weeks of launching two new models, indicating strong market demand [1][14] Technological Development - Horizon is focusing 90% of its R&D resources on end-to-end technology, which is seen as crucial for the future of autonomous driving [2][14] - The company believes that a solid end-to-end foundation is essential for integrating new modalities and enhancing product performance [15][21] Competitive Landscape - Companies lacking chip development capabilities are increasingly collaborating with Horizon, highlighting the company's strong position in the market [2][14] - Horizon's commitment to an end-to-end approach distinguishes it from competitors who are exploring various models, such as VLA [2][21] Technical Insights - The end-to-end system developed by Horizon is one of the few complete systems available, with a focus on seamless information transfer and high-dimensional feature integration [16][17] - The distinction between one-stage and two-stage end-to-end systems is critical, with the former providing a more cohesive and intuitive driving experience [18][19] Future Directions - Horizon plans to enhance its product experience and safety, emphasizing the importance of market acceptance over new terminologies and concepts [11][22] - The company is open to integrating VLA technology in the future but maintains that a robust end-to-end system is foundational for success [24]
地平线吕鹏:端到端是基石,做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-22 13:23
Core Viewpoint - The article emphasizes the importance of end-to-end technology in the development of autonomous driving solutions, highlighting Horizon's commitment to this approach as a foundation for future advancements in the industry. Market Overview - In the first three quarters of this year, the market share for passenger cars priced above 200,000 yuan accounted for 30%, while those below 130,000 yuan reached 50%, with many lower-priced models lacking urban auxiliary driving features [1]. - This gap in the market is attracting companies like Horizon and Momenta to accelerate their strategies to capture market opportunities [1]. Product Development - Horizon launched its Horizon SuperDrive (HSD) solution based on the Journey 6 series chips in April, entering mass production by November with the launch of the Exeed ET5 and Deep Blue L06 models, achieving over 12,000 activations within two weeks [1][2]. - The company aims to make urban auxiliary driving features available in vehicles priced around 100,000 yuan, targeting a production scale of ten million units in the next 3-5 years [2]. Technological Strategy - Horizon is one of the few companies firmly committed to the end-to-end approach in autonomous driving, believing that a solid end-to-end foundation is essential for integrating new modalities and enhancing product performance [3][7]. - The company has invested 90% of its R&D resources into developing and implementing end-to-end technology since the end of 2024 [2]. Technical Insights - Horizon's end-to-end system is described as a complete solution, contrasting with two-stage systems that may lose information during processing [4][5]. - The company believes that a robust end-to-end model is crucial for achieving high performance and seamless driving experiences, akin to human driving instincts [6][9]. Future Directions - Horizon's future plans include enhancing its end-to-end technology while exploring the integration of world models and reinforcement learning as auxiliary components to improve overall system performance [9][10]. - The focus remains on product experience and safety, with an emphasis on market acceptance rather than getting caught up in new terminologies or concepts [9].
研究生实验到什么程度可以写小论文?
自动驾驶之心· 2025-12-22 03:23
Core Viewpoint - The article emphasizes the importance of timely submission of academic papers, particularly for graduate students, highlighting that a complete story in research is more valuable than novelty [1]. Group 1: Academic Guidance Services - The company offers a paper guidance service aimed at efficiently producing research results within a limited timeframe, helping students avoid common pitfalls in self-writing [2]. - The guidance covers various advanced topics such as reinforcement learning, 3D object detection, and multi-sensor fusion, among others, providing tailored advice based on individual research directions [3]. - The service is designed to assist students who face challenges such as unclear direction, difficulty in code reproduction, and lack of systematic research training [5]. Group 2: Instructor Qualifications - All instructors associated with the service are from globally recognized universities ranked in the top 100 by QS, with multiple publications in A-level conferences and extensive project experience [6]. Group 3: Comprehensive Academic Support - The company provides a wide range of academic support services, including assistance with journal papers, conference papers, and thesis projects, ensuring a comprehensive approach to academic success [8]. - The service is results-oriented, offering continuous support until the paper is submitted, with a focus on enhancing coding skills alongside research guidance [8]. Group 4: FAQs and Additional Information - The company assures that even students with no prior experience can publish papers by following structured courses, with the potential to produce a small paper within six months [11]. - Outstanding students may receive recommendation letters from prestigious institutions and opportunities for internships in leading companies, indicating that publishing papers is just the beginning of their academic journey [11]. - Pricing for the services varies based on the publication target, with detailed consultations provided to tailor support to individual needs [11].
「一脑多形」圆桌:世界模型、空间智能在具身智能出现了哪些具体进展?丨GAIR 2025
雷峰网· 2025-12-20 04:07
Core Viewpoint - The article discusses the current state and future potential of embodied intelligence, focusing on the challenges and opportunities presented by world models and spatial intelligence in the field of robotics and AI [2][4][10]. Group 1: Development of Embodied Intelligence - The technology route for embodied intelligence is still in an exploratory phase, with no convergence yet, which is seen as a positive sign for innovation [4][3]. - There is a consensus among experts that the core issues of embodied intelligence, such as interaction and human-machine collaboration, should be addressed by academic institutions, while industries focus on practical applications [4][5]. - The integration of AI with physical entities is expected to lead to significant advancements in intelligence, but the field must avoid reverting to industrial automation without achieving generalized intelligence [4][5][30]. Group 2: World Models in Autonomous Driving - World models are currently being utilized by leading companies like Tesla to enhance data generation and improve decision-making processes through closed-loop testing [11][12]. - The concept of world models has gained traction in autonomous driving due to the simplicity of generating scenarios compared to robotics, with advancements in generative AI enabling the creation of realistic training samples [12][13]. - There is ongoing debate regarding the definition and application of world models in both autonomous driving and robotics, with differing opinions on the necessity of pixel-level reconstruction versus latent state representation [12][13][14]. Group 3: Spatial Intelligence in Robotics - Spatial intelligence is a critical aspect of robotics, with a focus on perception and understanding spatial relationships, which has evolved from traditional SLAM techniques to more learning-based approaches [20][21]. - The current challenges in spatial intelligence include the need for better data representation and understanding of complex spatial relationships, which are still underdeveloped in robotic systems [22][23]. - The integration of visual and semantic information is essential for enhancing robots' spatial capabilities, but the field is still in its early stages [22][23][24]. Group 4: Commercialization and Future Applications - The future of drone applications is expected to expand significantly, with potential uses in various sectors, but the timeline for widespread adoption remains uncertain [26][27]. - The gap between technological capabilities and market needs poses challenges for entrepreneurs, as there is often a mismatch between innovative ideas and practical industrial requirements [30][31]. - The shift towards learning-based control paradigms is anticipated to increase the applicability of drones and robots in real-world scenarios, moving beyond traditional automation [28][29].
最近收到了很多同学关于自驾方向选择的咨询......
自动驾驶之心· 2025-12-19 09:25
Core Insights - The article discusses various advanced directions in autonomous driving research, emphasizing the importance of deep learning and traditional methods for different academic backgrounds [2][3]. Group 1: Research Directions - Key areas of focus include VLA, end-to-end learning, reinforcement learning, 3DGS, and world models, which are recommended for students in computer science and automation [2]. - For mechanical and vehicle engineering students, traditional methods like PnC and 3DGS are suggested due to their lower computational requirements and ease of entry [2]. Group 2: Paper Guidance Services - The article announces the launch of a paper guidance service that covers various topics such as end-to-end learning, multi-sensor fusion, and trajectory prediction [3][6]. - The service includes support for topic selection, full process guidance, and experimental assistance [6]. Group 3: Publication Success - The guidance service has a high acceptance rate for papers submitted to top conferences and journals, including CVPR, AAAI, and ICLR [7]. - The article highlights the range of publication venues, including CCF-A, CCF-B, and various SCI categories [10].
特斯拉再一次预判潮水的方向
自动驾驶之心· 2025-12-18 09:35
Core Viewpoint - Tesla's AI leader Ashok Elluswamy revealed the technical methodology behind Tesla's Full Self-Driving (FSD) in a recent article, emphasizing the choice of an end-to-end neural network model and addressing the challenges faced in practice [4][6]. Group 1: End-to-End Neural Network Model - Tesla's decision to adopt an end-to-end neural network model is driven by the need to address complex driving scenarios that cannot be pre-defined by rules, such as the "trolley problem" and second-order effects [6][10]. - The end-to-end model is described as a complete overhaul of previous architectures, fundamentally changing design, coding, and validation processes, leading to a more human-like driving experience [11][19]. - The model outputs driving instructions alongside interpretable "intermediate results," utilizing technologies like generative Gaussian splatting to create dynamic 3D models of the environment in real-time [8][17]. Group 2: VLA and World Model Concepts - VLA (Vision-Language-Action) is an extension of the end-to-end model that incorporates language information, allowing for a more visual representation of driving behavior [12][14]. - The world model aims to establish a high-bandwidth cognitive system based on video/image data, addressing the limitations of language models in understanding complex, dynamic environments [15][19]. - The relationship between end-to-end, VLA, and world models is clarified, with end-to-end serving as the foundation, VLA as an upgrade, and the world model as the ultimate form of understanding spatial dynamics [12][19]. Group 3: Industry Perspectives and Trends - The industry is divided into three main technical routes: end-to-end, VLA, and world model, with companies like Horizon Robotics and Bosch primarily adopting end-to-end due to lower costs and higher stability [13][19]. - VLA has faced criticism from industry leaders who argue that its reliance on language models may not be essential for effective autonomous driving, emphasizing the need for spatial understanding instead [16][19]. - Tesla's recent publication has reignited discussions in the industry, positioning the company at the forefront of current technological directions and providing a systematic analysis of practical applications [20].