VLA
Search documents
工业界大佬带队!三个月搞定端到端自动驾驶
自动驾驶之心· 2025-09-29 08:45
Core Viewpoint - 2023 is identified as the year of end-to-end production, with 2024 expected to be a significant year for this development in the automotive industry, particularly in autonomous driving technology [1][3]. Group 1: End-to-End Production - Leading new forces and manufacturers have already achieved end-to-end production [1]. - There are two main paradigms in the industry: one-stage and two-stage approaches, with UniAD being a representative of the one-stage method [1]. Group 2: Development Trends - Since last year, the one-stage end-to-end approach has rapidly evolved, leading to various derivatives such as perception-based, world model-based, diffusion model-based, and VLA-based one-stage methods [3]. - Major autonomous driving companies are focusing on self-research and mass production of end-to-end autonomous driving solutions [3]. Group 3: Course Offerings - A course titled "End-to-End and VLA Autonomous Driving" has been launched, covering cutting-edge algorithms in both one-stage and two-stage end-to-end approaches [5]. - The course aims to provide insights into the latest technologies in the field, including BEV perception, visual language models, diffusion models, and reinforcement learning [5]. Group 4: Course Structure - The course consists of several chapters, starting with an introduction to end-to-end algorithms, followed by background knowledge essential for understanding the technology stack [9][10]. - The second chapter focuses on the most frequently asked technical keywords in job interviews over the next two years [10]. - Subsequent chapters delve into two-stage end-to-end methods, one-stage end-to-end methods, and practical assignments involving RLHF fine-tuning [12][13]. Group 5: Learning Outcomes - Upon completion, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer [19]. - The course aims to deepen understanding of key technologies such as BEV perception, multimodal large models, and reinforcement learning, enabling participants to apply learned concepts to real projects [19].
在具身智能的岔路口,这场论坛把数据、模型、Infra聊透了
机器之心· 2025-09-29 02:52
Core Viewpoint - The field of embodied intelligence is experiencing unprecedented attention, yet key issues remain unresolved, including data scarcity and differing technical approaches [1][2][3] Group 1: Data and Technical Approaches - The industry is divided into two factions: the "real machine" faction, which relies on real-world data collection, and the "synthetic" faction, which believes in the feasibility of synthetic data for model training [5][12] - Galaxy General, representing the synthetic faction, argues that achieving generalization in embodied intelligence models requires trillions of data points, which is unsustainable through real-world data alone [8][9] - The "real machine" faction challenges the notion that real-world data is prohibitively expensive, suggesting that with sufficient investment, data collection can be scaled effectively [12][14] Group 2: Model Architecture - Discussions around the architecture of embodied intelligence models highlight a divide between end-to-end and layered approaches, with some experts advocating for a unified model while others support a hierarchical structure [15][19] - The layered architecture is seen as more aligned with biological evolution, while the end-to-end approach is criticized for potential error amplification [19][20] - The debate extends to the relevance of VLA (Vision-Language Alignment) versus world models, with some experts arguing that VLA is currently more promising due to its data efficiency [21][22] Group 3: Industry Trends and Infrastructure - The scaling law in embodied intelligence is beginning to emerge, indicating that expanding model and data scales could be effective [24] - The industry is witnessing an acceleration in the deployment of embodied intelligence technologies, with various companies sharing their experiences in human-robot interaction and industrial applications [24][29] - Cloud service providers, particularly Alibaba Cloud, are emphasized as crucial players in supporting the infrastructure needs of embodied intelligence companies, especially as they transition to mass production [29][31] Group 4: Alibaba Cloud's Role - Alibaba Cloud has been preparing for the exponential growth in data and computational needs associated with embodied intelligence, having developed capabilities to handle large-scale data processing and model training [33][35] - The company offers a comprehensive suite of cloud-based solutions to support both real and synthetic data production, enhancing efficiency and reducing costs [35][36] - Alibaba Cloud's unique position as a model provider and its engineering capabilities are seen as significant advantages in the rapidly evolving embodied intelligence landscape [37][41]
没有导师指导,最快多久可以产出一篇具身领域相关论文?
具身智能之心· 2025-09-28 07:00
Core Insights - The article emphasizes the importance of building a solid foundation in research before diving into complex topics like VLA (Vision-Language-Action) in embodied intelligence [1][6] - VLA is highlighted as a transformative model that allows robots to perform tasks based on language instructions, breaking the limitations of traditional single-task training [4][7] - The article discusses the rapid development of the embodied intelligence sector, with various teams transitioning from research to commercialization, and major tech companies actively investing in this field [6] Summary by Sections VLA Overview - VLA enables robots to autonomously make decisions in diverse environments, significantly enhancing their adaptability and application across industries such as manufacturing and logistics [4][6] - The model has become a research hotspot, fostering collaboration between academia and industry through various projects like pi0, RT-2, and OpenVLA [4][7] Industry Development - The embodied intelligence field is experiencing robust growth, with companies like Unitree, Zhiyuan, and major tech players like Huawei and Tencent making significant strides [6] - There is a growing interest in VLA-related research, with many seeking guidance to quickly enter or transition within this domain [6] Course Offerings - A specialized course on VLA research is introduced, focusing on the theoretical and practical aspects of embodied intelligence, including simulation environment setup and experimental design [10][12] - The course aims to cultivate independent research capabilities, guiding students from idea generation to the completion of a research paper [12][17] Learning Outcomes - Participants will gain comprehensive knowledge of VLA models, practical experience in simulation, and skills in academic writing and research methodology [17] - The course is designed to help students identify research opportunities and navigate the complexities of the embodied intelligence landscape [12][16]
VLA这个方向的论文产出,是真的多......
具身智能之心· 2025-09-26 00:04
想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能长时 间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环境, 广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目的发 展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。其适应性 体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供了广泛的潜力 和实际应用价值,成为智能机器人领域的关键驱动力。 从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、泛 化、少样本、VLA+RL、人形相关。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力等团 队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正在一起 推动这一领域的发展。 很多同学后台留言,咨 ...
VLA及其相关方向占据了顶会近一半的具身工作,特别是这几个......
具身智能之心· 2025-09-23 04:00
从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、 泛化、少样本、VLA+RL、人形相关。 想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能 长时间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环 境,广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目 的发展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。 其适应性体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供 了广泛的潜力和实际应用价值,成为智能机器人领域的关键驱动力。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力 等团队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正 在一起推动这一领域的发展。 很多同学后台留言,咨 ...
打算招聘几位大佬共创平台(世界模型/VLA等方向)
自动驾驶之心· 2025-09-21 06:59
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The recruitment targets individuals with expertise in various advanced technologies such as large models, multimodal models, and 3D target detection [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The compensation package includes resource sharing for job seeking, PhD recommendations, and study abroad opportunities, along with substantial cash incentives [5] - The company encourages potential partners to reach out via WeChat for collaboration inquiries, specifying the need to mention their organization or company [6]
开放几个自动驾驶技术交流群(世界模型/端到端/VLA)
自动驾驶之心· 2025-09-20 16:03
欢迎大家加入一起交流相关的内容。感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称 +方向加群。 自动驾驶之心技术交流群成立了,开学季&秋招期我们开放了几个技术交流群(世界模型/端到端/VLA等方 向)。 ...
人形机器人考察要点_市场展望、组件与具身人工智能-Humanoid Robot tour takeaways_ market outlook, components and embodied AI
2025-09-18 13:09
Summary of Conference Call Notes on Greater China Industrials (Humanoid Robots and Autonomous Driving) Industry Overview - The humanoid robot and autonomous driving (AD) sectors in China are expected to experience rapid expansion over the next decade, with significant growth anticipated in factory settings within 2-3 years and further opportunities in commercial and household applications in the long term [1][1] - The current bill of materials (BOM) cost for a fully-functional humanoid robot is approximately US$50-60k, with expectations for rapid cost reductions in the next five years due to improved product design and economies of scale [1][1] - Stricter regulations in the AD sector are anticipated to create more opportunities for AD components, particularly for LiDAR technology, which will benefit from new long-distance object detection requirements [1][1] Key Players and Developments Dobot - Dobot is a leading global collaborative robot (COBOT) brand, achieving a 47% year-over-year growth in 6-axis COBOT sales in the first half of 2025, indicating market share gains [8][8] - The company has entered the humanoid robot market, launching its first prototype in early 2025 and planning deployment in manufacturing and business scenarios [9][9] RoboSense - RoboSense is focusing on its new EMX LiDAR products, which offer superior precision and detection distance compared to competitors, with expectations to ship 600-700k units in 2025 and 1.5 million units in 2026 [10][10] - The company is also exploring opportunities in the lawn mower, unmanned delivery, and robotaxi industries, with significant partnerships established [11][11] Zhaowei Machinery & Electronics - Zhaowei has launched new dexterous hand models for humanoid robots and aims for a 10-15% global market share in this segment [12][12][13][13] - The BOM cost of the dexterous hand is estimated to account for 20-30% of the total BOM cost of a humanoid robot [13][13] Googol Technology - Googol Technology specializes in high-end control systems for advanced manufacturing and sees strong growth potential in humanoid robots due to its expertise in multi-degree-of-freedom (DoF) controlling [14][15] Minieye - Minieye is making progress with its smart driving solutions, including iPilot and iRobo, and anticipates significant growth in the penetration of front-view camera modules and driver monitoring systems due to new safety regulations [16][17] Leju Robotics - Leju targets to deliver over 1,000 units of robotics in 2025, focusing on stability and durability for large-scale applications [18][18] Orbbec - Orbbec is a leading player in robot vision systems, holding over 70% market share in 3D vision systems for service robots in China [21][21][22][22] UBTECH - UBTECH aims to ship 500 humanoid robots in 2025 and 2,000-3,000 units in 2026, with expectations for BOM cost reductions in the coming years [23][23][24][24] LK Tech - LK Tech is focusing on magnesium alloy technology for humanoid robots, which offers lightweighting and other advantages, and has signed cooperation agreements for R&D projects [25][26][26] Technology Insights - The competition between VLA (Vision-Language-Action) and world model technologies for embodied AI is highlighted, with data availability being a key bottleneck [3][3] - The vision system of humanoid robots is evolving, with depth cameras becoming the mainstream choice for enhancing sensing and navigation capabilities [22][22] Market Outlook - The humanoid robot market is expected to grow significantly, with projections of 3 million units shipped by 2030, leading to substantial opportunities for component suppliers [13][13] - The average selling price (ASP) of humanoid robots is expected to decline to approximately RMB150k (~US$20k) by 2026-2028 due to scale effects [20][20] Conclusion - The humanoid robot and AD sectors in Greater China are poised for significant growth, driven by technological advancements, regulatory changes, and increasing market demand. Key players are actively innovating and expanding their product offerings to capture market share in this rapidly evolving landscape.
小鹏&理想全力攻坚的VLA路线,到底都有哪些研究方向?
自动驾驶之心· 2025-09-17 23:33
Core Viewpoint - The article discusses the transition in intelligent driving technology from rule-driven to data-driven approaches, highlighting the limitations of end-to-end models in complex scenarios and the potential of VLA (Vision-Language Action) as a more streamlined solution [1][2]. Group 1: Challenges in Learning and Research - The technical stack for autonomous driving VLA has not yet converged, leading to a proliferation of algorithms and making it difficult for newcomers to enter the field [2]. - A lack of high-quality documentation and fragmented knowledge in various domains increases the entry barrier for beginners in autonomous driving VLA research [2]. Group 2: Course Development - A new course titled "Autonomous Driving VLA Practical Course" has been developed to address the challenges faced by learners, focusing on a comprehensive understanding of the VLA technical stack [3][4]. - The course aims to provide a one-stop opportunity to enhance knowledge across multiple fields, including visual perception, language modules, and action modules, while integrating cutting-edge technologies [2][3]. Group 3: Course Features - The course emphasizes quick entry into the subject matter through a Just-in-Time Learning approach, using simple language and case studies to help students grasp core technologies rapidly [3]. - It aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points to form their own research systems [4]. - Practical application is a key focus, with hands-on sessions designed to complete the theoretical-to-practical loop [5]. Group 4: Course Outline - The course covers the origins of autonomous driving VLA, foundational algorithms, and the differences between modular and integrated VLA [6][10][12]. - It includes practical sessions on dataset creation, model training, and performance enhancement, providing a comprehensive learning experience [12][14][16]. Group 5: Instructor Background - The instructors have extensive experience in multimodal perception, autonomous driving VLA, and large model frameworks, with numerous publications in top-tier conferences [22]. Group 6: Learning Outcomes - Upon completion, students are expected to thoroughly understand the current advancements in autonomous driving VLA and master core algorithms [23][24]. - The course is designed to benefit students in internships, job recruitment, and further academic pursuits in the field [26]. Group 7: Course Schedule - The course is set to begin on October 20, with a structured timeline for unlocking chapters and providing support through online Q&A sessions [27].
论文解读之港科PLUTO:首次超越Rule-Based的规划器!
自动驾驶之心· 2025-09-15 23:33
Core Viewpoint - The article discusses the development and features of the PLUTO model within the end-to-end autonomous driving domain, emphasizing its unique two-stage architecture and its direct encoding of structured perception outputs for downstream control tasks [1][2]. Summary by Sections Overview of PLUTO - PLUTO is characterized by its three main losses: regression loss, classification loss, and imitation learning loss, which collectively contribute to the model's performance [7]. - Additional auxiliary losses are incorporated to aid model convergence [9]. Course Introduction - The article introduces a new course titled "End-to-End and VLA Autonomous Driving," developed in collaboration with top algorithm experts from domestic leading manufacturers, aimed at addressing the challenges faced by learners in this rapidly evolving field [12][15]. Learning Challenges - The course addresses the difficulties learners face due to the fast-paced development of technology and the fragmented nature of knowledge across various domains, making it hard for beginners to grasp the necessary concepts [13]. Course Features - The course is designed to provide quick entry into the field, build a framework for research capabilities, and combine theory with practical applications [15][16][17]. Course Outline - The course consists of several chapters covering topics such as the history and evolution of end-to-end algorithms, background knowledge on various technologies, and detailed discussions on both one-stage and two-stage end-to-end methods [20][21][22][29]. Practical Application - The course includes practical assignments, such as RLHF fine-tuning, allowing students to apply their theoretical knowledge in real-world scenarios [31]. Instructor Background - The instructor, Jason, has a strong academic and practical background in cutting-edge algorithms related to end-to-end and large models, contributing to the course's credibility [32]. Target Audience and Expected Outcomes - The course is aimed at individuals with a foundational understanding of autonomous driving and related technologies, with the goal of elevating their skills to the level of an end-to-end autonomous driving algorithm engineer within a year [36].