Workflow
VLA
icon
Search documents
VLA都上车了,还不知道研究方向???
自动驾驶之心· 2025-08-16 16:04
Core Viewpoint - The article discusses the advancements of the Li Auto VLA driver model, highlighting its enhanced capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3]. Summary by Sections VLA Model Capabilities - The VLA model has improved in three main areas: better semantic understanding through multimodal input, enhanced reasoning abilities via thinking chains, and closer alignment with human driving intuition through trajectory planning [1]. - Four core capabilities of the VLA model are showcased: spatial understanding, reasoning, communication and memory, and behavioral capabilities [1][3]. Development and Research Trends - The VLA model has evolved from VLM+E2E, incorporating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5]. - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for research [5]. VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, with many students eager for a second session. The program aims to help participants systematically grasp key theoretical knowledge and develop their own research ideas [6]. - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11]. Enrollment and Course Structure - The program is limited to 6-8 participants per session, targeting students at various academic levels interested in VLA and autonomous driving [12]. - Participants will gain insights into classic and cutting-edge papers, coding implementations, and methods for selecting research topics and writing papers [13][14]. Course Highlights - The course emphasizes a comprehensive learning experience with a "2+1" teaching model, involving main instructors and experienced research assistants to support students throughout the program [22]. - Students will receive guidance on coding, research ideas, and writing methodologies, culminating in the production of a research paper draft [31][32]. Required Skills and Resources - Participants are expected to have a foundational understanding of deep learning, basic programming skills in Python, and familiarity with PyTorch [19]. - The program encourages the use of high-performance computing resources, ideally with multiple GPUs, to facilitate research and experimentation [19]. Conclusion - The VLA model represents a significant advancement in autonomous driving technology, with ongoing research and educational initiatives aimed at fostering innovation in this field [1][5][31].
VLA与自动驾驶科研论文辅导第二期来啦~
自动驾驶之心· 2025-08-16 12:00
Core Insights - The article discusses the recent advancements in the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3]. Group 1: VLA Model Capabilities - The VLA model's enhancements focus on four core abilities: spatial understanding, reasoning, communication and memory, and behavioral capabilities [1]. - The reasoning and communication abilities are derived from language models, with memory capabilities utilizing RAG [3]. Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, incorporating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5]. - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting towards large models and VLA, indicating a wealth of subfields still open for research [5]. Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6]. - The program includes a structured 12-week online group research course followed by 2 weeks of paper guidance and a 10-week maintenance period for paper development [14][34]. Group 4: Course Structure and Content - The course covers various topics over 14 weeks, including traditional end-to-end autonomous driving, VLA end-to-end models, and writing methodologies for research papers [9][11][35]. - Participants will gain insights into classic and cutting-edge papers, coding skills, and methods for writing and submitting research papers [20][34]. Group 5: Enrollment and Requirements - The program is limited to 6-8 participants per session, targeting individuals with a background in deep learning and basic knowledge of autonomous driving algorithms [12][15]. - Participants are expected to have a foundational understanding of Python and PyTorch, with access to high-performance computing resources recommended [21].
VLA/强化学习/VLN方向的论文辅导招募!
具身智能之心· 2025-08-14 12:00
Group 1 - The article announces the availability of 1v1 paper guidance in the field of embodied intelligence, specifically offering three slots focused on vla, reinforcement learning, and sim2real directions, primarily targeting A and B conferences [1] - Major conferences mentioned include CVPR, ICCV, ECCV, ICLR, CoRL, ICML, and ICRA, indicating the relevance of the guidance to prominent events in the academic community [2] - Interested individuals are encouraged to add a specific WeChat contact for inquiries or to scan a QR code for consultation regarding the embodied paper guidance [3]
自动驾驶VLA论文指导班第二期来啦,名额有限...
自动驾驶之心· 2025-08-14 06:49
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A second session of the VLA research paper guidance program is being launched, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6][31] - The program includes a structured curriculum over 12 weeks of online group research, followed by 2 weeks of paper guidance and a 10-week maintenance period for paper development [14][31] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants, focusing on those pursuing master's or doctoral degrees in VLA and autonomous driving, as well as professionals in the AI field seeking to enhance their algorithmic knowledge [12][13] - Participants are expected to have a foundational understanding of deep learning, basic programming skills in Python, and familiarity with PyTorch [19][20] Group 5: Course Outcomes - Participants will gain insights into classic and cutting-edge papers, coding implementations, and methodologies for selecting research topics, conducting experiments, and writing papers [14][31] - The program aims to produce a draft of a research paper, enhancing participants' academic profiles for further studies or employment opportunities [14][31]
正式开课!端到端与VLA自动驾驶小班课,优惠今日截止~
自动驾驶之心· 2025-08-13 23:33
Core Viewpoint - The article emphasizes the significance of VLA (Vision-Language Alignment) as a new milestone in the mass production of autonomous driving technology, highlighting the progressive development from E2E (End-to-End) to VLA, and the growing interest from professionals in transitioning to this field [1][11]. Course Overview - The course titled "End-to-End and VLA Autonomous Driving Small Class" aims to provide in-depth knowledge of E2E and VLA algorithms, addressing the challenges faced by individuals looking to transition into this area [1][12]. - The curriculum is designed to cover various aspects of autonomous driving technology, including foundational knowledge, advanced models, and practical applications [5][15]. Course Structure - **Chapter 1**: Introduction to End-to-End Algorithms, covering the historical development and the transition from modular to end-to-end approaches, including the advantages and challenges of each paradigm [17]. - **Chapter 2**: Background knowledge on E2E technology stacks, focusing on key areas such as VLA, diffusion models, and reinforcement learning, which are crucial for future job interviews [18]. - **Chapter 3**: Exploration of two-stage end-to-end methods, discussing notable algorithms and their advantages compared to one-stage methods [18]. - **Chapter 4**: In-depth analysis of one-stage end-to-end methods, including various subfields like perception-based and world model-based approaches, culminating in the latest VLA techniques [19]. - **Chapter 5**: Practical assignment focusing on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, providing hands-on experience with pre-training and reinforcement learning modules [21]. Target Audience and Learning Outcomes - The course is aimed at individuals with a foundational understanding of autonomous driving and related technologies, such as transformer models and reinforcement learning [28]. - Upon completion, participants are expected to achieve a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, mastering various methodologies and being able to apply learned concepts to real-world projects [28].
传统感知逐渐被嫌弃,VLA已经上车了?!
自动驾驶之心· 2025-08-13 06:04
Core Viewpoint - The article discusses the launch of the Li Auto i8, which is the first model equipped with the VLA driver model, highlighting its advancements in understanding semantics, reasoning, and human-like driving intuition [2][7]. Summary by Sections VLA Driver Model Capabilities - The VLA model enhances four core capabilities: spatial understanding, reasoning ability, communication and memory, and behavioral ability [2]. - It can comprehend natural language commands during driving, set specific speeds based on past memories, and navigate complex road conditions while avoiding obstacles [5]. Industry Trends and Educational Initiatives - The VLA model represents a new milestone in the mass production of autonomous driving technology, prompting many professionals from traditional fields to seek transition into VLA-related roles [7]. - The article introduces a new course titled "End-to-End and VLA Autonomous Driving," designed to help individuals transition into this field by providing in-depth knowledge and practical skills [21][22]. Course Structure and Content - The course covers various topics, including end-to-end background knowledge, large language models, BEV perception, diffusion model theory, and reinforcement learning [12][26]. - It aims to build a comprehensive understanding of the research landscape in autonomous driving, focusing on both theoretical and practical applications [22][23]. Job Market and Salary Insights - The demand for VLA/VLM algorithm experts is high, with salary ranges for positions such as VLA model quantization deployment engineers and VLM algorithm engineers varying from 40K to 120K [15]. - The course is tailored for individuals looking to enhance their skills or transition into the autonomous driving sector, emphasizing the importance of mastering multiple technical domains [19][41].
车企、科技企业VLA研发进展
Group 1: Li Auto - Li Auto's i8 features the VLA "driver model," marking a significant advancement in intelligent driving following the previous VLM introduction [1] - The VLA model includes a newly designed spatial encoder that utilizes language models and logical reasoning to provide driving decisions, predicting trajectories of other vehicles and pedestrians through a diffusion model [1] - The inference frame rate of the VLA is approximately 10 Hz, more than tripling the previous VLM's rate of 3 Hz [1] Group 2: XPeng Motors - XPeng G7 officially commenced deliveries on July 7, with a clear timeline for the Ultra version's VLA and VLM software updates [2] - The VLA software OTA update is scheduled for September 2025, with VLM software upgrades following in November 2025, and personalized recommendations by December 2025 [2] - The XPeng G7 Ultra version is equipped with three self-developed Turing AI chips, boasting a total computing power of 2250 TOPS, positioning it as a leader among mass-produced models [2] Group 3: Chery Automobile - Chery plans to introduce the VLA and world model technology into fuel vehicles by 2025 through its Falcon 900 intelligent driving system, aiming to set a new benchmark for "oil-electric intelligence" [3] - The Falcon 900 system utilizes a self-developed VLA model that integrates visual perception, language understanding, and action execution [3] - The model has been trained on 20 million kilometers of real-world data, capable of understanding over 5000 traffic scenarios, achieving a 92% accuracy rate in recognizing non-standard traffic signals in complex urban conditions, a 37% improvement over traditional systems [3] Group 4: Geely Automobile - Geely is actively developing VLA technology, integrating it with world models to create a comprehensive world model system [4] - The Qianli Haohan system features a "dual end-to-end model" design, enabling a multi-modal VLA general scene model and an end-to-end model to back each other up [4] - This system is powered by dual NVIDIA Thor chips, with a total computing power of 1400 TOPS and over 40 perception units capable of detecting objects 0.75 meters in size from 300 meters away [4] Group 5: Yuanrong Qihang - Yuanrong Qihang is also investing in the VLA model, with five models expected to feature it by the third quarter of this year [5] - The company was among the earliest to publicly announce its VLA development in June of last year [5] - The VLA model focuses on defensive driving with four core functions: spatial semantic understanding, recognition of irregular obstacles, comprehension of text-based guide signs, and voice control of the vehicle, which will be gradually released with mass production [5]
VLA还是VTLA?这家企业用“超人类触觉”技术颠覆机器人未来!
具身智能之心· 2025-08-13 00:04
Core Insights - The article highlights significant advancements in hardware and technology for robotics, particularly in tactile sensing, which is crucial for precise physical interactions in various applications [1][3][10] - Daimon Robotics has achieved a breakthrough in tactile sensor technology, addressing key issues such as resolution, real-time performance, and durability, which are critical for the industry's growth [2][9] Group 1: Technology Advancements - The VLA model (Visual-Language-Action) is a focus for many companies, but there are limitations in physical interaction capabilities, necessitating the integration of tactile sensing to enhance performance [1] - Daimon Robotics has developed a new high-resolution visual-tactile sensing technology that captures minute optical changes, enabling robots to possess human-like tactile perception [4][10] - The DM-Tac W sensor, a pioneering product, features 40,000 sensing units per square centimeter, significantly surpassing human capabilities and traditional sensors [4][9] Group 2: Product Development - Daimon Robotics has introduced the DM-Hand1, a dexterous robotic hand that integrates ultra-thin visual-tactile sensors, enhancing flexibility and precision in tasks such as delicate handling and assembly [6] - The company showcased its products at the World Robot Conference (WRC), demonstrating their practical applications and attracting significant interest from attendees [8] Group 3: Market Position and Future Outlook - Daimon Robotics has successfully completed a significant financing round, raising hundreds of millions, which will be used to further develop and commercialize its tactile sensing technologies [3][10] - The company has transitioned from prototype development to large-scale production, achieving certifications and passing extensive durability tests, positioning itself for commercial success in the tactile sensing market [9][10]
具身智能之心技术交流群成立了!
具身智能之心· 2025-08-11 06:01
Group 1 - The establishment of a technical exchange group focused on embodied intelligence technologies, including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the joining process, it is recommended to include the organization/school, name, and research direction in the remarks [3]
对话千寻智能高阳:科学家创业不太「靠谱」,但创业就像一场游戏
36氪· 2025-08-08 09:28
智能涌现 . 直击AI新时代下涌现的产业革命。36氪旗下账号。 具身智能创业,要做苹果,而不是安卓。 文 | 邱晓芬 编辑 | 苏建勋 来源| 智能涌现(ID:AIEmergence) 封面来源 | 视觉中国 不管是刚刚结束的WAIC(世界人工智能大会),还是本周要开幕的WRC(世界机器人大会),如何在展会上识别一个机器人的真正实力? 具身智能公司"千寻智能"的联合创始人高阳,提供了这样几个tips: 以下文章来源于智能涌现 ,作者邱晓芬 对于号称能叠衣服的机器人,你可以尝试把衣服团成一团,随意丢在桌上,观察它是否能继续完成动作;或者是再给它裤子、外套,看它能否具备跨品类 的泛化能力; 在机器人操作时,可以观察其动作是否足够丝滑流畅,而不是一卡一卡,这代表了思维和动作的协调性…… 给我们提出指引的高阳,是当前具身智能领域炙手可热的创业者之一——从美国加州大学伯克利分校博士毕业后,他选择回国成为清华大学交叉信息研究 院助理教授。 2023年,他又与前珞石机器人CTO韩峰涛一起,创办了具身智能公司千寻智能——韩峰涛硬件经验丰富,过往操盘过数万台机器人量产出货,高阳则有 AI的研究基础,学术和产业界的搭配,使得千寻 ...