VLA
Search documents
从技术路线到人员更迭,为什么智能驾驶又开始了“新造词”? | 电厂
Xin Lang Cai Jing· 2025-11-19 10:20
Core Insights - The automotive and smart driving industry is experiencing rapid technological iterations, leading to new terminologies and concepts that challenge user understanding and acceptance [1] - The transition from rule-based systems to end-to-end and world model architectures is reshaping the industry, with significant implications for company strategies and personnel [2][6] Group 1: Technological Evolution - The shift from rule-based to end-to-end systems has highlighted the limitations of modular approaches, particularly in terms of latency and information loss [2] - Tesla's introduction of the end-to-end FSD V12 has sparked interest among other companies like Huawei, Xpeng, and NIO, who are also developing similar solutions [2][5] - The industry is moving towards VLA (vision-language-action) models, which aim to better understand the physical world and improve driving actions [8][12] Group 2: Challenges in Implementation - Current systems, whether rule-based or end-to-end, rely heavily on passive learning from vast amounts of driving data, which limits their ability to adapt to new scenarios [5][6] - The VLA model faces challenges such as multi-modal feature alignment and the inherent limitations of language models in processing complex real-world situations [11][15] - Companies like Ideal Auto and Xpeng are exploring innovative VLA approaches to enhance their systems' capabilities and efficiency [8][12] Group 3: Organizational Adjustments - The transition to new technological routes has led to significant organizational restructuring within companies like Xpeng, Ideal Auto, and NIO, reflecting a shift in focus towards foundational models [13][14] - Xpeng's leadership changes indicate a strategic pivot from traditional VLA to innovative VLA, emphasizing the need for a robust foundational model [14] - NIO and Ideal Auto have also undergone multiple organizational adjustments to align their resources with the evolving technological landscape [15][17] Group 4: Competitive Landscape - The trend of self-research in autonomous driving technology is shifting towards partnerships with specialized suppliers, as seen with companies like Chery and Great Wall [18][19] - Suppliers are gaining an edge in flexibility and rapid iteration capabilities compared to traditional automakers, which face constraints in their development processes [21] - The competition is intensifying, with suppliers expected to play a more dominant role in the market as they advance their solutions [18][22]
从投稿来看,具身方向的论文已经出现了堆积.......
具身智能之心· 2025-11-18 10:00
Core Insights - The article discusses the increasing number of submissions to various conferences and the concerns of researchers regarding the suitability of different conferences and the preferences of reviewers [1] - It highlights the active research directions in embodied intelligence, including VLN, VLA, reinforcement learning, and real2sim2real, and provides guidance for newcomers on how to choose their research focus [1][3] - The article promotes a customized paper mentoring service aimed at helping researchers navigate the complexities of paper writing and submission [3][4][5] Group 1 - The article notes that many researchers are anxious about selecting the right conference and understanding which research directions are favored by reviewers [1] - It emphasizes that humanoid robots are particularly active in reinforcement learning and sim2real/real2sim2real research, suggesting that labs with relevant embodiments should explore these areas [1] - It mentions that mechanical arm embodiments are suitable for VLA, VLA+RL, and diffusion policy research, with a high computational power requirement for VLA [1] Group 2 - The article states that quadrupedal robots are also suitable for reinforcement learning research, although there may be fewer innovative points due to prior extensive work in this area [2] - It suggests that combining VLN and VLA with mobile manipulation could be a promising research direction [3] - The article introduces a paper mentoring service that offers one-on-one customized guidance across various top-tier conference topics, emphasizing the importance of having a good idea and navigating potential pitfalls for new researchers [3][4] Group 3 - The mentoring service covers a full process from topic innovation to experimental design, code debugging, paper writing, and submission strategy, aimed at producing high-quality results quickly [4] - It highlights the dual perspective of both industrial and academic value, focusing not only on publishing papers but also on practical applications [5] - The article offers a free matching service for the first ten inquiries, allowing researchers to have in-depth meetings with mentors based on their research direction and academic background [6]
从蹒跚学步到模特步,人形机器人大模型做了什么
新财富· 2025-11-18 08:06
Group 1 - The core viewpoint of the article highlights the advancements in humanoid robots, particularly the release of various models like Figure03, 1X Neo, and others, despite the delay of Tesla's Optimus Gen3 until 2026 [2] - The article emphasizes the significant improvement in the movement capabilities of humanoid robots, evolving from awkward movements to more natural and graceful actions, largely due to the development of humanoid robot large models [2] - The article discusses the transition from Large Language Models (LLM) to Vision-Language Models (VLM) and finally to Vision-Language-Action Models (VLA), which integrate perception, understanding, and action in a unified framework [6][8] Group 2 - Google DeepMind introduced VLA with RT-2, which enhances robotic control by integrating visual and language information with action tokens, achieving a success rate improvement from 32% to 62% compared to its predecessor RT-1 [10] - Tesla's Optimus leverages its Full Self-Driving (FSD) model, transitioning to an end-to-end approach that simplifies input complexity while managing a vast amount of data for training [13][15] - NVIDIA's GR00T N1 model represents a comprehensive approach to humanoid robotics, combining hardware, software, and ecosystem development, emphasizing the importance of virtual environments for data collection and training [19][22] Group 3 - The article mentions that various startups are utilizing NVIDIA's large models and Cosmos for their robotic solutions, highlighting the competitive landscape in the humanoid robotics sector [24] - Wang Xingxing expresses skepticism about the VLA architecture, pointing out the inadequacy of existing data quality and quantity for effective real-world interaction, suggesting a need for better model architecture [26][27]
小鹏刘先明:VLA 2.0的「涌现」过程极其突然......
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses the emergence of advanced technologies in autonomous driving and robotics, particularly focusing on Xiaopeng Motors' developments in VLA (Vision-Language Architecture) and humanoid robots [5][10][28]. Group 1: Technological Advancements - Xiaopeng Motors has invested significantly in computational power, utilizing 30,000 cards and spending over 2 billion in training costs, leading to a breakthrough in their technology [7]. - The emergence of capabilities in their second-generation VLA and humanoid robot IRON was unexpected, with previous months of failures suddenly giving way to significant progress [5][8]. - The core logic of the second-generation VLA is to eliminate the translation from vision to language, enhancing efficiency and enabling self-supervised learning [10][19]. Group 2: Challenges and Solutions - The transition from structured text data to continuous video signals presents challenges, including information loss and the need for real-time feedback from the physical world [14][15][17]. - Xiaopeng's approach simplifies the training process by removing complex steps, allowing for direct input from multimodal data and output as physical actions [20][22]. - The company is focused on optimizing local deployment solutions to achieve low latency and high frame rates, ensuring real-time performance on their hardware [24]. Group 3: Robotics Development - Xiaopeng's robotics team is closely collaborating with the automotive division, emphasizing in-house development to reduce costs and accelerate iteration [28][29]. - The humanoid robot IRON has shown significant improvements in movement, achieving a human-like gait through innovative design and control systems [36][39]. - The development of a universal generative controller allows the robot to perform complex movements, such as Tai Chi, by directly inputting recorded trajectories [46]. Group 4: Future Prospects - The company envisions a future where robots can establish deeper emotional connections with humans, potentially personalizing their designs to meet individual preferences [48]. - The advancements in robotics and autonomous driving are expected to lead to sudden breakthroughs, similar to those seen in the automotive sector [32].
VLA方向,招募几个辅导的同学~
具身智能之心· 2025-11-12 04:00
Group 1 - The company is recruiting 3 students for VLA direction paper guidance, ensuring quality with limited spots [1] - The main research directions include VLA models, lightweight solutions, VLA combined with tactile feedback, VLA with world models, and VLA with reinforcement learning [1] Group 2 - The company has already submitted several papers for conferences, hoping for positive outcomes [1] - Students interested in guidance can contact the assistant via WeChat with a specific note [2]
VLA方向,想再带几个同学冲一下具身的A会......
具身智能之心· 2025-11-10 10:00
Group 1 - The article highlights the urgency for students to prepare for upcoming conferences after CVPR, indicating a competitive academic environment in the field of embodied intelligence [2] - The organization is recruiting three students for guidance in the VLA (Vision-Language Alignment) direction, emphasizing the importance of quality in mentorship [2] - Key research areas for the VLA direction include VLA models, lightweight models, VLA combined with tactile feedback, VLA with world models, and VLA with reinforcement learning [2]
从转型和研究来看,什么方向更适合第一篇论文?
具身智能之心· 2025-11-06 11:47
Group 1 - The article discusses suitable research directions for publishing papers, particularly in the fields of embodied intelligence, including vln, vla, reinforcement learning, and real2sim2real [1] - For researchers currently engaged in SLAM, vln and vla are recommended as good entry points, especially for those with robotic arms [1] - The article emphasizes the importance of having a good idea for research, noting that new researchers may need to navigate various challenges to develop innovative concepts [1] Group 2 - A new paper guidance service has been launched, offering customized one-on-one mentoring in various advanced topics such as multimodal large models, VLA, reinforcement learning, and more [2] - The mentoring team consists of PhD holders and researchers from top universities and companies, providing comprehensive support from topic selection to publication strategy [2] - The service aims to bridge the gap between academia and industry, focusing not only on paper publication but also on practical application value [3] Group 3 - The article promotes a free matching service for the first ten inquiries, allowing students to have in-depth meetings with mentors based on their research direction and academic background [5]
卷至底价,年销2万台,机器人4S店开进商场
3 6 Ke· 2025-11-05 10:35
Core Insights - The integration of AI and robotics is transforming daily life, with embodied intelligence expected to become commonplace, leading to a future where various robots assist in household tasks and privacy is maintained through local data processing [1][4][8] Industry Overview - The global robotics market is experiencing a significant shift, with humanoid robot sales projected to increase from a few hundred units last year to an estimated 20,000 units this year, indicating the onset of a price war as companies compete to capture market share [1][2] - The current pricing of humanoid robots is nearing raw material costs, which is straining innovative companies and limiting their ability to explore new applications [2][4] Market Dynamics - The manufacturing sector currently has a low penetration rate of robots, with only 4% of employees utilizing approximately 400 robots per 10,000 workers, highlighting the potential for growth in this area [2] - The emergence of robot 4S stores in cities like Shenzhen and Beijing signifies a growing consumer market for robots, with options for rental, purchase, and customization [5][7] Technological Challenges - The industry faces a dual challenge of data and standardization, with two main paths for development: the VLA (Vision-Language-Action) approach, which relies heavily on pre-collected data, and a small-sample high-generalization method [9][11] - The lack of standardized data collection methods and the reluctance of companies to share data are significant barriers to progress in the robotics sector [11][12] Future Prospects - The potential for humanoid robots to become commonplace in households is anticipated within the next decade, particularly in applications such as elder care and assistance with hazardous tasks [7][8] - The establishment of standardized data platforms, such as the "Pavilion X Embodied Intelligence Standardized Data Set Platform," aims to address the challenges of data collection and standardization, facilitating better integration of robotics into various industries [12][13]
当还在纠结研究方向的时候!别的同学已经CCF-A了......
具身智能之心· 2025-11-04 00:05
Group 1 - The article introduces a new research guidance service focused on embodied intelligence, addressing common challenges faced by newcomers in selecting research topics and methodologies [1][2] - The guidance covers various advanced topics such as multimodal large models, reinforcement learning, and robot simulation, providing tailored one-on-one support [2][3] - The service is backed by a team of experienced mentors from prestigious institutions and leading companies, ensuring high-quality assistance throughout the research process [2][3] Group 2 - The program emphasizes a dual perspective from both industry and academia, aiming not only for publication but also for practical application and value [3] - An introductory offer is available for the first ten inquiries, allowing students to receive personalized mentorship and tailored advice on suitable conferences and journals [4]
詹锟兼任理想美国硅谷研发中心负责人并将直播讨论世界模型与VLA
理想TOP2· 2025-11-03 07:33
Core Viewpoint - The article discusses the advancements in Tesla's FSD v14 and explores the potential of VLA (Vehicle Language Architecture) in defining the next generation of autonomous driving solutions, comparing it with WA (World Model Architecture) [1]. Group 1: Technology Discussion - The article highlights the exploration of world models and the future development direction of VLA, questioning the possibility of a unified approach [3]. - It emphasizes the high demand for data and computing power, which is making it increasingly difficult for academia to participate in the intelligent driving sector, while also considering what opportunities remain for academic involvement [3]. Group 2: Expert Insights - The article features insights from various experts in the field, including a senior director from Li Auto's VLA team, a senior algorithm scientist from Bosch, and a parking team leader from Changan Automobile, indicating a diverse range of perspectives on the topic [4]. - The discussion is moderated by a professor from Shanghai Jiao Tong University, showcasing the academic interest in the advancements of autonomous driving technologies [6].