VLA

Search documents
端到端VLA这薪资,让我心动了。。。
自动驾驶之心· 2025-07-10 12:40
Core Viewpoint - End-to-End Autonomous Driving (E2E) is the core algorithm for intelligent driving mass production, marking a new phase in the industry with significant advancements and competition following the recognition of UniAD at CVPR [2] Group 1: E2E Autonomous Driving Overview - E2E can be categorized into single-stage and two-stage approaches, directly modeling from sensor data to vehicle control information, thus avoiding error accumulation seen in modular methods [2] - The emergence of BEV perception has bridged gaps between modular methods, leading to a significant technological leap [2] - The rapid development of E2E has led to a surge in demand for VLM/VLA expertise, with potential salaries reaching millions annually [2] Group 2: Learning Challenges - The fast-paced evolution of E2E technology has made previous learning materials outdated, necessitating a comprehensive understanding of multi-modal large models, BEV perception, reinforcement learning, and more [3] - Beginners face challenges in synthesizing knowledge from numerous fragmented papers and transitioning from theory to practice due to a lack of high-quality documentation [3] Group 3: Course Development - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address learning challenges, focusing on Just-in-Time Learning to help students quickly grasp core technologies [4] - The course aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points [5] - Practical applications are integrated into the course to ensure a complete learning loop from theory to practice [6] Group 4: Course Structure - The course consists of multiple chapters covering the history and evolution of E2E algorithms, background knowledge, two-stage and one-stage E2E methods, and the latest advancements in VLA [8][9][10] - Key topics include the introduction of E2E algorithms, background knowledge on VLA, and practical applications of diffusion models and reinforcement learning [11][12] Group 5: Target Audience and Outcomes - The course is designed for individuals with a foundational understanding of autonomous driving and aims to elevate participants to a level comparable to one year of experience as an E2E algorithm engineer [19] - Participants will gain a deep understanding of key technologies such as BEV perception, multi-modal large models, and reinforcement learning, enabling them to apply learned concepts to real-world projects [19]
从25年顶会论文方向看后期研究热点是怎么样的?
自动驾驶之心· 2025-07-06 08:44
Core Insights - The article highlights the key research directions in computer vision and autonomous driving as presented at major conferences CVPR and ICCV, focusing on four main areas: general computer vision, autonomous driving, embodied intelligence, and 3D vision [2][3]. Group 1: Research Directions - In the field of computer vision and image processing, the main research topics include diffusion models, image quality assessment, semi-supervised learning, zero-shot learning, and open-world detection [3]. - Autonomous driving research is concentrated on end-to-end systems, closed-loop simulation, 3D ground segmentation (3DGS), multimodal large models, diffusion models, world models, and trajectory prediction [3]. - Embodied intelligence focuses on visual language navigation (VLA), zero-shot learning, robotic manipulation, end-to-end systems, sim-to-real transfer, and dexterous grasping [3]. - The 3D vision domain emphasizes point cloud completion, single-view reconstruction, 3D ground segmentation (3DGS), 3D matching, video compression, and Neural Radiance Fields (NeRF) [3]. Group 2: Research Support and Collaboration - The article offers support for various research needs in autonomous driving, including large models, VLA, end-to-end autonomous driving, 3DGS, BEV perception, target tracking, and multi-sensor fusion [4]. - In the embodied intelligence area, support is provided for VLA, visual language navigation, end-to-end systems, reinforcement learning, diffusion policy, sim-to-real, embodied interaction, and robotic decision-making [4]. - For 3D vision, the focus is on point cloud processing, 3DGS, and SLAM [4]. - General computer vision support includes diffusion models, image quality assessment, semi-supervised learning, and zero-shot learning [4].
四家具身智能公司齐聚,热钱与泡沫并存的万亿赛道谁能挺进决赛圈
Bei Ke Cai Jing· 2025-06-29 08:26
Core Insights - The embodied intelligence sector is experiencing unprecedented investment and interest, with discussions on whether there is a bubble and which applications will mature first [1][3] Investment Landscape - The current investment scale in embodied intelligence is significantly lower than that in the smart automotive sector, indicating potential for growth once scalable commercial applications are identified [3][4] - Companies believe that more capital is needed to bridge the financing gap between domestic and international players, with domestic leading companies operating at a scale of tens of billions of RMB compared to tens of billions of USD for their US counterparts [3][4] Market Applications - B-end applications are seen as the most suitable for initial deployment, particularly in areas like logistics, quality inspection, and manufacturing processes [6][7] - The industry is exploring various strategies, including the replacement of human labor in hard-to-fill positions, with a gradual expansion into more complex scenarios over the next few years [6][7] Technological Development - The VLA (Vision, Language, Action) model is considered a key framework for the future of robotics, with ongoing improvements in data collection and model training methodologies [7][8] - The industry is moving towards a unified model paradigm, emphasizing the importance of integrating visual, linguistic, and action capabilities in robotic systems [8] Competitive Landscape - The embodied intelligence sector is expected to evolve similarly to the smartphone and automotive industries, with a diverse range of players including hardware manufacturers and AI developers [9][10] - The market is anticipated to consolidate into a limited number of major players, with a focus on maintaining technological barriers and establishing closed-loop commercial applications [10][11]
北大卢宗青:现阶段世界模型和 VLA 都不触及本质|具身先锋十人谈
雷峰网· 2025-06-20 11:54
" 互联网视频数据是唯一可以 scale up 的道路 。 " 作者丨 郭海惟 编辑丨 陈彩娴 作为一名具身大脑的创业者,卢宗青有着金光闪闪的履历: 他是紧随 DeepMind之后,中国新生代的强化学习研究者。北京大学计算机学院长聘副教授,担任过智源 研究院多模态交互研究中心负责人,负责过首个国家自然科学基金委原创探索计划通用智能体项目,还同 时在NeurIPS、ICLR、ICML等机器学习的国际顶级会议担任领域主席。 早在 2023年,他旗下团队便有利用多模态模型研究通用 Agent 的研究尝试,让 Agent 玩《荒野大镖客 2》和办公,使其成为第一个从零开始在AAA级游戏中完成具体任务的 LLM 智能体。相关论文几经波折, 今年终于被 ICML 2025 录用。不过他自述对那份研究其实不够满意,因为"泛化性不足"。 当完成那些研究以后,卢宗青意识到 "当前的多模态模型缺乏与世界交互的能力"。因为模型缺少学习物 理交互的数据,所以 我们看到的那些泛化的能力本质都是 "抽象"的,它终究无法理解动作和世界的关 系,自然也无法预测世界 。 这如今成为他想在具身智能创业的起点:开发一个通用的具身人工智能模型。 卢 ...
对话灵初智能CEO王启斌:让机器人进工厂有意义,让机器人学会打麻将也有意义
Sou Hu Cai Jing· 2025-06-11 08:47
Core Viewpoint - The article discusses the advancements in embodied intelligence, particularly focusing on Lingchu Intelligent's development of the Psi R1 model, which enables robots to perform complex tasks in dynamic environments, such as playing Mahjong with humans [3][6][17]. Company Overview - Lingchu Intelligent was founded in 2024 by a team with extensive experience in robotics and artificial intelligence, including CEO Wang Qibin, who has a background in product management, and other notable figures from Stanford University and the robotics field [5][6]. - The company has established a joint laboratory with Peking University to enhance its research capabilities in embodied intelligence [5]. Technology and Innovation - The Psi R1 model represents a significant advancement in robot capabilities, allowing for "action perception-environment feedback-dynamic decision-making" in a closed-loop system [3][6]. - The transition from Vision Language Models (VLM) to Vision Language Action Models (VLA) is highlighted, with VLA enabling robots to understand and execute physical actions based on visual and textual information [7][14]. - The company aims to address the challenges of long-range operations in semi-open environments, which are crucial for practical applications in logistics and retail [8][14]. Market Position and Strategy - Lingchu Intelligent positions itself as a provider of stable and cost-effective robotic solutions, focusing on practical applications rather than superficial demonstrations [5][10]. - The company has plans to deliver products to overseas logistics clients within six months, indicating a clear market strategy [7][21]. - The target market includes manufacturing processes and logistics operations, with a focus on tasks such as material inspection and handling [21]. Financial Outlook - The company anticipates achieving sales of several hundred million by the end of 2026, reflecting a strong growth trajectory [22]. - Pricing strategies are designed to be competitive, aiming to keep robot costs below two years' worth of labor costs for similar positions [23]. Industry Trends - There is a growing expectation from investors for clear commercialization pathways in the field of embodied intelligence, contrasting with previous years [8][25]. - The article notes that while there is significant investment in the sector, the focus is shifting towards sustainable and viable technological advancements [25][26].
银河通用创始人王鹤:做好VLA,将见证具身智能第一次真正高峰的到来
Mei Ri Jing Ji Xin Wen· 2025-06-06 15:28
GALBOT G1正在抓取商品 图片来源:每经记者 李宇彤 摄 每经记者|李宇彤 每经编辑|马子卿 "我觉得今天我们谈具身智能,它有一个当下的目标,就是我们一定要推动具身智能的产业化。"6月6日,在"2025智源大会"上,北京银河通用机器人有限公 司(以下简称"银河通用")的创始人兼CTO(首席技术官)王鹤在会上如是说道。 而银河通用的轮式双臂机器人GALBOT G1也亮相现场。演示环节中,GALBOT G1在听到指令后,开始准确地从现场搭建的商品摆放密集的货架上,抓取 对应的物品。 银河通用创始人兼CTO 王鹤 图片来源:每经记者 李宇彤 摄 2023年5月,银河通用在北京海淀创立,公司专注研发人形机器人硬件和具身智能大模型。在过去一年多时间里就完成了超12亿元融资,投资方既包括美团 战投、北汽产投、商汤国香基金等战略及产业投资方,也包括启明创投、蓝驰创投、IDG资本等明星机构。 6月1日,银河通用正式推出自主研发的产品级端到端导航大模型TrackVLA。这是一款具备纯视觉环境感知、语言指令驱动、可自主推理、具备零样本 (Zero-Shot)泛化能力的具身大模型。 在银河通用发布的演示短片中,机器狗在大模型 ...
2025中国高阶智能辅助驾驶最新技术洞察:算力跃迁、数据闭环、VLA与世界模型
EqualOcean· 2025-06-05 05:42
Investment Rating - The report does not explicitly state an investment rating for the industry Core Insights - The report highlights the evolution of advanced driver assistance systems (ADAS) in China, focusing on the expansion of operational design domains (ODD), technological equity, safety concerns, and supportive policies [4][21][23] - It emphasizes the need for algorithm, data, and computing power upgrades to address safety shortcomings in high-level ADAS technologies [23][66] - The report discusses the transition from modular to end-to-end architectures in vehicle algorithms, aiming for human-like driving capabilities [66][68] Summary by Sections 1. Market Background - The expansion of high-level ADAS ODD is noted, with a focus on technological inclusivity and addressing accident anxiety through safety redundancies [4][21] - Policy support is highlighted as crucial for rational promotion of ADAS technologies [4][21] 2. Technology Insights - The report decodes the underlying logic of data, algorithms, and computing power in high-level ADAS [4][28] - It discusses the computing power landscape, noting the shift towards higher TOPS (trillions of operations per second) capabilities in vehicle and cloud computing [42][44] - Data challenges, including collection and positioning technologies, are identified as critical areas for development [4][28] 3. Competitive Analysis - The competitive landscape is analyzed, detailing the tiered structure of companies and their development strategies [29][30] - The report outlines various collaboration models among automotive manufacturers and technology providers, emphasizing the balance between self-research and external sourcing [83] 4. Trend Insights - The report notes the commercialization progress of passenger vehicle L3 systems, indicating a growing market for advanced ADAS [31][32] - It highlights the importance of continuous upgrades and iterations in ADAS functionalities to meet evolving consumer expectations and safety standards [82][83]
AI 如何成为理想一号工程
晚点LatePost· 2025-05-23 07:41
Core Viewpoint - The article discusses Li Auto's strategic focus on artificial intelligence (AI) and its evolution from a vehicle-centric AI assistant to a multi-platform intelligent application, emphasizing the importance of AI in future competitiveness [4][5][6]. Group 1: Strategic Meetings and AI Prioritization - Li Auto holds biannual closed-door strategy meetings to discuss future directions, with significant participation from top executives and industry leaders [3]. - Following a strategic meeting, Li Auto adjusted its AI-related business priorities, emphasizing the strategic importance of intelligent driving over other AI applications [4][5]. - The company aims to become a global leader in AI by 2030, with a clear focus on enhancing its AI capabilities and applications [5][6]. Group 2: Development of AI Capabilities - Li Auto has transitioned its AI assistant, "Li Xiang," from a vehicle-only application to a multi-platform tool, including mobile and web applications [7]. - The company has invested in self-developed algorithms, achieving a full switch to in-house technology for its AI functionalities by March 2023 [7][8]. - The introduction of the multi-modal cognitive model, Mind GPT 1.0, marks a significant advancement in Li Auto's AI capabilities [7]. Group 3: Intelligent Driving and Technological Advancements - Li Auto's intelligent driving system, AD Max, was launched to address product shortcomings and enhance competitive positioning in the market [10][11]. - The company has initiated a large-scale recruitment drive for its intelligent driving team, reflecting its commitment to advancing this technology [10]. - The shift towards an "end-to-end" model for intelligent driving aims to streamline processes and improve system performance through better data utilization [10][11]. Group 4: Organizational Changes and AI Integration - Li Auto established an AI Technical Committee to integrate AI capabilities across various business lines, enhancing collaboration and execution [15][16]. - The committee includes leaders from key departments, ensuring that AI is a core focus in strategic decision-making [16][17]. - The company aims to develop a foundational model that serves as a core capability for all AI projects, positioning itself as a leader in the automotive AI landscape [17].
AI 如何成为理想一号工程
晚点Auto· 2025-05-22 07:16
Core Viewpoint - The article discusses Li Auto's strategic shift towards AI and intelligent driving, emphasizing the importance of AI in the company's long-term competitiveness and product development [3][10][12]. Group 1: AI Strategy and Development - Li Auto held a strategic meeting in October 2022, where the priority of AI-related business was adjusted, emphasizing the strategic importance of intelligent driving [3][5]. - The company aims to become a global leader in AI by 2030, with significant investments in AI talent and technology [5][10]. - Li Auto's AI assistant, "Li Xiang," has evolved from a car-mounted system to a multi-platform application, indicating a broader vision for AI applications beyond the vehicle [7][8]. Group 2: Intelligent Driving Focus - Intelligent driving was designated as the company's primary strategy in 2023, with plans to heavily invest in this area to compete with major players like Huawei [10][12]. - The company has expanded its intelligent driving team significantly, with over 50 new positions created in late 2023, reflecting a strong commitment to this technology [10][11]. - Li Auto is transitioning its intelligent driving technology from a modular approach to an "end-to-end" model, which is expected to enhance performance and user experience [11][12]. Group 3: Organizational Changes and AI Integration - An AI Technical Committee was established to integrate AI capabilities across various business lines, indicating a strategic focus on AI as a core business direction [14][15]. - The committee includes leaders from product development and research departments, ensuring that AI applications are aligned with the company's overall strategy [15][16]. - Li Auto's foundational model for AI is seen as a critical component for future developments, with aspirations to rank among the top three in the industry [17][18].
TransDiffuser: 理想VLA diffusion出轨迹的架构
理想TOP2· 2025-05-18 13:08
Core Viewpoint - The article discusses the advancements in the field of autonomous driving, particularly focusing on the Diffusion model and its application in generating driving trajectories, highlighting the differences between VLM and VLA systems [1][4]. Group 1: Diffusion Model Explanation - Diffusion is a generative model that learns data distribution through a process of adding noise (Forward Process) and removing noise (Reverse Process), akin to a reverse puzzle [4]. - The model's denoising process involves training a neural network to predict and remove noise, ultimately generating target data [4]. - Diffusion not only generates the vehicle's trajectory but also predicts the trajectories of other vehicles and pedestrians, enhancing decision-making in complex traffic environments [5]. Group 2: VLM and VLA Systems - VLM consists of two systems: System 1 mimics learning to output trajectories without semantic understanding, while System 2 has semantic understanding but only provides suggestions [2]. - VLA is a single system with both fast and slow thinking capabilities, inherently possessing semantic reasoning [2]. - The output of VLA is action tokens that encode the vehicle's driving behavior and surrounding environment, which are then decoded into driving trajectories using the Diffusion model [4][5]. Group 3: TransDiffuser Architecture - TransDiffuser is an end-to-end trajectory generation model that integrates multi-modal perception information to produce high-quality, diverse trajectories [6][7]. - The architecture includes a Scene Encoder for processing multi-modal data and a Denoising Decoder that utilizes the DDPM framework for trajectory generation [7][9]. - The model employs a multi-head cross-attention mechanism to fuse scene and motion features during the denoising process [9]. Group 4: Performance and Innovations - The model achieves a Predictive Driver Model Score (PDMS) of 94.85, outperforming existing methods [11]. - Key innovations include anchor-free trajectory generation and a multi-modal representation decorrelation optimization mechanism to enhance trajectory diversity and reduce redundancy [11][12]. Group 5: Limitations and Future Directions - The authors note challenges in fine-tuning the model, particularly the perception encoder [13]. - Future directions involve integrating reinforcement learning and referencing models like OpenVLA for further advancements [13].