Workflow
端到端
icon
Search documents
VLA:有人喊“最强解法”,有人说“跑不动”
3 6 Ke· 2025-09-11 08:17
Core Viewpoint - The intelligent driving industry is at a critical juncture with the emergence of VLA (Vision-Language-Action) technology, leading to a division among key players regarding its potential and implementation [1][2][3]. Group 1: VLA Technology and Its Implications - VLA is seen as a potential solution to the limitations of end-to-end systems in intelligent driving, which can only address about 90% of the challenges [6][10]. - The introduction of language as a bridge in the VLA model aims to enhance the system's understanding and decision-making capabilities, allowing for more complex and nuanced driving actions [12][14][18]. - VLA is believed to improve three key areas: understanding dynamic traffic signals, enabling natural voice interactions, and enhancing risk prediction capabilities [19][20][21]. Group 2: Challenges and Criticisms of VLA - Despite the potential advantages, VLA faces significant challenges, including the need for substantial financial investment and the technical difficulties of aligning multimodal data [31][32]. - Critics argue that VLA may not be necessary for achieving higher levels of autonomous driving, with some suggesting it is more of a supplementary enhancement rather than a fundamental solution [35][36]. - The current limitations of existing intelligent driving chips hinder the effective deployment of VLA models, raising concerns about their practical application in real-world scenarios [31][32]. Group 3: Industry Perspectives and Strategies - Companies like Li Auto, Yuanrong, and Xiaopeng are betting on VLA, emphasizing high investment and computational intensity to pursue its development [41][42]. - In contrast, players like Huawei and Horizon are focusing on structural solutions and world models, arguing that these approaches may offer more reliable paths to achieving advanced autonomous driving [43][46]. - The ongoing debate over VLA reflects broader strategic choices within the industry, with companies prioritizing different technological pathways based on their resources and market positioning [47].
VLA之外,具身+VA工作汇总
具身智能之心· 2025-07-14 02:21
Core Insights - The article focuses on advancements in embodied intelligence and robotic manipulation, highlighting various research projects and methodologies aimed at improving robotic capabilities in real-world applications [2][3][4]. Group 1: 2025 Research Initiatives - Numerous projects are outlined for 2025, including "Steering Your Diffusion Policy with Latent Space Reinforcement Learning" and "Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation," which aim to enhance robotic manipulation through advanced learning techniques [2][3]. - The "BEHAVIOR Robot Suite" is designed to streamline real-world whole-body manipulation for everyday household activities, indicating a focus on practical applications of robotics [2]. - "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations" emphasizes the potential for efficient learning methods in robotic training [2][3]. Group 2: Methodologies and Techniques - The article discusses various methodologies such as "Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning" and "Learning the RoPEs: Better 2D and 3D Position Encodings with STRING," which aim to improve the adaptability and efficiency of robotic systems [2][3][4]. - "RoboGrasp: A Universal Grasping Policy for Robust Robotic Control" highlights the development of a versatile grasping policy that can be applied across different robotic platforms [2][3]. - "Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion" showcases advancements in fine motor skills for robots, crucial for complex tasks [4]. Group 3: Future Directions - The research emphasizes the importance of integrating visual and tactile feedback in robotic systems, as seen in projects like "Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation" [7]. - "Zero-Shot Visual Generalization in Robot Manipulation" indicates a trend towards developing robots that can generalize learned skills to new, unseen scenarios without additional training [7]. - The focus on "Human-to-Robot Data Augmentation for Robot Pre-training from Videos" suggests a shift towards leveraging human demonstrations to enhance robotic learning processes [7].