VLA(视觉语言模型)
Search documents
中兴通讯崔丽:AI应用触及产业深水区,价值闭环走向完备
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-30 10:25
Core Insights - The rapid development of AI large models is becoming a key factor in the new round of technological competition, with a belief that the number of foundational large models will converge to a single-digit figure, while numerous specialized models and applications will emerge across various industries [2] - Physical AI is highlighted as a significant area of focus, accelerating advancements in fields like embodied intelligence and autonomous driving, which are expected to profoundly change societal operations [2][3] - The transition from generative models to world models and visual language models (VLA) represents a paradigm shift in AI, moving from mere prediction to simulation and physical alignment [3][4] Industry Trends - The emergence of Sora has sparked discussions about world models, indicating a shift in AI capabilities from being mere predictors to becoming simulators [3] - The divergence in world model approaches has led to the classification of models into "generative" and "representational" camps, with each having distinct applications and strengths [4][5] - The integration of VLA and world models is seen as a trend, with VLA focusing on sequence modeling for robot control and world models emphasizing internal environmental modeling for efficient learning [5] Challenges and Solutions - Three major challenges remain for world models: understanding causality, building effective simulators, and addressing data scarcity issues [6] - The competition for high-quality synthetic data is crucial for the next phase of AI development, particularly in data-driven AI applications like autonomous driving [6] - The timeline for the realization of world models is projected to span from 2024-2025 for visual simulation to 2028-2030 for general embodied intelligence [6] Technological Evolution - The network architecture is evolving from "cloud-native" to "AI-native," necessitating a focus on performance and collaboration between computing and networking [7] - ZTE has been progressively advancing its hardware and software integration from 2G to 5G, now incorporating large models into its development paradigm [8] - The integration of AI into core business processes is expected to transform industries, with a shift from content generation to autonomous action [9] Implementation and Applications - ZTE's "Co-Sight Intelligent Agent Factory" aims to enhance reasoning capabilities and ensure decision-making reliability through advanced verification mechanisms [11][12] - The successful application of AI requires a combination of robust infrastructure, effective methodologies, and deep industry engagement [17] - Industries such as education, healthcare, software development, and smart manufacturing are identified as likely candidates for early AI value realization due to their structured data environments and feedback mechanisms [14][13] Future Directions - The hybrid approach of "cloud-edge collaboration" is recommended for integrating general foundational models with industry-specific enhancements [15] - The need for specialized models in non-natural language data scenarios is emphasized, particularly in high-stakes environments like finance [16] - The overarching narrative of AI is shifting towards practical applications in various sectors, moving away from mere technological showcases to tangible value creation [18]
读了 40 篇 VLA+RL之后......
具身智能之心· 2025-11-28 00:04
Core Insights - The article discusses the shift in research trends towards incorporating Reinforcement Learning (RL) in Visual Language Models (VLA), moving beyond Supervised Fine-Tuning (SFT) to enhance model performance and adaptability [1][2]. Group 1: RL Methodologies - Various RL methodologies are categorized, including online RL, offline RL, iterative RL, and inference-time improvement, but the author emphasizes that the effectiveness of these methods is more important than their classification [1]. - The real-world applicability of RL is crucial, with safety and efficiency being key concerns during data collection and model deployment [2]. Group 2: Task Performance and Challenges - Current RL implementations show promising results in single-task performance, with examples like Pi-star-0.6 requiring around 1,000 trajectories for complex tasks such as folding clothes [3]. - A significant challenge remains in enabling RL to handle multiple tasks effectively, ensuring that tasks can positively influence each other rather than detract from overall performance [3]. Group 3: Reward Functions and Research Directions - The necessity of learning reward functions or value functions is debated, with the potential for reduced variance in optimization being a key benefit, although this need may diminish as pre-trained VLA models improve [4][5]. - Research directions are identified, focusing on issues related to sparse rewards, the scale of policy networks, and the multi-task capabilities of RL [5]. Group 4: Literature and Keywords - A list of relevant literature and keywords is provided for further exploration, indicating a rich field of study within RL and VLA [6].
楼天城:VLA帮不了L4
自动驾驶之心· 2025-11-15 16:04
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the transition from Level 2 (L2) to Level 4 (L4) autonomous vehicles, emphasizing the complexity and safety challenges involved in achieving L4 autonomy [5][19][21]. Group 1: Technological Advancements - PonyWorld, a world model technology, enhances the safety of Robotaxi, making it ten times safer than human drivers [9]. - The cost of the autonomous driving kit has decreased by 70% compared to previous generations, with all components now being vehicle-grade [8][30]. - The integration of perception, prediction, and control into an end-to-end model has been achieved, which is now standard for L4 vehicles and a requirement for L2 vehicles [15][16]. Group 2: Learning Models - The article highlights two learning modes: imitation learning, which is quick but limits the learner's potential, and reinforcement learning, which allows for exploration and surpassing the teacher [12]. - L4 companies are evolving through reinforcement learning, while L2 remains within the bounds of imitation learning [12][21]. Group 3: Market and Product Development - The transition to L4 technology for personal vehicles is expected to take longer than anticipated, with significant operational and regulatory challenges still to be addressed [22]. - The Robotaxi fleet has accumulated over 500,000 hours of operation, indicating a significant step towards practical deployment [29]. - The company aims to achieve cost reduction through vehicle-grade components and eliminating the need for human drivers, marking a significant milestone in the development of autonomous vehicles [33]. Group 4: Industry Perspectives - The article discusses the limitations of Vision-Language Models (VLA) in L4 applications, suggesting that specialized models are necessary for the extreme safety requirements of autonomous driving [17]. - The author compares the current state of embodied intelligence to the state of autonomous driving in 2018, indicating a similar need for patience and long-term development [26].