VLA模型

Search documents
2025世界机器人大会闭幕 四大趋势勾勒机器人产业新图景
Shen Zhen Shang Bao· 2025-08-12 22:52
Core Insights - The 2025 World Robot Conference (WRC) in Beijing marked a significant shift in the robotics industry from showcasing technology to practical applications, with a focus on commercial viability and real-world scenarios [1] Group 1: Trends in Robotics - Trend 1: Increased product density and comprehensive supply chains, with over 200 companies showcasing more than 1,500 exhibits, including over 100 new products, nearly double from last year [2] - Trend 2: Robots are transitioning from mere demonstrations to practical applications in factories, with many humanoid robots now capable of performing complex tasks in simulated real-world environments [3] - Trend 3: A price war is emerging, with humanoid robot prices dropping significantly, such as the starting price of the Yushun R1 at 39,900 yuan, but low prices do not equate to low capabilities [4][5] Group 2: Technological Innovations - Trend 4: The VLA (Vision-Language-Action) model is gaining traction, enabling robots to understand and interact with their environment more effectively, as demonstrated by the Galbot in a supermarket setting [6]
聊模型的王兴兴
3 6 Ke· 2025-08-12 08:05
Core Insights - The founder of Yushu Technology, Wang Xingxing, challenges the perception that the company solely focuses on robot hardware, emphasizing the importance of models, algorithms, and data in robotics [1][2] - Wang expresses skepticism towards the current VLA (Vision-Language-Action) approach, arguing that the existing data quality and quantity are insufficient for effective real-world interaction [1][2] - Yushu is exploring video-driven models for robotics, which Wang believes may develop faster and have a higher convergence probability than the VLA approach [3] Group 1: Model and Algorithm Focus - Yushu's model team is relatively large compared to its size, but still smaller than major AI companies, indicating a cautious yet significant investment in model development [2] - Wang believes that the number of personnel in model development does not directly correlate with the quality of outcomes, suggesting that smaller teams can also innovate effectively [2] - The company is not entirely dismissing the VLA model but is cautious about over-relying on data accumulation for training [2] Group 2: Robotics Application and Future Vision - Current public perception may suggest that Yushu's robots are primarily for entertainment, but internally, the focus is on developing robots capable of practical tasks [5][6] - Wang argues that achieving practical applications for robots in factories and homes is currently unrealistic, and performance demonstrations are more feasible [6] - The vision for future robotics includes multifunctional capabilities rather than single-task operations, with a potential timeline of 2-5 years for achieving a "ChatGPT moment" in robotics [7][8] Group 3: Computational Needs - Wang anticipates the need for low-cost, large-scale, distributed computing clusters in the robotics field to address computational challenges [4] - He suggests that factories with multiple robots could benefit from establishing distributed server clusters to reduce communication latency [4]
WRC 2025聚焦(2):人形机器人临近“CHATGPT时刻” 模型架构成核心突破口
Xin Lang Cai Jing· 2025-08-12 06:33
Core Insights - The humanoid robot industry is on the brink of a "ChatGPT moment," with significant breakthroughs expected within 1-2 years driven by policy and demand [1] - The average growth rate for domestic humanoid robot manufacturers and component suppliers is projected to be between 50-100% in the first half of 2025 [1] - The main challenge in the industry is not hardware but the architecture of embodied intelligent AI models, with the VLA model having inherent limitations [1][4] Short-term Outlook (1-2 years) - The domestic market is expected to maintain rapid growth due to policy subsidies and the expansion of application scenarios, with high visibility of orders for complete machines and core components [2] - Key players like Tesla and Figure AI could accelerate global supply chain division and standardization once they achieve mass production [2] Mid-term Outlook (2-5 years) - The integration of end-to-end embodied intelligent models with world models and RL Scaling Law could become the mainstream architecture, facilitating the transition from prototype to large-scale commercialization [2] - Distributed computing is anticipated to become a critical supporting infrastructure, collaborating with 5G/6G and edge computing providers [2] - Investment opportunities include hardware manufacturers entering the mass production phase, AI companies with video generation world model capabilities, and distributed computing centers and edge cloud service providers [2] Long-term Outlook (5+ years) - If end-to-end embodied intelligence and low-latency distributed computing are realized, the market for household and industrial humanoid robots could expand rapidly, potentially reaching annual shipment volumes in the millions [2] - The focus of competition is expected to shift from technological breakthroughs to cost control and ecosystem development [2] Hardware Status - Current humanoid robot hardware can meet most application needs, although optimization is still required in mass production and engineering [3] AI Model Challenges - The VLA model is considered a "foolproof architecture" but struggles with real-world interactions due to insufficient data, and its effectiveness remains limited even after reinforcement learning training [4] - The video generation/world model approach is seen as more promising, allowing for task simulation before real-world application, which may lead to faster convergence [4] RL Scaling Law - Current reinforcement learning training lacks transferability, requiring new tasks to be trained from scratch, which is inefficient [5] - Achieving a scaling law similar to that of language models could significantly accelerate the learning speed of new skills [5] Distributed Computing Trends - Humanoid robots are limited by size and power consumption, with onboard computing equivalent to a few smartphones [6] - Future developments will rely on localized distributed servers to reduce latency, ensure safety, and lower the cost of individual computing units [6]
对话星动纪元陈建宇:世界模型是VLA的一个路径,未来5年家庭机器人会爆发
Tai Mei Ti A P P· 2025-08-12 02:00
Core Insights - The future trend in AI technology is the development of general humanoid robots, which will significantly enhance productivity and social service capabilities [2][4] - The VLA model is a broader concept that encompasses various applications of visual perception, language, and actions in robotics, with the world model being a pathway within this framework [3][4] Company Overview - Star Motion Era was established in August 2023 as an incubated project from Tsinghua University's Institute for Interdisciplinary Information Research, focusing on creating general intelligent agents in the physical world [5] - The company has completed three rounds of financing within two years, raising nearly 500 million yuan in Series A funding led by Dinghui VGC and Haier Capital [5] Product Development - Star Motion Era is developing embodied intelligent robots, integrating a general brain and ontology, with the VLA model ERA-42 unifying functions like vision, understanding, prediction, and action into an end-to-end model [5][6] - The company has introduced the Star Motion L7, a full-size bipedal humanoid robot, and the Star Motion Q5, designed for service industries, showcasing capabilities in logistics and daily tasks [6] Market Potential - The next five years are anticipated to be a breakthrough period for household robots, with simpler forms entering homes and high-net-worth individuals potentially using more advanced humanoid robots [4][9] - The humanoid robot's ultimate application is expected to be in households, although initial deployments will focus on B2B scenarios to refine technology and data accumulation [9][10] Industry Insights - Current intelligent robots achieve about 70% efficiency compared to humans, with projections to reach 90% in the coming year, indicating significant advancements in software and hardware [8] - The industry has not yet reached a "bubble" phase, as valuations have not matched those of sectors like smart vehicles, with a potential for a capital explosion once leading companies achieve scalable commercial applications [8]
「宇树科技」王兴兴:推进合规、稳健的上市流程,VLA是一个相对傻瓜式的架构
Robot猎场备忘录· 2025-08-12 00:03
Core Viewpoints - The humanoid robot industry is currently in a stage where technology is not yet mature enough for large-scale, complex tasks, but the annual shipment of humanoid robots is expected to double, with potential breakthroughs leading to significant increases in output in the next 2-3 years [4][5][6] - The competition in the humanoid robot sector extends beyond products and markets to include founder interviews and public speaking engagements [4] - The race to complete an IPO is critical for companies like Yushu Technology and Zhiyuan Robotics, as being the first to go public can provide substantial funding support [5][6] Industry Insights - Hardware for humanoid robots is currently adequate but requires further improvement for larger scale, lower cost, and higher reliability [7] - The biggest challenge in the humanoid robot sector is the AI model rather than data, with a need for better model architecture to enhance performance [7] - The commercial viability of humanoid robots is questioned, as many companies focus on entertainment rather than practical applications [10][11] Company Strategies - Yushu Technology focuses on educational and research applications, while Zhiyuan Robotics and others emphasize strong AI capabilities [10][11] - The commercial logic for Yushu Technology involves leveraging impressive robotic performances and low pricing to quickly secure orders, but sustainability remains a concern [10][15] - The software-focused companies often announce high revenue figures but lack transparency regarding order numbers and actual product deliveries [11] Market Dynamics - The humanoid robot market is characterized by a divide between "hardware-focused" companies like Yushu Technology and "software-focused" companies like Zhiyuan Robotics, leading to different commercialization strategies [10][12] - The current trend shows that many humanoid robot startups are struggling with effective commercialization and face challenges in scaling production and real-world application [12][15] - The industry is witnessing a shift towards self-developed foundational models, with leading startups like Figure AI taking the lead [13]
一套搞定VLA研发!“腾讯系”人形机器人创企再迎重大技术突破,推开通用机器人大门!
Robot猎场备忘录· 2025-08-08 09:33
Core Viewpoint - The article highlights the significant technological advancements made by the humanoid robot startup, Stardust Intelligence, particularly with its self-developed AI system DuoCore and the launch of the first full-body mobile operation model DuoCore-WB, which enhances the practical application of humanoid robots in real-world scenarios [2][3]. Group 1: Technological Breakthroughs - Stardust Intelligence's DuoCore system has achieved a major update, enabling robots to possess a dual intelligence mode that combines instinctive responses with deep thinking, allowing for intelligent planning and operation in complex environments [3]. - The DuoCore system employs a highly anthropomorphic knowledge transfer mechanism, improving learning efficiency and enabling the transfer of skills across different scenarios without starting from scratch [4]. - The DuoCore-WB model utilizes a simplified imitation learning framework, allowing robots to learn complex tasks with minimal high-quality demonstrations, achieving an average task success rate of 80% in challenging household tasks [16][24]. Group 2: Product Overview - The Astribot Suite is a comprehensive robot learning kit that includes a high-performance robot platform (Astribot S1), an intuitive remote operation scheme, and an efficient full-body operation strategy [8]. - The Astribot S1 robot is designed for general tasks, featuring a unique rope-driven design that mimics human muscle tissue, allowing for flexible and precise movements [11]. - The S1 robot has impressive specifications, including a single-arm freedom of 7 degrees, a maximum speed exceeding 10 m/s, and a load capacity of 10 kg, surpassing typical adult male capabilities [13]. Group 3: Market Position and Future Prospects - Stardust Intelligence aims to become a leading AI robot assistant provider, with a vision to enable billions of people to have AI robot assistants, focusing on human-machine coexistence and collaboration [25]. - The company has completed five rounds of financing, with the latest round raising several hundred million yuan, indicating strong investor confidence, particularly from major tech firms [31][32]. - The company is actively pursuing commercialization, having announced the pre-sale of the Astribot S1 and collaborating with leading universities and enterprises for practical applications [33].
成功率提高57%,VLA+RL最新!CO-RFT:实现VLA模型的高效微调(北航&清华等)
具身智能之心· 2025-08-07 00:03
Core Insights - The article discusses the development of a new reinforcement learning framework called Chunked RL, specifically designed for fine-tuning Vision-Language-Action (VLA) models, which show great potential in real-world robotic control [4][8]. - The proposed CO-RFT algorithm demonstrates significant improvements over traditional supervised fine-tuning methods, achieving a 57% increase in success rate and a 22.3% reduction in cycle time in real-world environments [4][29]. Section Summaries Introduction - VLA models integrate perception and language understanding for embodied control, showing promise in developing general strategies for real-world robotic control [6]. - The challenges faced in fine-tuning VLA models primarily stem from the dependency on the quality and quantity of task-specific data, which limits generalization to out-of-distribution (OOD) scenarios [6][7]. Methodology - The article introduces Chunked RL, a novel reinforcement learning framework that incorporates action chunking to enhance sample efficiency and stability, particularly suited for VLA models [8][12]. - The CO-RFT algorithm consists of two phases: imitation learning for initializing the backbone network and policy, followed by offline RL with action chunking to optimize the pre-trained policy [16][18]. Experimental Analysis - The experiments were conducted on a robotic platform with six dexterous manipulation tasks, evaluating the performance of the CO-RFT algorithm against traditional methods [20][23]. - Results indicate that CO-RFT significantly outperforms supervised fine-tuning (SFT), achieving a 57% increase in success rate and a 22.3% decrease in average cycle time across various tasks [29][30]. Position Generalization - CO-RFT exhibits strong position generalization capabilities, achieving a 44.3% success rate in previously unseen locations, outperforming SFT by 38% in OOD scenarios [4][29]. Importance of Data Diversity - Data diversity plays a crucial role in the performance of CO-RFT, with models trained on diverse datasets showing significantly better generalization capabilities compared to those trained on fixed datasets [32][33].
VLA-OS:NUS邵林团队探究机器人VLA做任务推理的秘密
具身智能之心· 2025-08-01 16:02
Core Viewpoint - The article discusses a groundbreaking research study by a team from the National University of Singapore, focusing on the VLA-OS framework, which systematically analyzes and dissects task planning and reasoning in Vision-Language-Action (VLA) models, aiming to provide insights for the next generation of general-purpose robotic VLA models [2][4]. Group 1: VLA-OS Overview - VLA-OS is a structured framework that includes a clear codebase, multimodal task planning datasets, and standardized training processes for VLA models [4][5]. - The framework aims to unify various VLA paradigms and facilitate controlled experiments to identify effective task planning representations and paradigms [19][20]. Group 2: VLA Model Paradigms - The article outlines two main approaches for integrating task reasoning into VLA models: Integrated-VLA, which combines task planning and policy learning, and Hierarchical-VLA, which separates these functions into different models [10][12]. - Current VLA models exhibit significant variability in architecture, training methods, and task planning representations, complicating performance assessments [13][15]. Group 3: Experimental Findings - The research identifies 14 key findings from over 100 experiments, highlighting the advantages of visual planning representations over language-based ones and the superior performance of Hierarchical-VLA compared to Integrated-VLA [34][35]. - Findings indicate that Integrated-VLA benefits from implicit task planning, while Hierarchical-VLA demonstrates better generalization capabilities [51][52]. Group 4: Recommendations for Future Research - The article suggests prioritizing visual representation planning and goal image planning, with language planning as a supplementary approach [68]. - It emphasizes the importance of task planning pre-training and the need for efficient training mechanisms to avoid gradient conflicts between planning and action outputs [73].
VLA+强化学习,会催生更强大的系统!
具身智能之心· 2025-07-31 00:04
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing tasks through visual language models and diverse datasets [5][10][11]. Group 1: RT-2 and Its Capabilities - RT-2 is introduced as a foundational robot model that can process visual questions and execute tasks based on language instructions, showcasing the potential of remote-accessible robotic models [5][7]. - The model's ability to convert robot control tasks into question-answer formats allows it to perform various basic language instructions effectively [7][8]. Group 2: RT-X Dataset and Its Impact - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, providing a diverse training ground for robotic models [10]. - Models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of cross-embodiment models [11]. Group 3: Evolution of VLA Models - The first-generation VLA model, RT-2, is noted for its simplicity, while the second-generation models utilize continuous action distributions for improved performance in complex tasks [14][15]. - The second-generation VLA models incorporate specialized mechanisms for generating continuous actions, enhancing their control capabilities [17][18]. Group 4: π0 and π0.5 Models - The π0 model, based on a large language model with 3 billion parameters, is designed to handle various tasks, including folding clothes, demonstrating its adaptability in different environments [18][23]. - The latest π0.5 model is aimed at executing long-term tasks in new environments, integrating high-level reasoning capabilities to manage complex instructions [28][30]. Group 5: Future Directions and Reinforcement Learning - Future VLA models are expected to integrate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [34][39]. - The combination of VLA and DLA (Deep Learning Architecture) is proposed to create a more effective system, leveraging expert data to improve generalist capabilities [44][46].
汽车视点丨32.18万元起!理想首款纯电SUV上市,大模型能否筑起“护城河”?
Xin Hua Cai Jing· 2025-07-30 07:59
新华财经上海7月30日电(李一帆)7月29日晚,理想汽车首款纯电SUV理想i8正式上市,指导价32.18万元至36.98万元,相比全系预售价格降低4至5万元。 理想i8能否帮助理想汽车正式打开纯电市场,扭转2025年以来销量低迷的态势,成为业内外关注焦点。 配置不及预期,资本市场反应平淡 2025年是理想汽车成立10周年。过去10年里,理想汽车收获了136万车主用户,开辟了增程细分市场,并凭借"冰箱彩电大沙发"的创新配置成为佼佼者,领 跑一众新势力品牌。 然而,进入2025年,随着鸿蒙智行系列车型在增程领域销量节节攀升,理想汽车的增程红利不再明显。 2025年上半年,理想汽车累计交付新车20.39万辆,同比增长7.91%,但增速明显放缓,仅完成全年64万辆销量目标的31.87%。其中,6月交付3.63万辆,同 比下降24.1%,环比下降11.20%。 湘财证券分析师汪炜认为,这反映出理想汽车增程技术优势减弱、产品吸引力下降及销售体系调整带来的短期扰动。 因此,理想i8作为理想汽车首款纯电SUV,被视为理想发力纯电的转型之作,也成为理想众多新技术的"集大成者"。 发布会上,理想汽车创始人、董事长兼CEO李想为i ...