VLA模型

Search documents
PI联合创始人,机器人大神!详解VLA+强化学习,催生更强大的系统
具身智能之心· 2025-07-30 06:03
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing complex tasks through improved data sets and model architectures [6][12][44]. Group 1: RT-2 and RT-X Models - RT-2 is introduced as a foundational robot model that utilizes a visual language model to process image-based commands and execute tasks [8][10]. - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, showcasing a diverse range of robotic capabilities [13][26]. - Cross-embodiment models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of generalization in robotic learning [13][29]. Group 2: Evolution of VLA Models - The first generation of VLA models, like RT-2, is based on simple question-answer structures for robot control, while the second generation incorporates continuous action distributions for better performance [16][19]. - The second generation VLA models, such as π0, utilize a large language model with an action expert module to handle complex tasks, generating action sequences over time [22][24]. - The π0.5 model is designed for long-term tasks, integrating high-level reasoning to execute complex instructions in new environments [36][40]. Group 3: Integration of Reinforcement Learning - Future VLA models are expected to incorporate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [44][49]. - The integration of reinforcement learning with VLA aims to create a more effective training process, allowing robots to learn from both expert data and real-world interactions [56][60]. - Current research is focused on developing stable and effective end-to-end training processes that leverage reinforcement learning to improve VLA capabilities [60].
国产人形机器人硬件+应用加速落地
2025-07-14 00:36
Summary of the Conference Call on the Domestic Humanoid Robot Industry Industry Overview - The domestic humanoid robot industry is accelerating deployment, with significant investments in companies like Zhiyuan and Yushui totaling 124 million yuan, indicating a growing market demand for humanoid robot applications [1][2] - The humanoid robot supply chain is steadily advancing, with over 80 domestic companies, primarily startups from universities, focusing on application scenarios such as logistics, household chores, inspection, and textiles [1][3][4] Key Developments - Zhiyuan and Yushui won a procurement project for humanoid and biped robots from China Mobile Hangzhou, with a total contract value of 124 million yuan, highlighting the rapid deployment of robots in the domestic market [2] - Tiangong Walker's standard version is priced at approximately 300,000 yuan, with expected production and orders exceeding 1,000 units in 2025 [2] Application Scenarios - The application of humanoid robots in inspection, logistics, and textiles is promising, with robots capable of replacing human labor in high-risk tasks such as high-altitude inspections, thereby improving safety [3][10][11] - In the logistics sector, humanoid robots are expected to collaborate with unmanned logistics vehicles to achieve automation in factories, enhancing efficiency and reducing human error [12][14] Company Highlights - UBTECH showcased the Walker S Two, featuring a replaceable battery and has begun small-scale industrial orders, indicating high market acceptance [5] - Yushui demonstrated advanced motion control capabilities, including climbing and dancing, with its products achieving world-leading standards [6] - Zhiyuan introduced multiple commercial products and is actively collecting data to iterate on technology, planning to gather 500,000 data points weekly for comprehensive deployment [7] Competitive Landscape - Domestic companies are making significant progress in VRA and VLA model development, establishing a data commonality layer and collaborating with partners to build resource platforms [8] - The domestic humanoid robot supply chain is outperforming international competitors in terms of application depth and capital expenditure, with a focus on practical applications [9] Future Prospects - The future of humanoid robots in the textile industry is promising, as they can replace manual operations in labor-intensive tasks, with advancements in technology allowing for better handling of flexible materials [16] - The overall market for humanoid robots is expected to grow, with increasing applications in various sectors, including logistics and inspection, as companies continue to innovate and improve their products [10][17]
EmbodyX最新!VOTE:集成投票&优化加速VLA模型的通用框架,吞吐量加速35倍!
具身智能之心· 2025-07-13 09:48
Core Insights - The article discusses the limitations of existing VLA models in generalizing to new objects and unfamiliar environments, prompting the development of a more efficient action prediction method called VOTE [4][6][9]. Group 1: Background and Motivation - The challenge of creating a universal robotic strategy that can handle diverse tasks and real-world interactions has been a core focus in robotics research [6]. - VLA models have shown excellent performance in familiar environments but struggle with generalization in unseen scenarios, leading to the exploration of methods to enhance robustness [7][8]. Group 2: VOTE Methodology - VOTE is introduced as a lightweight VLA model that optimizes trajectory using an ensemble voting strategy, significantly improving inference speed and reducing computational costs [9][14]. - The model eliminates the need for additional visual modules and diffusion techniques, relying solely on the VLM backbone and introducing a special token <ACT> to streamline action prediction [9][18]. - The action sampling technique employs an ensemble voting mechanism to enhance model performance by aggregating predictions from previous steps, thus improving stability and robustness [22][23]. Group 3: Performance and Evaluation - Experimental results indicate that VOTE achieves state-of-the-art performance, with a 20% increase in average success rate on the LIBERO task suite and a 3% improvement over CogACT on the SimplerEnv WidowX robot [9][28]. - The model demonstrates a 35-fold increase in throughput on edge devices like NVIDIA Jetson Orin, showcasing its efficiency for real-time applications [9][31]. - VOTE's performance is superior to existing models, achieving a throughput of 42Hz on edge platforms while maintaining minimal memory overhead [31][32].
VLA 推理新范式!一致性模型 CEED-VLA 实现四倍加速!
机器之心· 2025-07-13 04:58
Core Viewpoint - The article discusses the advancements in Vision-Language-Action (VLA) models, particularly focusing on the CEED-VLA model, which significantly improves inference speed while maintaining high task success rates in robotic applications [2][8][24]. Group 1: VLA Model Overview - VLA models have become a crucial research direction in robotics due to their strong multimodal understanding and generalization capabilities [2]. - Despite advancements, VLA models face significant inference speed bottlenecks, especially in high-frequency and precise tasks [2]. Group 2: Proposed Solutions - The article introduces a consistency distillation training strategy that allows the model to predict multiple correct action tokens simultaneously, enhancing decoding speed [4]. - A mixed-label supervision mechanism is designed to mitigate potential error accumulation during the distillation process [4][9]. - An early-exit decoding strategy is proposed to address inefficiencies in Jacobi decoding, allowing for improved average inference efficiency by relaxing convergence conditions [5][10]. Group 3: Experimental Results - The proposed methods achieved over 4 times inference acceleration across multiple baseline models while maintaining high task success rates in both simulated and real-world robotic tasks [8][18]. - The CEED-VLA model demonstrated a significant increase in manipulation task success rates, exceeding 70%, due to enhanced inference speed and control frequency [24].
VLA爆发!从美国RT-2到中国FiS-VLA,机器人的终极进化
具身智能之心· 2025-07-09 14:38
Core Viewpoint - The article emphasizes the rapid evolution and significance of Vision-Language-Action (VLA) models in the field of embodied intelligence, highlighting their potential to revolutionize human-robot interaction and the robotics industry as a whole [4][6][17]. Group 1: VLA Model Development - VLA models are becoming the core driving force in embodied intelligence, gaining traction among researchers and companies globally [7][8]. - Google recently released the first offline VLA model, enabling robots to perform tasks without internet connectivity [9]. - The emergence of the Fast-in-Slow (FiS-VLA) model in China represents a significant advancement, integrating fast and slow systems to enhance robotic control efficiency and reasoning capabilities [10][12]. Group 2: Academic and Industry Trends - There has been an explosive growth in academic papers related to VLA, with 1,390 papers published this year alone, accounting for nearly half of all related research [14]. - The VLA technology is facilitating the transition of robots from laboratory settings to real-world applications, indicating its vast potential [16][17]. Group 3: Key Innovations and Breakthroughs - The RT-2 model from Google marked a pivotal moment in VLA development, introducing a unified model architecture that integrates visual, language, and action modalities [38][40]. - The RoboMamba model, developed in China, significantly improved efficiency and reasoning capabilities in VLA models, achieving a threefold increase in inference speed compared to mainstream models [52][48]. - OpenVLA, another significant model, demonstrated superior performance in various tasks while being more efficient than previous models, achieving a 16.5% higher success rate than RT-2 [57][58]. Group 4: Future Directions and Implications - The introduction of the π series models aims to enhance VLA's generalization capabilities, allowing robots to perform complex tasks with minimal training [62][70]. - The FiS-VLA model represents a breakthrough in real-time control, achieving an 11% improvement in success rates in real environments compared to existing methods [114]. - The advancements in VLA technology are paving the way for robots to operate effectively in diverse environments, marking a significant step towards achieving Artificial General Intelligence (AGI) [127][123].
智能网联汽车ETF(159872)政策与技术共振,车联网基建+高阶自动驾驶双主线凸显
Xin Lang Cai Jing· 2025-06-17 02:25
Group 1 - The smart connected vehicle ETF (159872.SZ) remained stable with a 0.00% increase, while its associated index, CS Vehicle Networking (930725.CSI), rose by 0.15% [1] - Major constituent stocks such as SAIC Motor Corporation increased by 0.63%, Wanma Technology by 5.39%, and Qianfang Technology by 1.36%, indicating positive market sentiment [1] - A meeting held by the trading association on June 16 focused on supporting high-quality development in the automotive sector, with representatives from nine major automakers discussing financing needs and optimization suggestions [1] Group 2 - The trading association emphasized the need for innovation in the bond market to support automakers' transitions towards intelligent and green technologies [1] - Research from Shenwan Hongyuan highlighted the VLA model's significant improvement in autonomous driving performance, achieving an average no-takeover mileage of 50-100 kilometers, compared to traditional solutions [2] - The VLA model's deployment requires substantial computing power, as seen in Li Auto's use of a 4 billion parameter scale on the OrinX chip, underscoring the importance of computing hardware in the smart connected vehicle industry [2] Group 3 - Citic Securities noted Haige Communication's involvement in smart transportation, emphasizing its "Beidou + 5G + C-V2X" communication network, which is part of a national vehicle networking pilot project [2] - The technology developed by Haige Communication is expected to directly support high-level autonomous driving scenarios, reflecting the trend of collaborative development between vehicle networking infrastructure and intelligent driving [2]
能干活才是未来!五大先锋公司激辩从实验室到产业化的跨越式突破
机器人圈· 2025-06-11 11:43
Core Insights - The article emphasizes the rapid advancement of Embodied AI as a central focus in global technology, showcased during the 2025 Beijing Zhiyuan Conference, highlighting breakthroughs in key technologies such as motion control and environmental interaction [1] - The transition from showcasing technology to practical applications is underscored, with various companies demonstrating their robots' capabilities in real-world tasks [12] Group 1: Company Innovations - Yushu Technology's G1 robot, labeled as "the world's most capable fighting robot," won the CMG World Robot Competition, demonstrating its autonomous decision-making and high dynamic motion control [2] - Beijing Humanoid Robot Innovation Center's T-Gong 2.0 showcased its ability to complete a half marathon in 2 hours and 40 minutes, with enhanced upper limb dexterity and load-bearing capabilities [3] - Galaxy General's Galbot robot achieved high recognition and grasping success rates in complex retail environments through its self-developed VLA model [6] - Qunche Intelligent's robot demonstrated fine manipulation skills, such as shaving and ice cream scooping, indicating its application in the food processing industry [7] - Physical Intelligence's π-0.5 model, trained in 100 different household scenarios, showcased its ability to generalize tasks effectively, emphasizing the importance of algorithm optimization over sheer data volume [8] Group 2: Industry Trends and Perspectives - The article discusses the significance of robot competitions as catalysts for industrial advancement, providing a platform for technology demonstration and connection between industry and potential customers [12] - The concept of "shape decoupling" is introduced, suggesting that while humanoid robots are not the only solution, they remain ideal for household environments due to ergonomic design [10] - The limitations of current models, such as the VLA model, are acknowledged, particularly in complex, long-sequence tasks, indicating a need for further development to achieve practical application success rates [11] - The consensus among industry leaders is that robots must demonstrate their ability to perform work and create value, marking a shift towards practical applications of embodied intelligence [12]
智源大会热议人形机器人:技术趋势与商业现实
Zhong Guo Jing Ying Bao· 2025-06-08 13:39
Core Insights - The field of embodied intelligence has experienced explosive growth, becoming a core area for the integration of AI and robotics technology [1] - The 2025 Beijing Zhiyuan Conference featured discussions on the current state and future trends of embodied intelligence, highlighting the importance of humanoid robots [1] Group 1: Industry Developments - Humanoid robot competitions have gained popularity, raising questions about whether companies are merely showcasing their capabilities for attention [2] - Companies like Yushu Technology and Tiangong Robotics have participated in various events to demonstrate their robots' capabilities and generate commercial value [2][3] - The VLA model, a key breakthrough in embodied intelligence, allows robots to learn from internet data without experiencing every scenario, enhancing their performance [4] Group 2: Technical Challenges - The VLA model, which stands for Visual-Language-Action model, is crucial for the development of multi-modal large models in robotics [4] - Challenges remain in generalization and stability, with the goal of achieving 100% stable task completion in the future [4] - The use of synthetic data for training is advocated to overcome data bottlenecks, with high-quality simulation data being essential for zero-shot generalization [5][6] Group 3: Commercialization Pathways - The foundational capabilities of humanoid robots are still insufficient, necessitating improvements in terrain adaptability and stability before advancing to higher-level applications [7] - Yushu Technology has seen success in the humanoid robot rental market, indicating a growing industrial value [7] - Companies like Galaxy General Robotics are expanding their operations, with plans to open 100 pharmacies in major cities, utilizing humanoid robots for tasks like medication dispensing [7] Group 4: Future Directions - The development of embodied intelligence is expected to cross several "chasms," with the first phase focusing on innovative products and the second phase targeting B2B applications [8] - The goal is to eventually penetrate the consumer market, leading to widespread applications in households [8] - The Zhiyuan Research Institute aims to explore unique development paths, focusing on digital intelligence physicalization and cost-effective functionality for small-scale robots [8]
大模型热潮第三年,“AI春晚”又换主角 为什么是具身智能?
Mei Ri Jing Ji Xin Wen· 2025-06-06 13:20
Group 1 - The core theme of the news is the evolution of AI from large language models to embodied intelligence and robotics, marking a shift towards practical applications in the industry [1][3][4] - The 2023 Beijing Zhiyuan Conference highlighted the prominence of embodied intelligence, with key figures like Sam Altman and Geoffrey Hinton participating, indicating a significant industry focus shift [3][4] - The emergence of domestic AI companies such as Moonlight Dark Side and Zhipu AI is noted, showcasing the competitive landscape in the language and multimodal model sectors [3][7] Group 2 - The concept of embodied intelligence is gaining traction, with robots being showcased in various public events, indicating a growing interest in their practical applications [7][8] - The upcoming "World Humanoid Robot Sports Competition" will feature real-life scenarios, emphasizing the need for robots to demonstrate their capabilities in practical environments [8][11] - Industry leaders emphasize the importance of developing robots that can perform real tasks, moving beyond mere demonstrations to achieve commercial viability [8][12] Group 3 - The debate over the form of robots, particularly humanoid versus non-humanoid, is ongoing, with humanoid robots currently favored for their data collection and model training advantages [11][12][15] - The VLA (Vision Language Action) model is highlighted as a key area of research, with discussions on its applicability and limitations in the context of embodied intelligence [15][16] - Enhancing the understanding of the physical world is crucial for advancing embodied intelligence, with companies exploring innovative data generation methods to improve training processes [17]
理想汽车-W(2015.HK):净利率同比提升 关注纯电新车周期
Ge Long Hui· 2025-06-05 01:59
Core Viewpoint - The company reported a revenue of 25Q1 at 25.9 billion yuan, with a year-on-year increase of 1%, and a net profit attributable to shareholders of 0.65 billion yuan, up 9% year-on-year. The company is optimistic about its AI capabilities and the new electric vehicle cycle, maintaining a "buy" rating [1][2]. Financial Performance - In 25Q1, the company delivered 93,000 new vehicles, a year-on-year increase of 16% but a quarter-on-quarter decrease of 41% [2]. - The revenue for 25Q1 was 25.9 billion yuan, reflecting a year-on-year growth of 1% and a quarter-on-quarter decline of 41% [2]. - The net profit for 25Q1 was 0.65 billion yuan, showing a year-on-year increase of 9% but a significant quarter-on-quarter drop of 82% [2]. - The estimated revenue per vehicle in 25Q1 was approximately 266,000 yuan, down 3.6 thousand yuan year-on-year and 0.3 thousand yuan quarter-on-quarter [2]. - The estimated net profit per vehicle was about 7,000 yuan, remaining flat year-on-year and down 1.5 thousand yuan quarter-on-quarter [2]. Future Outlook - For 25Q2, the company expects vehicle deliveries to be between 123,000 and 128,000 units, representing a year-on-year increase of 13.3% to 17.9% [2]. - The total revenue for 25Q2 is projected to reach between 32.5 billion and 33.8 billion yuan, indicating a year-on-year growth of 2.5% to 6.7% [2]. Profitability Metrics - The net profit margin for 25Q1 was 2.5%, an increase of 0.2 percentage points year-on-year [2]. - The vehicle gross margin for 25Q1 was 19.8%, up 0.4 percentage points year-on-year, attributed to cost reductions and pricing strategy changes, although partially offset by product mix changes [2]. - The SG&A expense ratio decreased by 1.9 percentage points year-on-year, mainly due to reduced employee compensation, improved operational efficiency, and decreased marketing activities [2]. - The R&D expense ratio decreased by 2.2 percentage points year-on-year, related to reduced employee compensation and the pacing of new model projects [2]. Product Development and Innovation - The company is optimistic about AI integration, particularly with the upcoming launch of the pure electric i8 model, which will feature the new VLA model for advanced driver assistance [3]. - The VLA model integrates spatial intelligence, language intelligence, and behavioral intelligence, enabling seamless interaction between vehicles and users [3]. - The i8, positioned as a mid-to-large SUV, is scheduled for official release in July 2025, with plans to establish over 2,500 charging stations nationwide by the time of launch [3].