VLA模型
Search documents
汽车视点丨32.18万元起!理想首款纯电SUV上市,大模型能否筑起“护城河”?
Xin Hua Cai Jing· 2025-07-30 07:59
新华财经上海7月30日电(李一帆)7月29日晚,理想汽车首款纯电SUV理想i8正式上市,指导价32.18万元至36.98万元,相比全系预售价格降低4至5万元。 理想i8能否帮助理想汽车正式打开纯电市场,扭转2025年以来销量低迷的态势,成为业内外关注焦点。 配置不及预期,资本市场反应平淡 2025年是理想汽车成立10周年。过去10年里,理想汽车收获了136万车主用户,开辟了增程细分市场,并凭借"冰箱彩电大沙发"的创新配置成为佼佼者,领 跑一众新势力品牌。 然而,进入2025年,随着鸿蒙智行系列车型在增程领域销量节节攀升,理想汽车的增程红利不再明显。 2025年上半年,理想汽车累计交付新车20.39万辆,同比增长7.91%,但增速明显放缓,仅完成全年64万辆销量目标的31.87%。其中,6月交付3.63万辆,同 比下降24.1%,环比下降11.20%。 湘财证券分析师汪炜认为,这反映出理想汽车增程技术优势减弱、产品吸引力下降及销售体系调整带来的短期扰动。 因此,理想i8作为理想汽车首款纯电SUV,被视为理想发力纯电的转型之作,也成为理想众多新技术的"集大成者"。 发布会上,理想汽车创始人、董事长兼CEO李想为i ...
PI联合创始人,机器人大神!详解VLA+强化学习,催生更强大的系统
具身智能之心· 2025-07-30 06:03
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing complex tasks through improved data sets and model architectures [6][12][44]. Group 1: RT-2 and RT-X Models - RT-2 is introduced as a foundational robot model that utilizes a visual language model to process image-based commands and execute tasks [8][10]. - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, showcasing a diverse range of robotic capabilities [13][26]. - Cross-embodiment models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of generalization in robotic learning [13][29]. Group 2: Evolution of VLA Models - The first generation of VLA models, like RT-2, is based on simple question-answer structures for robot control, while the second generation incorporates continuous action distributions for better performance [16][19]. - The second generation VLA models, such as π0, utilize a large language model with an action expert module to handle complex tasks, generating action sequences over time [22][24]. - The π0.5 model is designed for long-term tasks, integrating high-level reasoning to execute complex instructions in new environments [36][40]. Group 3: Integration of Reinforcement Learning - Future VLA models are expected to incorporate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [44][49]. - The integration of reinforcement learning with VLA aims to create a more effective training process, allowing robots to learn from both expert data and real-world interactions [56][60]. - Current research is focused on developing stable and effective end-to-end training processes that leverage reinforcement learning to improve VLA capabilities [60].
国产人形机器人硬件+应用加速落地
2025-07-14 00:36
Summary of the Conference Call on the Domestic Humanoid Robot Industry Industry Overview - The domestic humanoid robot industry is accelerating deployment, with significant investments in companies like Zhiyuan and Yushui totaling 124 million yuan, indicating a growing market demand for humanoid robot applications [1][2] - The humanoid robot supply chain is steadily advancing, with over 80 domestic companies, primarily startups from universities, focusing on application scenarios such as logistics, household chores, inspection, and textiles [1][3][4] Key Developments - Zhiyuan and Yushui won a procurement project for humanoid and biped robots from China Mobile Hangzhou, with a total contract value of 124 million yuan, highlighting the rapid deployment of robots in the domestic market [2] - Tiangong Walker's standard version is priced at approximately 300,000 yuan, with expected production and orders exceeding 1,000 units in 2025 [2] Application Scenarios - The application of humanoid robots in inspection, logistics, and textiles is promising, with robots capable of replacing human labor in high-risk tasks such as high-altitude inspections, thereby improving safety [3][10][11] - In the logistics sector, humanoid robots are expected to collaborate with unmanned logistics vehicles to achieve automation in factories, enhancing efficiency and reducing human error [12][14] Company Highlights - UBTECH showcased the Walker S Two, featuring a replaceable battery and has begun small-scale industrial orders, indicating high market acceptance [5] - Yushui demonstrated advanced motion control capabilities, including climbing and dancing, with its products achieving world-leading standards [6] - Zhiyuan introduced multiple commercial products and is actively collecting data to iterate on technology, planning to gather 500,000 data points weekly for comprehensive deployment [7] Competitive Landscape - Domestic companies are making significant progress in VRA and VLA model development, establishing a data commonality layer and collaborating with partners to build resource platforms [8] - The domestic humanoid robot supply chain is outperforming international competitors in terms of application depth and capital expenditure, with a focus on practical applications [9] Future Prospects - The future of humanoid robots in the textile industry is promising, as they can replace manual operations in labor-intensive tasks, with advancements in technology allowing for better handling of flexible materials [16] - The overall market for humanoid robots is expected to grow, with increasing applications in various sectors, including logistics and inspection, as companies continue to innovate and improve their products [10][17]
EmbodyX最新!VOTE:集成投票&优化加速VLA模型的通用框架,吞吐量加速35倍!
具身智能之心· 2025-07-13 09:48
Core Insights - The article discusses the limitations of existing VLA models in generalizing to new objects and unfamiliar environments, prompting the development of a more efficient action prediction method called VOTE [4][6][9]. Group 1: Background and Motivation - The challenge of creating a universal robotic strategy that can handle diverse tasks and real-world interactions has been a core focus in robotics research [6]. - VLA models have shown excellent performance in familiar environments but struggle with generalization in unseen scenarios, leading to the exploration of methods to enhance robustness [7][8]. Group 2: VOTE Methodology - VOTE is introduced as a lightweight VLA model that optimizes trajectory using an ensemble voting strategy, significantly improving inference speed and reducing computational costs [9][14]. - The model eliminates the need for additional visual modules and diffusion techniques, relying solely on the VLM backbone and introducing a special token <ACT> to streamline action prediction [9][18]. - The action sampling technique employs an ensemble voting mechanism to enhance model performance by aggregating predictions from previous steps, thus improving stability and robustness [22][23]. Group 3: Performance and Evaluation - Experimental results indicate that VOTE achieves state-of-the-art performance, with a 20% increase in average success rate on the LIBERO task suite and a 3% improvement over CogACT on the SimplerEnv WidowX robot [9][28]. - The model demonstrates a 35-fold increase in throughput on edge devices like NVIDIA Jetson Orin, showcasing its efficiency for real-time applications [9][31]. - VOTE's performance is superior to existing models, achieving a throughput of 42Hz on edge platforms while maintaining minimal memory overhead [31][32].
VLA 推理新范式!一致性模型 CEED-VLA 实现四倍加速!
机器之心· 2025-07-13 04:58
Core Viewpoint - The article discusses the advancements in Vision-Language-Action (VLA) models, particularly focusing on the CEED-VLA model, which significantly improves inference speed while maintaining high task success rates in robotic applications [2][8][24]. Group 1: VLA Model Overview - VLA models have become a crucial research direction in robotics due to their strong multimodal understanding and generalization capabilities [2]. - Despite advancements, VLA models face significant inference speed bottlenecks, especially in high-frequency and precise tasks [2]. Group 2: Proposed Solutions - The article introduces a consistency distillation training strategy that allows the model to predict multiple correct action tokens simultaneously, enhancing decoding speed [4]. - A mixed-label supervision mechanism is designed to mitigate potential error accumulation during the distillation process [4][9]. - An early-exit decoding strategy is proposed to address inefficiencies in Jacobi decoding, allowing for improved average inference efficiency by relaxing convergence conditions [5][10]. Group 3: Experimental Results - The proposed methods achieved over 4 times inference acceleration across multiple baseline models while maintaining high task success rates in both simulated and real-world robotic tasks [8][18]. - The CEED-VLA model demonstrated a significant increase in manipulation task success rates, exceeding 70%, due to enhanced inference speed and control frequency [24].
VLA爆发!从美国RT-2到中国FiS-VLA,机器人的终极进化
具身智能之心· 2025-07-09 14:38
Core Viewpoint - The article emphasizes the rapid evolution and significance of Vision-Language-Action (VLA) models in the field of embodied intelligence, highlighting their potential to revolutionize human-robot interaction and the robotics industry as a whole [4][6][17]. Group 1: VLA Model Development - VLA models are becoming the core driving force in embodied intelligence, gaining traction among researchers and companies globally [7][8]. - Google recently released the first offline VLA model, enabling robots to perform tasks without internet connectivity [9]. - The emergence of the Fast-in-Slow (FiS-VLA) model in China represents a significant advancement, integrating fast and slow systems to enhance robotic control efficiency and reasoning capabilities [10][12]. Group 2: Academic and Industry Trends - There has been an explosive growth in academic papers related to VLA, with 1,390 papers published this year alone, accounting for nearly half of all related research [14]. - The VLA technology is facilitating the transition of robots from laboratory settings to real-world applications, indicating its vast potential [16][17]. Group 3: Key Innovations and Breakthroughs - The RT-2 model from Google marked a pivotal moment in VLA development, introducing a unified model architecture that integrates visual, language, and action modalities [38][40]. - The RoboMamba model, developed in China, significantly improved efficiency and reasoning capabilities in VLA models, achieving a threefold increase in inference speed compared to mainstream models [52][48]. - OpenVLA, another significant model, demonstrated superior performance in various tasks while being more efficient than previous models, achieving a 16.5% higher success rate than RT-2 [57][58]. Group 4: Future Directions and Implications - The introduction of the π series models aims to enhance VLA's generalization capabilities, allowing robots to perform complex tasks with minimal training [62][70]. - The FiS-VLA model represents a breakthrough in real-time control, achieving an 11% improvement in success rates in real environments compared to existing methods [114]. - The advancements in VLA technology are paving the way for robots to operate effectively in diverse environments, marking a significant step towards achieving Artificial General Intelligence (AGI) [127][123].
智能网联汽车ETF(159872)政策与技术共振,车联网基建+高阶自动驾驶双主线凸显
Xin Lang Cai Jing· 2025-06-17 02:25
Group 1 - The smart connected vehicle ETF (159872.SZ) remained stable with a 0.00% increase, while its associated index, CS Vehicle Networking (930725.CSI), rose by 0.15% [1] - Major constituent stocks such as SAIC Motor Corporation increased by 0.63%, Wanma Technology by 5.39%, and Qianfang Technology by 1.36%, indicating positive market sentiment [1] - A meeting held by the trading association on June 16 focused on supporting high-quality development in the automotive sector, with representatives from nine major automakers discussing financing needs and optimization suggestions [1] Group 2 - The trading association emphasized the need for innovation in the bond market to support automakers' transitions towards intelligent and green technologies [1] - Research from Shenwan Hongyuan highlighted the VLA model's significant improvement in autonomous driving performance, achieving an average no-takeover mileage of 50-100 kilometers, compared to traditional solutions [2] - The VLA model's deployment requires substantial computing power, as seen in Li Auto's use of a 4 billion parameter scale on the OrinX chip, underscoring the importance of computing hardware in the smart connected vehicle industry [2] Group 3 - Citic Securities noted Haige Communication's involvement in smart transportation, emphasizing its "Beidou + 5G + C-V2X" communication network, which is part of a national vehicle networking pilot project [2] - The technology developed by Haige Communication is expected to directly support high-level autonomous driving scenarios, reflecting the trend of collaborative development between vehicle networking infrastructure and intelligent driving [2]
能干活才是未来!五大先锋公司激辩从实验室到产业化的跨越式突破
机器人圈· 2025-06-11 11:43
Core Insights - The article emphasizes the rapid advancement of Embodied AI as a central focus in global technology, showcased during the 2025 Beijing Zhiyuan Conference, highlighting breakthroughs in key technologies such as motion control and environmental interaction [1] - The transition from showcasing technology to practical applications is underscored, with various companies demonstrating their robots' capabilities in real-world tasks [12] Group 1: Company Innovations - Yushu Technology's G1 robot, labeled as "the world's most capable fighting robot," won the CMG World Robot Competition, demonstrating its autonomous decision-making and high dynamic motion control [2] - Beijing Humanoid Robot Innovation Center's T-Gong 2.0 showcased its ability to complete a half marathon in 2 hours and 40 minutes, with enhanced upper limb dexterity and load-bearing capabilities [3] - Galaxy General's Galbot robot achieved high recognition and grasping success rates in complex retail environments through its self-developed VLA model [6] - Qunche Intelligent's robot demonstrated fine manipulation skills, such as shaving and ice cream scooping, indicating its application in the food processing industry [7] - Physical Intelligence's π-0.5 model, trained in 100 different household scenarios, showcased its ability to generalize tasks effectively, emphasizing the importance of algorithm optimization over sheer data volume [8] Group 2: Industry Trends and Perspectives - The article discusses the significance of robot competitions as catalysts for industrial advancement, providing a platform for technology demonstration and connection between industry and potential customers [12] - The concept of "shape decoupling" is introduced, suggesting that while humanoid robots are not the only solution, they remain ideal for household environments due to ergonomic design [10] - The limitations of current models, such as the VLA model, are acknowledged, particularly in complex, long-sequence tasks, indicating a need for further development to achieve practical application success rates [11] - The consensus among industry leaders is that robots must demonstrate their ability to perform work and create value, marking a shift towards practical applications of embodied intelligence [12]
智源大会热议人形机器人:技术趋势与商业现实
Zhong Guo Jing Ying Bao· 2025-06-08 13:39
Core Insights - The field of embodied intelligence has experienced explosive growth, becoming a core area for the integration of AI and robotics technology [1] - The 2025 Beijing Zhiyuan Conference featured discussions on the current state and future trends of embodied intelligence, highlighting the importance of humanoid robots [1] Group 1: Industry Developments - Humanoid robot competitions have gained popularity, raising questions about whether companies are merely showcasing their capabilities for attention [2] - Companies like Yushu Technology and Tiangong Robotics have participated in various events to demonstrate their robots' capabilities and generate commercial value [2][3] - The VLA model, a key breakthrough in embodied intelligence, allows robots to learn from internet data without experiencing every scenario, enhancing their performance [4] Group 2: Technical Challenges - The VLA model, which stands for Visual-Language-Action model, is crucial for the development of multi-modal large models in robotics [4] - Challenges remain in generalization and stability, with the goal of achieving 100% stable task completion in the future [4] - The use of synthetic data for training is advocated to overcome data bottlenecks, with high-quality simulation data being essential for zero-shot generalization [5][6] Group 3: Commercialization Pathways - The foundational capabilities of humanoid robots are still insufficient, necessitating improvements in terrain adaptability and stability before advancing to higher-level applications [7] - Yushu Technology has seen success in the humanoid robot rental market, indicating a growing industrial value [7] - Companies like Galaxy General Robotics are expanding their operations, with plans to open 100 pharmacies in major cities, utilizing humanoid robots for tasks like medication dispensing [7] Group 4: Future Directions - The development of embodied intelligence is expected to cross several "chasms," with the first phase focusing on innovative products and the second phase targeting B2B applications [8] - The goal is to eventually penetrate the consumer market, leading to widespread applications in households [8] - The Zhiyuan Research Institute aims to explore unique development paths, focusing on digital intelligence physicalization and cost-effective functionality for small-scale robots [8]
大模型热潮第三年,“AI春晚”又换主角 为什么是具身智能?
Mei Ri Jing Ji Xin Wen· 2025-06-06 13:20
Group 1 - The core theme of the news is the evolution of AI from large language models to embodied intelligence and robotics, marking a shift towards practical applications in the industry [1][3][4] - The 2023 Beijing Zhiyuan Conference highlighted the prominence of embodied intelligence, with key figures like Sam Altman and Geoffrey Hinton participating, indicating a significant industry focus shift [3][4] - The emergence of domestic AI companies such as Moonlight Dark Side and Zhipu AI is noted, showcasing the competitive landscape in the language and multimodal model sectors [3][7] Group 2 - The concept of embodied intelligence is gaining traction, with robots being showcased in various public events, indicating a growing interest in their practical applications [7][8] - The upcoming "World Humanoid Robot Sports Competition" will feature real-life scenarios, emphasizing the need for robots to demonstrate their capabilities in practical environments [8][11] - Industry leaders emphasize the importance of developing robots that can perform real tasks, moving beyond mere demonstrations to achieve commercial viability [8][12] Group 3 - The debate over the form of robots, particularly humanoid versus non-humanoid, is ongoing, with humanoid robots currently favored for their data collection and model training advantages [11][12][15] - The VLA (Vision Language Action) model is highlighted as a key area of research, with discussions on its applicability and limitations in the context of embodied intelligence [15][16] - Enhancing the understanding of the physical world is crucial for advancing embodied intelligence, with companies exploring innovative data generation methods to improve training processes [17]