视觉语言动作模型 - filings, earnings calls, financial reports, news

视觉语言动作模型

Search documents

回望2025·实物见变迁丨车轮上的新体验——2025年汽车“智变”里的科技跃迁

Xin Hua She· 2025-12-22 01:37

新华社北京12月22日电《经济参考报》12月22日刊发记者袁小康采写的文章《回望2025·实物见变迁丨车轮上的新体验——2025年汽车"智变"里的科技跃迁》。文章称，"现在的车就像有个隐形停车助手，不但能自己停进划线车位，还能用手机遥控进出窄位，再也不用担心停好车开不了门了。"北京车主陈女士说，"不到10万元的车也能有这些功能，智驾普及的速度真是超乎想象。" 车主越来越从容的背后，是一场席卷中国汽车产业的智能驾驶普及浪潮。2025年，随着智能泊车、车道保持等驾驶辅助功能成为越来越多车型的标配，一"键"入库、主动避让障碍物等场景，正从炫酷演示变为日常体验。工信部数据显示，2025年前三季度，具备组合驾驶辅助功能（L2）的乘用车新车销量同比增长21.2%，渗透率达64%。这意味着，每卖出10辆新车中，有超过6辆具备智能泊车、自适应巡航、车道保持等基础智驾能力。智驾给消费者带来的不仅是便利，还有更全面的驾驶体验提升。 "有一次A柱盲区突然窜出一辆电动车，我还没反应，车已经自动刹停并轻微避让。"上海车主徐先生感慨道，"在长途或堵车时，智驾还能接替大部分重复操作，让驾驶变得轻松不少。" 在广州、北 ...

小鹏汽车-W11月共交付智能电动汽车36728辆同比增长19%

Zhi Tong Cai Jing· 2025-12-01 10:11

Core Insights - Xpeng Motors delivered a total of 36,728 smart electric vehicles in November 2025, representing a year-on-year growth of 19% [1] - Cumulative deliveries from January to November 2025 reached 391,937 vehicles, marking a significant year-on-year increase of 156% [1] - Overseas deliveries during the same period amounted to 39,773 vehicles, reflecting a year-on-year growth of 95% [1] Product and Technology Developments - On November 5, 2025, Xpeng Motors successfully held the 2025 Xpeng Technology Day, unveiling a series of groundbreaking "physical AI" applications [1] - New AI applications include the second-generation visual language action model (XPENG VLA2.0), autonomous taxi (Robotaxi), and the next-generation IRON humanoid robot, which are expected to enter mass production in 2026 [1] - The monthly active user penetration rate for Xpeng's intelligent navigation assisted driving (XNGP) reached 84% in urban driving scenarios as of November 2025 [1] - In late December 2025, Xpeng Motors plans to invite Chinese users to participate in a pilot program for the second-generation visual language action model [1]

理想TOP2· 2025-10-18 08:44

Core Insights - The article discusses the differences between VLM (Visual Language Model) and VLA (Visual Language Action) in the context of autonomous driving, particularly focusing on scenarios like blind spot deceleration [1][2]. Group 1: VLM and VLA Differences - VLM operates by perceiving scenarios such as uncontrolled intersections and outputs a deceleration request to the E2E (End-to-End) model, which then reduces speed to 8-12 km/h, creating a sense of disconnection in the response [2]. - VLA, on the other hand, utilizes a self-developed base model to understand the scene directly, allowing for a more nuanced approach to blind spot deceleration, resulting in a smoother and more contextually appropriate response based on various road conditions [2]. Group 2: Action Mechanism - The action generated by VLA is described as a more native deceleration action rather than a dual-system command, indicating a more integrated approach to scene understanding and response [3]. - There are concerns raised in the comments regarding VLM's reliability as an external module, questioning its ability to accurately interpret 3D space and the stability of its triggering mechanisms [3].

机器人感知大升级！轻量化注入几何先验，成功率提升31%

量子位· 2025-09-28 11:54

Core Viewpoint - The article discusses the development of the Evo-0 model, which enhances the spatial understanding capabilities of visual language action (VLA) models by integrating 3D geometric priors without the need for explicit depth input or additional sensors [4][18]. Group 1: Model Development - The Evo-0 model is based on the VGGT visual geometry foundation model, which extracts 3D structural information from multi-view RGB images and integrates it into existing visual language models [4]. - Evo-0 employs a cross-attention fusion module that combines 2D visual tokens with 3D tokens to improve understanding of spatial structures and object layouts [6]. Group 2: Experimental Results - In RLBench simulation experiments, Evo-0 achieved an average success rate exceeding the baseline pi0 by 15% and surpassed openvla-oft by 31% across five tasks requiring fine manipulation [5]. - In real-world experiments involving five spatially demanding tasks, Evo-0 outperformed the baseline model pi0 with an average success rate improvement of 28.88%, particularly excelling in tasks involving complex spatial relationships [12][10]. Group 3: Robustness Evaluation - The robustness of Evo-0 was tested under five types of interference conditions, including unseen distractor objects and variations in background color, target position, height, and camera angle, consistently showing superior performance compared to the baseline pi0 [14][15]. - The model demonstrated a 100% correct pick rate and a 70% overall correct rate when faced with unseen distractor objects, indicating its robustness in challenging scenarios [15]. Group 4: Training Efficiency - Evo-0 achieved better performance with only 15,000 training steps compared to the 20,000 steps required for the baseline model pi0, highlighting its higher training efficiency [8].

Ren Min Ri Bao· 2025-09-01 01:03

Core Insights - The sales of humanoid robots in China are expected to exceed 10,000 units this year, representing a year-on-year growth of 125% [1] - The development of the humanoid robot industry is characterized by rapid innovation and application across various sectors, including industrial manufacturing, retail delivery, and restaurant services [1][2] Trend 1: Faster Innovation and Application - The Chinese government has included "embodied intelligence" in its work report, emphasizing the importance of humanoid robots as a typical application in the "Artificial Intelligence+" initiative [3] - Various local policies are being implemented to support humanoid robot development, with significant funding and investment initiatives announced in cities like Beijing, Shanghai, and Hangzhou [3] - Experts indicate that the industry has reached a "turning point" for large-scale production, with improvements in hardware and intelligence capabilities [3][4] Trend 2: Higher Technical Standards - The development of humanoid robots relies on the synergy of hardware innovation, advanced algorithms, and high-quality data accumulation [7] - The industry is witnessing rapid advancements in core components, such as actuators and sensors, which are becoming more standardized and cost-effective [4][7] - The integration of technologies like satellite navigation and 5G communication is enhancing the capabilities of humanoid robots [8][9] Trend 3: Stronger Comprehensive Performance - Humanoid robots are evolving towards full autonomy, moving away from remote control operations to self-sufficient decision-making and execution [11] - The complexity of humanoid robot development involves multiple fields, including mechanical structure, drive systems, and artificial intelligence [12] - The potential applications of humanoid robots are expanding, with roles in production, service industries, and even family settings, addressing diverse needs [12]

元戎启行VLA模型三季度要量产，能否冲破市场+技术壁垒？

Nan Fang Du Shi Bao· 2025-06-13 15:04

Core Insights - Yuanrong Qixing announced its VLA model will be launched to consumers in Q3 2025, with five vehicle models expected to be on the road within the year [1] - The VLA model features four key capabilities: blind spot detection, obstacle recognition, road sign interpretation, and voice control, generating significant interest in the industry [1][3] Company Overview - Yuanrong Qixing, established in 2018 and based in Shenzhen, has focused on autonomous driving and vehicle networking technologies [3] - The VLA model, or Vision Language Action Model, is considered the company's "secret weapon" and offers a differentiating factor compared to traditional end-to-end models by addressing the "black box problem" [3][4] Technology and Innovation - The VLA model enhances transparency by clearly displaying the reasoning process behind its decisions, which increases user trust in the autonomous driving system [4] - In Q4 2024, Yuanrong Qixing captured over 15% market share in the high-level intelligent driving assistance sector with a single mass-produced model [6] - The company has optimized costs through collaboration with Qualcomm, achieving complex scenario operations on a 100 TOPS platform, significantly reducing the price of its intelligent driving solutions [7] Market Challenges - The intelligent driving sector is highly competitive, with many players already established and partnerships formed with automotive manufacturers [8] - Yuanrong Qixing faces challenges in gaining market recognition and acceptance for the VLA model amidst increasing consumer caution due to recent accidents and stringent regulations [8] - The company has successfully raised $100 million in its C1 financing round in November 2024, but still faces financial pressures in a cooling investment environment [8][9] Strategic Considerations - The push to market the VLA model represents both a technological showcase and a market challenge, as the company shifts focus from L4 to L2 capabilities, potentially sacrificing some advanced technology for mass production [9] - The need for ongoing funding is critical to avoid disruptions in technology development and to maintain competitive positioning in the market [9]