Workflow
视觉语言动作模型
icon
Search documents
小鹏汽车(09868) - 自愿公告 2026年2月智能车交付数据
2026-03-02 04:10
自願公告 香港交易及結算所有限公司及香港聯合交易所有限公司對本公告之內容概不負責,對其準確性或完整性亦 不發表聲明,並明確表示概不會就本公告全部或任何部分內容而產生或因依賴該等內容而引致之任何損失 承擔任何責任。 本公司股東及潛在投資者於買賣本公司證券時,務請謹慎行事。 承董事會命 XPeng Inc. 董事長 何小鵬 香港,2026年3月2日(星期一) 本公告載有前瞻性陳述。前瞻性陳述涉及固有風險和不確定性。多種因素可能導致實際結果與 任何前瞻性陳述中的結果產生實質性差異,包括但不限於以下方面:本公司的目標及戰略;本 公司的拓展計劃;其未來的業務發展、財務狀況和經營成果;中國智能電動汽車市場的發展趨 勢和規模;本公司對於其產品及服務的需求及市場接受程度的期待;本公司對於其與客戶、合 作廠商、供應商、第三方服務提供商、戰略合作夥伴及其他持份者的關係的期待;總體經濟及 商業狀況;及與上述任何一項有關或與之相關的假設。本公告中提供的所有信息均截至本公告 日期,除適用法律要求的範圍外,本公司不承擔任何更新前瞻性陳述的義務。 於本公告日期,本公司董事會由執行董事何小鵬先生,非執行董事符績勳先生,以及獨立非執 行董事楊 ...
越疆机器人:开启第三批全尺寸工业人形机器人2026年量产交付
Xin Lang Cai Jing· 2026-02-03 10:32
Core Insights - The company Yuejiang has initiated the mass production and delivery of its full-size industrial humanoid robot ATOM, marking a significant milestone towards its 2026 large-scale delivery goal [2][5] - This development signifies a new phase in the application of humanoid robots, as Yuejiang becomes one of the few companies to successfully establish a complete closed-loop for mass production and delivery applications [2][5] Production and Quality Control - The ATOM robot stands at 165 cm and undergoes a standardized full-process control from core component selection to assembly and testing [2][5] - Yuejiang has implemented a multi-level inspection system for quality control, covering incoming component inspections, assembly process monitoring, and overall performance testing [2][5] - Each ATOM robot must pass strict standardized tests on key metrics such as dynamic balance and motion accuracy to ensure performance consistency and high reliability [2][5] Technological Advancements - The ATOM is equipped with Yuejiang's self-developed DOBOT-VLA (Visual Language Action) model, which integrates visual perception, natural language understanding, and action generation [2][5] - This model allows the robot to convert abstract instructions into structured task chains and produce continuous, generalizable action trajectories [2][5] - By combining reinforcement learning with real-world data alignment, the robot can not only "understand requests" but also possesses the autonomous adaptability to "respond to changes" [2][5]
AAAI 2026最新!OC-VLA:解决感知与动作的错位问题
具身智能之心· 2026-01-19 00:49
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 核心设计与方法 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 OC-VLA提出的背景和动机 在 VLA 模型中,一种常见的做法是将预训练的视觉-语言模型或视觉编码器应用于下游机器人任务以增强模型的泛化能力。然而,这些视觉模型主要是在相机坐标 系中进行标注、训练和监督的,因此其潜在表征是对齐到相机空间的。相比之下,大多数机器人控制信号是在机器人基坐标系中进行定义和完成采集的。这种差异 导致感知空间和动作空间之间存在错位,阻碍了机器人策略的有效学习,特别是将预训练的视觉模型迁移到机械人控制任务时。 机器人数据通常是在多样的相机视角和异构硬件配置下收集的,这种情况下,必须从不同的第三方摄像机视角预测出在机器人坐标系中执行的相同动作。这隐式地 要求模型从有限的二维观测中重建或推断出一致的三维动作。这种不一致性在大规模预训练期间尤其有害,因为训练数据中往往存在不同的摄像机视角的观测信 息:从不同角度捕捉 ...
人形机器人电影院“上班”!工作14小时不累,卖了1000杯还零失误
Sou Hu Wang· 2026-01-06 09:17
Core Insights - The humanoid robot Atom from Yujian Robotics has successfully completed the world's first public commercial demonstration, selling 1,000 cups of popcorn daily and generating over 20,000 yuan in revenue [1][7] Group 1: Technology and Implementation - The demonstration showcases the robot's capability as a "fully autonomous reliable workforce" in a real commercial environment, with popcorn selling serving as a high-frequency, standardized task to test the robot's operational effectiveness [3][4] - Atom's operation involves complex tasks requiring high levels of perception, decision-making, and execution, including recognizing customers, handling cups, and managing various environmental interferences [4][5] - The robot operates without relying on pre-set trajectories or remote control, utilizing an end-to-end Visual Language Action (VLA) model for real-time environmental perception and task execution [4][9] Group 2: Performance and Results - The robot achieved a record of 14 hours of continuous operation with zero errors, demonstrating its long-duration task execution capability and the reliability of its mechanical structure and control systems [12][16] - The successful sale of popcorn not only meets market demand but also illustrates the potential for robots to enhance efficiency, extend service hours, and maintain service consistency, providing quantifiable commercial value [7][16] Group 3: Future Implications - The success in the cinema environment opens opportunities for humanoid robots in other service sectors such as restaurants, cafes, hotels, and retail, addressing labor shortages in these industries [16] - The practical validation of the VLA model indicates a shift towards integrated intelligent architectures that are better suited for handling complex tasks in open environments, guiding the future development of humanoid robots [16]
回望2025·实物见变迁丨车轮上的新体验——2025年汽车“智变”里的科技跃迁
Xin Hua She· 2025-12-22 01:37
Core Insights - The article discusses the rapid adoption of intelligent driving technologies in the Chinese automotive industry, highlighting the shift from traditional driving to smart driving experiences by 2025 [1][2]. Group 1: Market Trends - By the third quarter of 2025, new passenger cars equipped with Level 2 (L2) driving assistance features saw a year-on-year sales increase of 21.2%, with a penetration rate of 64%, indicating that over 6 out of every 10 new cars sold have basic smart driving capabilities [1]. - The focus of consumers is shifting from single highway scenarios to complex urban environments, with a growing preference for driving assistance systems that can handle city traffic and intersections [2]. Group 2: Technological Advancements - Continuous technological breakthroughs and rapidly decreasing costs are driving the smart driving revolution, with hardware costs halving every two years and user experience expected to improve tenfold in the same period [3]. - The Chinese smart driving market is at a critical turning point in 2025, transitioning from "technology validation" to "scene implementation," with L2 features becoming standard across all vehicle models [3]. Group 3: Industry Dynamics - The market is experiencing intense competition, leading to a significant industry reshuffle where only companies with technical strength and mass production experience will survive [4]. - The focus of market competition is shifting towards user experience, cost control, and product ecosystem, with a predicted market structure that will be characterized by significant stratification and specialization [5].
小鹏汽车-W11月共交付智能电动汽车36728辆 同比增长19%
Zhi Tong Cai Jing· 2025-12-01 10:11
Core Insights - Xpeng Motors delivered a total of 36,728 smart electric vehicles in November 2025, representing a year-on-year growth of 19% [1] - Cumulative deliveries from January to November 2025 reached 391,937 vehicles, marking a significant year-on-year increase of 156% [1] - Overseas deliveries during the same period amounted to 39,773 vehicles, reflecting a year-on-year growth of 95% [1] Product and Technology Developments - On November 5, 2025, Xpeng Motors successfully held the 2025 Xpeng Technology Day, unveiling a series of groundbreaking "physical AI" applications [1] - New AI applications include the second-generation visual language action model (XPENG VLA2.0), autonomous taxi (Robotaxi), and the next-generation IRON humanoid robot, which are expected to enter mass production in 2026 [1] - The monthly active user penetration rate for Xpeng's intelligent navigation assisted driving (XNGP) reached 84% in urban driving scenarios as of November 2025 [1] - In late December 2025, Xpeng Motors plans to invite Chinese users to participate in a pilot program for the second-generation visual language action model [1]
理想VLM/VLA盲区减速差异
理想TOP2· 2025-10-18 08:44
Core Insights - The article discusses the differences between VLM (Visual Language Model) and VLA (Visual Language Action) in the context of autonomous driving, particularly focusing on scenarios like blind spot deceleration [1][2]. Group 1: VLM and VLA Differences - VLM operates by perceiving scenarios such as uncontrolled intersections and outputs a deceleration request to the E2E (End-to-End) model, which then reduces speed to 8-12 km/h, creating a sense of disconnection in the response [2]. - VLA, on the other hand, utilizes a self-developed base model to understand the scene directly, allowing for a more nuanced approach to blind spot deceleration, resulting in a smoother and more contextually appropriate response based on various road conditions [2]. Group 2: Action Mechanism - The action generated by VLA is described as a more native deceleration action rather than a dual-system command, indicating a more integrated approach to scene understanding and response [3]. - There are concerns raised in the comments regarding VLM's reliability as an external module, questioning its ability to accurately interpret 3D space and the stability of its triggering mechanisms [3].
机器人感知大升级!轻量化注入几何先验,成功率提升31%
量子位· 2025-09-28 11:54
Core Viewpoint - The article discusses the development of the Evo-0 model, which enhances the spatial understanding capabilities of visual language action (VLA) models by integrating 3D geometric priors without the need for explicit depth input or additional sensors [4][18]. Group 1: Model Development - The Evo-0 model is based on the VGGT visual geometry foundation model, which extracts 3D structural information from multi-view RGB images and integrates it into existing visual language models [4]. - Evo-0 employs a cross-attention fusion module that combines 2D visual tokens with 3D tokens to improve understanding of spatial structures and object layouts [6]. Group 2: Experimental Results - In RLBench simulation experiments, Evo-0 achieved an average success rate exceeding the baseline pi0 by 15% and surpassed openvla-oft by 31% across five tasks requiring fine manipulation [5]. - In real-world experiments involving five spatially demanding tasks, Evo-0 outperformed the baseline model pi0 with an average success rate improvement of 28.88%, particularly excelling in tasks involving complex spatial relationships [12][10]. Group 3: Robustness Evaluation - The robustness of Evo-0 was tested under five types of interference conditions, including unseen distractor objects and variations in background color, target position, height, and camera angle, consistently showing superior performance compared to the baseline pi0 [14][15]. - The model demonstrated a 100% correct pick rate and a 70% overall correct rate when faced with unseen distractor objects, indicating its robustness in challenging scenarios [15]. Group 4: Training Efficiency - Evo-0 achieved better performance with only 15,000 training steps compared to the 20,000 steps required for the baseline model pi0, highlighting its higher training efficiency [8].
人形机器人,更快更高更强
Ren Min Ri Bao· 2025-09-01 01:03
Core Insights - The sales of humanoid robots in China are expected to exceed 10,000 units this year, representing a year-on-year growth of 125% [1] - The development of the humanoid robot industry is characterized by rapid innovation and application across various sectors, including industrial manufacturing, retail delivery, and restaurant services [1][2] Trend 1: Faster Innovation and Application - The Chinese government has included "embodied intelligence" in its work report, emphasizing the importance of humanoid robots as a typical application in the "Artificial Intelligence+" initiative [3] - Various local policies are being implemented to support humanoid robot development, with significant funding and investment initiatives announced in cities like Beijing, Shanghai, and Hangzhou [3] - Experts indicate that the industry has reached a "turning point" for large-scale production, with improvements in hardware and intelligence capabilities [3][4] Trend 2: Higher Technical Standards - The development of humanoid robots relies on the synergy of hardware innovation, advanced algorithms, and high-quality data accumulation [7] - The industry is witnessing rapid advancements in core components, such as actuators and sensors, which are becoming more standardized and cost-effective [4][7] - The integration of technologies like satellite navigation and 5G communication is enhancing the capabilities of humanoid robots [8][9] Trend 3: Stronger Comprehensive Performance - Humanoid robots are evolving towards full autonomy, moving away from remote control operations to self-sufficient decision-making and execution [11] - The complexity of humanoid robot development involves multiple fields, including mechanical structure, drive systems, and artificial intelligence [12] - The potential applications of humanoid robots are expanding, with roles in production, service industries, and even family settings, addressing diverse needs [12]
元戎启行VLA模型三季度要量产,能否冲破市场+技术壁垒?
Nan Fang Du Shi Bao· 2025-06-13 15:04
Core Insights - Yuanrong Qixing announced its VLA model will be launched to consumers in Q3 2025, with five vehicle models expected to be on the road within the year [1] - The VLA model features four key capabilities: blind spot detection, obstacle recognition, road sign interpretation, and voice control, generating significant interest in the industry [1][3] Company Overview - Yuanrong Qixing, established in 2018 and based in Shenzhen, has focused on autonomous driving and vehicle networking technologies [3] - The VLA model, or Vision Language Action Model, is considered the company's "secret weapon" and offers a differentiating factor compared to traditional end-to-end models by addressing the "black box problem" [3][4] Technology and Innovation - The VLA model enhances transparency by clearly displaying the reasoning process behind its decisions, which increases user trust in the autonomous driving system [4] - In Q4 2024, Yuanrong Qixing captured over 15% market share in the high-level intelligent driving assistance sector with a single mass-produced model [6] - The company has optimized costs through collaboration with Qualcomm, achieving complex scenario operations on a 100 TOPS platform, significantly reducing the price of its intelligent driving solutions [7] Market Challenges - The intelligent driving sector is highly competitive, with many players already established and partnerships formed with automotive manufacturers [8] - Yuanrong Qixing faces challenges in gaining market recognition and acceptance for the VLA model amidst increasing consumer caution due to recent accidents and stringent regulations [8] - The company has successfully raised $100 million in its C1 financing round in November 2024, but still faces financial pressures in a cooling investment environment [8][9] Strategic Considerations - The push to market the VLA model represents both a technological showcase and a market challenge, as the company shifts focus from L4 to L2 capabilities, potentially sacrificing some advanced technology for mass production [9] - The need for ongoing funding is critical to avoid disruptions in technology development and to maintain competitive positioning in the market [9]