视觉语言动作模型
Search documents
越疆机器人:开启第三批全尺寸工业人形机器人2026年量产交付
Xin Lang Cai Jing· 2026-02-03 10:32
新浪科技讯 2月3日晚间消息,近日,越疆开启第三批全尺寸工业人形机器人ATOM的2026年量产交 付,并分批投入产业一线应用。据悉,这不仅是越疆开启2026大规模交付目标节点,更意味着其人形机 器人加速迈入规模化场景应用的全新阶段,成为少数真正跑通批量"生产交付应用"完整闭环的企业之 一。 据悉,此次量产交付的是越疆身高165cm的全尺寸人形机器人ATOM,从核心零部件筛选到整机组装调 试,每一台机器人都经历标准化的全流程管控。尤其在质量控制方面,越疆建立了多层级的检测体系, 覆盖零部件入厂检验、组装过程监控与整机性能测试。在动态平衡、运动精度等关键指标上,每台出厂 的ATOM都必须通过统一、严格的标准化测试,确保性能一致性与高可靠性。 据悉,此次量产交付的是越疆身高165cm的全尺寸人形机器人ATOM,从核心零部件筛选到整机组装调 试,每一台机器人都经历标准化的全流程管控。尤其在质量控制方面,越疆建立了多层级的检测体系, 覆盖零部件入厂检验、组装过程监控与整机性能测试。在动态平衡、运动精度等关键指标上,每台出厂 的ATOM都必须通过统一、严格的标准化测试,确保性能一致性与高可靠性。 此外,ATOM还搭载了越 ...
AAAI 2026最新!OC-VLA:解决感知与动作的错位问题
具身智能之心· 2026-01-19 00:49
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 核心设计与方法 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 OC-VLA提出的背景和动机 在 VLA 模型中,一种常见的做法是将预训练的视觉-语言模型或视觉编码器应用于下游机器人任务以增强模型的泛化能力。然而,这些视觉模型主要是在相机坐标 系中进行标注、训练和监督的,因此其潜在表征是对齐到相机空间的。相比之下,大多数机器人控制信号是在机器人基坐标系中进行定义和完成采集的。这种差异 导致感知空间和动作空间之间存在错位,阻碍了机器人策略的有效学习,特别是将预训练的视觉模型迁移到机械人控制任务时。 机器人数据通常是在多样的相机视角和异构硬件配置下收集的,这种情况下,必须从不同的第三方摄像机视角预测出在机器人坐标系中执行的相同动作。这隐式地 要求模型从有限的二维观测中重建或推断出一致的三维动作。这种不一致性在大规模预训练期间尤其有害,因为训练数据中往往存在不同的摄像机视角的观测信 息:从不同角度捕捉 ...
人形机器人电影院“上班”!工作14小时不累,卖了1000杯还零失误
Sou Hu Wang· 2026-01-06 09:17
Core Insights - The humanoid robot Atom from Yujian Robotics has successfully completed the world's first public commercial demonstration, selling 1,000 cups of popcorn daily and generating over 20,000 yuan in revenue [1][7] Group 1: Technology and Implementation - The demonstration showcases the robot's capability as a "fully autonomous reliable workforce" in a real commercial environment, with popcorn selling serving as a high-frequency, standardized task to test the robot's operational effectiveness [3][4] - Atom's operation involves complex tasks requiring high levels of perception, decision-making, and execution, including recognizing customers, handling cups, and managing various environmental interferences [4][5] - The robot operates without relying on pre-set trajectories or remote control, utilizing an end-to-end Visual Language Action (VLA) model for real-time environmental perception and task execution [4][9] Group 2: Performance and Results - The robot achieved a record of 14 hours of continuous operation with zero errors, demonstrating its long-duration task execution capability and the reliability of its mechanical structure and control systems [12][16] - The successful sale of popcorn not only meets market demand but also illustrates the potential for robots to enhance efficiency, extend service hours, and maintain service consistency, providing quantifiable commercial value [7][16] Group 3: Future Implications - The success in the cinema environment opens opportunities for humanoid robots in other service sectors such as restaurants, cafes, hotels, and retail, addressing labor shortages in these industries [16] - The practical validation of the VLA model indicates a shift towards integrated intelligent architectures that are better suited for handling complex tasks in open environments, guiding the future development of humanoid robots [16]
回望2025·实物见变迁丨车轮上的新体验——2025年汽车“智变”里的科技跃迁
Xin Hua She· 2025-12-22 01:37
Core Insights - The article discusses the rapid adoption of intelligent driving technologies in the Chinese automotive industry, highlighting the shift from traditional driving to smart driving experiences by 2025 [1][2]. Group 1: Market Trends - By the third quarter of 2025, new passenger cars equipped with Level 2 (L2) driving assistance features saw a year-on-year sales increase of 21.2%, with a penetration rate of 64%, indicating that over 6 out of every 10 new cars sold have basic smart driving capabilities [1]. - The focus of consumers is shifting from single highway scenarios to complex urban environments, with a growing preference for driving assistance systems that can handle city traffic and intersections [2]. Group 2: Technological Advancements - Continuous technological breakthroughs and rapidly decreasing costs are driving the smart driving revolution, with hardware costs halving every two years and user experience expected to improve tenfold in the same period [3]. - The Chinese smart driving market is at a critical turning point in 2025, transitioning from "technology validation" to "scene implementation," with L2 features becoming standard across all vehicle models [3]. Group 3: Industry Dynamics - The market is experiencing intense competition, leading to a significant industry reshuffle where only companies with technical strength and mass production experience will survive [4]. - The focus of market competition is shifting towards user experience, cost control, and product ecosystem, with a predicted market structure that will be characterized by significant stratification and specialization [5].
小鹏汽车-W11月共交付智能电动汽车36728辆 同比增长19%
Zhi Tong Cai Jing· 2025-12-01 10:11
Core Insights - Xpeng Motors delivered a total of 36,728 smart electric vehicles in November 2025, representing a year-on-year growth of 19% [1] - Cumulative deliveries from January to November 2025 reached 391,937 vehicles, marking a significant year-on-year increase of 156% [1] - Overseas deliveries during the same period amounted to 39,773 vehicles, reflecting a year-on-year growth of 95% [1] Product and Technology Developments - On November 5, 2025, Xpeng Motors successfully held the 2025 Xpeng Technology Day, unveiling a series of groundbreaking "physical AI" applications [1] - New AI applications include the second-generation visual language action model (XPENG VLA2.0), autonomous taxi (Robotaxi), and the next-generation IRON humanoid robot, which are expected to enter mass production in 2026 [1] - The monthly active user penetration rate for Xpeng's intelligent navigation assisted driving (XNGP) reached 84% in urban driving scenarios as of November 2025 [1] - In late December 2025, Xpeng Motors plans to invite Chinese users to participate in a pilot program for the second-generation visual language action model [1]
理想VLM/VLA盲区减速差异
理想TOP2· 2025-10-18 08:44
Core Insights - The article discusses the differences between VLM (Visual Language Model) and VLA (Visual Language Action) in the context of autonomous driving, particularly focusing on scenarios like blind spot deceleration [1][2]. Group 1: VLM and VLA Differences - VLM operates by perceiving scenarios such as uncontrolled intersections and outputs a deceleration request to the E2E (End-to-End) model, which then reduces speed to 8-12 km/h, creating a sense of disconnection in the response [2]. - VLA, on the other hand, utilizes a self-developed base model to understand the scene directly, allowing for a more nuanced approach to blind spot deceleration, resulting in a smoother and more contextually appropriate response based on various road conditions [2]. Group 2: Action Mechanism - The action generated by VLA is described as a more native deceleration action rather than a dual-system command, indicating a more integrated approach to scene understanding and response [3]. - There are concerns raised in the comments regarding VLM's reliability as an external module, questioning its ability to accurately interpret 3D space and the stability of its triggering mechanisms [3].
机器人感知大升级!轻量化注入几何先验,成功率提升31%
量子位· 2025-09-28 11:54
Core Viewpoint - The article discusses the development of the Evo-0 model, which enhances the spatial understanding capabilities of visual language action (VLA) models by integrating 3D geometric priors without the need for explicit depth input or additional sensors [4][18]. Group 1: Model Development - The Evo-0 model is based on the VGGT visual geometry foundation model, which extracts 3D structural information from multi-view RGB images and integrates it into existing visual language models [4]. - Evo-0 employs a cross-attention fusion module that combines 2D visual tokens with 3D tokens to improve understanding of spatial structures and object layouts [6]. Group 2: Experimental Results - In RLBench simulation experiments, Evo-0 achieved an average success rate exceeding the baseline pi0 by 15% and surpassed openvla-oft by 31% across five tasks requiring fine manipulation [5]. - In real-world experiments involving five spatially demanding tasks, Evo-0 outperformed the baseline model pi0 with an average success rate improvement of 28.88%, particularly excelling in tasks involving complex spatial relationships [12][10]. Group 3: Robustness Evaluation - The robustness of Evo-0 was tested under five types of interference conditions, including unseen distractor objects and variations in background color, target position, height, and camera angle, consistently showing superior performance compared to the baseline pi0 [14][15]. - The model demonstrated a 100% correct pick rate and a 70% overall correct rate when faced with unseen distractor objects, indicating its robustness in challenging scenarios [15]. Group 4: Training Efficiency - Evo-0 achieved better performance with only 15,000 training steps compared to the 20,000 steps required for the baseline model pi0, highlighting its higher training efficiency [8].
人形机器人,更快更高更强
Ren Min Ri Bao· 2025-09-01 01:03
Core Insights - The sales of humanoid robots in China are expected to exceed 10,000 units this year, representing a year-on-year growth of 125% [1] - The development of the humanoid robot industry is characterized by rapid innovation and application across various sectors, including industrial manufacturing, retail delivery, and restaurant services [1][2] Trend 1: Faster Innovation and Application - The Chinese government has included "embodied intelligence" in its work report, emphasizing the importance of humanoid robots as a typical application in the "Artificial Intelligence+" initiative [3] - Various local policies are being implemented to support humanoid robot development, with significant funding and investment initiatives announced in cities like Beijing, Shanghai, and Hangzhou [3] - Experts indicate that the industry has reached a "turning point" for large-scale production, with improvements in hardware and intelligence capabilities [3][4] Trend 2: Higher Technical Standards - The development of humanoid robots relies on the synergy of hardware innovation, advanced algorithms, and high-quality data accumulation [7] - The industry is witnessing rapid advancements in core components, such as actuators and sensors, which are becoming more standardized and cost-effective [4][7] - The integration of technologies like satellite navigation and 5G communication is enhancing the capabilities of humanoid robots [8][9] Trend 3: Stronger Comprehensive Performance - Humanoid robots are evolving towards full autonomy, moving away from remote control operations to self-sufficient decision-making and execution [11] - The complexity of humanoid robot development involves multiple fields, including mechanical structure, drive systems, and artificial intelligence [12] - The potential applications of humanoid robots are expanding, with roles in production, service industries, and even family settings, addressing diverse needs [12]
元戎启行VLA模型三季度要量产,能否冲破市场+技术壁垒?
Nan Fang Du Shi Bao· 2025-06-13 15:04
Core Insights - Yuanrong Qixing announced its VLA model will be launched to consumers in Q3 2025, with five vehicle models expected to be on the road within the year [1] - The VLA model features four key capabilities: blind spot detection, obstacle recognition, road sign interpretation, and voice control, generating significant interest in the industry [1][3] Company Overview - Yuanrong Qixing, established in 2018 and based in Shenzhen, has focused on autonomous driving and vehicle networking technologies [3] - The VLA model, or Vision Language Action Model, is considered the company's "secret weapon" and offers a differentiating factor compared to traditional end-to-end models by addressing the "black box problem" [3][4] Technology and Innovation - The VLA model enhances transparency by clearly displaying the reasoning process behind its decisions, which increases user trust in the autonomous driving system [4] - In Q4 2024, Yuanrong Qixing captured over 15% market share in the high-level intelligent driving assistance sector with a single mass-produced model [6] - The company has optimized costs through collaboration with Qualcomm, achieving complex scenario operations on a 100 TOPS platform, significantly reducing the price of its intelligent driving solutions [7] Market Challenges - The intelligent driving sector is highly competitive, with many players already established and partnerships formed with automotive manufacturers [8] - Yuanrong Qixing faces challenges in gaining market recognition and acceptance for the VLA model amidst increasing consumer caution due to recent accidents and stringent regulations [8] - The company has successfully raised $100 million in its C1 financing round in November 2024, but still faces financial pressures in a cooling investment environment [8][9] Strategic Considerations - The push to market the VLA model represents both a technological showcase and a market challenge, as the company shifts focus from L4 to L2 capabilities, potentially sacrificing some advanced technology for mass production [9] - The need for ongoing funding is critical to avoid disruptions in technology development and to maintain competitive positioning in the market [9]