UniVLA - filings, earnings calls, financial reports, news

UniVLA

Search documents

自动驾驶之心· 2025-12-15 00:04

以下文章来源于RoboX ，作者RoboX RoboX . 从AI汽车到机器人，我们关注最具潜力的超级智能体！来源 | RoboX 点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文日前， RoboX 从多方获悉，香港大学数据科学研究院助理教授、上海人工智能实验室 OpenDriveLab 研究科学家——李弘扬，已进入具身智能赛道创业，其公司将围绕 UniVLA 的研究方向，进行机器人 Manipulation 攻关，目前已形成长程任务的 Demo 。据悉，该公司目前已经组成数十人的研发团队，研究领域涵盖 VLA 、机器人、无人驾驶和端边计算芯片等领域。该框架特点在于：通过两阶段训练解耦任务相关与无关动态，利用 DINO 特征空间和语言指令增强语义对齐，结合轻量级解码器适配不同机器人硬件。（近日，李弘扬出席了2025地平线技术生态大会，并参与具身智能圆桌讨论）作为知名青年科学家，李弘扬的主要研究方向为自动驾驶、具身智能及端到端智能系统应用。由他主导的《 ...

具身智能

自动驾驶

端到端智能系统

Artificial Intelligence

Artificial Intelligence

BEVFormer

AgiBot World

Sim2Real，解不了具身智能的数据困境。

自动驾驶之心· 2025-10-03 03:32

Core Viewpoint - The article discusses the ongoing debate in the field of embodied intelligence regarding the reliance on simulation efficiency versus real-world data, and the potential of world models to redefine the landscape of data utilization in this domain [4][8]. Group 1: Understanding Sim-to-Real Gap - The "Sim-to-Real gap" refers to the discrepancies between simulated environments and real-world scenarios, primarily due to incomplete simulations that fail to accurately replicate visual and physical details [8]. - Research indicates that the gap exists because simulation models do not fully capture the complexities of the real world, leading to limited generalization capabilities and a focus on specific scenarios [8][11]. - Solutions to bridge this gap involve optimizing data, including designing virtual and real data ratios and leveraging AIGC to generate diverse datasets that balance volume and authenticity [11][12]. Group 2: Data Utilization in Embodied Intelligence - There is a consensus among experts that while real data is ideal for training, the current landscape necessitates a reliance on simulation data due to the scarcity of high-quality real-world datasets in the embodied intelligence field [20][21]. - Simulation data plays a crucial role in foundational model iteration and testing, as it allows for safe and efficient algorithm testing before deploying on real machines [21][24]. - The potential of simulation in scaling reinforcement learning is highlighted, as well-constructed simulators can facilitate large-scale parallel training, enabling models to learn from scenarios that are difficult to capture in real life [24][26]. Group 3: World Models and Future Directions - The article emphasizes the significance of world models in future research, particularly in areas like autonomous driving and embodied intelligence, showcasing their potential in general visual understanding and long-term planning [30][32]. - Challenges remain in automating the generation of simulation data and ensuring the diversity and generalization of actions within simulations, which are critical for advancing the field [28][29]. - The introduction of new modalities, such as force and touch, into world models is suggested as a promising direction for future research, despite current limitations in computational resources [30][31]. Group 4: Reaction to Boston Dynamics Technology - Experts acknowledge the advanced capabilities of Boston Dynamics robots, particularly their smooth execution of complex tasks that require sophisticated motion control [33][37]. - The discussion highlights the importance of hardware and data in the field of embodied intelligence, with Boston Dynamics' approach serving as a benchmark for future developments [37][39]. - The consensus is that the seamless performance of these robots is attributed not only to hardware differences but also to superior motion control techniques that could inform future research in embodied intelligence [39][41].

FlowVLA：破解 VLA 模型 “物理失真” 难题，机器人世界建模再升级

具身智能之心· 2025-08-29 00:03

Core Viewpoint - The article discusses the limitations of traditional Vision-Language-Action (VLA) models and introduces FlowVLA, a new framework that addresses these issues by implementing a Visual Chain of Thought (Visual CoT) principle, enhancing the model's ability to predict future frames through structured physical reasoning rather than mere pixel replication [5][8][36]. Group 1: Background and Current State - VLA models, particularly those pre-trained as world models, show significant potential in the field of general robotics, primarily through large self-regressive Transformers that learn environmental dynamics from vast video data [6][7]. - Existing models face critical flaws, including task confusion leading to prediction failures, knowledge transfer inefficiencies between passive observation and active control, and entangled learning of dynamics and appearance [7]. Group 2: Contributions of FlowVLA - FlowVLA introduces a new learning framework that emphasizes structured physical reasoning by requiring the model to infer motion dynamics before predicting future frames [8][10]. - The model is designed to unify appearance and motion reasoning within a single self-regressive Transformer, maintaining parameter efficiency and architectural simplicity [9][10]. - Experimental results validate FlowVLA's superior performance across various robotic operation benchmarks, demonstrating enhanced sample efficiency and bridging the gap between pre-training and policy fine-tuning [10][20]. Group 3: Research Content - The Visual CoT reasoning process decomposes the frame prediction into a causal chain of "current frame → optical flow → future frame," allowing the model to separate dynamic and appearance learning [12][14]. - The two-phase training paradigm consists of a pre-training phase focused on world model learning and a fine-tuning phase for adapting to control tasks [15][16]. Group 4: Experimental Analysis - FlowVLA outperforms existing methods in the LIBERO dataset across all task sets, particularly excelling in long-term tasks, showcasing its robust understanding of physical dynamics [20][21]. - In the SimplerEnv dataset, FlowVLA demonstrates strong adaptability to visual domain shifts, achieving significant performance improvements in tasks where other models struggle [22][23]. - The model's sample efficiency is validated, requiring only one-third of the training steps to reach peak performance compared to baseline models, with a 55% higher peak success rate in low-data scenarios [30][32]. Group 5: Key Component Validation - Ablation studies on the LIBERO-10 benchmark highlight the importance of the Visual CoT structure, flow loss, and interleaved sequence format, confirming their critical roles in the model's performance [33][34]. Group 6: Comparison with Related Work - FlowVLA distinguishes itself from traditional VLA models by prioritizing dynamic understanding and establishing a robust world model before adapting to control tasks, thus laying a solid foundation for physical knowledge [35].

重磅直播！RoboTwin2.0：强域随机化双臂操作数据生成器与评测基准集

具身智能之心· 2025-07-15 13:49

Core Viewpoint - The article discusses the challenges and advancements in training dual-arm robots for complex tasks, emphasizing the need for efficient data collection and simulation methods to enhance their operational capabilities [2]. Group 1: Challenges in Dual-Arm Robot Training - Dual-arm robots play a crucial role in collaborative assembly, tool usage, and object handover in complex scenarios, but training them to perform general operations like VLA faces multiple bottlenecks [2]. - The cost and time required to scale up the collection of real demonstration data are high, making it difficult to cover a wide range of tasks, object shapes, and hardware variations [2]. - Existing simulation methods lack efficient and scalable expert data generation techniques for new tasks, and their domain randomization designs are too superficial to accurately simulate the complexities of real environments [2]. Group 2: Advancements and Solutions - The article highlights the introduction of UniVLA, which efficiently utilizes multi-source heterogeneous data to construct a general and scalable action space for robots [5]. - The CVPR champion solution, BridgeVLA, reportedly improves real machine performance by 32%, showcasing advancements in robot navigation and motion control in real-world scenarios [4].

VLA统一架构新突破：自回归世界模型引领具身智能

机器之心· 2025-07-10 04:26

Core Viewpoint - The article discusses the development of a new unified Vision-Language-Action (VLA) model architecture called UniVLA, which enhances the integration of visual, language, and action signals for improved decision-making in embodied intelligence tasks [4][5][13]. Group 1: Model Architecture and Mechanism - UniVLA is based on a fully discrete, autoregressive mechanism that models visual, language, and action signals natively, incorporating world model training to learn temporal information and causal logic from large-scale videos [5][9][14]. - The framework transforms visual, language, and action signals into discrete tokens, creating interleaved multimodal temporal sequences for unified modeling [9][10]. Group 2: Performance and Benchmarking - UniVLA has set new state-of-the-art (SOTA) records across major embodied intelligence benchmarks such as CALVIN, LIBERO, and SimplerEnv, demonstrating its strong performance advantages [18][21]. - In the CALVIN benchmark, UniVLA achieved an average score of 95.5%, outperforming previous models significantly [19]. Group 3: Training Efficiency and Generalization - The post-training stage of the world model significantly enhances downstream decision-making performance without relying on extensive action data, utilizing only vast amounts of video data for efficient learning [14][15]. - The model supports unified training for various tasks, including visual understanding, video generation, and action prediction, showcasing its versatility and data scalability [10][24]. Group 4: Future Directions - The article suggests exploring deeper integration of the UniVLA framework with multimodal reinforcement learning to enhance its perception, understanding, and decision-making capabilities in open-world scenarios [24].

具身智能

世界模型

视觉 - 语言 - 动作（VLA）模型

Artificial Intelligence

Artificial Intelligence

UniVLA

智元机器人联合香港大学推出的UniVLA入选RSS | 投研报告

Zhong Guo Neng Yuan Wang· 2025-05-16 01:43

Market Performance - On May 14, 2025, the CSI 300 index rose by 1.21%, while the machinery sector declined by 0.43%, ranking 29th among all primary industries [2][1] - Within the sub-sectors, semiconductor equipment had the highest increase of 0.79%, whereas engineering machinery experienced the largest drop of 1.96% [2][1] - The top three gainers in individual stocks were Heng Er Da (+20.00%), Zhong Ji Huan Ke (+19.97%), and Da Ye Co. (+12.98%); the top three losers were Magnetic Valley Technology (-8.20%), Xin Yu Ren (-7.46%), and De Ma Technology (-6.19%) [2][1] Company Announcements - New Era's shareholder Wang Chunxiang plans to reduce his stake by 0.15% through block trading or centralized bidding, having previously held 2.12% [3] - Guangge Technology's major shareholder Beijing Jishi Chuangye Investment Fund reduced its stake by 0.27% from 5.00% between May 7 and May 13, 2025 [3] - Fengxing Co.'s major shareholder Jiangxi Taihao Technology Development Co. has reduced its stake by 1.02% from 7.92% through centralized bidding [3] - Zhuozhao Point Glue's shareholder Yinghao (Hainan) Venture Capital Co. has reduced its stake by 0.2914% from 1.2230% through centralized bidding [3] Industry News - Zhiyuan Robotics and the University of Hong Kong launched UniVLA, a new framework for universal strategy learning in robotics, which allows for cross-domain, cross-scenario, and cross-task capabilities [6] - UniVLA's core innovation is the task-centric latent action space, enabling efficient learning from vast amounts of unlabeled video data, achieving state-of-the-art performance with significantly lower computational resources [6] - The model demonstrated an average success rate improvement of 18.5% across four evaluation metrics and achieved state-of-the-art results with only 10% of the data in specific tasks [6] - The first practical quantum-resistant chip "Mi Xin PQC01" was released by Zhengzhou Xinda Yimi Technology Co., featuring 100% domestic production and core technology [7][8] - The chip supports dynamic switching between quantum-resistant and classical algorithms, operates on a 28nm process, and reduces power consumption by 60%, making it suitable for IoT and mobile devices [8]