Workflow
UniVLA
icon
Search documents
FlowVLA:破解 VLA 模型 “物理失真” 难题,机器人世界建模再升级
具身智能之心· 2025-08-29 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 传统 Vision-Language-Action(VLA)世界模型依赖 "下一帧直接预测" 范式,常因混淆静态外观与动态运动陷入 "像素复制陷阱"—— 不仅长时程预测出现机械臂 消失、物体运动异常等物理失真问题,还因预训练 "被动观测知识" 与策略学习 "主动控制知识" 脱节,导致下游任务收敛慢、样本效率低。 针对这一核心痛点,FlowVLA 基于 视觉思维链(Visual CoT) 原则,在单自回归 Transformer 中实现外观与运动的统一推理:先从当前帧预测中间光流编码运动 动态,再基于光流生成未来帧,通过 "帧→流→帧" 的结构化推理解耦动态与外观学习。 两阶段训练范式进一步强化性能:预训练阶段从无动作视频学通用物理规律,微调阶段适配机器人控制任务。实验显示,FlowVLA 在 LIBERO 全任务集(尤其 长时程任务)、Simple ...
重磅直播!RoboTwin2.0:强域随机化双臂操作数据生成器与评测基准集
具身智能之心· 2025-07-15 13:49
Core Viewpoint - The article discusses the challenges and advancements in training dual-arm robots for complex tasks, emphasizing the need for efficient data collection and simulation methods to enhance their operational capabilities [2]. Group 1: Challenges in Dual-Arm Robot Training - Dual-arm robots play a crucial role in collaborative assembly, tool usage, and object handover in complex scenarios, but training them to perform general operations like VLA faces multiple bottlenecks [2]. - The cost and time required to scale up the collection of real demonstration data are high, making it difficult to cover a wide range of tasks, object shapes, and hardware variations [2]. - Existing simulation methods lack efficient and scalable expert data generation techniques for new tasks, and their domain randomization designs are too superficial to accurately simulate the complexities of real environments [2]. Group 2: Advancements and Solutions - The article highlights the introduction of UniVLA, which efficiently utilizes multi-source heterogeneous data to construct a general and scalable action space for robots [5]. - The CVPR champion solution, BridgeVLA, reportedly improves real machine performance by 32%, showcasing advancements in robot navigation and motion control in real-world scenarios [4].
VLA统一架构新突破:自回归世界模型引领具身智能
机器之心· 2025-07-10 04:26
Core Viewpoint - The article discusses the development of a new unified Vision-Language-Action (VLA) model architecture called UniVLA, which enhances the integration of visual, language, and action signals for improved decision-making in embodied intelligence tasks [4][5][13]. Group 1: Model Architecture and Mechanism - UniVLA is based on a fully discrete, autoregressive mechanism that models visual, language, and action signals natively, incorporating world model training to learn temporal information and causal logic from large-scale videos [5][9][14]. - The framework transforms visual, language, and action signals into discrete tokens, creating interleaved multimodal temporal sequences for unified modeling [9][10]. Group 2: Performance and Benchmarking - UniVLA has set new state-of-the-art (SOTA) records across major embodied intelligence benchmarks such as CALVIN, LIBERO, and SimplerEnv, demonstrating its strong performance advantages [18][21]. - In the CALVIN benchmark, UniVLA achieved an average score of 95.5%, outperforming previous models significantly [19]. Group 3: Training Efficiency and Generalization - The post-training stage of the world model significantly enhances downstream decision-making performance without relying on extensive action data, utilizing only vast amounts of video data for efficient learning [14][15]. - The model supports unified training for various tasks, including visual understanding, video generation, and action prediction, showcasing its versatility and data scalability [10][24]. Group 4: Future Directions - The article suggests exploring deeper integration of the UniVLA framework with multimodal reinforcement learning to enhance its perception, understanding, and decision-making capabilities in open-world scenarios [24].
智元机器人联合香港大学推出的UniVLA入选RSS | 投研报告
Market Performance - On May 14, 2025, the CSI 300 index rose by 1.21%, while the machinery sector declined by 0.43%, ranking 29th among all primary industries [2][1] - Within the sub-sectors, semiconductor equipment had the highest increase of 0.79%, whereas engineering machinery experienced the largest drop of 1.96% [2][1] - The top three gainers in individual stocks were Heng Er Da (+20.00%), Zhong Ji Huan Ke (+19.97%), and Da Ye Co. (+12.98%); the top three losers were Magnetic Valley Technology (-8.20%), Xin Yu Ren (-7.46%), and De Ma Technology (-6.19%) [2][1] Company Announcements - New Era's shareholder Wang Chunxiang plans to reduce his stake by 0.15% through block trading or centralized bidding, having previously held 2.12% [3] - Guangge Technology's major shareholder Beijing Jishi Chuangye Investment Fund reduced its stake by 0.27% from 5.00% between May 7 and May 13, 2025 [3] - Fengxing Co.'s major shareholder Jiangxi Taihao Technology Development Co. has reduced its stake by 1.02% from 7.92% through centralized bidding [3] - Zhuozhao Point Glue's shareholder Yinghao (Hainan) Venture Capital Co. has reduced its stake by 0.2914% from 1.2230% through centralized bidding [3] Industry News - Zhiyuan Robotics and the University of Hong Kong launched UniVLA, a new framework for universal strategy learning in robotics, which allows for cross-domain, cross-scenario, and cross-task capabilities [6] - UniVLA's core innovation is the task-centric latent action space, enabling efficient learning from vast amounts of unlabeled video data, achieving state-of-the-art performance with significantly lower computational resources [6] - The model demonstrated an average success rate improvement of 18.5% across four evaluation metrics and achieved state-of-the-art results with only 10% of the data in specific tasks [6] - The first practical quantum-resistant chip "Mi Xin PQC01" was released by Zhengzhou Xinda Yimi Technology Co., featuring 100% domestic production and core technology [7][8] - The chip supports dynamic switching between quantum-resistant and classical algorithms, operates on a 28nm process, and reduces power consumption by 60%, making it suitable for IoT and mobile devices [8]