Workflow
通用具身智能
icon
Search documents
从300多篇工作来看, VLA是否为通向通用具身智能的必经之路?
具身智能之心· 2025-10-17 16:02
>>直播和内容获取转到 → 具身智能之心知识星球 点击按钮预约直播 视觉语言动作(VLA)模型的出现,标志着从传统的基于策略的控制向通用机器人技术的范式转变,将视觉语言模型(VLM)从被动的序列生成器重塑为能够 在复杂、动态环境中进行操控和决策的主动智能体。 今天我们将带来一场综述类的直播,Pure Vision Language Action (VLA) Models: A Comprehensive Survey这篇综述 深入探讨了先进的VLA方法,旨在对现有研究 提供一个清晰的分类法以及系统、全面的回顾。文章全面分析了VLA在不同场景下的应用,并将VLA方法划分为几种主要范式: 基于自回归的 、 基于扩散的 、 基于强化的 、 混合方法以及专门化方法 ;同时详细审视了它们的动机、核心策略与实现。此外,还介绍了基础性的数据集、基准测试以及仿真平台。 基于当前VLA的发展现状,本综述进一步就关键挑战和未来发展方向提出了见解,以推动VLA模型和可泛化机器人技术的研究。通过综合来自三百多项近期研究 的见解, 描绘了这一快速演进领域的轮廓,并重点指出了将塑造可扩展、通用型VLA方法发展的机遇与挑战! 点击下方 卡 ...
魔法原子CEO吴长征:蓄力1000个人形机器人落地应用场景
Sou Hu Cai Jing· 2025-10-16 07:05
在千万条通往"罗马"的道路中,吴长征尤其看中技术落地和商业化能力的打造。 文|王雅迪 ID | BMR2004 魔法原子是国内最早从事通用人形机器人研发的公司,也是全球少数具备通用人形机器人全栈自研能力的公司。目 前,其已具备为流水线作业、工业搬运等场景提供完整的机器人解决方案的能力,在国内人形机器人行业中也是较早 拉通场景、产品、研发、制造、交付全链路的具身智能公司。 吴长征笃定地认为,只有通用才能真正释放机器人在千行百业的潜力,避免因场景割裂导致的应用天花板。 前不久,具身智能公司魔法原子发布了旗下新款双足人形机器人MagicBot Z1(以下"简称Z1"),并将其带上 2025WAIC 世界人工智能大会的现场。Z1在现场不仅展示了连续"倒地起身""下腰"等高爆发、大幅度动作,还与现场 观众花式互动,只需轻轻触摸其头部,Z1便会迅速做出扭头、转身等回应,吸引不少人驻足互动,成为人气展台。 近日,魔法原子总裁吴长征接受《商学院》记者专访时指出,与市面上强调"专用型"服务机器人不同,魔法原子的战 略选择是以落地推动不同机器人走进千行百业。不仅在技术路线上追求通用,路径上更着眼于规模化落地,让机器 人"有用、好 ...
纯血VLA综述来啦!从VLM到扩散,再到强化学习方案
自动驾驶之心· 2025-09-30 16:04
1. 介绍 机器人学长期以来一直是科学研究中的重要领域。早期的机器人主要依赖预编程的指令和人工设计的控制策略来完成任务分解与执行。这类方法通常应用于简 单、重复性的任务,例如工厂流水线和物流分拣。近年来,人工智能的快速发展使研究者能够在图像、文本和点云等多模态数据中,利用深度学习的特征提取与 轨迹预测能力。通过结合感知、检测、跟踪和定位等技术,研究者将机器人任务分解为多个阶段,以满足执行需求,从而推动了具身智能与自动驾驶的发展。然 而,大多数机器人仍然作为孤立的智能体存在,它们通常为特定任务而设计,缺乏与人类和外部环境的有效交互。 为克服这些局限性,研究者开始探索将大语言模型(LLMs)与视觉语言模型(VLMs)引入机器人操作中,以实现更精准和灵活的控制。现代的机器人操作方法 通常依赖视觉-语言生成范式(如自回归模型 或扩散模型),并结合大规模数据集 以及先进的微调策略。我们将这些方法称为 VLA基础模型,它们显著提升了 机器人操作的质量。对生成内容进行细粒度的动作控制,使用户获得更大的灵活性,从而释放了VLA 在任务执行中的实际潜力。 标题:Pure Vision Language Action (VLA) ...
基于313篇VLA论文的综述与1661字压缩版
理想TOP2· 2025-09-25 13:33
Core Insights - The emergence of Vision Language Action (VLA) models signifies a paradigm shift in robotics from traditional strategy-based control to general robotic technology, enabling active decision-making in complex environments [12][22] - The review categorizes VLA methods into five paradigms: autoregressive, diffusion-based, reinforcement learning, hybrid, and specialized methods, providing a comprehensive overview of their design motivations and core strategies [17][20] Summary by Categories Autoregressive Models - Autoregressive models generate action sequences as time-dependent processes, leveraging historical context and sensory inputs to produce actions step-by-step [44][46] - Key innovations include unified multimodal Transformers that tokenize various modalities, enhancing cross-task action generation [48][49] - Challenges include safety, interpretability, and alignment with human values [47][56] Diffusion-Based Models - Diffusion models frame action generation as a conditional denoising process, allowing for probabilistic action generation and modeling multimodal action distributions [59][60] - Innovations include modular optimization and dynamic adaptive reasoning to improve efficiency and reduce computational costs [61][62] - Limitations involve maintaining temporal consistency in dynamic environments and high computational resource demands [5][60] Reinforcement Learning Models - Reinforcement learning models integrate VLMs with reinforcement learning to generate context-aware actions in interactive environments [6] - Innovations focus on reward function design and safety alignment mechanisms to prevent high-risk behaviors while maintaining task performance [6][7] - Challenges include the complexity of reward engineering and the high computational costs associated with scaling to high-dimensional real-world environments [6][9] Hybrid and Specialized Methods - Hybrid methods combine different paradigms to leverage the strengths of each, such as using diffusion for smooth trajectory generation while retaining autoregressive reasoning capabilities [7] - Specialized methods adapt VLA frameworks to specific domains like autonomous driving and humanoid robot control, enhancing practical applications [7][8] - The focus is on efficiency, safety, and human-robot collaboration in real-time inference and interactive learning [7][8] Data and Simulation Support - The development of VLA models heavily relies on high-quality datasets and simulation platforms to address data scarcity and testing risks [8][34] - Real-world datasets like Open X-Embodiment and simulation tools such as MuJoCo and CARLA are crucial for training and evaluating VLA models [8][36] - Challenges include high annotation costs and insufficient coverage of rare scenarios, which limit the generalization capabilities of VLA models [8][35] Future Opportunities - The integration of world models and cross-modal unification aims to evolve VLA into a comprehensive framework for environment modeling, reasoning, and interaction [10] - Causal reasoning and real interaction models are expected to overcome limitations of "pseudo-interaction" [10] - Establishing standardized frameworks for risk assessment and accountability will transition VLA from experimental tools to trusted partners in society [10]
从300多篇工作中,看VLA在不同场景下的应用和实现......
具身智能之心· 2025-09-25 04:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 兰州大学、中科院、新加坡国立等单位联合出品的一篇最新survey! Pure Vision Language Action (VLA) Models: A Comprehensive Survey 论文链接:https://arxiv.org/pdf/2509.19012 视觉-语言-动作(Vision Language Action, VLA)模型的出现,标志着机器人技术从传统基于策略的控制向通用机器人技术的范式转变,同时也将视觉- 语言模型(Vision Language Models, VLMs)从被动的序列生成器重新定位为在复杂、动态环境中执行操作与决策的主动智能体。 机器人技术长期以来一直是科学研究的重要领域。在历史发展进程中,机器人主要依赖预编程指令和设计好的控制策略来完成任务分解与执行。这些 方法通常应用于简单、重复性的任务,例如工厂 ...
深度综述 | 300+论文带你看懂:纯视觉如何将VLA推向自动驾驶和具身智能巅峰!
自动驾驶之心· 2025-09-24 23:33
视觉-语言-动作(Vision Language Action, VLA)模型的出现,标志着机器人技术从传统基于策略的控制向通用机器人技术的范式转变,同时也将视觉-语言模型(Vision Language Models, VLMs)从被动的序列生成器重新定位为在复杂、动态环境中执行操作与决策的主动智能体。 为此,兰州大学、中科院和新加坡国立大学的团队深入探讨了先进的VLA方法,旨在提供清晰的分类体系,并对现有研究进行系统、全面的综述。文中全面分析了VLA 在不同场景下的应用,并将VLA方法划分为多个范式: 自回归、扩散模型、强化学习、混合方法及专用方法 ;同时详细探讨了这些方法的设计动机、核心策略与实现方 式。 此外,本文还介绍了VLA研究所需的基础数据集、基准测试集与仿真平台。基于当前VLA研究现状,综述进一步提出了该领域面临的关键挑战与未来发展方向,以推动 VLA模型与通用机器人技术的研究进展。通过综合300多项最新研究的见解,本综述勾勒出这一快速发展领域的研究轮廓,并强调了将塑造可扩展、通用型VLA方法发 展的机遇与挑战。 论文标题:Pure Vision Language Action (VLA) M ...
中金:机器人大模型为具身智能破局关键 产业重心转向“小脑+大脑”系统研发
Zhi Tong Cai Jing· 2025-09-19 02:05
中金发布研报称,机器人大模型是破解传统机器人控制瓶颈、迈向通用具身智能的关键路径。当前行业 主要基于大语言模型、自动驾驶大模型及多模态大模型探索的发展方向,产业重心已转向"小脑+大 脑"系统研发,而不同企业在研发与商业化路径上存在差异。该行认为只有极少部分具备全栈技术能 力、资源整合优势与长期主义战略的企业,未来将通过收敛技术路径,最终定义"具身智能"的核心标 准。 中金主要观点如下: 机器人大模型助力通用具身智能发展 传统机器人在任务、场景和数据方面存在较强的专一性,泛化能力较弱,难以应对复杂环境,更偏 向"人形机器"的属性。相比人类学习,机器人在集体学习效率上具备优势。目前行业已形成共识,即机 器人大模型可通过融合视觉、触觉等多模态信息,弥补机器人在"物理常识"方面的不足,是推动产业向 通用具身智能迈进的重要路径之一。 商业化路径选择与企业能力边界待明确 商业化进程中,"硬件优先"(由车企、机器人企业主导)与"模型优先"(由AI企业主导)两种路径各有特点 与优势。受场景复杂度、技术门槛以及商业回报周期等因素影响,多数企业可能会聚焦于特定垂直领 域,实现该场景下的"通用/柔性"应用;该行认为,仅有少数具备全 ...
自变量机器人获近10亿元A+轮融资
Bei Jing Shang Bao· 2025-09-08 02:08
Group 1 - The company, Zibian Robotics, announced the completion of nearly 1 billion yuan in A+ round financing on September 8 [1] - The financing round was led by Alibaba Cloud and Guoke Investment, with participation from Guokai Financial, Sequoia China, and Yongce Capital [1] - Existing shareholder Meituan's strategic investment exceeded expectations, while Lenovo Star and Junlian Capital continued to invest [1] Group 2 - The funds will be used for the continuous training of Zibian's self-developed general embodied intelligence foundational model and the iterative development of hardware products [1] - Since its establishment at the end of 2023, Zibian has established a technical path to achieve general embodied intelligence through an end-to-end unified large model [1] - Recently, the company released the Quanta X2, a self-developed wheeled dual-arm humanoid robot that is compatible with multimodal large model control [1]
人形机器人开始比拼订单落地:松延动力称7月量产交付破百台
Core Insights - The humanoid robot company Songyan Power achieved a significant milestone by delivering 105 humanoid robots in July, marking a 176% month-on-month increase and the highest delivery record since its establishment [1][2] - The company has received over 2,500 orders totaling more than 100 million yuan, positioning it as a leading player in the humanoid robot market [2][4] - Songyan Power aims to enhance its production and delivery capabilities, with a target of delivering 10,000 robots next year [2][5] Company Overview - Songyan Power was founded in 2023, with a team from prestigious universities such as Tsinghua University and Zhejiang University [2] - The company has completed five rounds of financing, attracting investments from various funds, including Inno Angel Fund and SEE Fund [2][4] - The company has established production bases in Beijing, Changzhou, and Dongguan to ensure stable and reliable delivery of humanoid robots [2] Industry Context - The humanoid robot sector is currently a hot topic for investment, with several companies, including Yushutech and TARS, securing significant funding [4][5] - The industry is still in its early stages of commercial development, with a focus on achieving a comprehensive commercialization loop from R&D to sales and after-sales service [5][6] - There is a noted concern regarding homogeneous competition in application scenarios, emphasizing the need for high product quality and valuable use cases to achieve scalable commercialization [6]
四川首批机器人产业机会清单发布
Xin Hua Cai Jing· 2025-07-31 09:08
Group 1 - The core viewpoint of the news is the launch of the first batch of opportunity lists for the robot industry in Sichuan, aimed at fostering collaboration and mutual benefits among local robot enterprises [1][2] - The opportunity list includes four sub-lists: application scenarios, key products, technical requirements, and innovation platforms, focusing on the development needs of local robot companies [1][2] - A total of 194 application scenarios were collected, categorized into six demand types, including manufacturing and logistics, life and services, medical and rehabilitation, guidance and interaction, emergency and inspection, and special operations [1][2] Group 2 - The key products list consists of 120 products selected through voluntary declaration and competitive selection, reflecting the industrial distribution centered around Chengdu and Mianyang [1][2] - The technical requirements list includes 35 items covering various aspects such as intelligent algorithms, key components, design, system integration, and product optimization, involving over 20 robot enterprises [2] - The innovation platforms list comprises 10 entities, including the Sichuan Robot and Intelligent Equipment Innovation Center and the Mianyang Science and Technology City Robot Industry Technology Research Institute, primarily located in Chengdu, Deyang, and Mianyang [2]