具身智能之心
Search documents
南洋理工大学提出NORA-1.5:一种基于世界模型与动作奖励的VLA模型
具身智能之心· 2025-11-21 00:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Chia-YuHung等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 南洋理工大学等研究单位提出NORA-1.5 通过集成流匹配动作专家与奖励驱动的直接偏好优化(DPO)后训练,解决了现有视觉-语言-动作(VLA)模型泛化性和 可靠性不足的问题,在仿真与真实机器人场景中均实现了当前最优性能。 核心定位与解决的关键问题 架构设计:流匹配与 VLA backbone的协同优化 VLA backbone基础 论文标题 :NORA-1.5:AVision-Language-ActionModelTrainedusingWorldModel andAction-basedPreferenceRewards 论文链接 :https://arxiv.org/pdf/2511.14659 ProjectPage :https://declare-lab.github.io/nora-1.5 Code ...
坚持“具身大脑”与“人形本体”软硬⼀体!这家具身公司再获10亿融资
具身智能之心· 2025-11-20 10:52
Core Viewpoint - The article highlights the successful A+ round financing of Star Motion Era, amounting to nearly 1 billion yuan, led by Geely Capital and supported by several strategic investors, which will enhance the development and application of their embodied AI model, ERA-42 [1][14]. Group 1: Financing and Business Growth - Star Motion Era completed an A+ round financing of nearly 1 billion yuan, with Geely Capital leading the investment [1]. - The company has achieved a total order value exceeding 500 million yuan, with the largest single order in logistics nearing 50 million yuan [2][3]. - The business strategy focuses on domestic applications of embodied intelligence solutions while expanding into international markets, resulting in a diversified business landscape [2][3]. Group 2: Technological Advancements - The ERA-42 model has achieved precise control over full-sized humanoid robots and dexterous hands, with applications in logistics and commercial services [1][7]. - Star Motion Era has developed the world's first integrated world model VLA, enhancing the intelligence of their models through a positive feedback loop of "model - entity - scene data" [4][6]. - The company plans to release a new algorithm framework, VPP, which will allow robots to understand the physical world using vast amounts of internet video data [6]. Group 3: Product Development and Applications - The company has established three major product lines covering various scenarios, with over 95% of hardware developed in-house [8][12]. - Star Motion Era's humanoid robot, L7, has achieved significant milestones in performance, including winning a high jump championship and setting a long jump world record [13]. - The service robot Q5 is being utilized in various enterprises and events for tasks such as guiding, delivering, and providing customer service [13].
港中文最新!无需微调即可部署VLA模型
具身智能之心· 2025-11-20 04:02
Core Insights - The article introduces VLA-Pilot, a plug-and-play inference-time strategy that enhances the deployment of pre-trained VLA models in real-world robotic tasks without requiring additional fine-tuning or data collection [4][6][35] - VLA-Pilot significantly improves the success rate of pre-trained VLA strategies across diverse tasks and robot forms, demonstrating robust zero-shot generalization capabilities [4][6] Current Issues - Pre-trained VLA strategies often experience performance degradation when deployed in downstream tasks, which can be mitigated through fine-tuning, but this approach is costly and impractical in real-world scenarios [4][5] - Existing methods for guiding pre-trained strategies during inference have limitations, including the need for additional training and reliance on fixed candidate action sets, which may not align with task contexts [5][6] Innovations and Methodology - VLA-Pilot utilizes a multi-modal large language model (MLLM) as an open-world validator to enhance generalization and employs an evolutionary diffusion process for action optimization, improving task alignment [6][10] - The method includes an embodied policy steering chain (EPS-CoT) module that infers guiding target rewards from task contexts without requiring task-specific training [11][12] - An iterative guiding optimization mechanism is integrated to ensure closed-loop corrections, enhancing the precision and contextual relevance of the guiding process [20][21] Experimental Analysis - VLA-Pilot was evaluated using a dual-arm system, demonstrating superior performance compared to six baseline methods in both in-distribution and out-of-distribution tasks [23][24] - The experiments included six downstream tasks, with metrics such as operation success rate (MSR) and guiding objective alignment (SOA) used to assess performance [26][27] - Results showed that VLA-Pilot outperformed all baseline methods in in-distribution tasks and exhibited robust generalization capabilities in out-of-distribution tasks [28][31] Comparative Performance - In in-distribution tasks, VLA-Pilot achieved an overall MSR of 0.62 and SOA of 0.73, outperforming all baseline methods [30] - For out-of-distribution tasks, VLA-Pilot demonstrated a significant success rate of 0.50, indicating strong adaptability to unseen scenarios [32] Conclusion - VLA-Pilot effectively maximizes the utility of existing VLA models during inference, providing a scalable and data-efficient solution for robotic operations [35]
从纯小白到具身算法工程师的打怪之路
具身智能之心· 2025-11-20 04:02
Core Insights - The article discusses the evolution and research directions in Visual Language Action (VLA), Visual Language Navigation (VLN), and reinforcement learning in robotics, highlighting the importance of these technologies in enhancing robot capabilities and performance [1][2][5][9]. VLA Direction - VLA systems consist of visual perception processing, language instruction understanding, and action strategy networks, categorized into three paradigms: explicit end-to-end VLA, implicit end-to-end VLA, and hierarchical end-to-end VLA [1][2]. - Explicit end-to-end VLA compresses visual and language information into a joint representation, which is then mapped to action space, leveraging various architectures and models to achieve good performance [1]. - Implicit end-to-end VLA focuses on interpretability by predicting future states using video diffusion models, enhancing the potential for scaling VLA models [2]. - Hierarchical end-to-end VLA aims to utilize the characteristics of large models to improve generalization while maintaining efficiency for downstream execution [2]. VLN Direction - VLN systems are composed of visual language encoders, environmental history representation, and action strategies, requiring effective information compression from visual and language inputs [5][6]. - The choice of encoder and whether to project visual and language representations into a common space are critical issues, with current trends favoring pre-trained models on large datasets and the use of large language models (LLM) for instruction decomposition [6]. - VLN robots operate in a sequential decision-making task, accumulating historical information to inform future actions, with implicit methods representing past information as latent variables [6]. - Object Navigation within VLN emphasizes identifying target objects based on category information, reducing the need for detailed instructions and enhancing exploration capabilities [7]. Reinforcement Learning & Legged Robots - Reinforcement learning is crucial for legged robots, covering various aspects such as kinematics, dynamics, multi-modal sensor fusion, and advanced algorithms for task adaptation [9][10]. - Key areas include gait planning, balance control for bipedal robots, and the application of deep reinforcement learning and imitation learning for multi-task training [10]. - Techniques like domain randomization and safety mechanisms are essential for ensuring successful real-world deployment of robotic systems [10]. Diffusion Policy - The introduction of diffusion models in robotics has led to significant advancements, with the Diffusion Policy achieving an average performance improvement of 46.9% in various simulation environments [21][22]. - The Robotic Diffusion Transformer (RDT), with 1.2 billion parameters, showcases strong zero-shot generalization capabilities and the ability to learn new skills with minimal examples [22]. - The application of diffusion strategies is expanding beyond robotic manipulation to areas like autonomous navigation and dexterous grasping, enhancing task success rates through real-time environmental adaptation [22][23]. - Recent developments in diffusion strategies include advancements in 3D applications and the integration of safety and online reinforcement learning, opening new research avenues [23].
世界第一「空间智能」引擎!苹果没做成的,95后博士拿下了
具身智能之心· 2025-11-20 00:03
编辑丨 新智元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 【导读】 2024年以来,从苹果Vision Pro将「空间计算」推向巅峰,到Peloton Guide、Nex Playground等「AI摄像头」硬件开始在小规模市场 中验证可行性,风口已然初现。 一个共识正在全球科技行业悄然形成:AI驱动的体感交互是下一个浪潮。 但一个根本性的矛盾也随之暴露:前者价格高昂,动辄数千美元,将普罗大众拒之门外;后者虽然价格稍低,却仍需用户购买专属的硬件盒子,本质上 没有摆脱「外设」的束缚。 市场在呼唤一个更轻、更普惠的解决方案。 当我们回归真实的居家娱乐场景时,会发现用户体验是极度「割裂」的。 要么是花重金买来的游戏主机,在短暂的新鲜感过后,最终难逃在角落里「吃灰」的命运;要么是面对主机里那些同质化严重、缺乏实时反馈的游戏内 容。 这是一个消费者极度渴望互动,却被高昂硬件和贫瘠内容双重束缚的市场。 在此背景下,一家名为「飞拓星驰」(下文简称「FitX」)的中国 ...
从零将π0.5部署到具身机械臂上!
具身智能之心· 2025-11-20 00:03
Core Viewpoint - The article discusses the launch of the Imeta-Y1, a lightweight and cost-effective robotic arm designed for beginners and researchers in the field of embodied intelligence, emphasizing its open-source capabilities and user-friendly features [2][3][20]. Group 1: Product Features - Imeta-Y1 is designed specifically for novices and researchers, providing a low-cost and efficient solution for algorithm validation and project development [3]. - The robotic arm offers a complete open-source toolchain and code examples, facilitating a seamless process from data collection to model deployment [4][20]. - It supports dual-language interfaces (Python and C++) and is compatible with ROS1 and ROS2, allowing users to quickly get started regardless of their programming background [4][21]. Group 2: Technical Specifications - The robotic arm has a weight of 4.2 KG, a rated load of 3 KG, and features 6 degrees of freedom with a working radius of 612.5 mm and a repeatability precision of ±0.1 mm [9][22]. - It operates on a 24V power supply and utilizes CAN communication, with a compact design suitable for embedded AI and robotic learning platform development [6][9]. Group 3: Development and Support - The product provides a comprehensive SDK that includes drivers, API interfaces, sample code, and documentation, supporting rapid application development [33]. - Users can leverage the URDF model for real-time interaction between simulation environments like Gazebo and physical devices, significantly reducing development risks and debugging costs [25][39]. - The company offers timely after-sales support, with a commitment to respond within 24 hours and a warranty period of six months for non-human damage [51][52].
解决特斯拉「监督稀疏」难题,用世界模型放大自动驾驶的Scaling Law
具身智能之心· 2025-11-20 00:03
Core Insights - The article discusses the challenges faced by VLA models in autonomous driving, particularly the issue of "supervision deficit" due to sparse supervisory signals compared to high-dimensional visual input [3][7][8] - A new research paper titled "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving" proposes a solution by introducing world models to provide dense self-supervised signals, enhancing the model's learning capabilities [3][9][16] Group 1: Supervision Deficit - VLA models struggle with a "supervision deficit," where the input is dense visual information but the supervisory signals are sparse, leading to wasted representational capacity [7][8] - The research indicates that performance of VLA models saturates quickly with increased data under sparse supervision, diminishing the effects of Data Scaling Law [8][22] Group 2: Solution through World Models - The proposed solution involves using world models to generate dense self-supervised training tasks, such as predicting future images, which compels the model to learn the dynamics of the environment [10][14][15] - This approach provides richer learning signals compared to relying solely on sparse action supervision, effectively addressing the supervision deficit [15][16] Group 3: Amplification of Data Scaling Law - The core contribution of the research is the discovery that world models can significantly amplify the effects of Data Scaling Law, leading to better performance as data scales up [17][21] - Experimental results show that DriveVLA-W0 outperforms baseline models, with a notable performance improvement as data increases, particularly at scales from 700K to 70M frames [21][23] Group 4: Performance and Efficiency - DriveVLA-W0 is designed to be practical, addressing the high latency issues in VLA models by introducing a lightweight MoE "action expert" architecture, reducing inference latency to 63.1% of the baseline VLA [26][27] - The integration of world models resulted in a 20.4% reduction in collision rates at 70M frames, demonstrating a qualitative improvement beyond merely increasing action data [24][29]
如何构建通用具身导航大模型?
具身智能之心· 2025-11-20 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 >>直播和内容获取转到 → 具身智能之心知识星球 点击按钮预约直播 今天晚上我们邀请到了北京大学博士生张嘉曌作客具身智能之心,为大家直播分享他们团队在通用导航大模型领域的一系列前沿探索。 当前具身智能的导航研究多受限于特定任务与机器人平台,为突破这一局限, 他们团队的工作从跨任务的导航大模型Uni-NaVid,推进到跨本体的导航大模型 NavFoM,并成功应用于视觉避障、城区微出行与智能跟随等实际场景。 精彩看点 1.跨任务导航大模型: Uni-NaVid 2.跨任务跨本体导航大模型:NavFoM 3.导航大模型应用 : TrackVLA++, UrbanVLA, MM-Nav 面对非结构化、高动态环境以及需要语言理解的复杂任务,传统导航系统已难以满足需求。导航大模型的出现,将导航算法的范畴从专用能力拓展至通用智能移动 能力,为实现具身智能的落地开启了新的路径。欢迎前来聆听,共同探讨通用导航的未来发展。 参考材料 : Uni-Navid: https://pku-epic.github.io/Uni-NaVid/ NavFoM: https://pku-ep ...
适配简单、效率高!U-Arm:你的具身通用遥操臂来啦~
具身智能之心· 2025-11-19 10:00
Group 1 - The core concept of U-Arm is to address the pain points of traditional remote operation devices, which include high costs, low efficiency, and compatibility issues, by providing a high-performance, cost-effective, and open-source solution [1][4]. - U-Arm's core advantages can be summarized into four dimensions: stability, universality, cost-effectiveness, and openness [2][5]. Group 2 - U-Arm is designed specifically for embodied intelligence research and multi-scenario remote operation needs, breaking through traditional remote operation device limitations [4]. - The device features a dual-axis fixed joint design for stability, a lightweight yet impact-resistant body made of 4mm thick resin, and compatibility with 95% of commercial robotic arms [7][8]. Group 3 - U-Arm significantly reduces the initial investment required for remote operation solutions, priced at only 1999 yuan per unit, compared to traditional devices that can cost tens of thousands of dollars [17][18]. - The product includes a complete set of accessories, ensuring no hidden costs for users [8]. Group 4 - U-Arm's modular design allows for easy adaptation to various robotic arms, eliminating the need for separate remote operation devices for different arms [10][15]. - The device supports three core configurations that cover a wide range of mainstream robotic arms, ensuring compatibility and ease of use [11][12]. Group 5 - U-Arm enhances data collection efficiency by 39% compared to traditional methods, providing high-quality data for model training [11]. - The open-source nature of U-Arm allows for customization and supports educational practices, making it suitable for research and teaching [8][17]. Group 6 - The assembly process for U-Arm is straightforward, requiring no specialized technical skills, and includes clear steps for setup and initial operation [25][27]. - U-Arm provides a user-friendly experience, allowing beginners to quickly learn and operate the device with available examples and resources [28]. Group 7 - U-Arm offers a warranty for product quality, allowing for returns within two weeks for structural issues, ensuring customer satisfaction [29].
调研一下!你们最想关注具身的哪个方向?
具身智能之心· 2025-11-19 04:01
最近正在准备为具身行业起草一份非常丰富的研报,预计明年的第一季度公布。因为涉及的内容和方向非 常多,包括具身公司的融资、产业、政策、算法、落地、出口等多个模块,所以也非常想了解下大家都在 关注哪些内容,侧重点应该在哪里。 为了更好服务大家,我们也简单做个调研,涉及以下板块,支持多选哦~ 微信扫码填写,只需10s 国内具身产业与政策 国外具身产业情况 具身公司融资、业务情况 具身数采相关 具身算法优化部署相关 机器人边缘芯片相关 具身下游产业发展 具身产业人才结构与需求 具身公司上市辅导等 其它 ...