Workflow
具身智能之心
icon
Search documents
让机器人「不只是走路」,Nav-R1引领带推理的导航新时代
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the introduction of Nav-R1, a new embodied foundation model designed to enhance the reasoning and navigation capabilities of robots in 3D environments, integrating perception, reasoning, and action effectively [5][30]. Group 1: Key Innovations - Nav-R1 utilizes a large-scale dataset called Nav-CoT-110K, which contains approximately 110,000 Chain-of-Thought trajectories, facilitating a stable reasoning and action foundation before reinforcement learning optimization [8][6]. - The model incorporates three types of rewards: Format Reward, Understanding Reward, and Navigation Reward, which ensure structured output, semantic understanding, and path fidelity respectively [10][15]. - The Fast-in-Slow reasoning paradigm is inspired by human cognition, where a fast system handles immediate responses while a slow system manages long-term planning and semantic consistency [11][16]. Group 2: Experimental Results - Nav-R1 demonstrated significant improvements in various navigation tasks, achieving an increase of approximately 8% or more in success rates and path efficiency compared to other advanced methods [14]. - In real-world deployments, Nav-R1 was tested on a mobile robot platform, showing robust performance in navigating complex indoor environments [19][26]. Group 3: Applications and Implications - The model has potential applications in service robots and home assistants, enhancing user experience by enabling robots to navigate cluttered environments and understand commands [31]. - In healthcare settings, Nav-R1 can assist in navigating complex environments safely and reliably, which is crucial for elderly care and medical facilities [32]. - The technology is also applicable in augmented and virtual reality, where virtual agents need to navigate physical spaces effectively [33]. - In industrial and hazardous environments, Nav-R1's robustness and generalization capabilities make it suitable for executing tasks in unknown or dangerous settings [34].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].
390亿美元,全球具身智能第一估值来了!英伟达持续加注中
具身智能之心· 2025-09-19 00:03
Core Insights - Figure has successfully raised over $1 billion in Series C funding, achieving a post-money valuation of $39 billion, setting a record in the field of embodied intelligence [3][33] - The funding round was led by Parkway Venture Capital, with participation from major investors including Nvidia, Brookfield Asset Management, and Intel Capital [5] - The company aims to expand its humanoid robot manufacturing and deployment in both household and commercial settings [10][22] Funding and Valuation - The Series C funding raised over $1 billion, resulting in a valuation of $39 billion, the highest in the current publicly available information on the embodied intelligence sector [3][33] - Previous funding rounds include a $675 million Series B round in February 2024, which valued the company at $2.6 billion [23] Technological Advancements - Figure has developed the Helix architecture, a visual-language-action model that allows robots to perceive, understand, and act like humans [18][22] - The Helix system consists of two components that communicate with each other, enabling the robot to perform various tasks using a unified model [19] - The latest funding will support the development of the Helix system, including building next-generation GPU infrastructure and advanced data collection projects [22][21] Recruitment and Expansion - Figure is actively recruiting across 13 areas, including AI-Helix and BotQ manufacturing, to support its growth and technological advancements [6] - The company is expanding its humanoid robot production capabilities to assist with household chores and commercial labor tasks [10][22] Market Position - Figure has positioned itself as a leading player in the humanoid robotics sector, especially after parting ways with OpenAI and focusing on developing its proprietary AI models [29][31] - The company has quickly gained attention in the market, with significant advancements in technology and funding, making it a notable competitor in the embodied intelligence landscape [32][33]
VLA的论文占据具身方向的近一半......
具身智能之心· 2025-09-18 04:00
从今年各个机器人与AI顶会来看,VLA及其相关衍生方向,占据了近一半的具身产出。特别是长程操作、 泛化、少样本、VLA+RL、人形相关。 想象一下,如果能通过语言下达指令,并且丝滑执行任何你想要的动作,是一件多么幸福的事情!如果能 长时间连续动作完成,将会非常方便。下面给大家介绍下VLA到底是啥? VLA打破了传统方法的单任务局限,使得机器人能够在多样化的场景中自主决策,灵活应对未见过的环 境,广泛应用于制造业、物流和家庭服务等领域。此外,VLA模型已成为研究热点,推动了多个前沿项目 的发展,如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA,这些研究促进了学术界与工业界的合作。 其适应性体现在能够应用于机械臂、四足机器人和人形机器人等多种平台,为各类智能机器人的发展提供 了广泛的潜力和实际应用价值,成为智能机器人领域的关键驱动力。 从产业角度看,国内外具身智能领域正处于蓬勃发展阶段,Unitree、智元、星海图、银河通用、逐际动力 等团队从实验室走向商业化,华为、京东、腾讯等科技巨头也积极布局,与国外Tesla、Figure AI等公司正 在一起推动这一领域的发展。 本课程聚焦于智能体如 ...
10000台,特斯拉Optimus Gen3刚刚拿下了全球最大订单!
具身智能之心· 2025-09-18 01:23
点击下方 卡片 ,关注" 具身智能之心 "公众号 9月12日,马斯克个人掏出10亿美元买入特斯拉股票,背后还绑着一个"疯狂薪酬计划"——如果他能 在未来交付100万台Optimus,就能解锁价值1.2万亿美元的股票奖励。 参考:机器人领航员、硬核调研 >> 点击进入→ 具身智能之心 技术交流群 10000台!特斯拉机器人斩获史上最大订单。这回真的爆了!特斯拉Optimus Gen3刚刚拿下了全球首 个外部订单——整整1万台!采购方PharmAGRI制药公司计划用这些机器人实现药物生产流程的自动 化,确保精准控制和高效率。 写在最后 更多具身领域与产业信息,欢迎加入我们的具身社区,和近2000名成员,200家具身机器人公司与机 构交流。 本文只做学术分享,如有侵权,联系删文 Optimus Gen3+ 在特斯拉工厂里已经验证过,效率比人工高30%,未来成本甚至有望压到2万美元以 下。也就是说,它不只"能干",还"便宜得惊人"。 ...
具身智能能力狂飙,安全却滞后?首个安全可信EAI框架与路线图!
具身智能之心· 2025-09-18 00:03
编辑丨机器之心 为了弥合这一关键差距, 上海人工智能实验室和华东师范大学的研究团队 撰写了这篇 Position Paper,旨在为「安全可信具身智能」这一新兴领域建立一个系统性 的理论框架与发展蓝图,推动领域从碎片化研究走向整体性构建。 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 近年来,以人形机器人、自动驾驶为代表的具身人工智能(Embodied Artificial Intelligence, EAI)正以前所未有的速度发展,从数字世界大步迈向物理现实。然而, 当一次错误的风险不再是屏幕上的一行乱码,而是可能导致真实世界中的物理伤害时,一个紧迫的问题摆在了我们面前: 如何确保这些日益强大的具身智能体是安全且值得信赖的? 现实情况是,能力与安全,这两条本应齐头并进的轨道,正出现令人担忧的「脱钩」。如图 1 所示,业界的基础模型在能力上飞速迭代,却普遍忽视了与之匹配的 安全对齐机制;而学术界虽有探索,但研究成果往往零散、不成体系。 图 1: EA ...
TrajBooster:首个全身人行操作VLA方案,跨构型解决数据难题(代码全开源)
具身智能之心· 2025-09-18 00:03
Core Insights - The article discusses the TrajBooster framework, which aims to enhance the capabilities of humanoid robots by utilizing a trajectory-centric learning approach, enabling them to perform complex household tasks with minimal training data [2][40]. Group 1: Research Background and Challenges - The development of humanoid robots faces two main challenges: the unique difficulties of maintaining dynamic balance while performing upper body tasks, and the scarcity of high-quality training data necessary for effective VLA model training [3][4]. - Existing methods rely on expensive equipment and expert operators, resulting in limited data sets that do not adequately cover the diverse action spaces required for humanoid robots [4]. Group 2: TrajBooster Framework - TrajBooster utilizes a three-step process: real trajectory extraction, simulation redirection, and dual-stage fine-tuning, allowing for the conversion of extensive wheeled robot data into effective training resources for bipedal robots [5][40]. - The framework significantly reduces the dependency on costly data from similar robot types, enabling zero-shot skill transfer and improving the robustness and generalization of the VLA models [2][5]. Group 3: Methodology - The framework begins with extracting real trajectories from the Agibot-World Beta dataset, which contains over 1 million real robot trajectories, and then maps this data to the Unitree G1 robot's operational space [7][9]. - A hierarchical composite model is employed to decouple control into upper and lower body systems, enhancing the efficiency of whole-body manipulation [11][12]. Group 4: Experimental Results - TrajBooster demonstrated superior performance in various tasks, achieving the lowest position error (2.851 cm) and rotation error (6.231 degrees) in mobile scenarios, validating the advantages of hierarchical training and coordinated online DAgger [27]. - The framework's ability to adapt to unseen tasks was evidenced by its success in a "water transfer" task, which was not included in the training data, showcasing improved generalization capabilities [39][40]. Group 5: Limitations and Future Directions - The current implementation is limited by the precision of the Unitree Dex-3 hand, which only supports simple grasping tasks; future work will focus on integrating dexterous hands with tactile sensing for more complex manipulations [41]. - There is a need to address the visual input discrepancies and expand the framework to include mobile manipulation data, as the current research is primarily focused on static tasks [43][44].
3D/4D World Model(WM)近期发展的总结和思考
具身智能之心· 2025-09-18 00:03
Core Viewpoint - The article discusses the current state and future directions of embodied intelligence, particularly focusing on the development and optimization of 3D/4D world models, emphasizing the importance of data collection and utilization in training effective models [3][4]. Group 1: Current Research Focus - The majority of work in the first three quarters of the year has centered on data collection and utilization, specifically how to efficiently use video example data to train robust foundational models [3]. - There is a growing concern regarding the clarity and reliability of data collection methods, prompting a reevaluation of the approaches to data analysis and the development of 3D/4D world models [3][4]. Group 2: Approaches to 3D/4D World Models - Two main research approaches have emerged in the development of 3D/4D world models: implicit and explicit methods, each revealing limitations that have yet to be effectively addressed [4][7]. - Current research on explicit world models remains focused on static 3D scenes, with methods for constructing and enriching these scenes being well-established and ready for practical application [5]. Group 3: Challenges and Limitations - The existing methods for 3D geometry modeling, such as 3DGS, face challenges in surface optimization, leading to rough results despite attempts to improve through structured modifications [8]. - Issues related to lighting and surface quality in 3D reconstruction are being gradually optimized, but the overall design still faces significant hurdles, particularly in cross-physics simulator deployment [9]. Group 4: Future Directions - The article anticipates that future work will increasingly integrate physical knowledge into 3D/4D models, aiming to enhance the direct physical understanding and reasoning capabilities of models [15]. - There is an expectation for the emergence of new research that combines simulation and video generation to address existing gaps in the understanding of physical interactions and motion [14][15].
清华联手理想提出LightVLA:剪掉冗余token,推理速度提升38%!
具身智能之心· 2025-09-18 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Titong Jiang等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 研究背景与核心挑战 视觉-语言-动作(VLA)模型是机器人 embodied intelligence 的核心技术,能将视觉信息和语言指令直接转化为可执行的机器人动作,在复杂操作(如物体抓取、 长程规划)中展现出强大能力。但这类模型存在一个关键瓶颈: 视觉Token的计算冗余 ——VLA模型通常需要处理数百个视觉Token(如OpenVLA-OFT使用512 个),而注意力机制的计算复杂度随Token数量呈平方增长,导致模型在边缘设备(如家用机器人、自动驾驶)上难以实现实时部署。 现有优化方案存在明显局限: 1. 效率与性能的trade-off :多数Token剪枝方法(如EfficientVLA、VLA-Cache)为提升效率会固定保留Token数量,导致关键语义信息丢失,最终牺牲性能; 2. VLM剪枝方案不 ...
具身智能之心企业合作邀请函
具身智能之心· 2025-09-17 03:14
联系方式 添加商务微信oooops-life做进一步沟通。 具身智能之心是具身智能领域的优秀创作和宣传的媒体平台。近一年内,我们和多家具身公司签订长期合作事 项,包括但不限于产品宣传、品牌宣传、硬件代理、联合运营、教育产品研发等。 随着团队的不断扩大,我们期望在上述业务上和更多优秀的公司建立联系,推动具身领域的快速发展。欢迎有 相关业务需求的公司或团队联系我们。 我们期待进一步的合作!!! ...