具身智能之心
Search documents
LightVLA:你的VLA真的可以又强又快!
具身智能之心· 2025-10-14 00:02
Core Insights - LightVLA is an innovative differentiable token pruning framework designed for vision-language-action (VLA) models, enabling them to focus on critical visual information while significantly reducing computational costs and improving performance [2][8]. Group 1: LightVLA Overview - LightVLA addresses the computational challenges faced by VLA models on resource-constrained platforms by implementing adaptive and performance-driven visual token pruning [2]. - The framework generates dynamic queries to assess the importance of visual tokens and employs Gumbel softmax for differentiable token selection, allowing for the retention of the most informative tokens while discarding irrelevant ones [2][3]. Group 2: Performance Metrics - Experimental results indicate that LightVLA outperforms various VLA models and existing token pruning methods across multiple tasks in the LIBERO benchmark, achieving a 59.1% reduction in computational load (FLOPs) and a 38.2% decrease in latency, while increasing task success rate by 2.6% [3][8]. - The success rate achieved by LightVLA is reported to be 97.4%, marking a significant improvement in efficiency and performance [8]. Group 3: Research Significance - LightVLA is the first framework to apply adaptive visual token pruning to VLA tasks while simultaneously optimizing efficiency and performance, representing a critical advancement towards efficient, powerful, and practical real-time robotic systems [3].
一些项目合作,待遇open~
具身智能之心· 2025-10-13 04:02
Core Insights - The company aims to empower partners and small businesses in various areas such as solution development, data collection, technology upgrades, and corporate training [1] - The company is inviting global practitioners in the embodied intelligence field to collaborate in technical services, training, course development, and research guidance [1] Company Overview - The company, "Embodied Intelligence Heart," is a leading creative platform in the domestic embodied intelligence sector, offering services that include online education, offline training, corporate consulting, promotional services, hardware R&D, and solution provision [3] Main Directions - The focus areas include but are not limited to: VLA, VLN, Diffusion Policy, Reinforcement Learning, VLA+RL, remote operation, motion capture, sim2real, multimodal large models, simulation, motion control, end-to-end systems, and 3D perception [5] Job Description - The positions are primarily aimed at embodied course development, solution R&D, hardware development, and training collaboration, targeting B-end clients such as enterprises, universities, and research institutes, as well as C-end clients including students and job seekers [6] Contact Information - Interested parties can add WeChat oooops-life for further inquiries [7]
你的第一套具身科研平台来了,高性价比+代码开发方便
具身智能之心· 2025-10-13 04:02
Core Viewpoint - Imeta-Y1 is a lightweight, cost-effective robotic arm designed specifically for beginners and researchers in the field of embodied intelligence, enabling low-cost and efficient algorithm validation and project development [2][5]. Group 1: Product Features - The robotic arm offers a complete open-source toolchain and code examples, facilitating a seamless process from data collection to model deployment [3][17]. - It supports dual-language interfaces in Python and C++, allowing users to quickly get started regardless of their programming background [3][18]. - Compatibility with ROS1 and ROS2 is provided, along with URDF models for smooth transitions between simulation and real-world applications [3][19]. - The arm features high-precision motion control, low power consumption, and an open hardware architecture, supporting seamless integration from simulation to real machine [5][6]. Group 2: Technical Specifications - The robotic arm has a weight of 4.2 kg, a rated load of 3 kg, and 6 degrees of freedom, with a working radius of 612.5 mm and a repeatability precision of ±0.1 mm [8][19]. - It operates at a supply voltage of 24V and communicates via CAN, with external interfaces for power and CAN connections [8][19]. - The arm's joint motion ranges and maximum speeds are specified, ensuring versatility in various applications [8][19]. Group 3: Development and Support - The company provides a comprehensive open-source SDK, including drivers, API interfaces, sample code, and documentation, supporting rapid application development [26][32]. - Users can leverage multi-modal data fusion capabilities, compatible with mainstream frameworks like TensorFlow and PyTorch, to implement end-to-end intelligent algorithms [32][29]. - The company offers timely after-sales support, with a 24-hour response guarantee, and bulk purchase discounts for educational and project development purposes [19][44].
统一高效VLA+RL训练平台RLinf-VLA!
具身智能之心· 2025-10-13 00:02
Core Insights - The article discusses the launch of RLinf, a large-scale reinforcement learning framework aimed at embodied intelligence, highlighting its flexibility and efficiency in system design [2][3]. Group 1: System Design - RLinf-VLA provides a unified and efficient platform for VLA+RL research, achieving a throughput improvement of 2.27 times compared to baseline platforms [2][5]. - It supports multiple simulators (LIBERO and ManiSkill), allowing for integrated training across different environments [5]. - The system allows for easy switching between various VLA models and RL algorithms, reducing the workload for model adaptation [5]. Group 2: Performance Overview - A single unified model achieved a success rate of 98.11% across 130 tasks in LIBERO and 97.66% in 25 pick & place tasks in ManiSkill [6]. - The RLinf-VLA framework demonstrates superior zero-shot generalization capabilities when deployed on real robotic systems compared to strategies trained with SFT [6][45]. Group 3: Algorithm Design - The framework introduces several design optimizations, including lightweight critics and trajectory length normalization, which significantly enhance training efficiency [9][21][25]. - It supports three levels of output granularity (token-level, action-level, chunk-level) for both advantage and log-probability calculations, allowing for flexible training strategies [12][14][22]. Group 4: Experimental Results - In multi-task experiments, the OpenVLA model showed performance improvements of 45% to 70% over baseline models in ManiSkill tasks [31]. - The RLinf-VLA framework demonstrated high efficiency in training, with significant reductions in training time compared to baseline methods [43][44]. Group 5: Real-World Application - The RLinf-VLA framework was successfully deployed on the Franka Panda robotic arm, showcasing its ability to generalize from simulation to real-world tasks [45].
多机器人协作不再「慢半拍」!ReCA破解具身智能落地效率瓶颈
具身智能之心· 2025-10-13 00:02
Core Insights - The article discusses the limitations of current embodied intelligent systems, highlighting the need for real-time and efficient task completion rather than just successful task execution [2][5][33] Group 1: Current Challenges in Embodied Intelligence - Current robots exhibit significant delays and inefficiencies, often completing tasks much slower than humans, which hinders their integration into daily life [2][4] - Three major performance bottlenecks are identified: high planning and communication delays, limited scalability, and sensitivity of low-level execution [7][9][11] Group 2: ReCA Framework - The ReCA framework aims to enhance the efficiency and scalability of cooperative embodied systems through a cross-layer collaborative design that integrates algorithms, systems, and hardware [13][33] - Key innovations include localized model processing to eliminate network delays, multi-step execution planning to reduce API calls, and a dual memory structure for improved task management [15][20][21] Group 3: Performance Improvements - ReCA demonstrates a 5-10 times speed increase in task completion while improving success rates by an average of 4.3% [25][28] - Even in large-scale scenarios with 12 agents, ReCA maintains a high success rate of 80-90%, compared to below 70% for baseline systems [29] Group 4: Future Implications - ReCA sets a foundation for the future of embodied intelligence, emphasizing the transition from merely functional robots to those that are efficient and effective [33] - The framework's approach to soft-hardware collaboration could redefine the design of future intelligent systems, enabling more complex and capable robotic applications in various fields [34]
有臂有手还带主动视觉?全球首款桌面级灵巧手机械臂BeingBeyond D1震撼发布
具身智能之心· 2025-10-13 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 在具身智能蓬勃发展的今天,高校与科研机构对兼具性能与性价比的机器人平台需求愈发迫切。然而,传 统工业机械臂不仅价格高昂、动辄数十万元,还面临开发复杂、维护困难、算法与模型配套缺失等诸多瓶 颈,严重限制了科研创新的落地效率。 为打破这一局限, BeingBeyond正式发布全球首款桌面级灵巧手机械臂——D1 。它将"机械臂 + 灵巧手 + 主动视觉系统"三大核心功能集于一体,高性价比价格 ,打造高集成度的一体化平台 ,真正实现具身智能 的即刻上手。 D1不仅拥有强大的硬件能力,更搭载自研VLA大模型 Being-H0 ,覆盖 从数据采集、模型训练到部署落地 的完整链条 ,开箱即用,开源灵活,为科研人员提供一站式、低门槛的具身智能研究平台。 灵活模块化设计,功能强大、扩展无限 D1机械臂,不止是"灵巧",更是为科研量身打造的全能平台。它采用高度模块化架构,拥有19个自由度 (6臂 + 2头 + 11手),其中14个为主动自由度,5个为被动联动自由度,真正实现从感知到操控的全流程覆 盖。 模块解耦、接口标准,随装随用、随拆随换,是科研与教学的理想选择。 机械臂模块 ...
宇树科技2025发布的R1人形机器人荣登美国《时代周刊》2025年度最佳发明
具身智能之心· 2025-10-11 16:02
Core Insights - Yushu Technology's R1 humanoid robot was recognized as one of the best inventions of 2025 by TIME magazine, marking a significant advancement in humanoid robotics [4] - Wang Xingxing, the founder of Yushu Technology, was named one of the 100 most influential people in AI for 2025, and the company itself was listed among the 100 most influential global enterprises [4] Group 1: Community and Resources - The "Embodied Intelligence Heart" knowledge community is the first of its kind in China, focusing on various aspects of embodied intelligence, including datasets, simulation platforms, and advanced learning models [8] - The community offers over 30 learning paths and nearly 60 datasets related to embodied intelligence, providing a comprehensive resource for developers and researchers [8] Group 2: Academic Support - The community provides extensive academic support for research papers in the field of embodied intelligence, including guidance for top conferences and journals [6] - Services include assistance for graduation theses and competition preparation, catering to various academic needs [6]
光会“看”和“说”还不够,还得会“算”!Tool-Use+强化学习:TIGeR让机器人实现精准操作
具身智能之心· 2025-10-11 16:02
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in accurately interpreting and executing spatial commands in robotics, emphasizing the need for precise geometric reasoning and tool integration [2][5]. Group 1: TIGeR Framework - The Tool-Integrated Geometric Reasoning (TIGeR) framework enhances VLMs by integrating tool usage and reinforcement learning to improve their ability to perform precise calculations in a three-dimensional space [2][6]. - TIGeR allows AI models to transition from qualitative perception to quantitative computation, addressing the core pain points of existing VLMs [2][7]. Group 2: Advantages of TIGeR - TIGeR provides precise localization by integrating depth information and camera parameters, enabling the accurate conversion of commands like "10 centimeters above" into three-dimensional coordinates [7]. - The framework supports multi-view unified reasoning, allowing information from various perspectives to be merged and reasoned within a consistent world coordinate system [7]. - The model's reasoning process is transparent, making it easier to debug and optimize by clearly showing the tools used, parameters input, and results obtained [7]. Group 3: Training Process - The training of TIGeR involves a two-phase process: first, supervised learning to teach basic tool usage and reasoning chains, followed by reinforcement learning to refine the model's tool usage skills through a hierarchical reward mechanism [8][10]. - The hierarchical reward mechanism evaluates not only the correctness of the final answer but also the accuracy of the process, including tool selection and parameter precision [8]. Group 4: Data Utilization - The TIGeR-300K dataset, consisting of 300,000 samples, was created to train the model in solving geometric problems, ensuring both accuracy and diversity in the tasks covered [10][13]. - The dataset construction involved template-based generation and large model rewriting to enhance generalization and flexibility, ensuring the model can handle complex real-world instructions [13]. Group 5: Performance Metrics - TIGeR outperforms other leading VLMs in spatial understanding benchmarks, achieving scores such as 93.85 in 2D-Rel and 96.33 in 3D-Depth [10][14]. - The model's performance in various spatial reasoning tasks demonstrates its capability to execute operations that require precise three-dimensional positioning, which other models struggle to achieve [16].
对刚入门具身的同学来说,试错成本确实有点高......
具身智能之心· 2025-10-11 16:02
Core Insights - The article emphasizes the importance of creating a comprehensive community for embodied intelligence and robotics, catering to both beginners and advanced learners in the field [1][3][14]. Group 1: Community Development - The community aims to provide a platform for knowledge sharing, job referrals, and academic guidance, addressing the high trial-and-error costs faced by newcomers [1][6]. - It has established a closed-loop system across various domains, including industry, academia, and job opportunities, facilitating real-time problem-solving and research updates [3][20]. - The community has invited industry experts to engage in discussions and answer questions, enhancing the learning experience [4][12]. Group 2: Educational Resources - A compilation of over 30 technical routes has been created to assist users in finding benchmarks, reviews, and learning pathways, significantly reducing search time [4][10]. - The community offers a variety of learning paths for beginners, including technical stacks and routes tailored for newcomers [8][15]. - For those already engaged in research, valuable industry frameworks and project proposals are provided to further their knowledge [10][20]. Group 3: Job Opportunities - The community has established a job referral mechanism with multiple leading companies in the embodied intelligence sector, ensuring timely resume submissions to desired employers [6][7]. - Job postings and recruitment opportunities from top companies in the field are shared regularly, helping members navigate their career paths [7][20]. Group 4: Research and Development - The community has compiled a list of over 40 open-source projects and nearly 60 datasets related to embodied intelligence, aiding researchers in their work [14][31][37]. - Various research directions and notable laboratories in the field have been summarized for members considering further academic pursuits [19][22]. - The community provides insights into the latest advancements in embodied intelligence, including industrial applications and research reports [24][20].
港中文(深圳)冀晓强教授实验室全奖招收博士/博士后
具身智能之心· 2025-10-11 16:02
Core Viewpoint - The article emphasizes the opportunities in the field of embodied intelligence, highlighting the need for skilled researchers and the benefits of joining a collaborative academic environment focused on artificial intelligence and robotics. Research Content - The research focuses on interdisciplinary areas such as artificial intelligence control theory, embodied intelligence control, and reinforcement learning control [11]. - Candidates are expected to have a deep understanding and interest in core research directions, with the ability to conduct theoretical innovation and experimental validation independently [2]. Candidate Requirements - **Postdoctoral Researchers**: Must hold a PhD in relevant fields from prestigious institutions, with a strong publication record in top-tier journals or conferences [2]. - **PhD Candidates**: Should possess a master's degree or an outstanding bachelor's degree in related disciplines [3]. - **Master's Candidates**: Expected to have a bachelor's degree in relevant fields from recognized universities [5]. - Candidates should demonstrate a solid foundation in mathematics and programming, with a keen interest in control theory, AI, and robotics [4]. Skills and Experience - Familiarity with deep learning and AI models such as CLIP, BLIP, and LLaVA is essential [6]. - Experience with classic models like VAE, Transformer, and BERT, along with strong algorithm design and programming skills, particularly in high-performance languages like C++ or Rust, is preferred [7][8]. - Practical experience in training, tuning, and deploying deep learning models is highly valued [12]. Mentor Introduction - Professor Ji Xiaoqiang, with a PhD from Columbia University, leads the AI Control and Decision Laboratory at The Chinese University of Hong Kong (Shenzhen) [13]. - His research focuses on intelligent control systems, and he has published over 50 papers in top international journals and conferences [13]. Benefits and Compensation - **Postdoctoral Researchers**: Eligible for annual pre-tax living allowances of 210,000 CNY, with additional subsidies and potential for significant research funding [14]. - **PhD Candidates**: Full or half scholarships available, with top candidates eligible for a principal's scholarship of 180,000 CNY per year [15]. - **Master's Candidates**: Opportunities for transitioning to PhD programs and additional living stipends for outstanding candidates [16]. Application Materials - Applicants must submit a complete CV in both Chinese and English, along with any published papers and evidence of research capabilities [19].