Workflow
具身智能之心
icon
Search documents
重磅!清华×生数发布机器人通用大模型Vidar,高效泛化复杂物理操作达SOTA水平
具身智能之心· 2025-07-27 09:37
Core Insights - A revolutionary breakthrough in embodied intelligence is marked by the collaboration between Tsinghua University and Shengshu Technology, resulting in the Vidar model, which enables the transition from virtual to real-world physical execution through few-shot generalization capabilities [2][4]. Group 1: Vidar Model Overview - Vidar is the world's first multi-view embodied base model that achieves systematic migration of video understanding capabilities to physical decision-making, significantly reducing the data requirements for robot generalization [4][8]. - The model can generalize to new robot bodies using only 20 minutes of real machine data, which is about 1/80 of the leading industry standard RDT and 1/1200 of π0.5, thus lowering the data threshold for large-scale generalization [4][8]. Group 2: Data Pyramid and Training Methodology - Vidar's architecture utilizes a three-tier data pyramid consisting of vast general video data, medium-scale embodied video data, and a small amount of robot-specific data, allowing for effective training and generalization [8][12]. - The unified observation space method integrates multi-view video stitching, enabling a comprehensive dialogue between massive internet data and specific robot tasks, thus achieving true multi-dimensional integration [14]. Group 3: Performance Metrics and Results - The Vidu model, after embodied pre-training, showed significant improvements in subject consistency, background consistency, and imaging quality, which supports few-shot generalization [13]. - Vidar achieved superior success rates in 16 common robotic tasks, particularly excelling in generalization capabilities for unseen tasks and backgrounds, demonstrating strong adherence to task instructions [27][29]. Group 4: Automation and Efficiency - The introduction of the Automated Task-Agnostic Random Actions (ATARA) method allows for the automated collection of task-agnostic action data, requiring only 10 hours of automated data collection to achieve full action space generalization for new robots [16]. - The AnyPos model, which utilizes high-precision prediction techniques, significantly enhances action execution accuracy, achieving a success rate close to 100% in real-world task trajectory replay tests, surpassing baseline performance by 33-44% [18][22].
群核科技发布3D高斯语义数据集,给机器人装上“空间大脑”
具身智能之心· 2025-07-26 10:45
Core Viewpoint - The release of the InteriorGS dataset by Qunhe Technology aims to enhance spatial perception capabilities for robots and AI agents, marking a significant advancement in the field of AI training [2][5]. Group 1: InteriorGS Dataset - The InteriorGS dataset includes 1,000 3D Gaussian semantic scenes covering over 80 types of indoor environments, providing AI agents with a "spatial brain" to improve their environmental understanding and interaction capabilities [2][5]. - This dataset is claimed to be the world's first large-scale 3D dataset suitable for the free movement of intelligent agents [2][5]. Group 2: Technological Advancements - Qunhe Technology has successfully applied 3D Gaussian technology in various fields, including cultural heritage preservation and spatial design, with notable projects such as the digital restoration of a 60-year-old photo studio in Hangzhou [4][6]. - The InteriorGS dataset leverages the efficiency and cost advantages of 3D Gaussian technology in scene reconstruction, combined with the company's self-developed spatial large model capabilities, resulting in a dataset that balances realism and semantic understanding [5][6]. Group 3: Industry Impact and Collaboration - Qunhe Technology's SpatialVerse platform has accumulated a vast amount of interactive 3D data and a set of physical simulation tools, aiming to become the "ImageNet" of the spatial intelligence field, similar to how ImageNet propelled the explosion of computer vision [7]. - The company has formed partnerships with several embodied intelligence firms, including Zhiyuan Robotics and Galaxy General, indicating its growing influence in the industry [7]. Group 4: Future Directions - The company emphasizes the importance of the Sim2Real paradigm as the most efficient training method for embodied intelligence, aiming to promote a "real-virtual-real" framework in collaboration with industry players [8].
具身智能之心求职交流群来啦!!!
具身智能之心· 2025-07-26 10:45
Group 1 - The company has officially launched a job-seeking community focused on the embodied industry, responding to requests from fans [1] - The community will primarily discuss topics related to the embodied industry, including companies, product development, and job opportunities [1] - Members are encouraged to join to connect with industry peers and stay updated on industry developments [1]
开源!智元机器人正式发布首个具身智能操作系统参考框架:“智元灵渠OS”
具身智能之心· 2025-07-26 10:45
Core Insights - The article highlights the launch of the "Lingqu OS" open-source initiative by Zhiyuan Robotics at the WAIC 2025, aiming to build an open ecosystem for embodied intelligence [1][3][4] Group 1: Event Overview - The WAIC 2025 took place on July 26 at the Shanghai Expo Center, focusing on the themes of technology, cooperation, and inclusivity in AI development [1] - Zhiyuan Robotics' CTO, Peng Zhihui, represented embodied intelligence and showcased the Lingxi X2 humanoid robot, emphasizing the transition from tools to partners in human-robot collaboration [2][3] Group 2: Human-Robot Interaction - The dialogue between Zhiyuan Robotics and Lingxi X2 addressed critical questions about the role of robots as tools or partners, highlighting the importance of understanding in human-robot collaboration [2][3] - Lingxi X2 demonstrated advanced capabilities with smooth movements and high-quality autonomous responses, showcasing the potential for deeper human-robot interactions [2] Group 3: Open-Source Initiative - The "Lingqu OS" open-source plan aims to enhance the integration of current robotic systems and promote breakthroughs in new technologies for embodied intelligence [3][4] - The initiative will adopt a "layered open-source, co-build and share" model, providing a stable and efficient framework for distributed real-time communication and hardware abstraction [4] - The open-source plan is set to begin in Q4 of this year, with the goal of fostering collaboration within the industry to overcome challenges in intelligent enhancement and cloud-edge integration [4] Group 4: Industry Impact - Zhiyuan Robotics aims to lead the industry towards collaborative development and commercial scalability for embodied intelligence, leveraging the WAIC 2025 platform to showcase its capabilities [5]
弗吉尼亚大学提出Moving Out:实现物理世界人机无缝协作!
具身智能之心· 2025-07-25 07:11
Core Insights - The article emphasizes the need for a benchmark that simulates physical interactions and diverse collaboration scenarios to enhance the adaptability and generalization capabilities of intelligent agents in human-robot collaboration [3][6]. Group 1: Key Innovations - Introduction of the Moving Out benchmark, a physically-grounded human-robot collaboration environment that simulates various collaborative modes influenced by physical properties and constraints [8]. - Design of two evaluation tasks aimed at assessing the adaptability of intelligent agents to human behavioral diversity and their ability to generalize to unknown physical properties [10][11]. - Proposal of the BASS method, which enhances collaboration performance in physical environments through behavior augmentation, simulation, and action selection [13][14]. Group 2: Experimental Results - The BASS method demonstrated superior performance in both AI-AI and human-robot collaboration compared to baseline methods such as MLP, GRU, and Diffusion Policy [15][18]. - Evaluation metrics included Task Completion Rate (TCR), Normalized Final Distance (NFD), Waiting Time (WT), and Action Consistency (AC), with BASS showing significant improvements in these areas [16][17]. - User studies indicated that BASS significantly outperformed Diffusion Policy in terms of usefulness and physical understanding, reducing issues like object handover failures and delays in assistance [18]. Group 3: Related Work - Existing human-AI collaboration research has limitations, and Moving Out addresses these by providing a physically-grounded environment, diverse collaboration modes, and continuous state-action spaces [19][21]. - Previous works often focused on discrete environments with limited physical attributes or lacked independent task division, highlighting the need for more comprehensive evaluation methods that consider physical interactions [21].
正式开课啦!具身智能目标导航算法与实战教程来了~
具身智能之心· 2025-07-25 07:11
Core Viewpoint - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation systems [2][3]. Group 1: Technology Overview - Embodied navigation is a core area of embodied intelligence, relying on three technical pillars: language understanding, environmental perception, and path planning [2]. - Goal-Oriented Navigation requires robots to explore and plan paths in unfamiliar 3D environments using only goal descriptions such as coordinates, images, or natural language [2]. - The technology has been industrialized across various verticals, including delivery, healthcare, and hospitality, with companies like Meituan and Aethon deploying autonomous delivery robots [3]. Group 2: Technological Evolution - The evolution of Goal-Oriented Navigation can be categorized into three generations: 1. First Generation: End-to-end methods focusing on reinforcement learning and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5]. 2. Second Generation: Modular methods that explicitly construct semantic maps, breaking tasks into exploration and goal localization [5]. 3. Third Generation: Integration of large language models (LLMs) and visual language models (VLMs) to enhance knowledge reasoning and open-vocabulary target matching [7]. Group 3: Challenges in Learning - Learning Goal-Oriented Navigation is challenging due to the need for knowledge across multiple domains, including natural language processing, computer vision, and reinforcement learning [9]. - The fragmented nature of knowledge and the abundance of literature can overwhelm beginners, making it difficult to extract frameworks and understand development trends [9]. Group 4: Course Offering - A new course has been developed to address the challenges in learning Goal-Oriented Navigation, focusing on quick entry, building a domain framework, and combining theory with practice [10][11][12]. - The course includes a comprehensive curriculum covering semantic navigation frameworks, Habitat simulation ecology, end-to-end navigation methodologies, modular navigation architectures, and LLM/VLM-driven navigation systems [13][16][17][19][20][23].
开发者福利!一台机器搞定人形运控、强化学习、VLN/VLA
具身智能之心· 2025-07-25 07:11
Core Viewpoint - TRON1 is a cutting-edge research platform designed for educational and scientific purposes, featuring a modular design that supports multiple robotic forms and algorithms, catering to diverse research needs [1]. Group 1: Product Features - TRON1 supports humanoid gait development and is ideal for reinforcement learning research, with the EDU version allowing for external camera integration for navigation and perception tasks [6][24]. - The platform supports C++ and Python for development, making it accessible for users without C++ knowledge [6]. - It features a "three-in-one" modular design that allows for quick switching between bipedal, point-foot, and wheeled locomotion [1]. Group 2: Technical Specifications - The platform is compatible with major simulation platforms like NVIDIA Isaac, Mujoco, and Gazebo, enhancing validation efficiency and lowering research barriers [9]. - TRON1 can be equipped with a robotic arm for various mobile operation tasks, supporting both single-arm and dual-foot configurations [11]. - It integrates LiDAR and depth cameras for 3D mapping, localization, navigation, and dynamic obstacle avoidance [13]. Group 3: Hardware and Performance - The TRON1 standard version and EDU version share similar mechanical parameters, with a weight limit of approximately 10 kg and a maximum speed of 5 m/s for wheeled locomotion [26]. - The platform is powered by an 8-core Arm Cortex-A78AE CPU and features NVIDIA Ampere architecture GPU with AI computing power of 157 TOPS (sparse) and 78 TOPS (dense) [16][19]. - The battery supports a maximum power of 1000W, with a runtime of over 2 hours under rated conditions [26]. Group 4: User Support and Development - Comprehensive user manuals and development guides are provided, ensuring ease of use and support for new users [29][33]. - The platform offers a one-year after-sales service post-acceptance, with paid maintenance and parts support available thereafter [40].
准备扩大具身团队了,拉一些人搞点事.......
具身智能之心· 2025-07-25 07:11
Core Insights - The field of embodied intelligence is rapidly developing, with several star companies preparing for IPOs, indicating a growing market and investment opportunities [1] - Collaboration and communication within the industry are essential for overcoming early-stage challenges and fostering overall development [1] - The company aims to create a platform that gathers talent from the entire industry to promote progress [1] Group 1: Project Collaboration - The company is establishing research teams in major cities such as Beijing, Shanghai, Shenzhen, Guangzhou, Hangzhou, and Wuhan, seeking to recruit around 10 individuals per city [3] - Candidates are expected to have over 2 years of experience in embodied algorithms and robotics research [3] Group 2: Education and Consulting Services - The company invites industry experts to develop online courses and consulting services in the field of embodied intelligence [4] - Areas of focus include large models, multi-modal models, reinforcement learning, and robotics, among others [4][5] Group 3: Compensation and Recruitment - The company offers significant profit-sharing and resource sharing across the industry, with options for part-time or full-time positions [6] - A preference for candidates with a PhD or equivalent experience in research and development is noted [5]
NVIDIA最新!ThinkAct:复杂的具身任务中实现少样本适应、长时程规划
具身智能之心· 2025-07-24 09:53
Core Insights - The article introduces ThinkAct, a dual-system framework designed to enhance the reasoning capabilities of multi-modal large language models (MLLMs) in physical environments by connecting high-level reasoning with low-level action execution [4][9][12] - ThinkAct aims to address the limitations of existing VLA models that struggle with long-term planning and adapting to complex tasks by utilizing reinforced visual latent planning [4][6][9] Group 1: Framework and Methodology - ThinkAct employs a structured approach to VLA reasoning tasks, where the model receives visual observations and textual instructions to predict actions, effectively linking abstract planning with low-level control [12][21] - The framework utilizes reinforcement learning to enhance the reasoning capabilities of MLLMs, encouraging them to generate low-level actions after reasoning through the task [13][19] - A novel action-aligned visual feedback mechanism is introduced to capture long-term goals and encourage visual associations during the planning process [14][18] Group 2: Performance Evaluation - ThinkAct demonstrates superior performance in various robotic operation tasks, achieving a top success rate of 84.4% on the LIBERO benchmark, outperforming other models like DiT-Policy and CoT-VLA [25][26] - In the SimplerEnv evaluation, ThinkAct outperformed baseline action models by significant margins, achieving overall scores of 71.5%, 65.1%, and 43.8% across different settings [25] - The framework also excels in embodied reasoning tasks, showing advantages in long-term and multi-step planning capabilities, as evidenced by its performance on EgoPlan-Bench2 and RoboVQA benchmarks [26][27] Group 3: Qualitative Insights - The article provides qualitative examples illustrating ThinkAct's reasoning process and execution in tasks, showcasing its ability to decompose instructions into meaningful sub-goals and visualize planning trajectories [30][31] - The framework's reinforcement learning adjustments significantly enhance its reasoning capabilities, allowing it to better understand tasks and environments compared to cold-start models [31][32] Group 4: Adaptability and Error Correction - ThinkAct demonstrates effective few-shot adaptation capabilities, successfully generalizing to unseen environments and new skills with minimal demonstration samples [35][37] - The framework's ability to detect execution errors and perform ego correction is highlighted, showcasing its structured reasoning to reconsider tasks and generate corrective plans when faced with failures [37][38]
具身公司的两边倒!一边是大额融资,一边是招不到人.......
具身智能之心· 2025-07-24 09:53
太魔幻了!具身一边是海量岗位,一边是招不到人...... 最近星球里的同学来找我吐槽:峰哥,为什么很多具身公司明明很有钱,融资拿的根本花不完,岗位对外的也 多,但一直面试不发offer,他们对外一直说招不到人??? 作为完整经历过自驾发展周期的人来看,其实很简单。大家兜里有钱,但不敢轻易花钱了,保持着审慎的态 度,精打细算的细水长流。这个产业周期依然会很长,乱花钱、没有计划,死的会很快,洗牌也就这2-3年的 事情。 许多具身公司的产品(包括本体、算法、数据)都还不成熟,这一点我们在具身智能之心知识星球内详细分析 过。所以,有非常好的研究成果这批学者是各家公司争先招募的,比如人形机器人的稳定性、数据的scale、数 据的有效使用、泛化性等方向。底层技术突破的拐点还看不到,大家都想储备好干粮准备过寒冬, 对于求职 者来说,一方面需要自己技术过硬,另外一方面需要非常适配具身的研究方向。 而具身智能之心知识星球,作为国内最大的具身技术社区,一直在给行业和个人输送各类人才、产业学术信 息。 目前累积了国内外几乎所有主流具身公司和大多数知名研究机构。 如果您需要第一时间了解产业、求职 和行业痛点,欢迎加入我们。 一个认真 ...