具身智能之心
Search documents
有臂有手还带主动视觉?全球首款桌面级灵巧手机械臂BeingBeyond D1震撼发布
具身智能之心· 2025-10-13 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 在具身智能蓬勃发展的今天,高校与科研机构对兼具性能与性价比的机器人平台需求愈发迫切。然而,传 统工业机械臂不仅价格高昂、动辄数十万元,还面临开发复杂、维护困难、算法与模型配套缺失等诸多瓶 颈,严重限制了科研创新的落地效率。 为打破这一局限, BeingBeyond正式发布全球首款桌面级灵巧手机械臂——D1 。它将"机械臂 + 灵巧手 + 主动视觉系统"三大核心功能集于一体,高性价比价格 ,打造高集成度的一体化平台 ,真正实现具身智能 的即刻上手。 D1不仅拥有强大的硬件能力,更搭载自研VLA大模型 Being-H0 ,覆盖 从数据采集、模型训练到部署落地 的完整链条 ,开箱即用,开源灵活,为科研人员提供一站式、低门槛的具身智能研究平台。 灵活模块化设计,功能强大、扩展无限 D1机械臂,不止是"灵巧",更是为科研量身打造的全能平台。它采用高度模块化架构,拥有19个自由度 (6臂 + 2头 + 11手),其中14个为主动自由度,5个为被动联动自由度,真正实现从感知到操控的全流程覆 盖。 模块解耦、接口标准,随装随用、随拆随换,是科研与教学的理想选择。 机械臂模块 ...
宇树科技2025发布的R1人形机器人荣登美国《时代周刊》2025年度最佳发明
具身智能之心· 2025-10-11 16:02
Core Insights - Yushu Technology's R1 humanoid robot was recognized as one of the best inventions of 2025 by TIME magazine, marking a significant advancement in humanoid robotics [4] - Wang Xingxing, the founder of Yushu Technology, was named one of the 100 most influential people in AI for 2025, and the company itself was listed among the 100 most influential global enterprises [4] Group 1: Community and Resources - The "Embodied Intelligence Heart" knowledge community is the first of its kind in China, focusing on various aspects of embodied intelligence, including datasets, simulation platforms, and advanced learning models [8] - The community offers over 30 learning paths and nearly 60 datasets related to embodied intelligence, providing a comprehensive resource for developers and researchers [8] Group 2: Academic Support - The community provides extensive academic support for research papers in the field of embodied intelligence, including guidance for top conferences and journals [6] - Services include assistance for graduation theses and competition preparation, catering to various academic needs [6]
光会“看”和“说”还不够,还得会“算”!Tool-Use+强化学习:TIGeR让机器人实现精准操作
具身智能之心· 2025-10-11 16:02
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in accurately interpreting and executing spatial commands in robotics, emphasizing the need for precise geometric reasoning and tool integration [2][5]. Group 1: TIGeR Framework - The Tool-Integrated Geometric Reasoning (TIGeR) framework enhances VLMs by integrating tool usage and reinforcement learning to improve their ability to perform precise calculations in a three-dimensional space [2][6]. - TIGeR allows AI models to transition from qualitative perception to quantitative computation, addressing the core pain points of existing VLMs [2][7]. Group 2: Advantages of TIGeR - TIGeR provides precise localization by integrating depth information and camera parameters, enabling the accurate conversion of commands like "10 centimeters above" into three-dimensional coordinates [7]. - The framework supports multi-view unified reasoning, allowing information from various perspectives to be merged and reasoned within a consistent world coordinate system [7]. - The model's reasoning process is transparent, making it easier to debug and optimize by clearly showing the tools used, parameters input, and results obtained [7]. Group 3: Training Process - The training of TIGeR involves a two-phase process: first, supervised learning to teach basic tool usage and reasoning chains, followed by reinforcement learning to refine the model's tool usage skills through a hierarchical reward mechanism [8][10]. - The hierarchical reward mechanism evaluates not only the correctness of the final answer but also the accuracy of the process, including tool selection and parameter precision [8]. Group 4: Data Utilization - The TIGeR-300K dataset, consisting of 300,000 samples, was created to train the model in solving geometric problems, ensuring both accuracy and diversity in the tasks covered [10][13]. - The dataset construction involved template-based generation and large model rewriting to enhance generalization and flexibility, ensuring the model can handle complex real-world instructions [13]. Group 5: Performance Metrics - TIGeR outperforms other leading VLMs in spatial understanding benchmarks, achieving scores such as 93.85 in 2D-Rel and 96.33 in 3D-Depth [10][14]. - The model's performance in various spatial reasoning tasks demonstrates its capability to execute operations that require precise three-dimensional positioning, which other models struggle to achieve [16].
对刚入门具身的同学来说,试错成本确实有点高......
具身智能之心· 2025-10-11 16:02
Core Insights - The article emphasizes the importance of creating a comprehensive community for embodied intelligence and robotics, catering to both beginners and advanced learners in the field [1][3][14]. Group 1: Community Development - The community aims to provide a platform for knowledge sharing, job referrals, and academic guidance, addressing the high trial-and-error costs faced by newcomers [1][6]. - It has established a closed-loop system across various domains, including industry, academia, and job opportunities, facilitating real-time problem-solving and research updates [3][20]. - The community has invited industry experts to engage in discussions and answer questions, enhancing the learning experience [4][12]. Group 2: Educational Resources - A compilation of over 30 technical routes has been created to assist users in finding benchmarks, reviews, and learning pathways, significantly reducing search time [4][10]. - The community offers a variety of learning paths for beginners, including technical stacks and routes tailored for newcomers [8][15]. - For those already engaged in research, valuable industry frameworks and project proposals are provided to further their knowledge [10][20]. Group 3: Job Opportunities - The community has established a job referral mechanism with multiple leading companies in the embodied intelligence sector, ensuring timely resume submissions to desired employers [6][7]. - Job postings and recruitment opportunities from top companies in the field are shared regularly, helping members navigate their career paths [7][20]. Group 4: Research and Development - The community has compiled a list of over 40 open-source projects and nearly 60 datasets related to embodied intelligence, aiding researchers in their work [14][31][37]. - Various research directions and notable laboratories in the field have been summarized for members considering further academic pursuits [19][22]. - The community provides insights into the latest advancements in embodied intelligence, including industrial applications and research reports [24][20].
港中文(深圳)冀晓强教授实验室全奖招收博士/博士后
具身智能之心· 2025-10-11 16:02
Core Viewpoint - The article emphasizes the opportunities in the field of embodied intelligence, highlighting the need for skilled researchers and the benefits of joining a collaborative academic environment focused on artificial intelligence and robotics. Research Content - The research focuses on interdisciplinary areas such as artificial intelligence control theory, embodied intelligence control, and reinforcement learning control [11]. - Candidates are expected to have a deep understanding and interest in core research directions, with the ability to conduct theoretical innovation and experimental validation independently [2]. Candidate Requirements - **Postdoctoral Researchers**: Must hold a PhD in relevant fields from prestigious institutions, with a strong publication record in top-tier journals or conferences [2]. - **PhD Candidates**: Should possess a master's degree or an outstanding bachelor's degree in related disciplines [3]. - **Master's Candidates**: Expected to have a bachelor's degree in relevant fields from recognized universities [5]. - Candidates should demonstrate a solid foundation in mathematics and programming, with a keen interest in control theory, AI, and robotics [4]. Skills and Experience - Familiarity with deep learning and AI models such as CLIP, BLIP, and LLaVA is essential [6]. - Experience with classic models like VAE, Transformer, and BERT, along with strong algorithm design and programming skills, particularly in high-performance languages like C++ or Rust, is preferred [7][8]. - Practical experience in training, tuning, and deploying deep learning models is highly valued [12]. Mentor Introduction - Professor Ji Xiaoqiang, with a PhD from Columbia University, leads the AI Control and Decision Laboratory at The Chinese University of Hong Kong (Shenzhen) [13]. - His research focuses on intelligent control systems, and he has published over 50 papers in top international journals and conferences [13]. Benefits and Compensation - **Postdoctoral Researchers**: Eligible for annual pre-tax living allowances of 210,000 CNY, with additional subsidies and potential for significant research funding [14]. - **PhD Candidates**: Full or half scholarships available, with top candidates eligible for a principal's scholarship of 180,000 CNY per year [15]. - **Master's Candidates**: Opportunities for transitioning to PhD programs and additional living stipends for outstanding candidates [16]. Application Materials - Applicants must submit a complete CV in both Chinese and English, along with any published papers and evidence of research capabilities [19].
具身智能迎来数据革命!它石智航发布WIYH数据集,比特斯拉Optimus领先半年
具身智能之心· 2025-10-11 10:00
Core Insights - The article highlights the launch of the world's first large-scale real-world embodied VLTA (Vision-Language-Tactile-Action) multimodal dataset, World In Your Hands (WIYH), by the company Itstone Intelligent, marking a significant advancement in the embodied intelligence industry [1][6] - The WIYH dataset aims to address the challenges of data quality and availability in training large models, which have traditionally relied on inconsistent internet data and limited simulation data [1][3] Summary by Sections Dataset Features - The WIYH dataset is characterized by four main features: 1. **Realism**: Data is collected from actual embodied tasks, aligning with real-world applications [3] 2. **Richness**: It spans multiple industries and operational skills, enhancing the model's transfer and generalization capabilities [3] 3. **Comprehensiveness**: It includes multimodal data covering vision, language, touch, and action, facilitating pre-training alignment [3] 4. **Volume**: The dataset's scale is comparable to that of large language models, ensuring the future potential of embodied intelligence [3][4] Unique Advantages - The WIYH dataset offers three unique advantages: 1. **Modal Integrity**: It synchronously captures visual, tactile, and action data using proprietary collection equipment, ensuring precise temporal and spatial alignment [4] 2. **Data Annotation**: High-precision annotations are completed using the company's cloud-based foundational model, covering various granular truth labels for comprehensive supervision signals [4] 3. **Collection Environment**: Data is gathered in real-life operational settings, significantly enhancing authenticity, diversity, and generalization while reducing collection costs by an order of magnitude [4] Future Implications - The establishment of the WIYH dataset signifies the creation of a human-centric embodied data paradigm, enabling the pre-training of embodied AI models for real-world applications [6] - The dataset is expected to facilitate the transition from single-task applications to models with general operational capabilities, laying a solid foundation for the integration of embodied robots into various industries [6] - The company plans to make the WIYH dataset publicly available by December 2025, inviting research institutions and partners to collaborate in building a thriving ecosystem for embodied intelligence [6]
今晚分享!首篇智能体自进化综述:如何迈向超级人工智能之路?
具身智能之心· 2025-10-11 04:00
Core Insights - The article discusses the emerging paradigm of Self-evolving Agents in the field of artificial intelligence, emphasizing the shift from static models to dynamic agents capable of real-time learning and adaptation [1][6] - Despite growing interest from academia and industry, there is a lack of systematic organization and top-level design in the field, with most research treating evolution as a subset of the overall agent framework [1][6] - The article identifies three fundamental questions that remain unanswered in the field: What parts of the agent should evolve? When does evolution occur? How is evolution implemented? [1][6] Summary by Sections Self-evolution in Agents - The article outlines the areas where self-evolution occurs within agents, highlighting the need for clarity in understanding these components [5][6] Timing of Self-evolution - It addresses the timing of when self-evolution takes place, which is crucial for the development of effective intelligent agents [5][6] Implementation of Self-evolution - The article discusses how self-evolution can be realized, focusing on the methodologies and frameworks that can facilitate this process [5][6] Event Announcement - An upcoming live session featuring Gao Huanang, a PhD student from Tsinghua University, will delve deeper into the topic of self-evolving agents [2][6]
Being-VL的视觉BPE路线:把「看」和「说」真正统一起来
具身智能之心· 2025-10-11 00:02
Core Insights - The article discusses the limitations of traditional multimodal models, particularly how CLIP-style encoders prematurely align visual representations to text space, leading to potential hallucinations when details are queried without strong language dependence [1][5] - A new method called Being-VL is proposed, which focuses on visual BPE (Byte Pair Encoding) to improve the alignment and modeling of visual and textual data [1][2] Group 1: Being-VL Methodology - Being-VL consists of three main steps: quantizing images into discrete VQ tokens using VQ-GAN, training a visual BPE that measures both co-occurrence frequency and spatial consistency, and finally unifying visual and text tokens into a single sequence for modeling [2][5] - The Priority-Guided Encoding approach is introduced, which combines frequency and spatial consistency to create a more semantically and structurally meaningful visual token set [7][8] Group 2: Training Strategy - The training process is divided into three stages: initial alignment of visual token embeddings, selective fine-tuning of the LLM, and full fine-tuning on complex reasoning and instruction data [9][15] - A curriculum learning strategy is employed to gradually transition from basic tasks to more complex ones, enhancing the model's ability to understand cross-modal interactions [9][12] Group 3: Experimental Results - Experiments indicate that the discrete representation of images followed by visual BPE leads to improved reliability in detail-sensitive tasks and reduces hallucinations compared to traditional methods [12][16] - The introduction of visual BPE significantly enhances the model's performance and robustness, demonstrating that the semantic integration of stable visual patterns into tokens allows for better reasoning [12][19] Group 4: Tokenization and Efficiency - The study highlights the impact of BPE token size on training efficiency, suggesting that a balanced token size can optimize both expressiveness and training efficiency [19][20] - Larger token sizes may lead to sparse distributions and decreased returns on computational resources, indicating a need for careful scaling in future applications [19][20]
为「具身智能」打造专属眼睛:思岚科技Aurora S全集成AI空间感知系统破晓而来!
具身智能之心· 2025-10-11 00:02
今日,思岚科技(SLAMTEC)正式发布新一代全集成AI空间感知系统——Aurora S。有别于传统相机,Aurora S是一个 集成了AI算法和配套算例的"空间智能感知系统" ,旨在为具身智能机器人提供开箱即用的强大空间感知能力,大幅降低 集成开发门槛。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 思岚科技新款AI空间感知系统Aurora S 一、从"传感器"到"感知系统"的升维 Aurora S最大的革新在于高度集成化。它带有思岚自研的深度学习AI-VSLAM算法提供全3D的地图构建和定位能力以及 端到端神经网络的双目深度估计和语义识别能力,而所需的算力硬件,全部集成于仅238克的紧凑机身内。 对开发者意味着什么? 极大降低门槛 :无需额外配置算力,无需从头开发复杂的视觉算法。 加速上市时间 :提供开箱即用的高精度3D感知、建图与语义理解能力,让开发者能聚焦于机器人上层应用的创新。 简化系统设计 :一体化的设计极大简化了机器人的结构设计与电源管理。 Aurora S带来的不仅是技术参数提升,更是一种开发范式的转变:让复杂的空间感知,变得像使用一个普通摄像头一样 简单。 二、为何Aurora S是" ...
具身机器人赋予了强化学习许多新的应用场景!
具身智能之心· 2025-10-11 00:02
Core Insights - The article discusses the importance of reinforcement learning (RL) in the development of embodied intelligent robots, highlighting its application in various complex tasks such as stair climbing, running, and dancing [3][9] - It emphasizes the challenges faced by newcomers in the field of reinforcement learning, particularly in producing academic papers, and introduces a specialized tutoring program to address these challenges [6][10] Group 1: Reinforcement Learning Applications - Reinforcement learning is crucial for gait control in humanoid and quadruped robots, enabling them to perform tasks in challenging environments [3][9] - The VLA+RL approach for robotic arms is gaining popularity in academia, enhancing the efficiency and smoothness of robot operations [4][9] Group 2: Educational Program - The program is designed for graduate students and others needing guidance on academic papers, featuring small class sizes and weekly live sessions [8][10] - The course aims to help participants confirm research ideas, implement projects, and produce initial drafts for submission to conferences such as RAL, ICRA, IROS, and CoRL [8][10] Group 3: Course Structure and Content - The course spans 14 weeks of intensive online tutoring followed by 8 weeks of maintenance support, focusing on various aspects of reinforcement learning and its applications [10][19] - Weekly milestones and quantifiable indicators are set to ensure participants complete a draft paper by the end of the course [18][19] Group 4: Learning Outcomes - Participants will gain a comprehensive understanding of reinforcement learning algorithms, simulation environments, and the entire process from research idea to paper submission [23][24] - The program includes practical training on robot tasks and writing guidance, ensuring that even those without mature ideas can develop a publishable paper [17][24]