Workflow
具身智能之心
icon
Search documents
RoboMemory:专为物理具身系统中的终身学习而设计
具身智能之心· 2025-09-04 01:04
Core Viewpoint - The article discusses RoboMemory, a brain-inspired multi-memory framework designed for lifelong learning in physical embodied systems, addressing key challenges in dynamic real-world environments [2][4]. Group 1: Framework Overview - RoboMemory is designed to tackle four core challenges: continuous learning capability, multi-module memory latency, task relevance capture, and avoidance of deadlock in closed-loop planning [2]. - The framework integrates four core modules: information preprocessing system (thalamus-like function), lifelong embodied memory system (hippocampus-like function), closed-loop planning module (prefrontal cortex-like function), and low-level executors (cerebellum-like function) [2]. Group 2: Memory System Features - The lifelong embodied memory system features parallel updating and retrieval mechanisms across four sub-modules: spatial memory, temporal memory, episodic memory, and semantic memory, effectively resolving reasoning speed bottlenecks in complex memory architectures [2]. - The system employs dynamic knowledge graphs and a consistent architecture design, significantly enhancing memory coherence and scalability [2]. Group 3: Application and Impact - The article emphasizes the importance of memory systems for embodied agents in real-world environments, highlighting the need for continuous learning capabilities [4][6]. - The discussion includes the pain points faced by embodied agents in real environments and how a robust memory system can address these challenges [6].
Galaxea 团队推出:大规模高质量开放世界数据集与G0双系统VLA模型
具身智能之心· 2025-09-04 01:04
Core Insights - The article presents the Galaxea Open-World Dataset, a large-scale and diverse collection of robot behaviors recorded in real human living and working environments, addressing the scarcity of high-quality open-world robot data and insufficient model generalization [3][5][6]. Dataset Overview - The dataset consists of 500 hours of data, 100,000 demonstration trajectories, covering 150 task categories, 1,600 object types, and 58 operational skills, with a 2Hz frequency for detailed sub-task instruction labeling [8][12]. - Data was collected using the Galaxea R1 Lite mobile dual-arm robot, which has 23 degrees of freedom and is equipped with RGB cameras for global scene perception and fine operation sensing [5][6]. Data Diversity and Coverage - The dataset includes data from 11 physical sites across 50 unique scenarios, covering residential, retail, dining, and office environments, thus avoiding the limitations of existing datasets that are confined to controlled laboratory settings [6][12]. - The distribution of tasks shows a balance between basic actions and specialized skills, with residential scenes making up 50.8% and office scenes 33.2% of the dataset [11][12]. G0 Dual-System Framework - The G0 framework couples a "slow thinking" visual-language model (G0-VLM) with a "fast execution" visual-language-action model (G0-VLA), employing a three-stage training strategy to achieve complex task planning and precise execution [5][19]. - The training phases include cross-entity pre-training, single-entity pre-training, and task-specific fine-tuning, which enhance the model's performance significantly [21][30]. Model Performance Evaluation - The G0-VLA model demonstrated superior performance in benchmark tasks such as desktop organization and microwave operation, with G0-Full achieving the highest average task progress scores [39][47]. - The study found that single-entity pre-training is essential for effective model adaptation, as cross-entity pre-training can lead to negative transfer due to significant differences between the training and target robot entities [39][46]. Key Findings - The G0-VLM model outperformed mainstream visual-language models in instruction accuracy, achieving 83.3% in desktop organization and 78.2% in bed-making tasks, highlighting the importance of domain-specific fine-tuning [42][47]. - The dataset's design and the dual-system framework effectively address the challenges of real-world robot task execution, providing a robust foundation for future advancements in embodied intelligence [17][19].
VLA方向的1v1论文辅导来啦,辅导至中稿~
具身智能之心· 2025-09-03 10:00
Group 1 - The article highlights the availability of 5 slots for guidance on embodied intelligence-related papers, targeting conferences such as CVPR, ICCV, ECCV, ICLR, CoRL, ICML, ICRA, and RSS [1] - The mentoring is provided by active researchers in the field of embodied intelligence, each with over 10 top conference papers and specific ideas [1] Group 2 - Interested individuals can add a specific WeChat account for inquiries or scan a QR code for consultation regarding the embodied paper guidance [2]
Galaxea 团队推出:大规模高质量开放世界机器人数据集与G0双系统VLA模型
具身智能之心· 2025-09-03 03:23
Core Insights - The article presents the Galaxea Open-World Dataset, a large-scale and diverse collection of robot behaviors recorded in real human living and working environments, addressing the scarcity of high-quality open-world robot data and insufficient model generalization capabilities [2][5][6]. Dataset Overview - The Galaxea Open-World Dataset is the first large-scale robot behavior dataset collected in real-life scenarios, solving issues of existing datasets that are limited to controlled environments and inconsistent robot entities [5][17]. - Data collection was conducted using the Galaxea R1 Lite mobile dual-arm robot, which features 23 degrees of freedom and is equipped with RGB cameras for global scene perception and fine operation sensing [8][6]. - The dataset includes 500 hours of data, 100,000 demonstration trajectories, covering 150 task categories, 1,600 object types, and 58 operational skills, with a 2Hz frequency for detailed sub-task instruction labeling [8][12]. Model Framework - The G0 dual-system framework couples a "slow thinking" visual-language model (G0-VLM) with a "fast execution" visual-language-action model (G0-VLA), utilizing a three-stage training strategy to achieve complex task planning and precise execution [5][19]. - The training phases include cross-entity pre-training, single-entity pre-training, and task-specific fine-tuning, which are designed to balance general knowledge and specific robot adaptation [21][27]. Performance Evaluation - The G0-VLA model demonstrated superior performance in benchmark tasks such as desktop organization, microwave operation, bed making, and block building, with G0-VLM achieving an instruction accuracy of 78.2% in bed making and 83.3% in desktop organization [42][47]. - The study found that single-entity pre-training is essential for effective model performance, as cross-entity pre-training can lead to negative transfer due to significant differences between the training and target robot entities [39][46]. Key Findings - The dataset's design emphasizes real-world adaptability and model training friendliness, ensuring that the collected data reflects the complexities of human environments [6][17]. - The G0 model's architecture is inspired by Kahneman's dual-system theory, where System 2 (slow thinking) is responsible for planning and System 1 (fast execution) handles real-time reactions, allowing for a balance between planning rationality and execution timeliness [19][21].
MemoryVLA:给机器人装上海马体,助力长时序机器人操作任务
具身智能之心· 2025-09-03 00:03
Core Viewpoint - The article discusses the development of MemoryVLA, a cognitive-memory-action framework inspired by human memory systems, aimed at improving robotic manipulation tasks that require long-term temporal dependencies [3][7]. Group 1: Current Issues in VLA Models - Existing Vision-Language-Action (VLA) models primarily rely on current observations, leading to poor performance in long-term, temporally dependent tasks [2][7]. - Cognitive science indicates that humans utilize a memory system involving neural activity and the hippocampus to manage tasks effectively over time [7]. Group 2: MemoryVLA Framework - MemoryVLA is designed to create a memory system for robots, drawing inspiration from human cognitive mechanisms [3][7]. - The framework includes a pre-trained Vision-Language Model (VLM) that encodes observations into perceptual and cognitive tokens, which are stored in a Perceptual-Cognitive Memory Bank [3]. - Working memory retrieves relevant entries from the memory bank, merging them with current tokens to adaptively update the memory [3]. Group 3: Importance of Memory in Robotics - The article emphasizes the necessity of memory in robotic tasks, explaining that it enhances decision-making and action sequences in complex environments [3][7]. - A memory-conditioned diffusion action expert generates action sequences with temporal awareness using the tokens [3].
诚聘英才 | 朗毅机器人2026届全球校园招聘启动!
具身智能之心· 2025-09-03 00:03
Core Viewpoint - The article highlights the advancements and market position of Langyi Robotics, a company specializing in embodied intelligence and spatial intelligence solutions, emphasizing its innovative navigation technology and significant market share in the humanoid robot sector [2][3]. Company Overview - Langyi Robotics focuses on developing next-generation embodied intelligence and spatial intelligence solutions, aiming to push the boundaries of robot perception and navigation technology [2]. - The company has launched the world's first embodied perception navigation module, enabling humanoid robots to achieve fully autonomous movement, obstacle avoidance, advanced spatial reasoning, and generalized environmental interaction capabilities [2]. Market Position - The company holds a remarkable market share of 80% and has served numerous leading humanoid robot manufacturers [3]. - Research and development account for 85% of the company's operations, indicating a strong focus on innovation and technology advancement [3]. Team and Expertise - The core team members come from prestigious universities such as Huazhong University of Science and Technology, Zhejiang University, and University of Electronic Science and Technology, with over ten years of experience in core algorithms for spatial intelligence [4]. - The company has secured tens of millions in investments from several leading institutions, including Inno Angel Fund, Jiada Capital, and Qiji Chuangtan [4]. Recruitment and Opportunities - Langyi Robotics is actively recruiting for full-time and internship positions, targeting 2026 graduates and current students in relevant fields [9]. - The company offers competitive compensation packages, including fixed salaries, performance bonuses, and core talent equity incentives [5]. - Opportunities for professional growth include a mentorship system with industry experts and participation in core business projects [5].
刚入学,导师让我从0开始研究具身智能方向......
具身智能之心· 2025-09-03 00:03
Core Viewpoint - The article discusses the advancements in embodied intelligence algorithms and their market potential, emphasizing the need for further research and development in this field [1][2]. Group 1: Technological Advancements - Embodied algorithms have improved global perception capabilities, transitioning from traditional pipeline solutions to end-to-end models [1]. - Navigation tasks have shifted from mapping and localization to target navigation using large model-based, map-free solutions [2]. Group 2: Market Potential and Challenges - The market size and capacity for embodied intelligence are larger than other fields, but many unresolved issues remain, requiring collective effort [2]. - The short development time of embodied intelligence has led to a lack of systematic approaches and pathways for newcomers [2]. Group 3: Educational Initiatives - The company has developed several courses in the embodied intelligence field to address the lack of structure and guidance for learners [2]. - A community has been established to facilitate learning and collaboration among individuals interested in embodied intelligence [2]. Group 4: Course and Community Highlights - The learning path is systematized to help users avoid common pitfalls and quickly get started [6]. - The program includes a variety of practical robotics projects, combining simulation and real machine practice [6]. - Live interactions with industry experts and researchers are available, along with permanent access to recorded sessions and shared source code [7].
Scaling Laws起源于1993年?OpenAI总裁:深度学习的根本已揭秘
具身智能之心· 2025-09-03 00:03
Core Viewpoint - The article discusses the historical development and significance of the Scaling Law in artificial intelligence, emphasizing its foundational role in understanding model performance in relation to computational resources [2][34][43]. Group 1: Historical Context - The Scaling Law's origins are debated, with claims that it was first proposed by OpenAI in 2020 or discovered by Baidu in 2017 [2]. - Recent discussions attribute the initial exploration of Scaling Law to Bell Labs, dating back to 1993 [3][5]. - The paper from Bell Labs demonstrated the relationship between model size, data set size, and classifier performance, highlighting the long-standing nature of these findings [5][9]. Group 2: Key Findings of the Research - The NeurIPS paper from Bell Labs outlines a method for efficiently predicting classifier suitability, which is crucial for resource allocation in AI model training [12]. - The authors established that as training data increases, the error rate of models follows a predictable logarithmic pattern, reinforcing the Scaling Law's validity [12][16]. - The research indicates that after training on 12,000 patterns, new networks significantly outperform older ones, showcasing the benefits of scaling [16]. Group 3: Contributions of Authors - The paper features five notable authors, including Corinna Cortes and Vladimir Vapnik, both of whom have made significant contributions to machine learning and statistical theory [18][19][27]. - Corinna Cortes has over 100,000 citations and is recognized for her work on support vector machines and the MNIST dataset [21][22]. - Vladimir Vapnik, with over 335,000 citations, is known for his foundational work in statistical learning theory [27]. Group 4: Broader Implications - The article suggests that the Scaling Law is not a sudden insight but rather a cumulative result of interdisciplinary research spanning decades, from psychology to neural networks [34][43]. - The evolution of the Scaling Law reflects a broader scientific journey, with contributions from various fields and researchers, ultimately leading to its current understanding in deep learning [43].
XDog:具身低成本科研平台,四足机械狗+单臂(含VLA/强化学习/仿真/sim2real教程)
具身智能之心· 2025-09-02 02:00
Core Viewpoint - Xdog is a low-cost, multifunctional quadruped robotic dog and robotic arm development platform designed for embodied developers, featuring a comprehensive curriculum for research and learning in robotics [1][2]. Hardware Overview - Xdog integrates advanced functionalities such as voice control, sim2real, real2sim, target recognition and tracking, autonomous robotic arm grasping, and reinforcement learning gait control, covering most of the technology stack for embodied intelligent lower limb control [2][5]. - The robotic dog measures 25cm x 20cm x 30cm and weighs 7.0kg, with a maximum speed of 7.2 km/h and a maximum rotation speed of 450 degrees per second [3][11]. - The main control chip is Allwinner H616, featuring a quad-core 1.6GHz CPU, 4GB RAM, and 32GB storage [4][5]. - The robotic arm can reach a maximum height of 0.85m and has a grasping range of 0.4m around its base [7]. Software and Functionality - The system supports various control methods including voice control via TCP, keyboard control, visual control, and reinforcement learning for autonomous movement [15][17]. - Development is based on ROS1, with Python as the primary programming language, and it is recommended to use a GPU of at least 2080ti for inference [16][24]. - The platform includes a comprehensive curriculum covering topics from basic ROS knowledge to advanced reinforcement learning principles and practical applications [22][23]. Team and Support - The project is led by a team of experienced instructors responsible for project advancement, technical support, and course development [22]. - After-sales service is provided for one year post-delivery, with video and source code access granted immediately after hardware receipt [26]. Delivery and Consultation - The delivery cycle is set to be completed within three weeks after payment [25]. - For further inquiries, potential customers are encouraged to consult the assistant via WeChat [27].
还在卷端到端模型?Embodied-R1另辟蹊径:用“指向”+强化学习实现SOTA性能!
具身智能之心· 2025-09-02 00:03
Core Insights - The article discusses the development of Embodied-R1, a new model designed to bridge the "seeing-to-doing gap" in robotics, which has been a long-standing challenge in the field [2][32] - The model introduces a novel intermediate representation called "pointing," which allows complex operational instructions to be translated into visual points, enhancing the robot's ability to understand and execute tasks [3][10] Group 1: Challenges in Robotics - The "seeing-to-doing gap" is primarily caused by data scarcity and morphological heterogeneity, which hinder effective knowledge transfer in robotics [2] - Existing visual-language-action (VLA) models struggle with performance in new environments, often losing zero-shot operational capabilities [2][10] Group 2: Embodied-R1 Model Overview - Embodied-R1 is a 3 billion parameter model that utilizes "pointing" as an intuitive intermediate representation, defining four key capabilities: REG (representational understanding), RRG (spatial region pointing), OFG (functional part pointing), and VTG (visual trajectory generation) [10][12] - The model has demonstrated superior performance in 11 spatial reasoning and pointing tasks, achieving a 56.2% success rate in the SIMPLEREnv simulation and an impressive 87.5% in eight real-world tasks without fine-tuning [10][27] Group 3: Training Methodology - The model employs a two-phase training curriculum, focusing first on spatial reasoning and then on embodied pointing capabilities, utilizing a large dataset of 200,000 samples [15][16] - Reinforcement fine-tuning (RFT) is introduced to address the "multi-solution dilemma" in pointing tasks, allowing the model to develop a generalized understanding rather than memorizing specific answers [17][19] Group 4: Performance Metrics - Embodied-R1 outperforms other models in various benchmarks, achieving state-of-the-art (SOTA) results in REG, RRG, OFG, and VTG tasks [29][30] - The model's trajectory generation quality is the best among all compared models, which is crucial for reliable robot execution [29] Group 5: Robustness and Adaptability - The model exhibits strong robustness against visual disturbances, maintaining performance even under challenging conditions such as poor lighting and background changes [31] - This adaptability is attributed to the "pointing" representation, which enhances the robot's strategic robustness [31] Group 6: Conclusion - The introduction of Embodied-R1 marks a significant advancement in addressing the long-standing "seeing-to-doing gap" in robotics, providing a promising pathway for developing more powerful and generalizable embodied AI systems [32]