具身智能之心
Search documents
快3000人了,这个具身社区有点干货~
具身智能之心· 2025-11-30 03:03
Openarm是一款双臂任务框架,目前有几家公司开始生产相关本体,缺乏移动能力,一些叠衣服、pick and place也都能满足。但从数据采集来看,VR版本更舒服。 XLerobot存在一定的移动能力,但不多,适合一些入门科研&个人开发使用,可以适配移动操作的一些任 务。 最近在为大家收敛具身科研的几个重点模块:行业内容、本体形态、算法、还有部署的一些方案,已经汇 总在我们的社区内部。 目前为大家梳理了行业正在从事具身大脑、本体研发的公司(突然发现本体也卷不太动了......),以及一些 比较活跃的具身实验室。方便大家判断和升学,除此之外,还有很多行业的研报,供大家判断具身的发展 与周期。 本体方面,推荐几款适合科研的产品:SO-100系列、openarm系列、XLerobot系列等; SO100及升级版本,能上一些VA和VLA的算法,常见功能可以实现了; 其它开发平台,成本较高,需要一定的资金投入,可以参考方舟无限、星海图、宇树的几款本体。 算法层面,目前我们收拢了关于vla(训练、无需训练方式、vla+RL、vla+世界模型、vla轻量化、部署 等)、vln(时间语言、目标导航、点导航等)、运控(强化、 ...
E0:离散扩散新框架,大幅提升 VLA 模型泛化与操控精度
具身智能之心· 2025-11-29 02:07
Group 1 - The article discusses the need for robots to possess three core capabilities for operation in open environments: complex visual scene perception, natural language instruction understanding, and precise action generation [1][3] - Existing methods face significant bottlenecks, including insufficient generalization ability, coarse action control, and modeling paradigm contradictions [3][4] - The proposed framework introduces a continuous action discretization strategy, enhancing the stability of robot inference and allowing for fine-grained control [6][8] Group 2 - The architecture utilizes the PaliGemma open-source VLM as a backbone, adding a 300 million parameter action expert network to optimize action generation through a diffusion model [6][10] - The training process involves multi-modal observation encoding, action discretization, and Gaussian noise addition to ensure temporal consistency [8][9] - The inference process includes initializing a noise action sequence, multi-step denoising, and deterministic de-discretization to produce executable action blocks [10][11] Group 3 - The model achieves state-of-the-art (SOTA) performance across three benchmarks (LIBERO, VLABench, ManiSkill), with an average success rate exceeding baseline by 10.7% [21] - In the LIBERO benchmark, the model achieved an average success rate of 96%, demonstrating superior grasping and instruction execution capabilities [21] - The model also excels in high-precision tasks, achieving an average success rate of 55.2% in the ManiSkill benchmark, significantly outperforming baseline models [24][28] Group 4 - The article identifies limitations such as insufficient semantic alignment for specific tasks, challenges in complex coordination tasks, and inadequate modeling of mechanical interactions [32][35] - Future directions include enhancing cross-modal alignment for semantic-rich tasks, designing adaptive task sampling strategies, and integrating physical model priors to improve control precision [35]
RoboTidy即将开源:让机器人在家庭场景“游刃有余”
具身智能之心· 2025-11-29 02:07
Core Insights - The article discusses the advancements in Embodied AI, particularly through the introduction of RoboTidy, which utilizes 3D Gaussian Splatting (3DGS) technology to create realistic interactive 3D environments for training robots [4][8][20]. Group 1: Importance of 3DGS - Embodied AI research has been hindered by the "simulation paradox," where traditional 3D modeling methods result in low-fidelity environments that do not accurately represent real-world complexities [7]. - RoboTidy's breakthrough lies in its use of 3DGS, which allows for high-speed rendering (over 100 FPS) of photorealistic scenes, enhancing the training environment for robots [9][11]. - The research team scanned 500 real household scenes, enabling robots to experience realistic lighting and textures, which significantly improves the robustness of visual encoders [11][12]. Group 2: Redefining Organization Tasks - Organizing a room is a complex long-horizon planning challenge for robots, requiring semantic understanding and common-sense reasoning [14]. - RoboTidy provides a large dataset of over 8000 expert demonstration trajectories, capturing the implicit logic of how humans organize spaces [14][15]. - The framework includes a "Semantic Planner" and "Low-level Policy," allowing robots to learn organization tasks in a human-like manner [15]. Group 3: Sim-to-Real Validation - The collaboration with Yuanli Infinite focuses on bridging the Sim-to-Real gap, addressing a significant industry challenge [17]. - Experiments show that models trained in RoboTidy's high-fidelity environment achieve a task success rate improvement of 29.4% in real-world robot tests compared to traditional methods [17][18]. - This demonstrates that high-quality simulation data can be effectively translated into real-world productivity [18]. Group 4: Standardization and Open Source - Prior to RoboTidy, there was a lack of standardized evaluation metrics for household organization tasks, making it difficult to compare results across different research labs [20]. - RoboTidy establishes a standardized evaluation system and leaderboard, inviting global developers to contribute to the evolution of household service robots [20][22]. - The initiative aims to create a more realistic and rigorous starting point for advancing the field of Embodied AI [22][27].
首个面向求职+工业级的VLA实战教程!真机+各类VLA算法部署+量化+世界模型
具身智能之心· 2025-11-29 02:07
Core Viewpoint - The article discusses the challenges and advancements in the VLA (Variable Learning Algorithm) field, emphasizing the importance of real machine data collection and the complexities involved in model training and deployment. Group 1: Data Collection - Real machine data collection is crucial for VLA models, with methods including remote operation, VR, and full-body motion capture being highlighted as effective approaches [2][8]. - The article stresses the need for high-quality data and the significance of the real2sim2real process in ensuring effective data collection [8]. Group 2: Model Training - Training VLA models typically requires simulation debugging before real machine deployment, especially when real machine data is insufficient [10]. - The article notes that many beginners struggle with model training, particularly with advanced models like π0 and π0.5, which require specific techniques and experience to achieve good results [6][10]. Group 3: Model Deployment - After training, VLA models often need to undergo a "slimming" process due to their large parameter sizes, which poses challenges for deployment on edge devices [12]. - Techniques such as quantization and distillation are essential to minimize parameter size while maintaining performance [12]. Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn about VLA, covering various aspects such as hardware, data collection, algorithms, and deployment [14][16]. - The course is designed for a wide audience, including students and professionals looking to transition into the VLA field, and includes hands-on experience with hardware [27][30].
VLA+RL方案:具身的“关键突破”,如何更好地部署落地?
具身智能之心· 2025-11-29 02:07
Core Viewpoint - The article discusses the challenges and advancements in deploying VLA (Variable Length Attention) algorithms and Reinforcement Learning (RL) in robotics, focusing on improving full-body motion control and real-world application [3][4][5]. Group 1: VLA Architecture and Challenges - The article highlights the pain points in the architecture and models of VLA, indicating that there are still significant challenges to overcome for effective implementation [4][8]. Group 2: Full-Body Motion Control - It explores the potential for evolution in full-body motion control solutions for robots, emphasizing the need for advancements to enhance performance [4][8]. Group 3: Real-World Deployment of VLA and RL - The discussion includes strategies for better real-world deployment of VLA combined with RL, addressing how to select appropriate hardware ("板子") and the importance of lightweight solutions [4][8].
那些坚持具身智能泡沫论的人.......
具身智能之心· 2025-11-28 04:00
Core Viewpoint - The article discusses the rapid growth of the embodied intelligence industry, emphasizing that the development speed and value should not be seen as opposing forces, despite ongoing discussions about potential bubbles in the market [2][4]. Industry Growth - The embodied intelligence industry, particularly represented by humanoid robots, is experiencing a growth rate exceeding 50%, with projections indicating a market size reaching 100 billion yuan by 2030 [2][4]. - The primary market remains active, with significant financing occurring, which, despite some valuation bubbles, is beneficial for driving industry development and encouraging participation across various sectors such as manufacturing, healthcare, and entertainment [6]. Business Models and Development - There is a need to focus on the differences in business models within the embodied intelligence sector, avoiding redundant efforts and price wars that do not contribute to effective growth [4]. - The industry is still in its early stages, with technology routes, business models, and application scenarios not yet fully matured, indicating substantial room for development [4].
OpenReview大瓜!原来我的评审低,是好友打的分
具身智能之心· 2025-11-28 01:10
Core Viewpoint - The article discusses a significant bug in the OpenReview platform that exposed the identities and scores of reviewers for academic papers, leading to various reactions within the AI community [2]. Group 1: OpenReview Bug - The bug in OpenReview is system-level, allowing users to view reviewer information for papers from various AI conferences by altering a specific part of the URL [4]. - This incident has caused a stir in the academic community, revealing personal biases and competitive behaviors that resulted in low scores for some papers [2]. Group 2: Reviewer Suspicion Analysis - A total of 196 suspicious reviewers were identified, with 120 frequent collaboration relationships noted [5]. - The analysis highlighted several reviewers with a suspicion score of 5, indicating extremely inconsistent ratings and excessive rating ranges, with average ratings varying significantly [5]. - One reviewer had an average rating of 8.00 with a standard deviation of 0.00, suggesting lenient ratings and high confidence, raising concerns about the review process integrity [5].
InternData-A1开源:纯合成数据性能比肩顶级真实数据,效果媲美官方π0模型
具身智能之心· 2025-11-28 00:04
Core Insights - The article discusses the introduction of InternData-A1, a synthetic dataset that overcomes the limitations of traditional robot training data by providing high-fidelity, large-scale, and low-cost data for Vision-Language-Action (VLA) models [1][2][21]. Group 1: Need for Reconstructing Robot Pre-training Data Paradigm - Current VLA model training faces a dilemma: real data is high fidelity but costly and limited in scale, while synthetic data lacks diversity and physical realism [2]. - InternData-A1 addresses this by combining high-quality synthetic data with a modular generation pipeline, ensuring scalability, diversity, and cost-effectiveness [2][21]. Group 2: Core Features of InternData-A1 - InternData-A1 encompasses a comprehensive robot interaction data system, covering 4 robot types, 70 tasks, and 227 scenes, with a total of 630,000 trajectories and 7,433 hours of interaction data [4][6]. - The dataset achieves high-fidelity simulation through optimized physics engines and visual rendering, minimizing the gap between simulation and real-world performance [6][21]. - A modular generation pipeline allows for low-cost, efficient data production, automating the processes of asset configuration, skill combination, domain randomization, and trajectory generation [8][9]. Group 3: Performance Comparison and Validation - Models pre-trained on InternData-A1 demonstrate top-tier performance in both simulated and real-world tasks, matching or exceeding the performance of models trained on real datasets [10][14]. - In simulated tasks, the success rate reached 60% in Easy mode and 26.5% in Hard mode, outperforming traditional models [11][12]. - The dataset shows that 1,600 synthetic data points can match the performance of 200 real data points, significantly reducing data collection costs [20][21]. Group 4: Future Directions - Future enhancements will focus on expanding task and robot type coverage, including high-precision dexterous tasks and more robot forms [19][20]. - The potential for synthetic data to replace real data in VLA model pre-training is emphasized, highlighting the importance of scalability, diversity, and fidelity in synthetic datasets [21][22].
读了 40 篇 VLA+RL之后......
具身智能之心· 2025-11-28 00:04
Core Insights - The article discusses the shift in research trends towards incorporating Reinforcement Learning (RL) in Visual Language Models (VLA), moving beyond Supervised Fine-Tuning (SFT) to enhance model performance and adaptability [1][2]. Group 1: RL Methodologies - Various RL methodologies are categorized, including online RL, offline RL, iterative RL, and inference-time improvement, but the author emphasizes that the effectiveness of these methods is more important than their classification [1]. - The real-world applicability of RL is crucial, with safety and efficiency being key concerns during data collection and model deployment [2]. Group 2: Task Performance and Challenges - Current RL implementations show promising results in single-task performance, with examples like Pi-star-0.6 requiring around 1,000 trajectories for complex tasks such as folding clothes [3]. - A significant challenge remains in enabling RL to handle multiple tasks effectively, ensuring that tasks can positively influence each other rather than detract from overall performance [3]. Group 3: Reward Functions and Research Directions - The necessity of learning reward functions or value functions is debated, with the potential for reduced variance in optimization being a key benefit, although this need may diminish as pre-trained VLA models improve [4][5]. - Research directions are identified, focusing on issues related to sparse rewards, the scale of policy networks, and the multi-task capabilities of RL [5]. Group 4: Literature and Keywords - A list of relevant literature and keywords is provided for further exploration, indicating a rich field of study within RL and VLA [6].
NeurIPS 2025奖项出炉,Qwen获最佳论文
具身智能之心· 2025-11-28 00:04
Core Insights - The NeurIPS 2025 conference awarded four Best Paper awards and three Best Paper Runner-up awards, highlighting significant advancements in various AI research areas [1][2][4]. Group 1: Best Papers - Paper 1: "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)" introduces Infinity-Chat, a dataset with 26,000 diverse user queries, addressing the issue of homogeneity in language model outputs [6][8][10]. - Paper 2: "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" reveals the impact of gated attention mechanisms on model performance, enhancing training stability and robustness [12][18]. - Paper 3: "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities" demonstrates that increasing network depth to 1024 layers significantly improves performance in self-supervised reinforcement learning tasks [19][20]. - Paper 4: "Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training" explores the dynamics of training diffusion models, identifying mechanisms that prevent memorization and enhance generalization [21][23]. Group 2: Awards and Recognition - The Time-Tested Award was given to the paper "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," recognized for its foundational impact on computer vision since its publication in 2015 [38][42]. - The Sejnowski-Hinton Prize was awarded to researchers for their work on "Random synaptic feedback weights support error backpropagation for deep learning," contributing to the understanding of biologically plausible learning rules [45][49].