具身智能之心
Search documents
从零把π0和π0.5部署上去!
具身智能之心· 2025-11-22 16:03
Core Viewpoint - The article highlights the launch of the Imeta-Y1, a lightweight and cost-effective robotic arm designed for beginners and researchers in the field of embodied intelligence, emphasizing its open-source tools and user-friendly features [3][4][6]. Product Features - Imeta-Y1 is specifically designed for newcomers and researchers, providing a high-performance robotic arm at an affordable price [3]. - It offers a complete open-source toolchain and code examples, facilitating a seamless process from data collection to model deployment [4][18]. - The arm supports dual-language interfaces (Python/C++) and is compatible with ROS1/ROS2, allowing users to quickly get started regardless of their programming background [4][19]. - It features high precision motion control, low power consumption, and an open hardware architecture, enabling smooth integration from simulation to real-world applications [6][7]. Technical Specifications - The robotic arm has a weight of 4.2 kg, a rated load of 3 kg, and 6 degrees of freedom, with a working radius of 612.5 mm and a repeatability precision of ±0.1 mm [9][20]. - It operates at a supply voltage of 24V and communicates via CAN, with a compact design suitable for embedded AI and robotic learning platforms [9][20]. - The arm's joint motion range and maximum speeds are specified, ensuring it meets various application needs [22]. Development and Support - The company provides a comprehensive open-source SDK, including drivers, API interfaces, example codes, and documentation, supporting rapid application development [31][30]. - Users can leverage multi-modal data fusion capabilities, compatible with mainstream frameworks like TensorFlow and PyTorch, to implement intelligent algorithms [37][18]. - The company ensures quick after-sales support, with a 24-hour response time for customer inquiries [20][49]. Testing and Reliability - Rigorous hardware testing processes are in place to validate the arm's accuracy, durability, load performance, and stability across various application scenarios [40][44]. - The product is backed by a six-month warranty against non-human damage, with post-warranty support available at market rates [50].
移动操作的AlohaMini来啦!600美元成本,全开源
具身智能之心· 2025-11-22 03:07
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 The $600 Open-Source Home Robot @ Meet AlohaMini, the new dual-arm mobile robot designed to make real- world manipulation and embodied AI research accessible. This bot is fully 3D-printable and aimed at home builders and research labs. Key Features: S Cost: The Bill of Materials (BOM) totals around $600 (USD) for self- printed parts, making it highly accessible. (_ Hardware: Dual-arm mobile base with a motorized vertical lift (0-60 cm travel). 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ ...
两院院士增选结果揭晓:周志华、刘云浩当选科学院院士
具身智能之心· 2025-11-21 16:03
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 11 月 21 日上午,中国科学院、中国工程院公布了 2025 年院士增选结果,分别选举产生中国科学院院士 73 人,中国工程院院士 71 人。 两院院士是我国科学技术方面和工程科技领域的最高荣誉称号,本次增选后,我国院士队伍的结构进一步优化。其中,新当选的科学院院士平均年龄 57.2 岁,最小年龄 44 岁,最大年龄 66 岁,60 岁(含)以下的占 67.1%,女性科学家有 5 人当选。 本次增选后,我国现有中国科学院院士共 908 人,现有中国工程院院士共 1002 人。 值得关注的是,在本次增选中,有与人工智能领域相关的学者入选。 中国科学院院士 2025 年中国科学院选举产生了 73 名中国科学院院士和 27 名中国科学院外籍院士。 在这批当选者中,我们熟悉的计算机与人工智能领域的多位顶尖科研大牛也成功入选,彰显了中国在前沿科技方向上的持续突破与重视。 刘云浩 — 清华大学教授,研究 ...
VLA-Pruner:面向高效VLA推理的时序感知视觉token剪枝
具身智能之心· 2025-11-21 16:03
Group 1 - The core challenge of VLA models lies in their ability to integrate visual scene perception, natural language understanding, and action execution, which results in significant computational overhead due to the high number of visual tokens compared to text tokens [2][4]. - Existing pruning methods for visual tokens are flawed as they primarily focus on semantic relevance, neglecting the distinct needs of high-level semantic understanding and low-level action execution, leading to performance drops at high pruning rates [3][4]. - A key observation is that the temporal continuity of robot operations allows for the estimation of necessary visual tokens for current actions based on historical attention trends, providing a breakthrough in addressing the limitations of existing methods [5]. Group 2 - The VLA-Pruner is designed to retain both semantic understanding and action execution tokens under a given computational budget, achieving efficient inference without performance loss through a dual-level criterion and selection strategy [6][10]. - The dual-level importance criteria include semantic relevance based on pre-fill attention scores and action-level importance estimated through temporal smoothing, ensuring a comprehensive approach to token selection [7][9]. - The method employs a "merge-filter" mechanism to maximize relevance and minimize redundancy, ensuring that all critical tokens for both semantic understanding and action execution are preserved [10][11]. Group 3 - Experimental results demonstrate that at a 50% pruning rate, VLA-Pruner not only maintains performance but also improves success rates, with OpenVLA showing an average increase of 2.45% [16]. - The VLA-Pruner exhibits robustness across different scenarios, achieving a success rate of 96.8% in the SIMPLER environment at a 75% pruning rate, significantly outperforming baseline methods [19][20]. - Efficiency improvements are notable, with FLOPs reduced to approximately 60% of the original model at a 50% pruning rate and achieving up to 1.8 times faster inference speeds [26][27]. Group 4 - The core contributions of the study include the introduction of a dual-level pruning criterion that addresses the inherent flaws of existing methods and the design of a plug-and-play pruning framework that enhances inference efficiency without altering the model architecture [31]. - Limitations include potential inaccuracies in action attention estimation in dynamic scenarios with rapid perspective shifts or target changes, suggesting areas for future optimization [31]. - Future directions involve the development of adaptive prediction modules and the integration of additional techniques such as quantization and layer pruning to further enhance deployment efficiency [31].
每家具身公司都在重复造轮子,数据孤岛问题怎么解决?
具身智能之心· 2025-11-21 16:03
这周四智源做了一个Open Day,由于时间关系没能来得及参与。看了下现场的嘉宾和topic觉得蛮感兴趣,和 大家分享下。 国内具身公司如星海图、银河通用、原力灵机、智元、自变量、加速进化、北京人形……多家头部的CEO或 联合创始人,几乎都来了。 会上有一个非常重要的topic:打破数据孤岛、共建生态。也就是说,各家公司的数据都可以提供给平台,作 为非赢利的第三方组织,智源不涉及商业行为更值得"被信任",移动操作、机械臂等多个本体数据正在逐渐 开源...... 期间,还宣布开源百万级的高质量真机具身数据(经过清洗、标注、对齐),除此之外,发布了全流程开发 平台RoboXstudio和数据软件框架CoRobot。 这就意味着从数据采集、标注管理、训练、评测部署整套流程打通,创业公司不用花高价请团队各自搭建平 台。 最重要的一点是,"统一评测",再也别说自己的机器人是最好的了,一起评测试试? 整体看下来,对本体公司来说,谁贡献更多数据给开源平台,优化的效果会更好。统一标准的评测,更能区 分优劣。这对整个行业也是一个极大的促进,从各自为战到"有组织、有纪律"。 以上为我们的具身社区: 具身智能之心知识星球内 的分享 ...
第一家人形机器人公司,被量产拖死了......
具身智能之心· 2025-11-21 09:59
Core Viewpoint - K-Scale Labs, a once-promising robotics company, has officially announced its dissolution due to cash flow issues and strategic missteps, particularly in the high-end market segment [2][3]. Group 1: Company Overview - K-Scale Labs was initially valued at $50 million during its seed round and aimed to compete with established robotics firms [2]. - The company faced a critical failure in scaling production, having only produced 10 prototypes of its high-end K-bot, each costing $100,000 [2]. Group 2: Strategic Missteps - The company initially targeted the low-end market with its Z-Bot priced under $1,000 but failed to continue its development, missing out on a product that could generate immediate cash flow [2]. - K-Scale Labs shifted its focus entirely to the high-end market with the K-bot, which resulted in unsustainable production costs and ultimately led to its downfall [2]. Group 3: Financial Challenges - The company experienced a cash flow crisis that made it impossible to secure further financing, leading to its decision to refund customers and cease operations [2][3]. - Despite receiving orders for 100 units worth over $200,000, the high production costs forced K-Scale Labs to issue full refunds to customers [3].
实力出圈,43秒搞定工业任务!拎桶分拣惊艳全场。
具身智能之心· 2025-11-21 04:01
Core Viewpoint - The article highlights the success of Lingyu Intelligent Technology Co., Ltd. at the 2025 Second Zhongguancun Embodied Intelligent Robot Application Competition, where the company’s TeleAvatar robot won first place in all seven subcategories it entered, showcasing its advanced remote operation capabilities and practical applications in various scenarios [2][4][23]. Group 1: Competition Overview - The competition, themed "Embodied Intelligence, Application Future," attracted 157 top teams globally and offered a total prize pool of 2 million yuan to encourage technological breakthroughs and practical applications [2][4]. - The event was organized by the Zhongguancun Science City Management Committee, with support from various governmental and research institutions, aiming to bridge the gap between technological innovation and real-world application [4]. Group 2: Performance Highlights - Lingyu Intelligent's TeleAvatar robot demonstrated exceptional performance, completing tasks significantly faster than the average time set for the competition, with material handling completed in 43 seconds and parts assembly in 1 minute and 22 seconds [6][10]. - In the household service category, tasks such as desktop cleaning and clothes drying were completed in 45 seconds and 55 seconds, respectively, showcasing the robot's adaptability and efficiency in practical scenarios [7][10]. Group 3: Technical Foundation - The company was founded by a top motion control team from Tsinghua University, leveraging the expertise of its core members to establish a strong technical foundation [17]. - The TeleAvatar robot is equipped with a self-developed TeleDroid control platform, featuring a seven-axis robotic arm and a dual-camera vision system, enabling low-latency transmission and high-precision action replication [17]. Group 4: Future Directions - Lingyu Intelligent aims to continue optimizing the core performance of the TeleAvatar robot based on real-world application needs, contributing to industrial upgrades and improvements in daily life [23]. - The company plans to use the competition as a platform to enhance industry visibility and foster collaboration with peers, ultimately driving the adoption of embodied intelligent technologies in various sectors [17][23].
GEN-0 以及后续的 VLA 发展的看法
具身智能之心· 2025-11-21 00:04
Core Insights - The release of GEN-0 marks a significant advancement in the field of embodied intelligence, particularly in manipulation tasks, which have historically faced challenges due to data scarcity and the difficulty of generalization [1][2] - GEN-0 has leveraged a massive dataset of 270,000 hours, equivalent to approximately 31 years, and continues to collect data at a rate of 10,000 hours per week, surpassing previous models like the Pi series in pre-training effectiveness [2][3] - Despite its advancements, GEN-0 has not achieved a "GPT moment" or true zero-shot capabilities, indicating ongoing challenges in the field [2][3] Data Collection and Utilization - The data collection strategy for GEN-0 emphasizes the importance of data diversity and quality over sheer quantity, as evidenced by the scaling laws observed in the model's performance [10][13] - The emergence of UMI (Unified Multi-Instance) has posed challenges to traditional simulation methods, highlighting the need for real-world data collection to achieve high success rates in manipulation tasks [5][7] - The success rate of real-world data collection approaches 100%, while simulation methods face significant challenges, particularly in generating long-horizon data [8][9] Model Training and Performance - GEN-0's results suggest that larger models are necessary to effectively utilize vast amounts of data, as smaller models struggle to generalize under data overload conditions [11][12] - Pre-training in GEN-0 focuses on learning action space exploration rather than generalization, indicating a shift in how models are trained to handle diverse tasks [12] - The insights gained from GEN-0's pre-training process emphasize the need for a deeper understanding of data quality and diversity, which can significantly impact model performance [10][13] Future Directions - The findings from GEN-0 challenge existing paradigms in the field, suggesting that new engineering efforts and problem-solving approaches are required to advance embodied intelligence [15] - The industry is expected to see a shift towards larger model infrastructures and a focus on co-training methodologies to enhance model capabilities [11][14] - The ongoing development of data collection environments and pre-training methodologies will likely shape the future landscape of embodied intelligence research [15][16]
分割一切并不够,还要3D重建一切,SAM 3D来了
具身智能之心· 2025-11-21 00:04
Core Viewpoint - Meta has launched significant updates with the introduction of SAM 3D and SAM 3, enhancing the understanding of images in 3D and providing advanced capabilities for object detection, segmentation, and tracking in images and videos [2][6][40]. Group 1: SAM 3D Overview - SAM 3D is the latest addition to the SAM series, featuring two models: SAM 3D Objects and SAM 3D Body, both demonstrating state-of-the-art performance in converting 2D images into detailed 3D reconstructions [2][4]. - SAM 3D Objects allows users to generate 3D models from a single image, overcoming limitations of traditional 3D modeling that often relies on isolated or synthetic data [11][15]. - Meta has annotated nearly 1 million real-world images, generating approximately 3.14 million 3D meshes, utilizing a scalable data engine to enhance the quality and quantity of 3D data [20][26]. Group 2: SAM 3D Body - SAM 3D Body focuses on accurate 3D human pose and shape reconstruction from single images, maintaining high-quality performance even in complex scenarios with occlusions and unusual poses [28][30]. - The model is interactive, allowing users to guide and control predictions, enhancing accuracy and usability [29]. - A high-quality training dataset of around 8 million images was created to improve the model's performance across various 3D benchmarks [33]. Group 3: SAM 3 Capabilities - SAM 3 introduces promptable concept segmentation, enabling the model to detect and segment specific concepts based on text or example image prompts, significantly improving its performance in concept recognition [40][42]. - The architecture of SAM 3 builds on previous advancements, utilizing components like the Meta Perception Encoder and DETR for enhanced image recognition and object detection capabilities [42][44]. - SAM 3 achieves a twofold increase in cgF1 scores for concept recognition and maintains near real-time performance for images with over 100 detection targets, completing inference in approximately 30 milliseconds on H200 GPUs [44].
VLA+RL方向的同学可以看过来了~
具身智能之心· 2025-11-21 00:04
Group 1 - The article discusses the recruitment of instructors for courses and projects related to VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) within the community [1] - The community seeks candidates with a research focus on VLA and RL, preferably holding a PhD or currently enrolled in a doctoral program, and having experience in top conferences in the academic field [2] - For industry candidates, practical experience and hands-on debugging experience with real machines are desired [2] Group 2 - The company, referred to as "Embodied Intelligence," is the first comprehensive technical exchange community in China, gathering a large number of individuals focused on VLA and RL [3] - The company offers compensation above the industry average along with abundant industry resources for the recruited instructors [4] - For more detailed information, interested individuals are encouraged to add a specified WeChat contact for consultation [5]