具身智能之心
Search documents
开箱子,叠毛巾!从零把pi0部署到你的机械臂上吧!
具身智能之心· 2025-11-14 04:00
支持pi0部署了~ 最近刚把pi0任务打通,代码也会对客户正式开源,助力大家加速具身科研落地。感兴趣的同学可以关注下 ~ 面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完成 算法验证与项目开发。 对小白尤其友好的是: ✅ 提供全流程开源工具链+代码示例,从数据采集到模型部署一气呵成; ✅ 支持 Python / C++ 双语言接口,无论你擅长哪种语言都能快速上手; ✅ 兼容 ROS1 / ROS2,并提供 URDF 模型,仿真与真机无缝切换; ✅ 24小时快速售后响应,遇到问题不卡壳,学习路上有保障! 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 太贵的机械臂买不起,太便宜的又难 ...
具身智能年度盛会!完整议程公开,VLA、世界模型与RL三大研讨会同期开讲
具身智能之心· 2025-11-14 01:02
Group 1 - The 2025 China Embodied Intelligent Robot Conference (EAIRCon 2025) will be held on November 19 in Shenzhen, focusing on the wave of embodied intelligent robots [2][4] - The conference will feature a main forum, specialized forums, workshops, and an exhibition area, with the theme "Embodied Intelligence Awakens" [2][4] - Nearly 40 guests will deliver speeches, reports, and dialogues to comprehensively analyze the new wave of robot revolution driven by embodied intelligence [2][5] Group 2 - The main forum will start with a keynote address by the CEO of Zhiyi Technology, followed by a report from a prominent professor on the challenges and advancements in humanoid robots [6][7] - Keynote speakers include executives and scientists from various leading companies and research institutions, discussing topics such as human-level intelligence in robots and the transition from open-source to standardization in humanoid robotics [8][9] Group 3 - The specialized forum will delve into the industrialization opportunities and innovations in humanoid robots, featuring discussions on the implementation paths for embodied intelligent robots [15][16] - Workshops will cover topics such as robot imitation learning, embodied world models, and VLA (Vision-Language-Action) large models, with participation from young scholars and industry experts [24][25][26] Group 4 - The conference aims to address the key capabilities required for humanoid robots, including physical interaction and skill evolution, to facilitate their practical applications [20][21] - The event will also explore the integration of AI and robotics, emphasizing the importance of real-world data and multi-modal information for the development of embodied intelligence [21][22]
港科大等团队提出WMPO:基于世界模型的VLA策略优化框架
具身智能之心· 2025-11-14 01:02
Core Insights - The article introduces WMPO (World Model-based Policy Optimization), a framework developed by Hong Kong University of Science and Technology and ByteDance Seed team, which enhances sample efficiency, task performance, generalization ability, and lifelong learning through pixel-level video generation for VLA (Vision-Language-Action) models [5][25]. Research Background and Pain Points - Existing solutions struggle to balance scalability and effectiveness, with human intervention requiring continuous supervision and high costs for adapting simulators to diverse scenarios [4]. - Traditional latent space world models misalign with web-scale pre-trained visual features, failing to fully leverage pre-trained knowledge [4] [6]. Core Framework Design - WMPO's logic is based on generating trajectories in an "imagination" space using high-fidelity pixel-level world models, replacing real environment interactions and supporting stronger on-policy reinforcement learning [5][11]. - The iterative process follows "imagination trajectory generation → trajectory sampling evaluation → policy update" [5]. Key Modules - **Generative World Model**: Simulates dynamic changes between the robot and the environment, generating visual trajectories aligned with VLA pre-trained features [8]. - **Lightweight Reward Model**: Automatically assesses the success or failure of imagined trajectories, providing sparse reward signals to avoid complex reward shaping [9]. - **On-Policy Policy Optimization (GRPO)**: Adapts Group Relative Policy Optimization for sparse reward scenarios, balancing stability and scalability [10]. Core Innovations - **Pixel Space Priority**: Directly generates trajectories in pixel space, perfectly matching VLA pre-trained visual features and maximizing the value of pre-trained knowledge [11]. - **Trajectory Generation Logic**: Predicts action blocks based on initial frames and language instructions, generating subsequent frames iteratively [12]. - **Dynamic Sampling Strategy**: Generates multiple imagined trajectories from the initial state, filtering out all-success or all-failure trajectories to ensure effective training samples [12]. Experimental Validation and Key Results - In simulation environments, WMPO outperformed baseline methods (GRPO, DPO) across four fine manipulation tasks, achieving an average success rate of 47.1% with a rollout budget of 128, and 57.6% with a budget of 1280, demonstrating superior sample efficiency [13][14]. - In real environments, WMPO achieved a success rate of 70% in a "block insertion" task, significantly higher than baseline strategies [15]. Emergent Behaviors - WMPO exhibits self-correcting capabilities, autonomously adjusting actions in response to failure states, unlike baseline strategies that continue erroneous actions until timeout [17]. Generalization Ability - WMPO demonstrated an average success rate of 29.6% in out-of-distribution scenarios, outperforming all baseline methods, indicating its learning of general operational skills rather than false visual cues [19][20]. Lifelong Learning - WMPO showed stable performance improvement through iterative collection of trajectories, while DPO struggled with instability and required more expert demonstrations [23]. Conclusion and Significance - WMPO establishes a new paradigm for VLA optimization by integrating world models with on-policy reinforcement learning, addressing high costs and low sample efficiency in real environment interactions. It enhances performance, generalization, and lifelong learning capabilities, paving the way for scalable applications in general robotic operations [25].
头部的具身公司,正在投资其它公司了......
具身智能之心· 2025-11-14 01:02
Core Viewpoint - The article discusses the investment activities of various companies in the embodied intelligence sector, highlighting their strategies to secure key technologies and supply chains for competitive advantage [2][3][4]. Group 1: Company Investments - Zhiyuan Robotics has been actively preparing for an IPO while investing in over 30 companies, focusing on upstream key technologies, product supply chains, and downstream markets [3]. - Xinghai Map has recently invested in Jianzhixinchuang (Beijing) Robot Technology Co., Ltd., which provides a one-stop service for "data + deployment" [6]. - Zhujidi Power has invested in Shanghai Wujitech, which is responsible for the research and development of high-performance motors and dexterous hands [7]. - Songyan Power has invested in Silicon-based Wisdom (Beijing) Robot Co., Ltd., which focuses on the development of companion and elderly care robots [8].
李飞飞3D世界模型公测,网友已经玩疯了
具身智能之心· 2025-11-14 01:02
Core Insights - The article discusses the launch of a new 3D world generation model called Marble, developed by Fei-Fei Li's World Lab, which allows users to easily create personalized 3D worlds without needing a professional team [3][5][15]. Group 1: Model Features - Marble enables users to generate 3D worlds using simple text prompts, single images, or even short videos, making it accessible to the general public [5][17]. - The model includes built-in AI editing tools that allow users to make both minor and major modifications to their created worlds, such as removing objects or changing visual styles [21][25]. - Users can export their created worlds in two formats: high-fidelity Gaussian point clouds for rendering in browsers and triangle meshes for compatibility with various industry-standard tools [29][40]. Group 2: User Experience - The model has received positive feedback for its ease of use, with users quickly sharing their creations online [8][15]. - Marble supports multi-modal input, allowing for a variety of ways to create and edit 3D environments, which enhances user engagement and creativity [34][35]. Group 3: Future Developments - The team plans to focus on enhancing interactivity in future iterations of Marble, enabling real-time interactions within the created 3D worlds [36][37]. - The article emphasizes that Marble is a significant step towards achieving a "truly spatially intelligent world model," which will incorporate capabilities for dynamic interaction and evolution over time [40].
首款移动操作机器人!宇树正式发布G1-D
具身智能之心· 2025-11-13 13:04
Core Viewpoint - Yushu Technology has launched its first wheeled humanoid robot G1-D, marking a significant step from technology demonstration to practical application in various scenarios [2]. Group 1: Product Features - The G1-D robot combines the efficiency of wheeled movement with the flexibility of humanoid design [2]. - It includes a complete data collection training solution, enhancing its usability in real-world applications [2]. - The robot features a high-definition dual-camera system, interchangeable end effectors, and a single-degree-of-freedom gripper [4]. - The height of the robot can be adjusted between approximately 1260mm to 1680mm, and it can be equipped with a mobile chassis that allows for a maximum speed of 1.5m/s [4].
头部的具身公司,正在投资其它公司了......
具身智能之心· 2025-11-13 05:46
Core Insights - The article discusses the growing trend of companies in the embodied intelligence sector investing in various startups to secure core technologies and enhance their competitive edge in the market [2][3]. Investment Activities - Zhiyuan Robotics has been actively preparing for its IPO while simultaneously investing in over 30 companies across the supply chain, from upstream key technologies to downstream market applications [2]. - Galaxy General has shown interest in a new company, Lanyue Power, which focuses on industrial logistics robotics [4]. - Xinghai Map has recently invested in Jianzhixinchuang (Beijing) Robotics Technology Co., Ltd., which provides a one-stop service for "data + deployment" [5]. - Zhujidi Power has invested in Shanghai Wujizhi Technology, which specializes in the production and research of high-performance motors and dexterous hands [6]. - Songyan Power has invested in Silicon-based Wisdom (Beijing) Robotics Co., Ltd., which is engaged in the development of companion and elderly care robots [7].
谁在带队小鹏机器人:IRON背后的四位关键人物
具身智能之心· 2025-11-13 02:05
Core Viewpoint - The article discusses the development and significance of Xiaopeng Motors' humanoid robot "IRON," highlighting the key figures behind its success and the strategic direction of the company in the field of embodied intelligence. Group 1: Key Figures in Xiaopeng Robotics - Mi Liangchuan is identified as the core leader of Xiaopeng Robotics, responsible for overseeing the technical direction and product implementation of the humanoid robot project [6][20]. - Mi's background includes significant experience in autonomous driving and AI, having joined Xiaopeng in 2021 and rapidly advancing to leadership roles [15][18]. - Other notable team members include Chen Jie, an expert in reinforcement learning, and Ge Yixiao, the founding director of the intelligent mimicry department, both of whom bring substantial academic and industry experience to the team [44][51]. Group 2: Development of the IRON Robot - The design of IRON is inspired by human anatomy, particularly its spine and muscle structure, which contributes to its advanced movement capabilities [10][12]. - The robot's development faced challenges, including a significant internal debate on whether to pursue humanoid robotics, which was ultimately resolved in favor of this direction due to the rise of AI technologies [85][88]. - The team has grown from a peak of 300 members to over 200, indicating a recovery and renewed focus on humanoid robotics after initial setbacks [98]. Group 3: Strategic Direction of Xiaopeng Motors - Xiaopeng Motors aims to establish humanoid robots as a third growth curve alongside smart cars and flying vehicles, reflecting a strategic pivot towards embodied intelligence [99]. - The company has accumulated significant financial resources, with nearly 50 billion RMB available for research and development, facilitating its ambitious projects in robotics [46]. - The article draws parallels between Xiaopeng Motors and Tesla, suggesting that Xiaopeng is positioning itself similarly in the robotics market as it did in the automotive sector [101][110].
如果Policy模型也能动态思考推理,是否能让机器人在真实世界中表现得更好?
具身智能之心· 2025-11-13 02:05
Core Insights - The article introduces EBT-Policy (Energy-Based Transformer Policy), a new strategy architecture based on Energy-Based Models (EBM), which enhances robot performance in real-world scenarios by enabling dynamic reasoning and understanding of uncertainty [2][6]. Group 1: EBT-Policy Overview - EBT-Policy significantly improves training and inference efficiency, showcasing a unique "zero-shot retry" capability [4]. - The model learns an energy value to assess the compatibility between input variables, optimizing the energy landscape during language modeling tasks [5]. - EBT-Policy outperforms traditional Diffusion Policy in both simulated and real-world tasks, reducing computational requirements by up to 50 times [6][18]. Group 2: Key Features and Advantages - The model minimizes energy through multiple forward passes during inference, adjusting computational resources based on problem difficulty [8]. - EBT-Policy's emergent retry behavior allows it to recover from errors by dynamically redirecting itself towards lower energy states [10]. - Compared to Diffusion Policy, EBT-Policy requires only 2 steps for inference, while Diffusion Policy typically requires around 100 steps [11]. Group 3: Performance Metrics - In real-world tasks, EBT-Policy demonstrated superior performance, achieving scores of 86, 75, and 92 in tasks like "Fold Towel," "Collect Pan," and "Pick And Place," respectively, compared to Diffusion Policy's lower scores [17]. - The convergence speed during training improved by approximately 66%, and the model's inference process is significantly more efficient [18]. Group 4: Future Outlook - The research team plans to continue optimizing hyperparameters and model scale, expecting further performance enhancements as more experimental data is collected [22].
传统导航与视觉语言/目标导航有什么区别?
具身智能之心· 2025-11-13 02:05
Core Insights - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation [2] - The technology has been successfully implemented in various verticals, enhancing service efficiency in delivery, healthcare, and hospitality sectors [4] - The evolution of goal-driven navigation can be categorized into three generations, each showcasing advancements in methodologies and technologies [6][8][10] Group 1: Technology Overview - Goal-Oriented Navigation is a key aspect of embodied navigation, relying on language understanding, environmental perception, and path planning [2] - The transition from explicit instruction-based navigation to autonomous decision-making involves semantic parsing, environmental modeling, and dynamic decision-making [2] - The technology has been integrated into delivery robots, service robots in healthcare and hospitality, and humanoid robots for various applications [4] Group 2: Technical Evolution - The first generation focuses on end-to-end methods using reinforcement and imitation learning, achieving breakthroughs in Point Navigation and image navigation tasks [6] - The second generation employs modular methods that explicitly construct semantic maps, enhancing performance in zero-shot object navigation tasks [8] - The third generation integrates large language models (LLMs) and visual language models (VLMs) to improve exploration strategies and open-vocabulary target matching [10] Group 3: Challenges and Learning Opportunities - The complexity of embodied navigation requires knowledge across multiple domains, making it challenging for newcomers to enter the field [11] - A new course has been developed to address these challenges, providing a structured learning path and practical applications [11][12] - The course aims to build a comprehensive understanding of goal-oriented navigation, covering theoretical foundations and practical implementations [12][13]