Workflow
具身智能之心
icon
Search documents
坚持“具身大脑”与“人形本体”软硬⼀体!这家具身公司再获10亿融资
具身智能之心· 2025-11-20 10:52
Core Viewpoint - The article highlights the successful A+ round financing of Star Motion Era, amounting to nearly 1 billion yuan, led by Geely Capital and supported by several strategic investors, which will enhance the development and application of their embodied AI model, ERA-42 [1][14]. Group 1: Financing and Business Growth - Star Motion Era completed an A+ round financing of nearly 1 billion yuan, with Geely Capital leading the investment [1]. - The company has achieved a total order value exceeding 500 million yuan, with the largest single order in logistics nearing 50 million yuan [2][3]. - The business strategy focuses on domestic applications of embodied intelligence solutions while expanding into international markets, resulting in a diversified business landscape [2][3]. Group 2: Technological Advancements - The ERA-42 model has achieved precise control over full-sized humanoid robots and dexterous hands, with applications in logistics and commercial services [1][7]. - Star Motion Era has developed the world's first integrated world model VLA, enhancing the intelligence of their models through a positive feedback loop of "model - entity - scene data" [4][6]. - The company plans to release a new algorithm framework, VPP, which will allow robots to understand the physical world using vast amounts of internet video data [6]. Group 3: Product Development and Applications - The company has established three major product lines covering various scenarios, with over 95% of hardware developed in-house [8][12]. - Star Motion Era's humanoid robot, L7, has achieved significant milestones in performance, including winning a high jump championship and setting a long jump world record [13]. - The service robot Q5 is being utilized in various enterprises and events for tasks such as guiding, delivering, and providing customer service [13].
港中文最新!无需微调即可部署VLA模型
具身智能之心· 2025-11-20 04:02
Core Insights - The article introduces VLA-Pilot, a plug-and-play inference-time strategy that enhances the deployment of pre-trained VLA models in real-world robotic tasks without requiring additional fine-tuning or data collection [4][6][35] - VLA-Pilot significantly improves the success rate of pre-trained VLA strategies across diverse tasks and robot forms, demonstrating robust zero-shot generalization capabilities [4][6] Current Issues - Pre-trained VLA strategies often experience performance degradation when deployed in downstream tasks, which can be mitigated through fine-tuning, but this approach is costly and impractical in real-world scenarios [4][5] - Existing methods for guiding pre-trained strategies during inference have limitations, including the need for additional training and reliance on fixed candidate action sets, which may not align with task contexts [5][6] Innovations and Methodology - VLA-Pilot utilizes a multi-modal large language model (MLLM) as an open-world validator to enhance generalization and employs an evolutionary diffusion process for action optimization, improving task alignment [6][10] - The method includes an embodied policy steering chain (EPS-CoT) module that infers guiding target rewards from task contexts without requiring task-specific training [11][12] - An iterative guiding optimization mechanism is integrated to ensure closed-loop corrections, enhancing the precision and contextual relevance of the guiding process [20][21] Experimental Analysis - VLA-Pilot was evaluated using a dual-arm system, demonstrating superior performance compared to six baseline methods in both in-distribution and out-of-distribution tasks [23][24] - The experiments included six downstream tasks, with metrics such as operation success rate (MSR) and guiding objective alignment (SOA) used to assess performance [26][27] - Results showed that VLA-Pilot outperformed all baseline methods in in-distribution tasks and exhibited robust generalization capabilities in out-of-distribution tasks [28][31] Comparative Performance - In in-distribution tasks, VLA-Pilot achieved an overall MSR of 0.62 and SOA of 0.73, outperforming all baseline methods [30] - For out-of-distribution tasks, VLA-Pilot demonstrated a significant success rate of 0.50, indicating strong adaptability to unseen scenarios [32] Conclusion - VLA-Pilot effectively maximizes the utility of existing VLA models during inference, providing a scalable and data-efficient solution for robotic operations [35]
从纯小白到具身算法工程师的打怪之路
具身智能之心· 2025-11-20 04:02
Core Insights - The article discusses the evolution and research directions in Visual Language Action (VLA), Visual Language Navigation (VLN), and reinforcement learning in robotics, highlighting the importance of these technologies in enhancing robot capabilities and performance [1][2][5][9]. VLA Direction - VLA systems consist of visual perception processing, language instruction understanding, and action strategy networks, categorized into three paradigms: explicit end-to-end VLA, implicit end-to-end VLA, and hierarchical end-to-end VLA [1][2]. - Explicit end-to-end VLA compresses visual and language information into a joint representation, which is then mapped to action space, leveraging various architectures and models to achieve good performance [1]. - Implicit end-to-end VLA focuses on interpretability by predicting future states using video diffusion models, enhancing the potential for scaling VLA models [2]. - Hierarchical end-to-end VLA aims to utilize the characteristics of large models to improve generalization while maintaining efficiency for downstream execution [2]. VLN Direction - VLN systems are composed of visual language encoders, environmental history representation, and action strategies, requiring effective information compression from visual and language inputs [5][6]. - The choice of encoder and whether to project visual and language representations into a common space are critical issues, with current trends favoring pre-trained models on large datasets and the use of large language models (LLM) for instruction decomposition [6]. - VLN robots operate in a sequential decision-making task, accumulating historical information to inform future actions, with implicit methods representing past information as latent variables [6]. - Object Navigation within VLN emphasizes identifying target objects based on category information, reducing the need for detailed instructions and enhancing exploration capabilities [7]. Reinforcement Learning & Legged Robots - Reinforcement learning is crucial for legged robots, covering various aspects such as kinematics, dynamics, multi-modal sensor fusion, and advanced algorithms for task adaptation [9][10]. - Key areas include gait planning, balance control for bipedal robots, and the application of deep reinforcement learning and imitation learning for multi-task training [10]. - Techniques like domain randomization and safety mechanisms are essential for ensuring successful real-world deployment of robotic systems [10]. Diffusion Policy - The introduction of diffusion models in robotics has led to significant advancements, with the Diffusion Policy achieving an average performance improvement of 46.9% in various simulation environments [21][22]. - The Robotic Diffusion Transformer (RDT), with 1.2 billion parameters, showcases strong zero-shot generalization capabilities and the ability to learn new skills with minimal examples [22]. - The application of diffusion strategies is expanding beyond robotic manipulation to areas like autonomous navigation and dexterous grasping, enhancing task success rates through real-time environmental adaptation [22][23]. - Recent developments in diffusion strategies include advancements in 3D applications and the integration of safety and online reinforcement learning, opening new research avenues [23].
世界第一「空间智能」引擎!苹果没做成的,95后博士拿下了
具身智能之心· 2025-11-20 00:03
编辑丨 新智元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 【导读】 2024年以来,从苹果Vision Pro将「空间计算」推向巅峰,到Peloton Guide、Nex Playground等「AI摄像头」硬件开始在小规模市场 中验证可行性,风口已然初现。 一个共识正在全球科技行业悄然形成:AI驱动的体感交互是下一个浪潮。 但一个根本性的矛盾也随之暴露:前者价格高昂,动辄数千美元,将普罗大众拒之门外;后者虽然价格稍低,却仍需用户购买专属的硬件盒子,本质上 没有摆脱「外设」的束缚。 市场在呼唤一个更轻、更普惠的解决方案。 当我们回归真实的居家娱乐场景时,会发现用户体验是极度「割裂」的。 要么是花重金买来的游戏主机,在短暂的新鲜感过后,最终难逃在角落里「吃灰」的命运;要么是面对主机里那些同质化严重、缺乏实时反馈的游戏内 容。 这是一个消费者极度渴望互动,却被高昂硬件和贫瘠内容双重束缚的市场。 在此背景下,一家名为「飞拓星驰」(下文简称「FitX」)的中国 ...
解决特斯拉「监督稀疏」难题,用世界模型放大自动驾驶的Scaling Law
具身智能之心· 2025-11-20 00:03
Core Insights - The article discusses the challenges faced by VLA models in autonomous driving, particularly the issue of "supervision deficit" due to sparse supervisory signals compared to high-dimensional visual input [3][7][8] - A new research paper titled "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving" proposes a solution by introducing world models to provide dense self-supervised signals, enhancing the model's learning capabilities [3][9][16] Group 1: Supervision Deficit - VLA models struggle with a "supervision deficit," where the input is dense visual information but the supervisory signals are sparse, leading to wasted representational capacity [7][8] - The research indicates that performance of VLA models saturates quickly with increased data under sparse supervision, diminishing the effects of Data Scaling Law [8][22] Group 2: Solution through World Models - The proposed solution involves using world models to generate dense self-supervised training tasks, such as predicting future images, which compels the model to learn the dynamics of the environment [10][14][15] - This approach provides richer learning signals compared to relying solely on sparse action supervision, effectively addressing the supervision deficit [15][16] Group 3: Amplification of Data Scaling Law - The core contribution of the research is the discovery that world models can significantly amplify the effects of Data Scaling Law, leading to better performance as data scales up [17][21] - Experimental results show that DriveVLA-W0 outperforms baseline models, with a notable performance improvement as data increases, particularly at scales from 700K to 70M frames [21][23] Group 4: Performance and Efficiency - DriveVLA-W0 is designed to be practical, addressing the high latency issues in VLA models by introducing a lightweight MoE "action expert" architecture, reducing inference latency to 63.1% of the baseline VLA [26][27] - The integration of world models resulted in a 20.4% reduction in collision rates at 70M frames, demonstrating a qualitative improvement beyond merely increasing action data [24][29]
从零将π0.5部署到具身机械臂上!
具身智能之心· 2025-11-20 00:03
Core Viewpoint - The article discusses the launch of the Imeta-Y1, a lightweight and cost-effective robotic arm designed for beginners and researchers in the field of embodied intelligence, emphasizing its open-source capabilities and user-friendly features [2][3][20]. Group 1: Product Features - Imeta-Y1 is designed specifically for novices and researchers, providing a low-cost and efficient solution for algorithm validation and project development [3]. - The robotic arm offers a complete open-source toolchain and code examples, facilitating a seamless process from data collection to model deployment [4][20]. - It supports dual-language interfaces (Python and C++) and is compatible with ROS1 and ROS2, allowing users to quickly get started regardless of their programming background [4][21]. Group 2: Technical Specifications - The robotic arm has a weight of 4.2 KG, a rated load of 3 KG, and features 6 degrees of freedom with a working radius of 612.5 mm and a repeatability precision of ±0.1 mm [9][22]. - It operates on a 24V power supply and utilizes CAN communication, with a compact design suitable for embedded AI and robotic learning platform development [6][9]. Group 3: Development and Support - The product provides a comprehensive SDK that includes drivers, API interfaces, sample code, and documentation, supporting rapid application development [33]. - Users can leverage the URDF model for real-time interaction between simulation environments like Gazebo and physical devices, significantly reducing development risks and debugging costs [25][39]. - The company offers timely after-sales support, with a commitment to respond within 24 hours and a warranty period of six months for non-human damage [51][52].
如何构建通用具身导航大模型?
具身智能之心· 2025-11-20 00:03
Core Insights - The article discusses advancements in general navigation models within the field of embodied intelligence, highlighting the transition from task-specific navigation systems to more universal models that can handle a variety of tasks and environments [2][5]. Group 1: Navigation Models - The Uni-NaVid model represents a cross-task navigation framework that aims to enhance the capabilities of navigation systems beyond specific tasks [5][6]. - The NavFoM model is a cross-ontology navigation framework that further expands the application of navigation algorithms to various real-world scenarios, including visual obstacle avoidance and urban micro-mobility [2][5]. Group 2: Applications and Challenges - Current navigation systems struggle with unstructured, dynamic environments and complex tasks requiring language understanding, which traditional systems cannot adequately address [2][5]. - The introduction of navigation large models is seen as a pathway to achieving embodied intelligence by broadening the scope of navigation algorithms from specialized capabilities to general intelligent mobility [2][5]. Group 3: Event Details - A live session featuring Zhang Jiazhao, a PhD student from Peking University, will take place on November 20 from 19:30 to 20:30, focusing on the exploration of general navigation models [5][6]. - The session will cover specific applications of the navigation models, including TrackVLA++, UrbanVLA, and MM-Nav, showcasing their practical implementations [6].
适配简单、效率高!U-Arm:你的具身通用遥操臂来啦~
具身智能之心· 2025-11-19 10:00
Group 1 - The core concept of U-Arm is to address the pain points of traditional remote operation devices, which include high costs, low efficiency, and compatibility issues, by providing a high-performance, cost-effective, and open-source solution [1][4]. - U-Arm's core advantages can be summarized into four dimensions: stability, universality, cost-effectiveness, and openness [2][5]. Group 2 - U-Arm is designed specifically for embodied intelligence research and multi-scenario remote operation needs, breaking through traditional remote operation device limitations [4]. - The device features a dual-axis fixed joint design for stability, a lightweight yet impact-resistant body made of 4mm thick resin, and compatibility with 95% of commercial robotic arms [7][8]. Group 3 - U-Arm significantly reduces the initial investment required for remote operation solutions, priced at only 1999 yuan per unit, compared to traditional devices that can cost tens of thousands of dollars [17][18]. - The product includes a complete set of accessories, ensuring no hidden costs for users [8]. Group 4 - U-Arm's modular design allows for easy adaptation to various robotic arms, eliminating the need for separate remote operation devices for different arms [10][15]. - The device supports three core configurations that cover a wide range of mainstream robotic arms, ensuring compatibility and ease of use [11][12]. Group 5 - U-Arm enhances data collection efficiency by 39% compared to traditional methods, providing high-quality data for model training [11]. - The open-source nature of U-Arm allows for customization and supports educational practices, making it suitable for research and teaching [8][17]. Group 6 - The assembly process for U-Arm is straightforward, requiring no specialized technical skills, and includes clear steps for setup and initial operation [25][27]. - U-Arm provides a user-friendly experience, allowing beginners to quickly learn and operate the device with available examples and resources [28]. Group 7 - U-Arm offers a warranty for product quality, allowing for returns within two weeks for structural issues, ensuring customer satisfaction [29].
调研一下!你们最想关注具身的哪个方向?
具身智能之心· 2025-11-19 04:01
Group 1 - The company is preparing a comprehensive research report on the embodied industry, expected to be released in the first quarter of next year [1] - The report will cover various aspects including financing, industry trends, policies, algorithms, implementation, and exports related to embodied companies [1] - The company is conducting a survey to understand which topics the audience is most interested in, allowing for multiple selections [2] Group 2 - Key areas of focus for the survey include domestic and international embodied industry conditions, financing and business situations of embodied companies, data collection related to embodiment, algorithm optimization and deployment, edge chips for robotics, downstream industry development, talent structure and demand in the embodied industry, and guidance for the listing of embodied companies [4]
全球首个量产绳驱AI机器人公司完成数亿元A++轮融资!
具身智能之心· 2025-11-19 00:34
Core Viewpoint - Astribot, a leading company in rope-driven AI robotics, has successfully completed multiple rounds of financing to enhance its research and development capabilities and commercialize its technology [2][4]. Financing and Investment - Astribot completed a significant A++ round financing amounting to hundreds of millions, led by Guoke Investment and Ant Group, with participation from various financial institutions and industry capital [2]. - The company has seen continuous investment from existing shareholders, including Ant Group and Jinqiu Fund, indicating strong confidence in its growth potential [2]. Product and Technology - Astribot is recognized as the first company globally to achieve mass production of rope-driven AI robots, utilizing a unique design that mimics human tendon movement, allowing for high flexibility and precision [4][6]. - The company has developed a comprehensive platform for embodied intelligence, integrating top-tier robotics, remote operation, and efficient modeling [6]. Market Deployment and Applications - Astribot has secured thousands of orders across various high-value sectors, including research, cultural tourism, commercial services, and industrial logistics [9]. - The company has launched a new generation of cultural tourism robots in collaboration with Jinma Leisure, marking a significant step in the commercialization of humanoid robots in the entertainment sector [9][10]. Strategic Partnerships and Collaborations - Astribot has established partnerships with major industry players such as ByteDance, Tencent, and JD.com to accelerate the deployment of its rope-driven AI robots across multiple scenarios [10]. - The company is actively building an open research ecosystem by collaborating with top-tier institutions like MIT and Tsinghua University to validate and enhance its embodied intelligence technology [10]. Leadership and Vision - The CEO of Astribot emphasizes the importance of AI and robotics development through a collaborative approach between software and hardware, aiming to integrate AI robots into real-world applications [14]. - Investors express strong confidence in Astribot's unique capabilities and its potential to revolutionize the robotics industry with its innovative rope-driven technology [14].