具身智能之心 - filings, earnings calls, financial reports, news

具身智能之心

Search documents

具身智能之心· 2025-06-28 07:58

Group 1 - The article introduces a new discussion group focused on sim2real and sim2real2sim technologies, particularly in the fields of robotic arms, dual arms, quadrupeds, and humanoid robots [1] - The group aims to facilitate communication and sharing among industry professionals interested in these technologies [1] - The article emphasizes that the group will not allow any advertising or promotional content, ensuring a focused discussion environment [1]

清华90后博士厨房机器人融资数千万，获北京首张具身智能餐饮许可证

具身智能之心· 2025-06-28 07:48

Core Viewpoint - The article highlights the successful completion of a multi-million yuan Pre-A round financing for Xiangke Intelligent, which focuses on kitchen service robots, particularly the LAVA robot that has received significant market recognition and operational success [2][10]. Company Overview - Xiangke Intelligent was founded by Chen Zhen, a serial entrepreneur with a strong academic background in computer science from prestigious institutions [3][4]. - The company aims to leverage its expertise in robotics and artificial intelligence to automate kitchen operations, particularly in the fast-food sector [12]. Product Development - The LAVA robot has achieved notable operational milestones, including processing a peak of 1,732 orders in a single day and maintaining continuous operation for 190 days without faults [8]. - The robot can autonomously identify ingredients, determine cooking times, and learn new recipes, showcasing advanced capabilities in automation [8]. Market Strategy - Xiangke Intelligent plans to scale up production and deployment of the LAVA robot, with existing orders for a thousand units from overseas chain clients [10]. - The company is focusing on the Western fast-food market due to its higher standardization and automation potential compared to more complex cuisines like Chinese food [12]. Investment and Partnerships - The recent financing round attracted a prestigious group of investors, including Century Changhe Technology Group and NetDragon Tianying Venture Capital, indicating strong industry support [13][14]. - Xiangke Intelligent has established partnerships with academic institutions, such as Tsinghua University's Pearl River Delta Research Institute, to enhance its technological capabilities [15][18]. Entrepreneurial Journey - Chen Zhen's entrepreneurial journey includes founding Sukan Technology, which was acquired by Joyoung, and then establishing Xiangke Intelligent, reflecting a strategic approach to key industry developments [4][18]. - The core team comprises experienced professionals from previous ventures, ensuring a strong foundation in robotics and AI [18].

数据、算法和本体，小白入门很难绕开任何一个部分......

具身智能之心· 2025-06-28 07:48

硬件部分：预算足的实验室有经费购买20-30w的本体，预算不足的同学依赖3D打印自己制作机械臂或者采购性价比高的硬件平台，甚至在仿真里面做，研究比较受限。我们的具身社区针对这三个大的模块做了比较充足的分享，包括数据采集方案、本体、仿真以及算法部分，同时也给大家提供了几款高性价比的机械臂平台，助力研究。社区目标是3年内打造一个万人聚集的地方，这里也非常欢迎优秀的同学加入我们（目前已经有很多具身研究前沿的学者加入我们了）！我们和多家具身公司搭建了学术+产品+招聘完整的桥梁和链路，同时内部在教研板块也基本形成了闭环（课程 + 硬件 + 问答）。社区里也能看到很多最新的行业观点、技术输出。现在本体是怎么样的？有哪些不足？数据采集的成功率和有效率怎么提升？sim2real怎么做的有效点？这些都是我们一直关注的。入门具身离不开3个要素，数据+算法+本体，说实话很多同学只懂算法，甚至说懵懵懂！数据的采集更是需要经验，遥操和retargeting方案，很多人采集不到真实有效的数据。本体更是许多同学触不可及的东西，高性价比的平台和仿真是很多同学入门的第一步。数据部分：遥操采集依赖本体，成本较高。但前处理 ...

北航×新国立×上交发布RoboCerebra：长时序机器人操作推理的全新评测基准

具身智能之心· 2025-06-28 07:48

Core Insights - The article discusses the development of RoboCerebra, a new benchmark designed to evaluate long-horizon robotic manipulation tasks, emphasizing the need for collaboration between high-level planning (VLM) and low-level control (VLA) models [6][8][10]. Group 1: Background and Motivation - Recent advancements in visual-language models (VLM) have enabled robots to execute commands based on natural language, but as tasks become more complex, a dual system involving both a "brain" (VLM) for planning and a "controller" (VLA) for execution is necessary [6][7]. - Existing benchmarks often fail to assess the collaborative capabilities of these systems, leading to the creation of RoboCerebra to evaluate long-term planning and memory management [8]. Group 2: RoboCerebra Contributions - RoboCerebra includes a large-scale dataset and a systematic benchmark for assessing cognitive challenges related to planning, memory, and reflection in robotic tasks [10]. - The dataset construction process integrates automated generation and manual annotation to ensure high quality and scalability [10]. Group 3: Task Setting - The benchmark features long task sequences averaging 2,972 steps, with dynamic disturbances introduced to challenge the models' planning and recovery abilities [13]. - A top-down data generation pipeline utilizes GPT to create high-level tasks, which are then broken down into sub-goals and verified for feasibility [13][14]. Group 4: Evaluation Protocol and Metrics - RoboCerebra employs a four-dimensional evaluation system that includes success rate, plan match accuracy, plan efficiency, and action completion accuracy to assess the collaboration between VLM and VLA [15][21]. - The framework introduces anchor points to synchronize evaluation across different models, ensuring consistency in task execution [21]. Group 5: Experimental Results - The hierarchical framework demonstrates that the collaboration between VLM and VLA significantly improves task success rates, particularly in memory execution scenarios, with improvements exceeding 70% [27]. - The results indicate that neither the VLA nor the VLM alone can effectively handle long-horizon tasks, highlighting the necessity of their integration [27][28]. Group 6: Model Evaluation - GPT-4o outperforms other models in planning accuracy, task success rate, and plan efficiency, underscoring the importance of strong language reasoning capabilities in executing long-term tasks [30]. - In memory-related tasks, GPT-4o shows superior exploration and execution decision-making abilities compared to other models, indicating its robustness in understanding scenes and recalling memories [31].

Hierarchical Planning & Execution (HPE)

Hierarchical Planning & Execution (HPE)

OpenVLA

具身的秋招马上要开始了，去哪里抱团呀？

具身智能之心· 2025-06-28 07:48

Core Viewpoint - The article emphasizes the rapid advancements in AI technologies, particularly in autonomous driving and embodied intelligence, which have significantly influenced the industry and investment landscape [1]. Group 1: AutoRobo Knowledge Community - AutoRobo Knowledge Community is established as a platform for job seekers in the fields of autonomous driving, embodied intelligence, and robotics, currently hosting nearly 1,000 members from various companies [2]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and resume optimization services to assist members in their job search [2][3]. Group 2: Recruitment Information - The community regularly shares job openings in algorithms, development, and product roles, including positions for campus recruitment, social recruitment, and internships [3][4]. Group 3: Interview Preparation - A compilation of 100 interview questions related to autonomous driving and embodied intelligence is available, covering essential topics for job seekers [6]. - Specific areas of focus include sensor fusion, lane detection algorithms, and multi-modal 3D object detection, among others [7][12]. Group 4: Industry Reports - The community offers access to various industry reports that provide insights into the current state, development trends, and market opportunities within the autonomous driving and embodied intelligence sectors [13][14]. - Reports include analyses of successful and failed interview experiences, which can serve as valuable learning tools for candidates [15]. Group 5: Salary Negotiation and Professional Development - The community provides resources on salary negotiation techniques and shares foundational books related to robotics, autonomous driving, and AI to enhance members' professional knowledge [17][18].

具身智能之心· 2025-06-27 09:41

Core Viewpoint - The article promotes a comprehensive tutoring service for students facing challenges in research paper writing, particularly in cutting-edge fields such as multimodal large models, embodied intelligence, and robotics [2][3][4]. Group 1: Tutoring Services Offered - The service includes one-on-one customized guidance in various advanced research areas, including multimodal large models, visual-language navigation, and robot navigation [3][4]. - The tutoring team consists of PhD researchers from prestigious institutions like CMU, Stanford, and MIT, with experience in top-tier conference reviews [4]. - The tutoring process covers the entire research paper lifecycle, from topic selection to experimental design, coding, writing, and submission strategies [4]. Group 2: Target Audience and Benefits - The service targets students struggling with research topics, data modeling, and feedback from advisors, offering a solution to enhance their academic performance [2][5]. - The first 50 students to consult can receive a free matching with a dedicated tutor for in-depth analysis and tailored advice on conference and journal submissions [5]. - The focus is not only on publishing papers but also on the practical application and value of research outcomes in industrial and academic contexts [4].

ICCV 2025不完全汇总（具身/自驾/3D视觉/LLM/CV等）

具身智能之心· 2025-06-27 09:41

Group 1 - The article discusses the recent announcements from ICCV 2025, highlighting various works that have been accepted for presentation [1] - It emphasizes the importance of the "Embodied Intelligence" community in sharing insights and developments related to the accepted works [1] - The article encourages readers to join the community for timely updates on ongoing research and developments in the field [1] Group 2 - Several works related to embodied intelligence and autonomous driving are summarized, showcasing advancements in areas such as robotic manipulation and navigation [4][6] - The article lists various projects, including "GaussianProperty" and "DriveArena," which focus on integrating physical properties and generative simulation for autonomous driving [4] - It also mentions works on 3D reconstruction and visual recognition, indicating a broad range of research topics being explored [6][5]

具身智能之心· 2025-06-27 08:36

Core Insights - The article discusses the recent awards announced at the Robotics: Science and Systems (RSS) conference, highlighting significant advancements in robotics research and technology [2][3]. Group 1: Award Highlights - The conference took place from June 21 to 25 in Los Angeles, USA, and recognized multiple outstanding papers with various awards [3]. - The Outstanding Demo Paper Award was given for the paper titled "Demonstrating MuJoCo Playground," which presents an open-source robot learning framework aimed at simplifying simulation environment setup and model training [6][7]. - The Outstanding Systems Paper Award was awarded to "Building Rome with Convex Optimization," which introduces a new formula for enhancing 2D keypoint measurements to 3D and demonstrates improved reconstruction quality and speed [11][13][15]. - The Outstanding Student Paper Award recognized a paper that proposed a new multi-agent reinforcement learning algorithm, Def-MARL, which ensures safety in collaborative tasks among robots [17][19][23]. - The Outstanding Paper Award was given to "FEAST: A Flexible Mealtime-Assistance System Tackling In-the-Wild Personalization," which addresses the challenges of personalizing robotic assistance for feeding in real-world environments [28][30][31]. Group 2: Research Contributions - The MuJoCo Playground framework supports various robot platforms, enabling zero-shot simulation-to-reality transfer based on state observations or pixel-level inputs [6][7]. - The SBA (scaled bundle adjustment) formula proposed in the Outstanding Systems Paper enhances the accuracy of 3D reconstructions from 2D measurements [13]. - The Def-MARL algorithm focuses on minimizing global costs while maintaining safety constraints, demonstrating superior performance in simulations and real-world experiments [19][23]. - The FEAST system emphasizes adaptability, transparency, and safety, utilizing modular hardware and diverse interaction methods to cater to individual user needs [30][31].

Robotics Learning

Convex Optimization

Multi - Agent Reinforcement Learning

Multi - Agent Reinforcement Learning

Robotics

MuJoCo Playground

Def - MARL

保姆级具身智能实战：从零基础到强化学习与Sim2Real

具身智能之心· 2025-06-27 08:36

Core Viewpoint - The article discusses the unprecedented turning point in AI development, highlighting the rise of embodied intelligence and its potential to revolutionize various industries, including manufacturing, healthcare, and space exploration [1]. Group 1: Embodied Intelligence - Embodied intelligence is defined as AI systems that not only possess a "brain" but also have the capability to perceive and interact with the physical world [1]. - Major tech companies like Tesla, Boston Dynamics, OpenAI, and Google are actively investing in this transformative field [1]. Group 2: Technical Challenges - Achieving true embodied intelligence presents significant technical challenges, requiring advanced algorithms and a deep understanding of physical simulation, robot control, and perception fusion [2]. Group 3: MuJoCo's Role - MuJoCo (Multi-Joint dynamics with Contact) is identified as a critical technology for embodied intelligence, serving as a high-fidelity training environment for robot learning [4]. - It allows researchers to conduct millions of trials in a virtual environment, significantly speeding up the learning process and reducing costs associated with physical hardware [6]. Group 4: MuJoCo's Advantages - MuJoCo features advanced contact dynamics algorithms, supports parallel computation, and provides a variety of sensor models, making it a standard tool in both academia and industry [6][7]. - Major tech companies utilize MuJoCo for their robot research, indicating its importance in the field [7]. Group 5: Practical Training - A comprehensive MuJoCo development course is offered, focusing on practical applications and theoretical foundations, covering topics from physical simulation to deep reinforcement learning [8][9]. - The course is structured into six modules, each with specific learning objectives and practical projects, ensuring a solid grasp of embodied intelligence technologies [10][12]. Group 6: Project Examples - The course includes projects such as intelligent robotic arm control, vision-guided grasping systems, and multi-robot collaboration, allowing participants to apply their knowledge in real-world scenarios [14][21]. Group 7: Target Audience and Outcomes - The course is suitable for individuals with programming or algorithm backgrounds looking to enter the field of embodied robotics, as well as graduate and undergraduate students focused on robotics and reinforcement learning [27]. - Upon completion, participants will have a complete skill set in embodied intelligence, including technical, engineering, and innovative capabilities [28].

Embodied Intelligence

Robotics

MuJoCo

Embodied Intelligence

Robotics

MuJoCo

清华大学最新综述！具身AI中多传感器融合感知：背景、方法、挑战

具身智能之心· 2025-06-27 08:36

Core Insights - The article emphasizes the significance of embodied AI and multi-sensor fusion perception (MSFP) as a critical pathway to achieving general artificial intelligence (AGI) through real-time environmental perception and autonomous decision-making [3][4]. Group 1: Importance of Embodied AI and Multi-Sensor Fusion - Embodied AI represents a form of intelligence that operates through physical entities, enabling autonomous decision-making and action capabilities in dynamic environments, with applications in autonomous driving and robotic swarm intelligence [3]. - Multi-sensor fusion is essential for robust perception and accurate decision-making in embodied AI systems, integrating data from various sensors like cameras, LiDAR, and radar to achieve comprehensive environmental awareness [3][4]. Group 2: Limitations of Current Research - Existing AI-based MSFP methods have shown success in fields like autonomous driving but face inherent challenges in embodied AI applications, such as the heterogeneity of cross-modal data and temporal asynchrony between different sensors [4][7]. - Current reviews often focus on single tasks or research areas, limiting their applicability to researchers in related fields [7][8]. Group 3: Structure and Contributions of the Research - The article organizes MSFP research from various technical perspectives, covering different perception tasks, sensor data types, popular datasets, and evaluation standards [8]. - It reviews point-level, voxel-level, region-level, and multi-level fusion methods, focusing on collaborative perception among multiple embodied agents and infrastructure [8][21]. Group 4: Sensor Data and Datasets - Various sensor types are discussed, including camera data, LiDAR, and radar, each with unique advantages and challenges in environmental perception [10][12]. - The article presents several datasets used in MSFP research, such as KITTI, nuScenes, and Waymo Open, detailing their modalities, scenarios, and the number of frames [12][13][14]. Group 5: Perception Tasks - Key perception tasks include object detection, semantic segmentation, depth estimation, and occupancy prediction, each contributing to the overall understanding of the environment [16][17]. Group 6: Multi-Modal Fusion Methods - The article categorizes multi-modal fusion methods into point-level, voxel-level, region-level, and multi-level fusion, each with specific techniques to enhance perception robustness [21][22][23][24][28]. Group 7: Multi-Agent Fusion Methods - Collaborative perception techniques are highlighted as essential for integrating data from multiple agents and infrastructure, addressing challenges like occlusion and sensor failures [35][36]. Group 8: Time Series Fusion - Time series fusion is identified as a key component of MSFP systems, enhancing perception continuity across time and space through various query-based fusion methods [38][39]. Group 9: Multi-Modal Large Language Model (LLM) Fusion - The integration of multi-modal data with LLMs is explored, showcasing advancements in tasks like image description and cross-modal retrieval, with new datasets designed to enhance embodied AI capabilities [47][50].