具身智能之心
Search documents
AI Lab发布『书生』具身全栈引擎,推动机器人大脑进入量产时代
具身智能之心· 2025-07-28 13:19
Core Viewpoint - Shanghai AI Laboratory has launched the "Intern-Robotics" embodied full-stack engine, addressing key challenges in the embodied intelligence sector and promoting a shift from fragmented development to full-stack mass production [3][4][9]. Group 1: Technological Innovations - Intern-Robotics integrates virtual simulation modeling, real-virtual data connectivity, and training-testing integration, creating a comprehensive solution for the entire chain of embodied intelligence from data collection to application [4][10]. - The engine allows for the development of a single model that can adapt to over 10 types of robotic forms, significantly enhancing the efficiency of model training and deployment across different robot types [6][9]. - Data collection costs have been reduced to 0.06% compared to previous solutions, thanks to the integration of real machine data and virtual synthesized data [6][10]. Group 2: Addressing Industry Challenges - The embodied intelligence field faces three main bottlenecks: lack of unified standards, high data costs, and long R&D cycles. Intern-Robotics provides systematic solutions to these issues [9][10]. - The engine supports six major tasks and over 20 datasets, enabling efficient training and evaluation, thus significantly shortening the development cycle [10][11]. Group 3: Collaborative Initiatives - The "Embodied Intelligence Photosynthesis Plan" has been initiated to empower training centers, robotic companies, and developer communities, fostering innovation and technology breakthroughs [5][20]. - The plan has already attracted 15 organizations, including leading robotics companies, to collaborate on the development and training using Intern-Robotics [5][20]. Group 4: Engine Components - Intern-Robotics consists of three core engines: simulation, data, and training-testing, which together meet the full-stack production needs of embodied intelligence [11][14]. - The simulation engine allows for easy switching of scenarios, robots, and evaluation metrics, significantly lowering the learning curve for developers [13][14]. - The data engine combines physical simulation and generative AI to create high-quality, low-cost data, enhancing the diversity and quality of training datasets [14][15].
找不到合适的公司与岗位?具身智能之心求职交流群来啦!!!
具身智能之心· 2025-07-28 07:14
Group 1 - The article announces the formal operation of a job-seeking community focused on the embodiment industry, responding to requests from fans [1] - The community will primarily discuss topics related to the embodiment industry, including companies, product development, job seeking, and career transitions [1] - The article encourages individuals interested in networking with industry peers and staying updated on the industry to join the community [1]
从今年的WAIC25看具身智能的发展方向!
具身智能之心· 2025-07-28 07:14
Core Insights - The article highlights the development direction of embodied intelligence showcased at the World Artificial Intelligence Conference (WAIC) 2025, with a particular focus on embodied intelligence and autonomous driving, noting a significant increase in the number of participating companies and diverse product forms [1][8]. Group 1: Embodied Intelligence Developments - The event featured various applications of mobile operations, including service and industrial robots, although challenges in cognitive recognition under human intervention were noted [3]. - Companies like Lingxin and Aoyi Technology showcased their dexterous hands, indicating a positive overall shipment performance and standardization of tactile and force control solutions [7]. - Many humanoid robots demonstrated remote control operations, with claims of achieving autonomous navigation and decision-making still lacking stability [8]. Group 2: Industry Trends and Community Engagement - The transition from demo showcases to a more integrated industrial model was observed, with companies focusing on a full-stack process from data to strategy and system deployment, enhancing commercialization efforts [8]. - The article introduces the "Embodied Intelligence Heart Knowledge Planet," a community aimed at facilitating technical exchanges among nearly 200 companies and institutions in the field [10][20]. - The community offers resources such as technical routes, open-source projects, and job sharing, catering to both newcomers and experienced researchers in embodied intelligence [15][19][21]. Group 3: Educational and Research Resources - The community has compiled a comprehensive list of over 30 technical routes and various resources for learning and research in embodied intelligence, including data sets and simulation platforms [21][22]. - Regular discussions and roundtables are organized to address common questions and share insights on the latest advancements in the field [23][24]. Group 4: Job Opportunities and Networking - The community provides job recommendations and networking opportunities, connecting members with industry leaders and potential employers [24][19]. - Members can freely ask questions regarding career choices and research directions, fostering a supportive environment for professional growth [77].
准备扩大具身团队了,拉一些人搞点事.......
具身智能之心· 2025-07-28 07:14
Core Viewpoint - The rapid development of embodied intelligence is acknowledged, with several leading companies preparing for IPOs, emphasizing the importance of collaboration and communication within the industry [1] Group 1: Collaboration and Industry Development - The company encourages active communication among industry players to overcome technological isolation and foster overall industry growth [1] - A platform is being developed to gather talent from the entire industry, aiming to invite influential figures to join in advancing the sector [1] Group 2: Project Collaboration - The company is establishing project research teams in major cities including Beijing, Shanghai, Shenzhen, Guangzhou, Hangzhou, and Wuhan, with opportunities for part-time involvement [3] - Each city aims to recruit around 10 individuals with over 2 years of experience in embodied algorithms and robotics research [4] Group 3: Education and Consulting Services - The company invites industry experts to create online courses and consulting services in the field of embodied intelligence [5] - Specific areas of interest include large models, multi-modal models, reinforcement learning, and robot motion planning, among others [5][6] Group 4: Compensation and Recruitment - The company offers significant profit-sharing and resource sharing across the industry, with options for both part-time and full-time positions [7] - A preference for candidates with a PhD or equivalent experience in the industry is stated [6]
清华大学具身智能多传感器融合感知综述
具身智能之心· 2025-07-27 09:37
Group 1 - The core viewpoint of the article emphasizes the significance of multi-sensor fusion perception (MSFP) in embodied AI, highlighting its role in enhancing perception capabilities and decision-making accuracy [5][6][66] - Embodied AI is defined as an intelligent form that utilizes physical entities as carriers to achieve autonomous decision-making and action capabilities in dynamic environments, with applications in autonomous driving and robotic clusters [6][7] - The article discusses the necessity of multi-sensor fusion due to the varying performance of different sensors under different environmental conditions, which can lead to more robust perception and accurate decision-making [7][8] Group 2 - The article outlines the limitations of current research, noting that existing surveys often focus on single tasks or fields, making it difficult for researchers in other related tasks to benefit [12][13] - It identifies challenges at the data level, model level, and application level, including data heterogeneity, temporal asynchrony, and sensor failures [12][66] - The article presents various types of sensor data, including camera data, LiDAR data, and mmWave radar data, detailing their characteristics and limitations [11][13] Group 3 - Multi-modal fusion methods are highlighted as a key area of research, aiming to integrate data from different sensors to reduce perception blind spots and achieve comprehensive environmental awareness [19][20] - The article categorizes fusion methods into point-level, voxel-level, region-level, and multi-level fusion, each with specific techniques and applications [21][29] - Multi-agent fusion methods are discussed, emphasizing the advantages of collaborative perception among multiple agents to enhance robustness and accuracy in complex environments [33][36] Group 4 - Time series fusion is identified as a critical component of MSFP systems, enhancing perception continuity and spatiotemporal consistency by integrating multi-frame data [49][51] - The article introduces query-based time series fusion methods, which have become mainstream due to the rise of transformer architectures in computer vision [53][54] - Multi-modal large language models (MM-LLM) are explored for their role in processing and integrating data from various sources, although challenges remain in their practical application [58][59] Group 5 - The article concludes by addressing the challenges faced by MSFP systems, including data quality, model fusion strategies, and real-world adaptability [76][77] - Future work is suggested to focus on developing high-quality datasets, effective fusion strategies, and adaptive algorithms to improve the performance of MSFP systems in dynamic environments [77][68]
港科大等提出LOVON:足式机器人开放世界全域目标追踪新范式!
具身智能之心· 2025-07-27 09:37
Core Viewpoint - The article introduces the LOVON framework, which integrates large language models, open vocabulary visual detection, and precise language-motion mapping to enhance the navigation capabilities of legged robots in dynamic and unstructured environments [4][6][23]. Group 1: LOVON Framework Overview - LOVON addresses the challenges of long-range multi-target navigation for legged robots in complex environments, overcoming limitations of traditional methods that struggle with real-time visual disturbances and target loss [3][6]. - The framework combines task planning capabilities of large language models with open vocabulary visual detection, enabling robots to efficiently navigate and track dynamic targets in open-world scenarios [4][6][10]. Group 2: Key Features of LOVON - LOVON consists of three core modules that create a closed loop of language, vision, and motion, enhancing the robot's ability to perform complex tasks [10]. - The framework employs Laplacian variance filtering technology to stabilize visual processing, improving the detection frame rate by 25% during robot movement [12][13]. - An adaptive execution logic allows robots to respond to unexpected situations, such as target loss or external interference, by switching to search mode or seamlessly executing new commands [14][16]. Group 3: Performance Metrics - In simulated environments, LOVON achieved a success rate (SR) of 1.00, significantly outperforming traditional methods like EVT, which had an SR of 0.94 [19]. - The training efficiency of LOVON is remarkable, requiring only 1.5 hours to complete training, compared to 360 hours for the best competing model, TrackVLA, representing a 240-fold improvement [19][20]. Group 4: Practical Applications - LOVON's "plug-and-play" feature allows easy deployment on various mainstream legged robot platforms, supporting applications in home services, industrial inspections, and field research [21][24]. - The framework demonstrates exceptional capabilities in open-world adaptation, multi-target long-range tracking, robustness in dynamic environments, and resistance to interference, making it suitable for diverse real-world scenarios [24].
通用全身机器人操控更进一步!学习现实世界全身操控任务的统一框架
具身智能之心· 2025-07-27 09:37
Core Viewpoint - The article discusses the development of a general-purpose intelligent robot, emphasizing the importance of mimicking human evolution through continuous interaction with the environment and learning from human behavior, while addressing challenges in hardware design, intuitive data collection interfaces, and learning algorithms [4][7]. Group 1: Introduction and Challenges - The goal of creating intelligent robots that can coexist with humans and assist in daily life has been a long-standing vision, requiring learning from fine interactions with the physical world [7]. - Three fundamental challenges are identified: designing safe and capable robot hardware, developing intuitive data collection interfaces, and creating learning models that can handle the complexity of whole-body control [7][8]. Group 2: Astribot Suite Overview - The Astribot Suite is introduced as a unified framework to address the challenges of whole-body manipulation, consisting of a high-performance robot platform, an intuitive remote operation interface, and a learning algorithm for whole-body visual-motion strategies [4][28]. - The robot platform, Astribot S1, features dual 7-degree-of-freedom arms, a flexible torso, and a mobile base designed for high mobility and accessibility in daily tasks [10][12]. Group 3: Hardware Components - The Astribot S1 robot is equipped with various onboard sensors for robust scene understanding and manipulation, including RGB cameras and LiDAR for spatial perception [12][13]. - The remote operation system utilizes a Meta Quest 3S VR headset for intuitive control, allowing operators to perform tasks with high precision and low latency [14][16]. Group 4: Learning Methodology - The DuoCore-WB algorithm is presented as a simple yet effective method for learning coordinated whole-body actions from demonstration data, emphasizing compatibility with large-scale pre-training [17][19]. - The algorithm utilizes a transformer-based model to learn actions in the end-effector space, reducing error accumulation and enhancing robustness to large viewpoint changes [19][21]. Group 5: Experimental Analysis - The effectiveness of the Astribot Suite is evaluated through six representative tasks, demonstrating an average success rate of 80% for the DuoCore-WB algorithm, with the highest success rate reaching 100% [26][27]. - The remote operation interface is shown to be efficient and intuitive, allowing users to generate smooth and accurate robot actions with a high replay success rate [25][26]. Group 6: Future Directions - Future plans include enhancing robot hardware for improved capabilities and safety, iterating on more intuitive human-robot interaction methods, and optimizing model and system scalability for broader deployment [28].
重磅!清华×生数发布机器人通用大模型Vidar,高效泛化复杂物理操作达SOTA水平
具身智能之心· 2025-07-27 09:37
Core Insights - A revolutionary breakthrough in embodied intelligence is marked by the collaboration between Tsinghua University and Shengshu Technology, resulting in the Vidar model, which enables the transition from virtual to real-world physical execution through few-shot generalization capabilities [2][4]. Group 1: Vidar Model Overview - Vidar is the world's first multi-view embodied base model that achieves systematic migration of video understanding capabilities to physical decision-making, significantly reducing the data requirements for robot generalization [4][8]. - The model can generalize to new robot bodies using only 20 minutes of real machine data, which is about 1/80 of the leading industry standard RDT and 1/1200 of π0.5, thus lowering the data threshold for large-scale generalization [4][8]. Group 2: Data Pyramid and Training Methodology - Vidar's architecture utilizes a three-tier data pyramid consisting of vast general video data, medium-scale embodied video data, and a small amount of robot-specific data, allowing for effective training and generalization [8][12]. - The unified observation space method integrates multi-view video stitching, enabling a comprehensive dialogue between massive internet data and specific robot tasks, thus achieving true multi-dimensional integration [14]. Group 3: Performance Metrics and Results - The Vidu model, after embodied pre-training, showed significant improvements in subject consistency, background consistency, and imaging quality, which supports few-shot generalization [13]. - Vidar achieved superior success rates in 16 common robotic tasks, particularly excelling in generalization capabilities for unseen tasks and backgrounds, demonstrating strong adherence to task instructions [27][29]. Group 4: Automation and Efficiency - The introduction of the Automated Task-Agnostic Random Actions (ATARA) method allows for the automated collection of task-agnostic action data, requiring only 10 hours of automated data collection to achieve full action space generalization for new robots [16]. - The AnyPos model, which utilizes high-precision prediction techniques, significantly enhances action execution accuracy, achieving a success rate close to 100% in real-world task trajectory replay tests, surpassing baseline performance by 33-44% [18][22].
群核科技发布3D高斯语义数据集,给机器人装上“空间大脑”
具身智能之心· 2025-07-26 10:45
Core Viewpoint - The release of the InteriorGS dataset by Qunhe Technology aims to enhance spatial perception capabilities for robots and AI agents, marking a significant advancement in the field of AI training [2][5]. Group 1: InteriorGS Dataset - The InteriorGS dataset includes 1,000 3D Gaussian semantic scenes covering over 80 types of indoor environments, providing AI agents with a "spatial brain" to improve their environmental understanding and interaction capabilities [2][5]. - This dataset is claimed to be the world's first large-scale 3D dataset suitable for the free movement of intelligent agents [2][5]. Group 2: Technological Advancements - Qunhe Technology has successfully applied 3D Gaussian technology in various fields, including cultural heritage preservation and spatial design, with notable projects such as the digital restoration of a 60-year-old photo studio in Hangzhou [4][6]. - The InteriorGS dataset leverages the efficiency and cost advantages of 3D Gaussian technology in scene reconstruction, combined with the company's self-developed spatial large model capabilities, resulting in a dataset that balances realism and semantic understanding [5][6]. Group 3: Industry Impact and Collaboration - Qunhe Technology's SpatialVerse platform has accumulated a vast amount of interactive 3D data and a set of physical simulation tools, aiming to become the "ImageNet" of the spatial intelligence field, similar to how ImageNet propelled the explosion of computer vision [7]. - The company has formed partnerships with several embodied intelligence firms, including Zhiyuan Robotics and Galaxy General, indicating its growing influence in the industry [7]. Group 4: Future Directions - The company emphasizes the importance of the Sim2Real paradigm as the most efficient training method for embodied intelligence, aiming to promote a "real-virtual-real" framework in collaboration with industry players [8].
具身智能之心求职交流群来啦!!!
具身智能之心· 2025-07-26 10:45
Group 1 - The company has officially launched a job-seeking community focused on the embodied industry, responding to requests from fans [1] - The community will primarily discuss topics related to the embodied industry, including companies, product development, and job opportunities [1] - Members are encouraged to join to connect with industry peers and stay updated on industry developments [1]