具身智能之心
Search documents
仅用三五条样本击败英伟达,国内首个超少样本具身模型登场
具身智能之心· 2025-10-17 00:04
Core Insights - The article discusses the breakthrough in the field of embodied intelligence with the release of the first general-purpose few-shot embodied operation model, FAM-1, by the domestic startup FiveAges, which bridges the gap between visual language models and 3D robotic manipulation [2][5][18]. Data Scarcity and Challenges - Embodied intelligence faces a significant challenge due to the scarcity of data compared to natural language and visual fields, as real-world robotic operations involve complex physical interactions and real-time feedback, making data collection costly and inefficient [3]. - Current visual-language-action (VLA) models rely heavily on large-scale labeled data to compensate for their lack of generalization capabilities in practical applications [4]. FAM-1 Model Overview - FAM-1 utilizes a novel architecture called BridgeVLA, which allows for efficient knowledge transfer and spatial modeling between large visual language models and 3D robotic control [5][7]. - The model achieves significant breakthroughs in few-shot learning, cross-scene adaptation, and complex task understanding, requiring only 3-5 robot data points per task to achieve an impressive success rate of 97%, surpassing state-of-the-art (SOTA) models [5][14]. Technical Innovations - The model consists of two core modules: Knowledge-driven Pretraining (KP) and 3D Few-shot Fine-tuning (FF), which enhance its ability to generalize across different tasks and environments [9][12]. - The KP module builds a knowledge base from vast amounts of image and video data to improve the model's understanding of operational contexts, while the FF module aligns the outputs of VLM and VLA using 3D heatmaps, significantly reducing the dependency on labeled data [9][12]. Experimental Results - FAM-1 outperformed SOTA models in various international benchmarks, achieving an average success rate of 88.2% in tasks such as "Insert Peg" and "Open Drawer," with improvements of over 30% in average success rates compared to competitors [11]. - In real-world deployments, FAM-1 demonstrated a 97% success rate in basic tasks using only 3-5 samples, showcasing its robustness against various environmental challenges [15]. Future Directions - FiveAges aims to enhance the generalization, reliability, and adaptability of its foundational models for operational scenarios, promote their application in industrial settings, and develop general-purpose models for navigation tasks [20]. - The company is also exploring self-supervised learning strategies from unlabeled human operation videos, which could further lower the barriers to application in robotics [19].
3天搞定机械臂上的VLA完整部署:算法&项目实践(运动规划、视觉反馈、模仿学习等)
具身智能之心· 2025-10-17 00:04
Core Insights - The concept of "embodied intelligence" has been incorporated into government work reports, leading to a rapid increase in related projects across various sectors, with companies competing for talent [1] - The demand for skilled professionals in the field of robotic arms is high, with a supply-demand ratio of 1:7, resulting in companies offering attractive compensation packages, including annual salaries exceeding one million and stock options [1] Group 1: Challenges in Implementation - Many researchers and engineers lack practical project experience, facing difficulties when deploying algorithms from simulation environments to actual hardware [3] - Two main reasons for these challenges include insufficient mastery of classic methods for robotic arm operation and inadequate engineering practice skills, hindering effective integration of various methods [3] Group 2: Training Initiatives - Deep Blue Academy has partnered with notable figures and companies to launch a hands-on training program focused on robotic arm operation and grasping techniques [5] - The program offers practical opportunities with real robotic arms and covers key technologies such as motion planning, visual feedback, and imitation learning [5][6] Group 3: Course Projects - Project 1 involves achieving 1:1 precise mapping between RViz models and real machines, integrating RRT* path planning and inverse kinematics algorithms to address robotic arm control and obstacle avoidance [6] - Project 2 focuses on combining machine vision with rule-based algorithms and reinforcement learning for precise identification and adaptive grasping of specific target objects [6] - Project 3 establishes a 1:1 remote operation data collection platform, utilizing visual language models for efficient transfer of human operational skills to robotic arms [6] Group 4: Instructor and Collaborating Entities - The instructor, Qin Tong, is a prominent figure in the field, with a background in robotics and experience in developing intelligent driving systems [8] - The collaborating company, Songling Robotics, is recognized as a leading global platform in robotic technology, contributing to the practical training initiatives [10][11]
相约杭州!具身智能之心首次赞助IROS并现场颁奖
具身智能之心· 2025-10-17 00:04
在机器人系统不断迈向真实世界的进程中,感知系统的稳定性、鲁棒性与泛化能力正成为制约其 部署能力的关键因素。面对动态人群、恶劣天气、传感器故障、跨平台部署等复杂环境条件,传 统感知算法往往面临性能大幅下降的挑战。 | Track 1: Driving with Language | | --- | | egistration | From June 2025 | | --- | --- | | ompetition Server Online | June 15th, 2025 | | hase One Deadline | August 15th, 2025 | | hase Two Deadline | September 15th, 2025 | | ward Decision @ IROS 2025 | October 19th, 2025 | 该赛事由新加坡国立大学、南洋理工大学、香港科技大学、香港科技大学(广州)、密歇根大学 为此, RoboSense Challenge 2025 应运而生。该挑战赛旨在系统性评估机器人在真实场景下的感 知与理解能力,推动多模态感知模型的稳健性研究,鼓励跨模态融合与 ...
成立几个月,刚刚,一家具身智能明星公司原地解散了......
具身智能之心· 2025-10-16 08:05
Core Viewpoint - The article discusses the sudden dissolution of OneStar Robotics, a startup in the field of embodied intelligence, which was founded only a few months ago and had recently announced significant funding and high-profile hires [3][4][6]. Company Overview - OneStar Robotics was established on May 9, 2025, by Li Xingxing, the son of Geely's founder Li Shufu, and was positioned as a key player in Geely's robotics strategy [10][11]. - The company aimed to innovate in the "embodied intelligence" sector, focusing on practical applications rather than flashy demonstrations [13][14]. Recent Developments - In July, OneStar Robotics announced the completion of a multi-hundred million yuan "friends and family" funding round, primarily from Geely's ecosystem [16]. - The company secured a notable talent, Ding Yan, from Shanghai AI Lab, who became the CTO and co-founder [17]. - In August, a partnership was formed with Fudan University to establish a joint laboratory, and the first product, "Star Wheel 1," was launched [18]. - By September 17, the company completed another multi-hundred million yuan seed funding round, attracting various investors [19]. Sudden Dissolution - Despite its rapid growth and significant backing, OneStar Robotics was reported to have dissolved its team abruptly, with the reasons for this dissolution still unclear [8][20]. - There are indications that the existing Geely-related platforms and businesses may revert to Geely Auto Group, while the technology team, led by Ding Yan, might pursue independent ventures [9].
输出你的insights,邀请更多优秀的具身合作伙伴加入我们~
具身智能之心· 2025-10-16 07:00
Core Viewpoint - The article emphasizes the importance of collaboration and innovation in the field of embodied intelligence, aiming to create a platform that adds real value to the industry [2][3]. Group 1: Content Creation and Collaboration - The company invites participation from the community for content sharing through various platforms such as WeChat, Bilibili, and video channels for technical talks and roundtable discussions [4]. - There is an emphasis on developing online courses and practical projects to enhance the quality of content in the field [4]. Group 2: Main Directions of Focus - The primary areas of focus include vla, vln, reinforcement learning, embodied simulation, Diffusion Policy, multimodal large models, mobile operations, end-to-end processes, and model deployment [5]. Group 3: Future Engagement - The company is open to discussions regarding compensation and collaboration methods, encouraging interested parties to reach out for further communication [6].
刚刚,UCLA周博磊也加入了一家机器人公司
具身智能之心· 2025-10-16 00:03
Core Insights - Coco Robotics has appointed Bolei Zhou, a UCLA associate professor, as the Chief AI Scientist to lead the newly established Physical AI Lab, focusing on autonomous sidewalk delivery solutions [2][3][5] - The company aims to achieve full automation in last-mile delivery, leveraging the extensive operational data collected over the past five years to enhance their robotic systems [4][5][7] - The Physical AI Lab is an independent research initiative, separate from Coco Robotics' collaboration with OpenAI, and will focus on improving the company's automation capabilities and operational efficiency [8][9] Group 1: Company Overview - Coco Robotics, founded in 2020, specializes in last-mile delivery robotics and initially relied on teleoperators to navigate obstacles [4] - The company has accumulated millions of miles of data in complex urban environments, which is crucial for training reliable AI systems [7] - The goal is to reduce overall delivery costs while improving service quality for businesses and consumers [9] Group 2: Leadership and Research Focus - Bolei Zhou's expertise in machine perception and intelligent decision-making aligns with Coco Robotics' objectives, particularly in micromobility [7][8] - Zhou has a strong academic background, having published over 100 papers with significant citations, particularly in explainable AI and scene understanding [12][14] - The Physical AI Lab will utilize the research findings to enhance Coco's local models and potentially share insights with operational cities to improve infrastructure [9] Group 3: Data Utilization and Future Plans - Coco Robotics plans to use the data collected to improve its automation levels and operational efficiency, rather than selling it to competitors [9] - The success of the Physical AI Lab will be measured by the company's ability to provide high-quality services at lower costs, which could drive significant growth in the ecosystem [9]
Google最新!Gemini Robotics 1.5:通用机器人领域的突破进展
具身智能之心· 2025-10-16 00:03
Core Insights - The article discusses the breakthrough advancements in the field of general robotics presented in the "Gemini Robotics 1.5" report by Google DeepMind, highlighting the innovative models and their capabilities in perception, reasoning, and action [1][39]. Technical Architecture - The core architecture of Gemini Robotics 1.5 consists of a "Coordinator + Action Model" framework, enabling a functional closed loop through multimodal data interaction [2]. - The Coordinator (Gemini Robotics-ER 1.5) processes user inputs and environmental feedback, controlling the overall task flow and breaking down complex tasks into executable sub-steps [2]. - The Action Model (Gemini Robotics 1.5) translates natural language sub-instructions into robot action trajectories, supporting direct control of various robot forms without additional adaptation [2][4]. Motion Transfer Mechanism - The Motion Transfer (MT) mechanism addresses the "data silo" issue in traditional robotics by enabling skill generalization across different robot forms, validated through experimental comparisons [5][7]. - The Gemini Robotics 1.5 model, utilizing mixed data from multiple robot types, demonstrated superior performance in skill transfer compared to single-form training approaches [7][8]. Performance Validation - The introduction of a "thinking VLA" mechanism allows for a two-step process in task execution, enhancing performance in multi-step tasks by breaking down complex instructions into manageable sub-steps [8][11]. - Quantitative results show a performance improvement of approximately 21.8% in task completion scores when the thinking mode is activated [11]. - The model's ability to generalize skills across different robot forms was evidenced by significant performance gains in scenarios with limited training data [13][28]. Safety Mechanisms - The ER model incorporates safety mechanisms that assess risks and provide intervention strategies in various scenarios, ensuring safe task execution [36][38]. - Performance comparisons indicate that ER 1.5 excels in risk identification and mitigation, demonstrating a high accuracy rate in predicting potential hazards [36][38]. Conclusion and Future Directions - The Gemini Robotics 1.5 model represents a significant advancement in universal control for multiple robots, reducing deployment costs and enhancing task execution capabilities [39]. - The integration of reasoning and action is identified as a critical factor for achieving complex task completion, emphasizing the importance of the ER and VLA collaboration [39].
大模型方向适合去工作还是读博?
具身智能之心· 2025-10-16 00:03
Core Insights - The article discusses the decision-making process for individuals in the large model field regarding whether to pursue a PhD or engage in entrepreneurial ventures related to agents [1][2] Group 1: Importance of Foundation in Large Models - A solid foundation in large models is crucial, as the field encompasses various directions such as generative models, multi-modal models, fine-tuning, and reinforcement learning [1] - Many mentors lack sufficient expertise in large models, leading to a misconception among students about their readiness for related positions [1] Group 2: Role of a Pioneer in Research - The suitability of an individual to take on the role of a "pioneer" in research is essential, especially in a field with many unexplored directions [2] - The ability to independently explore and endure failures is emphasized as a key trait for those aiming to innovate from scratch [2] Group 3: Community and Learning Resources - The "Large Model Heart Tech Knowledge Planet" community offers a comprehensive platform for beginners and advanced learners, featuring videos, articles, learning paths, and Q&A sections [2] - The community aims to provide a space for technical exchange and collaboration among peers in the large model domain [4] Group 4: Learning Pathways - The community has compiled detailed learning pathways for various aspects of large models, including RAG, AI Agents, and multi-modal training [4][9] - Each learning pathway includes clear technical summaries, making it suitable for systematic learning [4] Group 5: Benefits of Joining the Community - Members gain access to the latest academic advancements and industrial applications related to large models [7] - The community facilitates networking with industry leaders and provides job recommendations in the large model sector [7][68] Group 6: Future Plans and Engagement - The community plans to host live sessions with industry experts, allowing for repeated viewing of valuable content [65] - A focus on building a professional exchange community with contributions from over 40 experts from renowned institutions and companies is highlighted [66]
3个月,完成具身的大脑算法+小脑算法学习!
具身智能之心· 2025-10-16 00:03
Core Insights - The article discusses the evolution and current trends in the field of embodied intelligence, focusing on the development of brain and cerebellum modules in robots, which are essential for perception, understanding, and action [3][10]. Technical Evolution - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping using point clouds or images, but lacked context modeling for complex tasks [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations, but faced challenges in generalization and performance in multi-target scenarios [7]. - The third stage, emerging in 2023, introduced diffusion policy methods that enhance stability and generalization by modeling action sequences [8]. - The fourth stage, anticipated in 2024, emphasizes the integration of VLA models with reinforcement learning and world models, enhancing robots' predictive capabilities and multi-modal perception [9][10]. Current Trends and Applications - The integration of VLA with reinforcement learning improves robots' trial-and-error capabilities and self-improvement in long-term tasks [10]. - The combination of VLA with world models allows robots to predict environmental dynamics, enhancing planning and decision-making [10]. - The addition of tactile sensing to VLA expands the boundaries of embodied perception, enabling more precise and safer operations in complex environments [10]. Educational and Community Aspects - The article highlights the growing demand for engineering and system capabilities in the field, transitioning from theoretical research to practical deployment [14]. - A structured curriculum is proposed to cover various aspects of embodied intelligence, including simulation platforms and model training [14][11]. - The community aspect is emphasized, with active discussions and support for learners in the field [15].
具身走向现实世界!RoboChallenge:从仿真到实体,全球首个大规模多任务真机任务基准
具身智能之心· 2025-10-15 11:03
Core Insights - The article discusses the launch of RoboChallenge, a large-scale, multi-task benchmark testing platform for embodied intelligence, initiated by Dexmal and Hugging Face, aimed at addressing the lack of real machine testing in the field [5][41]. Group 1: Challenges in the Embodied Intelligence Field - The embodied intelligence sector has seen rapid advancements, but the absence of real machine testing and limitations of existing evaluation systems have become significant bottlenecks [3][4]. - Current mainstream benchmarks primarily rely on simulation environments, leading to issues where algorithms that perform well in simulations fail in real-world applications [4][10]. Group 2: Introduction of RoboChallenge - RoboChallenge is the first large-scale benchmark testing platform that allows real robots to perform tasks in a physical environment, providing a more reliable and comparable evaluation standard for visual language action models (VLAs) [5][10]. - The platform aims to overcome challenges related to performance validation in real environments, standardized testing conditions, and accessibility [5][10]. Group 3: Features of RoboChallenge - RoboChallenge features a "remote robot" paradigm, allowing users to interact with real machines without needing hardware, thus lowering the entry barrier for researchers and developers [15][19]. - The platform supports a wide range of tasks, with an initial benchmark set (Table30) comprising 30 diverse tasks designed to evaluate core capabilities of VLA models [12][26]. Group 4: Evaluation Mechanism - The evaluation mechanism combines end-to-end task success rates with process scoring, ensuring a rigorous and transparent assessment of models [16][20]. - RoboChallenge employs a "visual input matching" method to ensure consistency in testing conditions, reducing variability caused by human testers [23][25]. Group 5: Open and Collaborative Ecosystem - RoboChallenge promotes an open ecosystem by providing free access to evaluation services, publicly sharing task demonstration data, and ensuring transparency in results [34][41]. - The platform encourages collaboration among researchers, developers, and industry professionals, fostering innovation in the field of embodied intelligence [38][41]. Group 6: Future Directions - RoboChallenge plans to expand its capabilities by introducing more robot types and challenging tasks, aiming to enhance the evaluation of embodied intelligence in real-world scenarios [42].