Workflow
具身智能之心
icon
Search documents
具身智能之心开学季福利!今年的有点不太一样......
具身智能之心· 2025-09-08 00:03
所以这个基础上,大家如果感兴趣可以放开手做事情,数据、本体、算法都能大有所为。由于具身发展时间较 短,很多同学也苦于没有体系和路线,具身智能之心为了解决这类痛点,研发了几门具身领域的教程,还有一 个很不错的具身社区供大家学习。 开学季大额优惠 智能之川 学 福 H 利 男 DFT 9.1-9.14号 时间 活动说明 299元限时超级折扣卡: 全课七折 喜 具身课程七折优惠 (一年期) 具身智能之心知识 減 減66元 星球立减66 最高抵扣 具身智能论文辅导 托 1000普宣抵扣10000 10000 课程和社区亮点 又到了九月的开学季,实验室里已经可以陆续听到敲代码的声音了。还记得之前通宵调试机器人小车的校园时 光,转眼间机器人算法也从传统的pipeline方案发展到端到端。最近有个小朋友找峰哥,保研马上要开学了, 导师让我自己去看具身智能方向,不知道这个领域怎么样...... 从技术上来说,具身算法对全局的感知能力有进一步提升。一个抓取工作,早期方案首先通过姿态识别和3D 视觉完成感知,然后到规划执行,再到抓取。过程繁琐,泛化性差。而目前的VLA或者VA方案,通过学习的 方式让机器人能够丝滑的动起来。不乏有 ...
字节团队最新Robix!全能大模型,一个模型就能搞定机器人推理、任务规划和交互
具身智能之心· 2025-09-08 00:03
Core Viewpoint - The article discusses the development of Robix, a unified visual-language model by ByteDance, aimed at addressing the limitations of existing hierarchical robotic systems in understanding and executing tasks in dynamic environments [2][3][4]. Group 1: Problem Identification - Current hierarchical robotic systems face a capability fragmentation issue, relying heavily on large language models (LLMs) or visual-language models (VLMs) for task decomposition, which neglects human-robot interaction and embodied reasoning capabilities [3][4]. - Modular interaction and planning frameworks exhibit rigidity and lack robustness, making it difficult for robots to adapt to real-time environmental changes [3][4]. Group 2: Proposed Solution - Robix serves as the high-level cognitive hub in hierarchical robotic systems, integrating 3D spatial understanding and visual localization to enhance task planning and human interaction [2][5]. - The model employs a three-stage training strategy: continuous pre-training, supervised fine-tuning, and reinforcement learning, to systematically improve its capabilities [5][13]. Group 3: Key Contributions - Robix introduces a unified high-level cognitive model that integrates reasoning, long-term task planning, and natural language interaction within an end-to-end framework [5][6]. - Extensive experimental validation demonstrates Robix's performance advantages over existing commercial baselines, such as GPT-4o and Gemini 2.5 Pro, across various dimensions [5][24]. Group 4: Architecture and Mechanism - Robix operates at the high-level cognitive layer, processing multimodal reasoning, adaptive task planning, and human-robot interaction, while lower-level controllers execute the generated atomic action commands [7][8]. - The model generates outputs including atomic action commands, natural language responses, and structured reasoning trajectories to guide decision-making [11][12]. Group 5: Training Strategy - The training strategy involves a comprehensive dataset covering 200 billion tokens, focusing on enhancing embodied reasoning, visual localization, and task-centric reasoning [13][14]. - The supervised fine-tuning phase adapts the pre-trained model for high-level cognitive tasks, ensuring diverse human-robot interaction scenarios and high-quality reasoning trajectories [17][18]. Group 6: Performance Evaluation - Robix outperforms existing models in various tasks, including basic embodied reasoning, offline task planning, and online real-world scenarios, showcasing significant accuracy improvements [22][24][27]. - In online evaluations, Robix achieves an average task progress of 92.6%, surpassing Gemini-2.5-Pro and demonstrating lower response latency [29][32]. Group 7: Future Directions - Future efforts will focus on enhancing robustness in dynamic environments and improving long-term memory capabilities to support complex, extended tasks in real-world settings [36][38].
面向VLA方向的1v6科研论文辅导小班课来啦~
具身智能之心· 2025-09-07 12:28
Core Insights - The Vision-Language-Action (VLA) model represents a new paradigm in embodied intelligence, enabling robots to generate executable actions from given language instructions and visual signals, thus enhancing their adaptability to complex environments [1][3]. Group 1: VLA Model Significance - VLA breaks the limitations of traditional single-task training, allowing robots to make autonomous decisions in diverse scenarios and respond flexibly to unseen environments [3]. - The model has become a research hotspot, driving the development of several cutting-edge projects such as pi0, RT-2, OpenVLA, QUAR-VLA, and HumanVLA, fostering collaboration between academia and industry [3]. Group 2: Industry Development - The embodied intelligence sector is experiencing robust growth, with teams like Unitree, Zhiyuan, Xinghaitu, Galaxy General, and Zhujidongli transitioning from laboratories to commercialization [5]. - Major tech giants such as Huawei, JD.com, and Tencent are actively investing in this field, alongside international companies like Tesla and Figure AI [5]. Group 3: Educational Initiatives - A specialized VLA research guidance course has been launched to assist students in quickly entering or transitioning within the field, addressing the complexity of the VLA system and research methodologies [5]. - The course aims to cultivate independent academic research capabilities, guiding students through the entire process from theoretical foundations to practical experimentation and paper writing [14][16]. Group 4: Course Structure and Outcomes - The curriculum covers the full spectrum of VLA model theory, simulation environment setup, experimental design, and paper writing [16]. - Students will learn to identify research opportunities, develop their own research ideas, and complete initial experiments, ultimately producing a complete draft of a research paper [15][16].
【附内推】银河通用机器人2026届校园招聘火热进行中
具身智能之心· 2025-09-07 01:39
Core Viewpoint - The article emphasizes the rapid advancement of technology, particularly in embodied AI models, which are expected to lead to the widespread adoption of humanoid robots across various industries, showcasing a promising future for general intelligence [2]. Company Overview - Galaxy General Robotics is a leading global company in the field of embodied intelligent models, established in May 2023, focusing on providing general robotic products for users worldwide, with applications in commercial, industrial, and medical sectors [4][5]. - The company has research centers in Beijing, Shenzhen, Suzhou, and Hong Kong, and has established joint laboratories with institutions such as Peking University and Xuanwu Hospital [5]. Product Development Timeline - May 2023: Company establishment [6]. - June 2024: Launch of the first self-developed embodied model robot, Galbot G1 [6]. - January 2025: Release of the world's first end-to-end embodied grasping foundational model, GraspVLA [6]. - May 2025: Announcement of the world's first humanoid robot smart retail solution, securing orders from 100 stores [6]. - June 2025: Introduction of product-level end-to-end embodied navigation model, TrackVLA, and the first end-to-end embodied model for the retail industry, GroceryVLA [7]. - August 2025: Launch of Galbot G1 Premium, the first robot equipped with NVIDIA's latest Jetson Thor chip, and victory in the sorting skills competition at the World Humanoid Robot Games [7]. Recruitment and Opportunities - The company is seeking graduates from domestic and international universities for various positions, emphasizing a passion for learning, exploration, and practical application [9]. - Available positions include roles in embodied multi-modal models, humanoid reinforcement learning control, robot planning and control, and embodied software system development [10].
具身和机器人领域爱好者的集会!ROSCon China 2025正式敲定了
具身智能之心· 2025-09-06 04:00
Core Viewpoint - ROSCon China 2025 will be held from October 31 to November 1, 2025, at the Shanghai Hongqiao Xinhua Union Sofitel Hotel, promising a valuable and content-rich event for attendees [2]. Group 1: Event Highlights - The previous ROSCon China 2024 featured industry leaders sharing insights on cutting-edge algorithms and practical case studies, creating an engaging atmosphere for discussions [3]. - Attendees had opportunities to interact with experts and network with peers, fostering collaboration and idea exchange within the robotics community [4]. - The event is designed to showcase the latest advancements in robotics technology, allowing participants to experience innovative products firsthand [12][16]. Group 2: Participation Opportunities - The conference is currently accepting proposals for guest speeches, workshops, and lightning talks, focusing on ROS1 and ROS2 topics [13][14]. - Interested speakers can submit their topics and personal/team profiles to share their insights on the conference stage [16]. - The event aims to gather top experts globally to discuss the latest research and applications in robotics, helping attendees stay updated on industry trends [16]. Group 3: Sponsorship and Ticketing - The conference is actively recruiting sponsors, providing brands with exposure and collaboration opportunities within the robotics sector [17]. - Early bird tickets are available for purchase, with limited quantities, encouraging prompt registration for the event [20]. - Attendees can enjoy discounted hotel rates at the conference venue, enhancing the overall experience [22].
许多自驾和传统机器人公司,已经开始成立具身实验室了......
具身智能之心· 2025-09-05 16:03
许多自驾和传统机器人公司,已经开始成立具身实验室了...... 今天在和朋友喝下午茶,聊到了很多公司开始筹建具身团队和业务线。其中不乏有自动驾驶公司、 主机厂、新势力、传统机器人公司、传统臂商。 貌似又到了那个谁不入场就会被时代抛弃的时候,先不妨看看这几类公司的出发点吧。对于自驾公 司和主机厂,他们想做的更多是解决工厂智能部分,造车需要大量的工人,包括制造、搬运、特殊 场景的需求。如果机器人能够在固定场景下完成相对智能,那么能省下不少成本。 对于传统机器人公司来说,比如扫地机器人公司,他们更希望升级已有产品,具备更加智能的服 务。比如扫地机上添加机械臂,用户可以下达指令完成清扫之外的工作以及通过大模型实现更好的 交互功能。传统臂商,也想进一步参与臂的智能化升级,适配更加泛化和多样的场景。 当然,也不排除很多公司通过投资新兴产业来赚取更多的利润。但这个趋势会导致,很多公司的资 源倾斜到具身领域。无论是数据生产模块还是算法、本体层面。很多公司的具身岗位存在缺口,一 些没有经验的leader不得不扛起大旗!持续的面试仍然招不到合适的,因为真的懂的人很少。 归根到底是没有系统的培养体系,导致这方面的人才出现了数量和质 ...
从近1000篇工作中,看具身智能的技术发展路线!
具身智能之心· 2025-09-05 00:45
Core Insights - The article discusses the evolution and challenges of embodied intelligence, emphasizing the need for a comprehensive understanding of its development, issues faced, and future directions [3][4]. Group 1: Robotic Manipulation - The survey on robotic manipulation highlights the transition from mechanical programming to embodied intelligence, focusing on the evolution from simple grippers to dexterous multi-fingered hands [5][6]. - Key challenges in dexterous manipulation include data collection methods such as simulation, human demonstration, and teleoperation, as well as skill learning frameworks like imitation learning and reinforcement learning [5][6]. Group 2: Navigation and Manipulation - The discussion on robotic navigation emphasizes the importance of physics simulators in addressing high costs and data scarcity in real-world training, with a focus on the Sim-to-Real transfer challenges [9][15]. - The evolution of navigation techniques is outlined, transitioning from explicit memory to implicit memory, and the role of various simulators in narrowing the Sim-to-Real gap is analyzed [15][16]. Group 3: Multimodal Large Models - The exploration of embodied multimodal large models (EMLMs) reveals their potential to bridge perception, cognition, and action gaps, driven by advancements in large model technologies [17][19]. - Challenges identified include cross-modal alignment difficulties, high computational resource demands, and weak domain generalization [19]. Group 4: Teleoperation and Data Collection - The survey on teleoperation of humanoid robots discusses the integration of human cognition with robotic capabilities, particularly in hazardous environments, while addressing challenges such as high degrees of freedom and communication limitations [29][30]. - Key components of teleoperation systems include human state measurement, motion retargeting, and multimodal feedback mechanisms [30][33]. Group 5: Vision-Language-Action Models - The analysis of Vision-Language-Action (VLA) models covers their evolution from cross-modal learning architectures to the integration of visual language models and action planners [33][36]. - The article identifies core challenges in real-time control, multimodal action representation, and system scalability, while proposing future directions for adaptive AI and cross-entity generalization [36][41].
从复刻魔术开始,RoboMirage打开了机器人仿真的新世界
具身智能之心· 2025-09-05 00:45
但随着研究不断深入,行业对于数据提出了更高要求: 更高 的 物理精度 ,以保证数据与现实世界的贴合度; 更丰富的交互类型 ,覆盖刚体、软体、流体等复杂 场景; 更强的扩展性与稳定性 ,既支持科研中的微观动力学细节,也能满足产业应用的大规模仿真需求。 在这样的背景下,RoboScience 推出了面向具身智能的高精度通用物理仿真平台 「 RoboMirage 」。 「RoboMirage」具有以下核心特性: 作者丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在具身智能的发展路径中,如何获得海量且高质量的数据是行业绕不开的核心问题。 如果说大语言模型依赖于互联网规模的语料库,那么具身智能的成长同样需要规模化的交互经验。现实中,收集这些数据的代价极高:机械臂等硬件部署成本 高,单台投入就需数万元,且难以规模化;数据采集环节依赖经验丰富的数采员且耗时漫长。而在仿真环境中,智能体则可以以更低成本、更高效率进行无限次 试错,从而快速积累大规模交互经验 ...
美的团队分享!在七个工作中找到推理到执行,构建通用灵巧VLA模型的钥匙
具身智能之心· 2025-09-05 00:45
Core Viewpoint - The article discusses the development and capabilities of Vision-Language-Action (VLA) models, emphasizing their application in general robot control and the integration of multimodal understanding for enhanced performance in complex tasks [6][8]. Group 1: VLA Model Development - The VLA models are designed to evolve continuously, integrating semantic reasoning from open-world scenarios to execute dexterous actions in real environments [6]. - A unified framework combining perception and action is established, addressing challenges such as flexible objects and fine manipulation tasks to enhance general dexterity [6]. Group 2: Model Capabilities and Enhancements - The article highlights the expansion of VLA model capabilities, focusing on improving generalization abilities through the incorporation of large model priors and semantic reasoning [8]. - The introduction of Spec-VLA, a framework specifically designed for accelerating inference in VLA models, is noted as a significant advancement [8]. Group 3: Expert Insights and Additional Content - The live session features insights from industry experts, including the head of the embodied foundation model at Midea, discussing the design challenges of dexterous hands and their role in closing the "hand-eye-brain" perception loop [7]. - Additional content is available on the knowledge platform "Embodied Intelligence Heart," providing in-depth analysis and technical details related to VLA models and their applications [10].
具身智能之心遥操作技术交流群来了!
具身智能之心· 2025-09-05 00:45
Group 1 - The article introduces a new community focused on remote operation technology in embodied intelligence, inviting individuals interested in this field to join for discussions [1] - It provides a contact method for joining the group, specifically through adding a WeChat assistant with a request for membership details [2]