Workflow
具身智能之心
icon
Search documents
万字长文聊具身智能“成长史”:具身智能跨越了哪些山海,又将奔向哪里
具身智能之心· 2025-08-08 00:08
Core Viewpoint - The forum emphasizes the rapid advancements in embodied intelligence and robotics, highlighting the need for a unique computational brain that can translate computational power into physical capabilities, addressing the gap between AI's performance in games like Go and its struggles with simple physical tasks [4]. Group 1: Evolution of Embodied Intelligence - Over the past decade, embodied intelligence has evolved significantly, with robotics being a closed-loop system that integrates perception, action, and the physical world, emphasizing the importance of adhering to physical laws [5][6]. - The gap between research prototypes and practical applications is highlighted, with the Technology Readiness Level (TRL) being a key metric for assessing the maturity of robotic applications, where levels 8 to 9 are crucial for industry acceptance [6]. Group 2: Opportunities and Challenges in Robotics - The forum discusses the historical context of machine learning's impact on robotics, noting that advancements in sensors, algorithms, and deep learning have led to significant progress, but achieving high performance in the physical world remains a challenge [9][13]. - The importance of scalable learning systems is emphasized, with a shift from small-scale learning to large-scale applications being crucial for overcoming challenges in robotics [15]. Group 3: Specialized vs. General Intelligence - The discussion contrasts Artificial Specialized Intelligence (ASI) with Artificial General Intelligence (AGI), suggesting that while ASI focuses on high performance in specific tasks, AGI aims for broader capabilities [23][25]. - The advantages of specialized models include efficiency, robustness, and suitability for real-time applications, while general models offer greater flexibility but are more complex and resource-intensive [27][30]. Group 4: Future Directions in Robotics - The emergence of visual-language-action (VLA) models, such as RT-2, represents a significant step forward, allowing robots to execute tasks through internet-based API calls, indicating a trend towards more versatile robotic capabilities [39][40]. - The development of the second-generation VLA model, PI-Zero, showcases advancements in continuous action generation, enabling robots to perform complex tasks with higher efficiency [46][48]. Group 5: Data and Performance in Robotics - The forum highlights the necessity of large-scale data collection for training robotic models, with the RTX dataset being a pivotal resource for developing cross-embodied models that outperform specialized counterparts [42][43]. - The importance of performance metrics is underscored, with a focus on achieving high reliability and robustness in robotic systems to ensure practical deployment in real-world scenarios [58][65].
这个2000人的具身社区,帮助大家解决了各种各样的难题!
具身智能之心· 2025-08-08 00:08
这类问题前面在咱们的具身社区里面已经碰到过多次了,如何使用设备?如何有效采集数据?如何部署 VA、VLA模型等。是采集背景太复杂还是数据比较dirty? 后面我们也很快给他相关答复,快速用到项目里 面了。 一个社区能在大家最需要帮助的时候解决问题,无疑是非常有价值的。具身智能之心知识星球(国内首个 具身全栈技术社区),目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。遇到什么问题就 分享什么解决方案,哪块研究最前沿,就给大家源源不断提供解决思路,还有求职岗位第一时间对接给大 家!除了上面的问题,我们还为大家梳理了很多其它的内容: 机器人仿真和数据采集有哪些平台? 人形机器人怎么做模仿学习?VLA为什么难做? VLA在机器人抓取与规划任务中是怎么用的? VLA+RL是怎么做的?为什么work? sim2real效果不好怎么办?real2sim2real是怎么work的? 分层决策一般是怎么做的?和端到端比优势劣势有哪些? 具身机器人的研报有哪些?30家汇总 多家头部具身机器人公司岗位分享招聘 具身智能,如何选择研究方向?哪个方向容易出成果? ...... 更有料的是: 星球内部为大家梳理了近30+技术路 ...
具身智能之心运营实习生招募来啦!合伙人1v1培养(只有1个名额哦)
具身智能之心· 2025-08-07 12:00
大家好,我们是自动驾驶之心/具身智能/大模型之心Tech团队。非常高兴在这里和你相遇,如果你也认同技 术内容可以改变世界,那你可能就是我们在找的人! 1. 自驾、大模型、具身相关研究方向,本科及以上学历,硕士优先; 2. 对技术相关的前沿进展和事件有极高的研究热情和分享欲; 3. 较强的执行力、效率意识和沟通意识; 4. 有一定的文字功底,逻辑清晰,表达流畅; 5. 具备较强的学习能力和知识梳理能力; 6. 加分项: 有技术背景,独立解读学术论文,运行部署开源项目和撰写代码demo; 有产品背景,能深入体验和拆解AI产品,提炼核心价值; 有运营背景,主导运营过原创科技自媒体账号; 我们在做什么? 我们希望通过技术内容连接学术界和工业界,成为企业和学校沟通的桥梁,更乃至数十万的AI开发者和创 业者。我们致力于为大家带来全网最新最权威的技术信息,团队聚焦在自动驾驶、具身智能、大模型等AI 最前沿的技术领域,涵盖学术论文解读、业内量产方案分析、大模型评测、商业动态、行业招聘、开源项 目等,并通过公众号、社群、视频号、知乎、小红书、B站等平台进行内容分享、粉丝交流及企业联系。 目前自动驾驶和具身智能两个方向我们已经和 ...
具身智能之心项目与论文辅导来了!
具身智能之心· 2025-08-07 12:00
好消息来了,具身智能之心正式推出了项目与论文指导系列课程了!方向涉及大模型、VLA、VLN、强化学 习、DP、sim2real、仿真等多个方向。如果您真的需要项目辅导、论文辅导、求职辅导,欢迎联系我们。 专业的学术资源,一线的工程算法人员助力解决各种问题。如果有需要,欢迎添加微信oooops-life做进一步咨 询。 具身智能之心项目与论文辅导来了! 你是否经常遇到各类奇葩问题而不知道找谁交流?是否几个月在一个卡点上一直跳不出来?不知道怎么写代码 和debug?求职的时候简历不知道怎么写?不知道如何面试...... ...
具身智能之心技术交流群成立了!
具身智能之心· 2025-08-07 02:38
注意哦, 备注:机构/学校+姓名+研究方向 ,能够快速入群! 具身智能之心技术交流群成立了!主要关注VLA、VLN、遥操作、Diffusion Policy、强化学习、VLA+RL、 sim2real、多模态大模型、仿真、运动控制、目标导航、建图定位、导航等方向。 感兴趣的同学可以添加小助理微信AIDriver005,邀请加入我们的社群。 ...
国内首个具身大脑+小脑算法实战全栈教程
具身智能之心· 2025-08-07 02:38
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1] - The development of embodied intelligence is marked by the evolution of technology from low-level perception to high-level task understanding and generalization [6][9] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build an ecosystem for embodied intelligence, while international players like Tesla and investment firms support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several stages: - The first stage focused on grasp pose detection, which struggled with complex tasks due to a lack of context modeling [6] - The second stage involved behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6] - The third stage introduced Diffusion Policy methods, enhancing stability and generalization through sequence modeling [7] - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome current limitations [8] Product Development and Market Growth - The advancements in embodied intelligence have led to the development of various products, including humanoid robots, robotic arms, and quadrupedal robots, serving industries such as manufacturing, home services, and healthcare [9] - The demand for engineering and system capabilities is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [13] Educational Initiatives - A comprehensive curriculum has been developed to assist learners in mastering the full spectrum of embodied intelligence algorithms, covering topics from basic tasks to advanced models like VLA and its integrations [9][13]
谷歌“世界模拟器”深夜上线!一句话生成3D世界,支持分钟级超长记忆
具身智能之心· 2025-08-07 00:03
刚刚,谷歌DeepMind发布了 新一代通用世界模型Genie 3 。 性能上,Genie 3相比上一代大幅升级,支持 720P画质,每秒24帧实时导航,以及分钟级的一致性保持 。 | Genie 2 | Genie 3 | | --- | --- | | 360p | 720p | | 3D Environments | General | | Limited keyboard / mouse actions | Navigation; Promptable world events | | 10-20 seconds | Multiple minutes | | Not real time | Real time 益公众号 | 编辑丨量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 只需一句话,就能生成可实时交互的3D世界。 前DeepMind科学家、AI 3D生成创业者Tejas Kulkarni受邀体验了Genie 3。 他使用Genie ...
这个2000人的具身社区,帮助大家解决了各种各样的难题!
具身智能之心· 2025-08-07 00:03
Core Insights - The article emphasizes the value of a community that can provide solutions to problems in the field of embodied intelligence, highlighting the establishment of a comprehensive technical exchange platform for industry and academic discussions [3][17]. Group 1: Community and Resources - The Embodied Intelligence Knowledge Planet has created a closed loop in various fields such as industry, academia, job seeking, and Q&A exchanges, providing timely solutions and job opportunities [3][5]. - The community has compiled over 30 technical routes, significantly reducing search time for benchmarks and learning paths [5]. - Members can access a wealth of resources, including nearly 40 open-source projects and 60 datasets related to embodied intelligence [17]. Group 2: Educational Support - The community offers structured learning paths for beginners, including technical stacks and routes tailored for newcomers [12]. - For those already engaged in research, valuable industry frameworks and project proposals are provided to enhance their work [14]. - The platform hosts roundtable forums and live broadcasts to share insights on the latest developments in the embodied intelligence industry [5][18]. Group 3: Job Opportunities - The community has established a job referral mechanism with multiple embodied companies, facilitating direct connections between job seekers and potential employers [11]. - Members are encouraged to share their resumes for timely job placements in desired companies [11]. Group 4: Research and Development - The community has summarized various research directions and notable laboratories in the field of embodied intelligence, aiding members in their academic pursuits [21][22]. - A collection of industry reports related to large models and humanoid robots is available, providing insights into industry trends and applications [24]. Group 5: Technical Insights - The community has compiled extensive information on various technical aspects, including simulation platforms, data collection methods, and reinforcement learning applications [39][41][45]. - Specific learning routes for embodied intelligence perception, interaction, and navigation have been detailed, covering a wide range of tasks and methodologies [45][49].
XRoboToolkit:延迟低、可扩展、质量高的数据采集框架
具身智能之心· 2025-08-07 00:03
Core Insights - The article discusses the development of XRoboToolkit, a cross-platform framework for robot teleoperation, addressing the increasing demand for large-scale, high-quality robot demonstration datasets due to the rapid advancement of visual-language-action models (VLAs) [3]. Existing Teleoperation Solutions Limitations - Current teleoperation frameworks have various shortcomings, including limited scalability, complex setup processes, and poor data quality [4][5]. XRoboToolkit's Core Design - The framework features a three-layer architecture for cross-platform integration, comprising XR end components, robot end components, and a service layer for real-time teleoperation and stereo vision [4][5]. Data Streaming and Transmission - XRoboToolkit employs an asynchronous callback-driven architecture for real-time data transmission from XR hardware to the client, with a focus on various tracking data formats [7][9]. Robot Control Module - The inverse kinematics (IK) solver is based on quadratic programming (QP) to generate smooth movements, particularly near kinematic singularities, enhancing stability [8][10]. XR Unity Application and Stereo Vision Feedback - The framework has been validated across multiple platforms, demonstrating an average latency of 82ms, significantly lower than the 121.5ms of Open-TeleVision, with a standard deviation of 6.32ms [11][13]. - Data quality was verified through the collection of 100 data points, achieving a 100% success rate in a 30-minute continuous operation [11][14]. Application Interface and Features - The application interface includes five panels for network status, tracking configuration, remote vision, data collection, and system diagnostics, supporting various devices [16]. - Stereo vision capabilities are optimized for depth perception, with the PICO 4 Ultra outperforming in visual quality metrics [16].
成功率提高57%,VLA+RL最新!CO-RFT:实现VLA模型的高效微调(北航&清华等)
具身智能之心· 2025-08-07 00:03
Core Insights - The article discusses the development of a new reinforcement learning framework called Chunked RL, specifically designed for fine-tuning Vision-Language-Action (VLA) models, which show great potential in real-world robotic control [4][8]. - The proposed CO-RFT algorithm demonstrates significant improvements over traditional supervised fine-tuning methods, achieving a 57% increase in success rate and a 22.3% reduction in cycle time in real-world environments [4][29]. Section Summaries Introduction - VLA models integrate perception and language understanding for embodied control, showing promise in developing general strategies for real-world robotic control [6]. - The challenges faced in fine-tuning VLA models primarily stem from the dependency on the quality and quantity of task-specific data, which limits generalization to out-of-distribution (OOD) scenarios [6][7]. Methodology - The article introduces Chunked RL, a novel reinforcement learning framework that incorporates action chunking to enhance sample efficiency and stability, particularly suited for VLA models [8][12]. - The CO-RFT algorithm consists of two phases: imitation learning for initializing the backbone network and policy, followed by offline RL with action chunking to optimize the pre-trained policy [16][18]. Experimental Analysis - The experiments were conducted on a robotic platform with six dexterous manipulation tasks, evaluating the performance of the CO-RFT algorithm against traditional methods [20][23]. - Results indicate that CO-RFT significantly outperforms supervised fine-tuning (SFT), achieving a 57% increase in success rate and a 22.3% decrease in average cycle time across various tasks [29][30]. Position Generalization - CO-RFT exhibits strong position generalization capabilities, achieving a 44.3% success rate in previously unseen locations, outperforming SFT by 38% in OOD scenarios [4][29]. Importance of Data Diversity - Data diversity plays a crucial role in the performance of CO-RFT, with models trained on diverse datasets showing significantly better generalization capabilities compared to those trained on fixed datasets [32][33].