Workflow
具身智能之心
icon
Search documents
具身智能之心招募产品领域的大佬一起合作了~
具身智能之心· 2025-10-26 12:00
通过线上形式授课,兼职形式; 制作高质量教学材料(课件、案例、实操指导等) 我们期望您具备1年以上embodied ai领域的产品设计、需求管理经验。 二、产品方向企业咨询专家 具身产品经理招募来啦! 具身智能之心诚邀各位具身领域的产品大佬加入我们的课程讲师团队与企业咨询专家库,共同推动 具身智能技术的普及与应用,赋能行业创新与转型。 「具身智能之心」是垂直聚焦于具身智能领域的专业内容平台与社区,致力于构建最全面的具身智 能知识体系和连接产学研用各方资源,推动技术创新与产业落地和培育具身智能领域专业人才。 我们正在寻找 一、具身产品课程讲师 您将负责:设计与开发具身产品经理相关课程; 您将负责:为企业客户提供具身智能技术应用与产品化的专业咨询,协助企业制定具身智能战略与 实施方案。 参与咨询项目,解决客户在技术选型、产品设计、团队建设中的实际问题。 我们期望您:具备丰富的具身智能项目实践经验,深刻理解行业需求与技术趋势,有咨询经验或企 业服务经验者优先。 加入我们的价值 专业影响力提升,与顶尖专家交流合作,合作模式灵活,时间安排灵活(线上线下多种工作方 式)。 除此之外,还提供具有竞争力的报酬机制,连接学术界 ...
World-in-World:约翰霍普金斯 × 北大联合提出闭环下的具身世界模型评估框架!
具身智能之心· 2025-10-26 04:02
Core Insights - The article emphasizes the need to redefine the evaluation of world models in embodied intelligence, focusing on their practical utility rather than just visual quality [2][23] - The introduction of the "World-in-World" platform aims to test world models in real embodied tasks through a closed-loop interaction system, addressing the gap between visual quality and task effectiveness [3][23] Evaluation Redefinition - Current evaluation systems prioritize visual clarity and scene rationality, often rewarding models that produce high-quality visuals without assessing their decision-making capabilities in real tasks [2][23] - The article highlights the importance of aligning actions and predictions in embodied tasks, where the model must accurately predict scene changes based on the agent's movements [2][3] World-in-World Platform Design - The platform creates a closed-loop system where the agent, world model, and environment interact in a cycle of observation, decision-making, execution, and re-observation [3][6] - A unified action API is established to standardize input across different world models, ensuring consistent interpretation of action intentions [6][12] Task Evaluation - Four types of real-world embodied tasks are selected for comprehensive testing, each with defined scenarios, objectives, and scoring criteria [10][14] - The platform incorporates post-training techniques to fine-tune models using task-specific data, enhancing their adaptability to real-world tasks [12][23] Experimental Findings - Experiments with 12 mainstream world models reveal that task data fine-tuning is more effective than simply using larger pre-trained models, demonstrating significant improvements in success rates [17][20] - The article notes that models with high visual quality do not necessarily perform better in practical tasks, emphasizing the importance of controllability over visual appeal [18][23] Recommendations for Future Development - The article suggests focusing on improving controllability, utilizing task data for low-cost enhancements, and addressing the shortcomings in physical modeling for operational tasks [23][22]
汇报一下ICCV全部奖项,恭喜朱俊彦团队获最佳论文
具身智能之心· 2025-10-26 04:02
Core Insights - The article highlights the significant presence of Chinese authors at ICCV 2025, accounting for 50% of the submissions, showcasing China's growing influence in the field of computer vision [1]. Awards and Recognitions - The Best Paper Award (Marr Prize) was awarded to a study titled "Generating Physically Stable and Buildable Brick Structures from Text," which introduced BRICKGPT, a model that generates stable brick structures based on textual prompts [4][24]. - The Best Student Paper Award went to "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models," which presents a method for editing images without the need for inversion [6][38]. - Honorary mentions for Best Paper included "Spatially-Varying Autofocus," which innovatively allows cameras to focus on different depths simultaneously [7][42]. - Honorary mentions for Best Student Paper included "RayZer: A Self-supervised Large View Synthesis Model," which autonomously reconstructs camera parameters and generates new perspectives from uncalibrated images [9][47]. Notable Research Contributions - The BRICKGPT model was trained on a dataset of over 47,000 brick structures, demonstrating its ability to generate aesthetically pleasing and stable designs that can be assembled manually or by robotic arms [24][26]. - FlowEdit utilizes a differential equation to map source and target distributions directly, achieving advanced results without the need for model-specific dependencies [39][40]. - The "Fast R-CNN" method, awarded the Helmholtz Prize, significantly improved training and testing speeds while enhancing detection accuracy in object recognition tasks [10][54]. - The research on modified activation functions, which led to a new parameterized ReLU, achieved a top-5 test error of 4.94% on the ImageNet dataset, surpassing human-level performance [58][60]. Awarded Teams and Individuals - The SMPL Body Model Team developed a highly accurate 3D human model based on extensive data from 3D scans, enhancing compatibility with mainstream rendering pipelines [62][66]. - The VQA Team created a dataset for visual question answering, containing approximately 250,000 images and 7.6 million questions, facilitating deeper understanding and reasoning about image content [68][69]. - Distinguished researchers David Forsyth and Michal Irani received the Outstanding Researcher Award for their contributions to computer vision and machine learning [72][75]. - Rama Chellappa was honored with the Azriel Rosenfeld Lifetime Achievement Award for his extensive work in computer vision and pattern recognition [78].
从世界模型到VLA再到强化,具身大小脑算法原来是这样的!
具身智能之心· 2025-10-26 04:02
Core Insights - The article discusses the evolution and current state of embodied intelligence, focusing on the roles of the brain and cerebellum in robotics, where the brain handles perception and planning, while the cerebellum is responsible for execution [3][10]. Technical Evolution - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection, moving to behavior cloning, and now advancing to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [8]. - The third stage, marked by the introduction of diffusion policy, improved stability and generalization by modeling action sequences [8]. - The fourth stage, emerging in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance robots' predictive and interactive capabilities [9][10]. Current Trends and Applications - The integration of VLA with reinforcement learning enhances robots' trial-and-error learning and self-improvement abilities, while the combination with world models allows for future prediction and better planning [10]. - The article highlights the growing demand for embodied intelligence applications across various sectors, including industrial, home, restaurant, and medical rehabilitation, leading to increased job opportunities and research interest in the field [10]. Educational Initiatives - The article outlines a structured learning program aimed at equipping individuals with comprehensive knowledge of embodied intelligence algorithms, including practical applications and real-world projects [11][14]. - The course targets individuals with a foundational understanding of embodied intelligence and aims to bridge the gap between theoretical knowledge and practical deployment [18][24].
国内具身创业公司的机器人,让老外直接破防了!
具身智能之心· 2025-10-24 16:03
Core Viewpoint - The article highlights the advancements and competitive pricing of domestic robotics companies, showcasing their ability to produce high-performance robots at lower costs, which has garnered international attention and admiration [1][3][5]. Group 1: Domestic Robotics Innovations - The Bumi humanoid robot from Songyan Power is priced at 9998 yuan, making it the world's first high-performance robot under 10,000 yuan, demonstrating that price barriers are diminishing due to advancements in supply chains and technology [1]. - The Steel Coin L1 robotic dog from Zhishen Technology won the IROS25 competition, featuring a peak torque of 48 N·m, showcasing its superior performance in extreme environments [3]. - Yushu's recently released H2 bionic humanoid robot, standing 180 cm tall and weighing 70 kg, has attracted significant attention for its agile movements [5]. Group 2: Community and Knowledge Sharing - The article promotes a community platform for embodied intelligence, which has been established to facilitate knowledge sharing across various sectors, including technology routes, live discussions, and job opportunities [7][8]. - The community offers a comprehensive technical roadmap for beginners, helping them navigate the complexities of the field [10]. - For those already engaged in research, valuable industry frameworks and project proposals are provided to enhance their work [12]. Group 3: Resources and Networking - The community has established a job referral mechanism with multiple robotics companies, allowing members to connect with potential employers [13]. - Members can access a wealth of resources, including open-source projects, datasets, and learning paths tailored to various aspects of embodied intelligence [17][26]. - The platform also features a compilation of notable robotics companies and research labs, facilitating networking and collaboration opportunities [20][21].
你的VLA太慢了!?算力不够也能提速:这篇综述教你打造高效VLA新范式
具身智能之心· 2025-10-24 16:03
Core Insights - The article emphasizes the importance of efficiency in Vision-Language-Action (VLA) models, which are crucial for enabling robots to understand their environment and execute tasks effectively. It identifies efficiency as a key bottleneck that hinders the transition of VLA models from research to practical applications [3][4][7]. Background and Value - The rapid development of embodied intelligence has led to the emergence of VLA models as a core framework for robotic task execution. However, current VLA systems face significant challenges related to computational and storage demands, as well as high inference latency, which are critical for real-time applications [3][4][7]. Efficiency Bottlenecks - The review systematically analyzes the efficiency issues in VLA models across four dimensions: model architecture, perception features, action generation, and training/inference processes. It highlights that efficiency challenges are systemic and not limited to single-point optimizations [3][4][7]. Classification Framework - The article categorizes existing efficient VLA strategies into four complementary dimensions: efficient architecture design, perception feature compression, action generation acceleration, and training/inference optimization. This classification provides a comprehensive understanding of the design logic and trade-offs of current methods [4][6][7]. Future Trends and Directions - The review outlines future directions for VLA models, emphasizing the need for a balance between capability enhancement and computational cost. Key areas for efficiency optimization include data utilization, perception features, action generation, and learning strategies [4][25][26]. Efficient Perception Features - Optimizing visual input, which constitutes the largest computational overhead in VLA models, can be approached through selective processing of features and temporal feature reuse. These strategies aim to reduce redundant calculations while maintaining performance [13][15][16]. Efficient Action Generation - Action generation strategies focus on minimizing latency while ensuring task accuracy. Techniques include outputting low-dimensional continuous action vectors and introducing explicit reasoning to enhance interpretability and generalization across tasks [18][21]. Efficient Training and Inference - Training strategies aim to reduce adaptation costs for new tasks and environments through methods like parameter-efficient fine-tuning and knowledge distillation. Inference strategies focus on breaking the autoregressive bottleneck to enable parallelization and mixed decoding [22][24]. Future Outlook - The article suggests that future VLA models should prioritize collaborative design between models and data, efficient spatiotemporal perception, and robust action encoding. It also calls for a standardized evaluation framework to measure efficiency improvements [25][26][27].
浙大 | EMP框架让人形机器人“学动作不摔倒”!
具身智能之心· 2025-10-24 16:03
以下文章来源于具身智能研究室 ,作者小智 具身智能研究室 . 我们是一群AI探险家,聚焦智能体与具身智能的知识分享。在这里,您将获得:✓ 精选论文解读 ✓ 核心算法抽丝剥茧 ✓ 前沿技术动态速递 。期待与每 一位好奇的您,共同构建AI的未来图景。 作者丨 小智 编辑丨具身智能研究室 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 研究背景与核心创新点 EMP通过上半身模仿 + 下半身平衡 + 可执行修正,实现安全稳定的人形控制 项目主页 : https://anonymous.4open.science/w/EMP-project-page-4D58/ 小编观点 EMP 的亮点不只是控制策略本身,而是它代表了人形机器人强化学习的一个 新范式 不再让 RL 去"硬学全部物理",而是在 RL 前插入一个"动作可行性网络",帮它判断什么该做、什么不该做。 在未来,如果再结合 VLA(VLA-RL、Helix、ControlVLA),我们可能会看到——机器人在 ...
强化学习是怎么赋能人形/四足/机械臂等本体的?学术界是怎么展开的?
具身智能之心· 2025-10-24 10:00
Core Insights - Reinforcement Learning (RL) remains a significant field, with increasing applications in robotics, including humanoid and quadruped robots, as well as in product optimization across various industries [1][2][3] - The complexity of RL poses challenges for newcomers, making it difficult to produce publishable research papers without a structured learning system [5][9] - To address these challenges, a specialized 1v6 mentoring course in RL has been launched, aimed at helping students produce quality research papers [6][9] Group 1: Importance of Reinforcement Learning - RL is crucial for tasks such as gait control in embodied intelligent robots, which is essential for achieving general-purpose capabilities [2] - Companies like Yushun and Zhiyuan utilize RL for humanoid robots to perform complex actions like climbing stairs, running, and dancing, enabling applications in rescue and hazardous environments [2][8] Group 2: Challenges in Learning and Research - The vast and intricate nature of RL makes it difficult for beginners to enter the field, often leading to frustration and abandonment of studies [5][9] - Producing a research paper requires proficiency in methodology, experimental results, and writing, with any misstep potentially resulting in low scores from reviewers [5] Group 3: Course Offerings and Structure - The 1v6 mentoring course is designed for graduate students and others needing guidance on research papers, featuring small class sizes and weekly live sessions [7][9] - The course spans 14 weeks of intensive training followed by 8 weeks of maintenance support, focusing on various aspects of RL and robotics [9][15] - Participants will receive guidance on paper ideas, project implementation, experimental guidance, writing refinement, and initial draft formation for conferences like RAL, ICRA, IROS, and CoRL [7][9][15] Group 4: Course Content and Deliverables - The curriculum includes topics such as RL fundamentals, simulation environments, sim2real techniques, and writing guidance, with a focus on practical applications in quadruped, humanoid, and robotic arms [17][19][20] - Students will produce a research paper draft by the end of the course, with support for revisions and submission processes [23][28]
有的同学还没入门具身,有的已经CCF-A!?
具身智能之心· 2025-10-24 10:00
Group 1 - The article introduces a new paper tutoring service that offers one-on-one customized guidance in various advanced research areas such as multimodal models, reinforcement learning, and robotics simulation [1] - The tutoring service covers a wide range of academic levels, from CCF-A to CCF-C and SCI Zone 1 to Zone 4, including support for graduation theses and doctoral applications [1] - The team consists of experienced PhD mentors and researchers from top universities and leading companies, with expertise in reviewing papers for prestigious conferences like ICML, ICLR, and NeurIPS [1] Group 2 - The service emphasizes a dual perspective from both industry and academia, focusing not only on publishing papers but also on their practical value [2] - The first ten students who inquire will receive a free matching with a dedicated mentor for in-depth analysis and tailored publication strategy suggestions [3]
劲爆!3.99万起!高灵巧双臂机器人竟能拉小提琴,打羽毛球?正式亮相IROS'25
具身智能之心· 2025-10-24 04:00
Core Viewpoint - VLAI Robotics has launched a cost-effective, high-load, and highly flexible humanoid robotic arm, addressing the high demands of research institutions and development teams while significantly lowering the entry barrier for scientific applications with a starting price of 39,900 yuan [1][10]. Group 1: Product Features - The robotic arm features a design that replicates human arm movement with 7 basic degrees of freedom plus 1 for the gripper, totaling 8 DOF per arm and 16 DOF for both arms, allowing it to perform high-precision tasks [6][4]. - It can handle a peak load of 6 kg per arm and 12 kg for both arms, making it suitable for various applications including research experiments, industrial assistance, and educational demonstrations [6][8]. - The arm utilizes biomimetic kinematics modeling and high compliance control strategies to closely mimic human movements, enhancing its ability to perform tasks that require natural motion [8][12]. Group 2: Development and Manufacturing - The development process involved collaboration with the Japanese OpenArm team, which provided open-source design standards and quality control, ensuring that the product meets international standards [2][10]. - The manufacturing team implemented systematic performance optimizations, including harness restructuring and lightweight design, to balance strength, flexibility, and energy efficiency [12][14]. - The use of high-strength materials in critical areas and lightweight engineering plastics in non-load-bearing parts contributes to the arm's stability and responsiveness [12][14]. Group 3: Market Positioning and Accessibility - The pricing strategy of 39,900 yuan significantly reduces the cost barrier for research-grade performance, making advanced robotics more accessible to a wider audience [10][14]. - The product is designed for an open ecosystem, allowing users to expand functionalities such as remote control and dual-arm collaboration without the need for expensive proprietary software [14][10]. - The company plans to adapt advanced algorithms for physical AI and intelligent agent training, further broadening the application scenarios and enhancing human-robot interaction capabilities [14][16]. Group 4: Future Plans and Engagement - VLAI Robotics and the OpenArm team will showcase the robotic arm at the IROS 2025 conference in Hangzhou, providing live demonstrations and technical explanations [17]. - The initial batch of 300 units is available for pre-order, with customization options for specific research and production needs [18][19].