具身智能之心
Search documents
一个为具身智能量身打造的移动底盘应该是怎么样的?
具身智能之心· 2025-07-17 09:07
Core Viewpoint - The global embodied intelligence industry is experiencing explosive growth, driven by the deep integration of language models in robotics, transitioning from "perceptual intelligence" to "decision-making intelligence" and finally to "action intelligence" [1]. Group 1: Product Features - The Hermes chassis, designed for robotic arms, operates in a 48V power supply environment, allowing for quick assembly of multi-arm systems with a motion chassis for practical applications [1]. - The 48V power platform provides high power output without the need for additional boosting devices, capable of driving dual robotic arms and multi-joint modules simultaneously, thus avoiding motion delays due to insufficient voltage [3]. - The Hermes chassis supports a 1C discharge rate, releasing a peak power of 1440W, enhancing performance by 200% compared to 24V solutions, ideal for rapid start-stop and high-impact tasks [5]. - It features a 30AH large battery, providing 8-12 hours of stable operation under continuous work scenarios, significantly improving operational efficiency [6]. - The intelligent power management system optimizes energy consumption, extending battery life to 2000 cycles, thereby reducing long-term usage costs [8]. Group 2: Navigation and Adaptability - The Hermes chassis is equipped with dual radar and multiple depth vision sensors to handle complex low-obstacle environments, ensuring stable and reliable positioning and navigation [9]. - It has been successfully applied in various top-tier embodied intelligence companies, demonstrating its adaptability to different robotic arms, sensors, and industry-specific requirements [11]. - The chassis includes an open interface with an expandable Android system, supporting CAN/RS485 communication for seamless integration with navigation and vision systems, making it suitable for diverse applications such as service robots and industrial AMRs [13]. Group 3: Application Scenarios - In industrial manufacturing and warehouse logistics, the Hermes chassis supports flexible production line collaborative robots, AMRs, and high-risk environment inspections, facilitating high-load transportation and flexible production needs [14]. - In smart healthcare, it aids in drug transportation and equipment delivery, contributing to the intelligent upgrade of hospitals [14]. - For commercial services and public facilities, it enables smart robots to perform cross-floor deliveries with extended standby times, reducing labor costs [14]. Group 4: Market Positioning - The launch of the 48V Hermes chassis marks a significant advancement in the embodied intelligence sector, redefining the standards for intelligent robotic platforms by combining explosive power and endurance [16].
这家具身公司落地场景竟然是这个?待遇最高100w招募算法研究员
具身智能之心· 2025-07-17 09:07
OneStar由吉利集团孵化,以"真实 数据驱动的智能进化机器人"为核心定位,锚定大工业场景,通过持续积累与优 化真实场景数据,让机器人在实践中实现智能迭代,为工业生产与智能化升级提供全新解题思路。 一星机器人联合 全球顶尖多模态大模型及FastUMI数采技术团队,融合吉利新能源汽车三电与智能能力,构建"模型+数据+本体"综 合竞争力。聚焦多模态扩散大模型开发与高精度真机数据采集,依托整车制造等大工业场景,加速商业化落地, 让"高精数据驱动的智能进化机器人"从概念迈向实践。 待遇说明 岗位一览 极具竞争力的薪酬与回报: 正式员工:博士年薪70-100万,硕士年薪40-60万(优秀者薪资可面议),并设有丰厚的年度绩效激励; 技术团队专属激励:项目盈利的10%归属技术团队分配,让您的智慧创造获得真金白银的回报; 实习生待遇:硕士实习生300元/天,博士实习生400元/天,并免费提供住宿,助力优秀人才无忧启航; 完善的福利保障: 投递说明 更多求职相关内容,欢迎加入我们的AutoRobo知识星球,一个覆盖机器人、自动驾驶、具身智能方向的求职社区! 这也是国内首个以自动驾驶和具身为主要方向的社区。 三周年大额优惠来啦 ...
PhysX:南洋理工与上海AI Lab首创物理基础3D资产生成框架
具身智能之心· 2025-07-17 09:07
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Ziang Cao等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 数据集系统定义了三类属性(figure 2上),涵盖目标从识别到操作的全维度: 特别地,为避免过细粒度标注的冗余,数据集将顶点和面积小于阈值的微小部件与相邻部件合并。 研究背景与动机 3D资产生成在游戏、机器人和具身仿真器等领域应用日益广泛,但现有研究多聚焦于外观和几何结构,忽 视了真实世界目标固有的物理属性。真实目标除了结构特征外,还包含绝对尺度、材料、交互可能性 (affordance)、运动学参数和功能描述等物理与语义特性,这些特性是物理仿真、机器人操作等场景的关 键基础。 现有数据集存在明显局限:PartNet-Mobility虽包含2.7K带运动约束的3D模型,但缺乏尺寸、材料等物理描 述;ABO数据集虽有材料元数据,但仅停留在目标层面,无法支持部件级应用。这种缺口使得3D生成模型 难以满足物理建模和推理的 ...
这家具身公司的定位很工业化?!待遇最高100w招募算法研究员
具身智能之心· 2025-07-17 02:58
待遇说明 极具竞争力的薪酬与回报: 正式员工:博士年薪70-100万,硕士年薪40-60万(优秀者薪资可面议),并设有丰厚的年度绩效激励; 技术团队专属激励:项目盈利的10%归属技术团队分配,让您的智慧创造获得真金白银的回报; 实习生待遇:硕士实习生300元/天,博士实习生400元/天,并免费提供住宿,助力优秀人才无忧启航; 完善的福利保障: 足额缴纳五险一金(公积金按双边合计24%的顶格比例缴纳); 额外提供房补与饭补; 全天候不间断的零食饮料补给; 投递说明 更多求职相关内容,欢迎加入我们的AutoRobo知识星球,一个覆盖机器人、自动驾驶、具身智能方向的求职社 区!这也是国内首个以自动驾驶和具身为主要方向的社区。 三周年大额优惠来啦!欢迎和我们一起继续成长 AutoRobo知识星球 这是一个给自动驾驶、具身智能、机器人方向同学求职交流的地方,目前近1000名成员了,成员范围包含已经 工作的社招同学,如地平线、理想汽车、华为、小米汽车、momenta、元戎启行等公司。同时也包含2024年秋 招、2025年秋招的小伙伴,方向涉及自动驾驶与具身智能绝大领域。 星球内部有哪些内容?这一点结合我们已有的优势,给大 ...
果然!秋招会惩罚每一个本末倒置的研究生!
具身智能之心· 2025-07-17 00:53
Core Viewpoint - The article emphasizes the importance of proactive engagement in research and academic writing for students, especially those in graduate programs, to enhance their employability and academic credentials. Group 1: Employment and Academic Pressure - The article highlights the increasing anxiety among students regarding job prospects as the job market evolves, urging them to take action rather than wait passively [1] - It suggests that students should focus on both campus recruitment and social recruitment to identify gaps in their skills and knowledge [1] Group 2: Research Guidance and Support - The company offers a comprehensive research guidance program aimed at helping students produce high-quality academic papers, particularly in fields like autonomous driving and embodied intelligence [3][12] - The program has a high success rate, with a 96% acceptance rate for papers submitted by students who received guidance [3] Group 3: Structured Research Process - The article outlines a 12-week structured process for completing a research paper, including topic selection, literature review, experimental design, and submission [5] - This structured approach is designed to help students overcome challenges such as lack of guidance from supervisors and fragmented knowledge [6] Group 4: Target Audience and Benefits - The program is tailored for graduate students who need to produce research papers for graduation, enhance their academic profiles, or improve their job competitiveness in the AI field [11] - Participants can expect to gain not only a published paper but also skills in research methodology, coding, and access to networking opportunities with prestigious institutions [15] Group 5: Personalized Support and Flexibility - The company provides personalized mentoring, real-time interaction with instructors, and flexible learning options, including recorded sessions and 24-hour support [12][16] - A matching system is in place to ensure that students are paired with mentors who align with their research interests and goals [14]
小模型逆袭!复旦&创智邱锡鹏团队造出「世界感知」具身智能体,代码数据完全开源!
具身智能之心· 2025-07-16 09:12
Core Viewpoint - The article discusses the introduction of the World-Aware Planning Narrative Enhancement (WAP) framework, which significantly improves the performance of large vision-language models (LVLMs) in embodied planning tasks by integrating world knowledge into the data and reasoning chain [2][17]. Group 1: Introduction - LVLMs are becoming central in embodied planning, but existing methods often rely on environment-agnostic imitation learning, leading to poor performance in unfamiliar scenarios [2]. - The WAP framework has shown a success rate increase from 2% to 62.7% on the EB-ALFRED benchmark, surpassing models like GPT-4o and Claude-3.5-Sonnet, highlighting the importance of world perception in high-level planning [2][17]. Group 2: Related Work - WAP differs from existing approaches by explicitly binding instruction-environment context at the data level and relying solely on visual feedback without privileged information [4]. Group 3: Technical Method - The framework injects four-dimensional cognitive narratives (visual, spatial, functional, syntactic) into the data layer, allowing the model to understand the environment before reasoning deeply [6]. - It employs closed-loop observation (only RGB + instructions) and a three-stage curriculum learning approach to develop environmental understanding and long-term reasoning capabilities [6][12]. Group 4: Experiments - The performance comparison on the EmbodiedBench (EB-ALFRED) shows that the WAP approach significantly enhances success rates across various task categories, with Qwen2.5-VL achieving a 60.7 percentage point increase in average success rate [14]. - The WAP framework demonstrates a notable improvement in long-term task success rates, achieving 70% compared to previous models [14][16]. Group 5: Conclusion and Future Work - WAP effectively incorporates world knowledge into the data and reasoning processes, allowing smaller open-source LVLMs to outperform commercial models in pure visual closed-loop settings [17]. - Future work includes expanding to dynamic industrial/outdoor scenes and exploring self-supervised narrative evolution for data-model iterative improvement [21].
ICCV 2025满分论文:一个模型实现空间理解与主动探索大统一
具身智能之心· 2025-07-16 09:12
Core Insights - The article discusses the transition of artificial intelligence from the virtual internet space to the physical world, emphasizing the challenge of enabling agents to understand three-dimensional spaces and align natural language with real environments [3][40] - A new model proposed by a collaborative research team aims to unify spatial understanding and active exploration, allowing agents to build cognitive maps of their environments through dynamic exploration [3][40] Group 1: Model Overview - The proposed model integrates exploration and visual grounding in a closed-loop process, where understanding and exploration are interdependent and enhance each other [10][14] - The model consists of two main components: online spatial memory construction and spatial reasoning and decision-making, optimized under a unified training framework [16][22] Group 2: Exploration and Understanding - In the exploration phase, the agent accumulates spatial memory through continuous RGB-D perception, actively seeking potential target locations [12][21] - The reasoning phase involves reading from the spatial memory to identify relevant candidate areas based on task instructions, utilizing cross-attention mechanisms [22][23] Group 3: Data Collection and Training - The authors propose a hybrid strategy for data collection, combining real RGB-D scan data with virtual simulation environments to enhance the model's visual understanding and exploration capabilities [25] - The dataset constructed includes over 900,000 navigation trajectories and millions of language descriptions, covering various task types such as visual guidance and goal localization [25] Group 4: Experimental Results - The MTU3D model was evaluated on four key tasks, demonstrating significant improvements in success rates compared to existing methods, with a notable increase of over 20% in the GOAT-Bench benchmark [28][29] - In the A-EQA task, the model improved the performance of GPT-4V, increasing its success rate from 41.8% to 44.2%, indicating its potential to enhance multimodal large models [32][33] Group 5: Conclusion - The emergence of MTU3D represents a significant advancement in embodied navigation, combining understanding and exploration to enable AI to autonomously navigate and complete tasks in real-world environments [40]
一周年啦,心酸历程!从野路子到一个专业的具身教育平台
具身智能之心· 2025-07-16 09:12
Core Insights - The "Embodied Intelligence Heart" platform has made significant progress in the past year, expanding in product development, financing, and technology within the embodied intelligence sector [1][2] - The platform has transitioned from a semi-welfare learning community to a paid knowledge community, with membership benefits including discounts on self-developed platforms and courses, job referrals, and internal learning sessions [2][19] - The community has established a job referral mechanism with multiple embodied intelligence companies, facilitating connections between job seekers and employers [8][19] Product and Technology Development - The platform has developed several courses related to embodied intelligence, including vla, vln, dp, sim2real, and reinforcement learning, which have been well-received by over 1,500 members [1][13] - A comprehensive list of over 30 technical routes has been organized to assist members in finding benchmarks and learning paths, significantly reducing search time [2][13] - The community has compiled nearly 40 open-source projects and 60 datasets related to embodied intelligence, providing valuable resources for both beginners and advanced learners [13][32] Community Engagement and Learning - The platform hosts various roundtable forums and live sessions covering topics from fundamentals to algorithms, aimed at sharing insights on industry developments and challenges [2][19] - Members have access to exclusive learning videos and documents, enhancing the educational experience [19] - The community includes members from renowned universities and leading companies in the field, fostering a rich environment for knowledge exchange [13][18] Membership Benefits - Membership in the community offers numerous advantages, including job recommendations, industry insights, and access to exclusive content [19][21] - The platform provides a structured approach to learning, with detailed summaries of various research directions and industry reports available to members [21][24] - Members can engage in discussions and receive guidance on career choices and research directions, promoting a collaborative learning atmosphere [72]
BeDAViN:大规模音频-视觉数据集与多声源架构研究
具身智能之心· 2025-07-16 09:12
作者丨 视觉语言导航 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有 你想要的。 主要贡献 研究背景 具身导航的重要性 :具身导航是具身智能(Embodied AI)的一个基本且关键的组成部分,要求自主智能体 通过与未见过的环境交互来解决复杂的导航任务。近年来,具身导航技术被广泛应用于家庭服务、仓储和物 流等领域。 | Dataset | Total number Total duration | | --- | --- | | | of audio of samples | | SAVi-dataset (Chen, Al-Halah, and | 1.157 144 seconds | | Grauman 2021) | | | BeDAViN (Ours) | 2.258 | 现有研究的局限性 : 数据集限制 :现有的音频-视觉导航数据集样本有限,难以模拟多样化的多声源场景。 框架限制 :大多数现有的导航框架是为单声源场景设计的,在多声源场景下的性能大幅下 ...
让 VLMs 更适配机器人:小型VLMs也能展现出强大的视觉规划能力
具身智能之心· 2025-07-15 13:49
Core Insights - The article discusses the potential of large language models (LLMs) in robotic program planning, highlighting their ability to generate coherent action sequences but also noting their limitations in providing the necessary sensory details for physical execution [3][4] - It introduces a new framework called SelfReVision, which enhances the performance of small visual language models (VLMs) through self-distillation without external supervision, aiming to improve their planning capabilities in real-world scenarios [4][9] Research Background - LLMs show promise in generating action sequences but often lack the precision required for robotic tasks due to their reliance on human-centric training data [3] - Visual language models (VLMs) can potentially address these limitations, but existing methods either require specialized simulation environments or are costly to train and deploy [3] Methodology - SelfReVision is proposed as a self-improvement framework that allows small VLMs to enhance their performance through iterative self-critique and revision [4][6] - The framework operates in three stages: critique, revise, and verify, enabling models to generate and refine plans based on self-assessment [4][10] Experimental Setup - Two types of experiments were conducted to evaluate the planning capabilities of SelfReVision: image-based program planning and entity-agent tasks [11] - Evaluation metrics included coverage, ordering, completeness, overall quality, and a new metric called image groundedness [12] Key Results - SelfReVision significantly outperformed baseline models across various metrics, achieving an average win rate of 68% on the PLACES dataset and 72% on the SIMULATION dataset [13] - Larger models benefited more from SelfReVision, with an average gain of 74% for models with 12 billion parameters or more [13] Comparison with Other Methods - SelfReVision demonstrated clear advantages over other methods like Best-of-N and PaliGemma, with improvements of 60% in most settings compared to modest gains from Best-of-N [17] - When compared to GPT-4o, SelfReVision's plans had at least a 25% higher win rate for models with 12 billion parameters or more, indicating its effectiveness in enhancing smaller models [17] Ablation Studies - The complete Criticize-Revise-Verify (CRV) process showed the strongest performance, with average win rates of 68.3% on the PLACES dataset and 71.9% on the SIMULATION dataset [18] - Variants of the process showed significant performance drops, emphasizing the importance of the verification step in filtering out suboptimal revisions [18] Application in Entity-Agent Tasks - SelfReVision was tested in challenging scenarios, showing a 26% improvement for the Gemma 12B model and a 17% improvement for the Gemma 27B model in block manipulation tasks [21] - In hierarchical tasks, SelfReVision plans led to a 70% success rate in generating trajectories, surpassing the 61% success rate of baseline models [21]