Workflow
具身智能之心
icon
Search documents
一周年啦!我们做的具身智能社区,准备涨涨价了......(最后2天)
具身智能之心· 2025-07-18 03:21
除此之外,还为大家准备了很多圆桌论坛、直播,从本体、数据到算法,各类各样,逐步为大家分享具身 行业究竟在发生什么?还有哪些问题! 具身智能之心,真正意义上一周年了。一年前,我们怀揣着"让智能体更进一步"的愿景,播下了"具身智能 之心"的种子。从最初的寥寥数人,到如今汇聚具身领域各类行业人才,一路走来离不开大家的共同努力。 具身智能之心知识星球,是一个沉淀了大量具身领域干货的地方,也是我们一直坚持维护的地方,目前内 部汇聚了许多行业研究前沿的大佬,分布在本体、数据、算法、部署等多个领域。 星球内部集问答、干货分享、视频直播、技术路线为一体,无论您是刚入门,还是正在进阶都能受益。 星球在20号之后开始提升至279了,现在入手只需209元!大家抓紧时间哦~~~ 星球内部为大家梳理了近30+技术路线,无论你是要找benchmark、还是要找综述和学习入门路线,都能极 大缩短检索时间。星球还为大家邀请了数十个具身领域嘉宾,都是活跃在一线产业界和工业界的大佬(经 常出现的顶会和各类访谈中哦)。欢迎随时提问,他们将会为大家答疑解惑。 已经从事相关研究的同学,我们也给大家提供了很多有价值的产业体系和项目方案。 还有源源不断的 ...
为什么能落地?目标导航是怎么识别目标并导航的?
具身智能之心· 2025-07-18 03:21
Core Viewpoint - Goal-Oriented Navigation empowers robots to autonomously complete navigation tasks based on goal descriptions, marking a significant shift from traditional visual language navigation systems [2][3]. Group 1: Technology Overview - Embodied navigation is a core area of embodied intelligence, relying on three technical pillars: language understanding, environmental perception, and path planning [2]. - Goal-Oriented Navigation requires robots to explore and plan paths in unfamiliar 3D environments using only goal descriptions such as coordinates, images, or natural language [2]. - The technology has been industrialized across various verticals, including delivery, healthcare, and hospitality, with companies like Meituan and Aethon deploying autonomous delivery robots [3]. Group 2: Technological Evolution - The evolution of Goal-Oriented Navigation can be categorized into three generations: 1. **First Generation**: End-to-end methods focusing on reinforcement learning and imitation learning, achieving breakthroughs in Point Navigation and closed-set image navigation tasks [5]. 2. **Second Generation**: Modular methods that explicitly construct semantic maps, breaking tasks into exploration and goal localization phases, showing significant advantages in zero-shot object navigation [5]. 3. **Third Generation**: Integration of large language models (LLMs) and visual language models (VLMs) to enhance knowledge reasoning and open-vocabulary target matching accuracy [7]. Group 3: Challenges and Learning Path - The complexity of embodied navigation requires knowledge from multiple fields, making it challenging for newcomers to extract frameworks and understand development trends [9]. - A new course has been developed to address these challenges, focusing on quick entry into the field, building a research framework, and combining theory with practice [10][11][12]. Group 4: Course Structure - The course includes six chapters covering semantic navigation frameworks, Habitat simulation ecology, end-to-end navigation methodologies, modular navigation architectures, and LLM/VLM-driven navigation systems [16][18][19][21][23]. - A significant project involves the reproduction of the VLFM algorithm and its deployment in real-world scenarios, allowing students to engage in algorithm improvement and practical application [25][29]. Group 5: Target Audience and Outcomes - The course is aimed at professionals in robotics, students in embodied intelligence research, and individuals transitioning from traditional computer vision or autonomous driving fields [33]. - Participants will gain skills in the Goal-Oriented Navigation framework, including end-to-end reinforcement learning, modular semantic map construction, and LLM/VLM integration methods [33].
真香!一台机器搞定人形运控、强化学习、VLN/VLA
具身智能之心· 2025-07-18 02:28
Core Viewpoint - TRON1 is a cutting-edge research platform designed for educational and scientific purposes, featuring a modular design that supports multiple locomotion forms and algorithms, maximizing research flexibility [1]. Group 1: Product Features - TRON1 supports humanoid gait development and is suitable for reinforcement learning research, with the EDU version allowing for external camera integration for navigation and perception tasks [6][4]. - The platform supports C++ and Python for development, making it accessible for users without C++ knowledge [6]. - It features a "sim2real" capability with minimal discrepancies, enhancing validation efficiency and lowering research barriers [9]. - TRON1 can be equipped with robotic arms for various mobile operation tasks, supporting both single-arm and dual-leg control modes [11]. - The platform integrates LiDAR and depth cameras for 3D mapping, localization, navigation, and dynamic obstacle avoidance [13]. Group 2: Technical Specifications - The TRON1 platform includes advanced hardware specifications such as NVIDIA Ampere architecture GPU with 1024 CUDA cores and 32 Tensor cores, providing AI computing power of 157 TOPS (sparse) and 78 TOPS (dense) [16][19]. - It operates on an 8-core Arm Cortex-A78AE CPU with a maximum frequency of 2.0GHz and has 16GB of LPDDR5 memory [16]. - The platform supports a maximum load capacity of approximately 10kg and can achieve speeds of up to 5m/s with its wheeled legs [26]. Group 3: User Support and Development - The company provides comprehensive user manuals and development guides, ensuring ease of use and support for new users [30][37]. - TRON1 SDK is well-documented, facilitating secondary development and allowing users to troubleshoot and expand their research capabilities [34][40]. - The platform offers one year of after-sales service post-acceptance, with paid maintenance and parts support available thereafter [40].
论具身智能的持久战
具身智能之心· 2025-07-17 14:22
Core Viewpoint - The article discusses the current state and future potential of the embodied intelligence industry, highlighting the challenges and opportunities in automating factories and the cautious approach taken by companies in this sector [1][4][12]. Group 1: Industry Transformation - The automotive industry's technological transformation is described as consisting of three phases: electrification, intelligence, and factory automation, with the latter still in the early conceptual exploration stage [1]. - Factory automation is seen as a desirable goal for large industrial enterprises, as it could significantly reduce labor costs and management complexities [1]. Group 2: Current Challenges - Embodied intelligence technology is currently in a nascent stage, with many startups struggling to produce even usable demos [2]. - There are significant hardware challenges, such as the high cost and short lifespan of dexterous hands, which can exceed ten thousand yuan but may fail within weeks [6]. - Software and algorithmic issues also persist, including difficulties in data collection for training models and the lack of generalization across different scenarios [9][10]. Group 3: Cautious Investment - Despite a surge in financing news for embodied intelligence companies, many are adopting a conservative approach, avoiding large-scale hiring and focusing on cost control [4][12]. - The industry is filled with pitfalls, leading to a cautious attitude among founders who are aware of the long and uncertain path to technological breakthroughs [12][13]. Group 4: Core Competitive Factors - The ability to secure financing is identified as the most critical competitive factor for embodied intelligence companies, as it supports talent acquisition, data collection, and computational power [16][20]. - Historical lessons from the autonomous driving sector indicate that algorithmic capabilities alone do not constitute a sustainable competitive advantage, as they can be quickly replicated by competitors [17][18]. Group 5: Strategic Outlook - The article suggests that companies should adopt a long-term strategy, preparing for a protracted battle in the face of numerous challenges in the embodied intelligence sector [22].
一个为具身智能量身打造的移动底盘应该是怎么样的?
具身智能之心· 2025-07-17 09:07
Core Viewpoint - The global embodied intelligence industry is experiencing explosive growth, driven by the deep integration of language models in robotics, transitioning from "perceptual intelligence" to "decision-making intelligence" and finally to "action intelligence" [1]. Group 1: Product Features - The Hermes chassis, designed for robotic arms, operates in a 48V power supply environment, allowing for quick assembly of multi-arm systems with a motion chassis for practical applications [1]. - The 48V power platform provides high power output without the need for additional boosting devices, capable of driving dual robotic arms and multi-joint modules simultaneously, thus avoiding motion delays due to insufficient voltage [3]. - The Hermes chassis supports a 1C discharge rate, releasing a peak power of 1440W, enhancing performance by 200% compared to 24V solutions, ideal for rapid start-stop and high-impact tasks [5]. - It features a 30AH large battery, providing 8-12 hours of stable operation under continuous work scenarios, significantly improving operational efficiency [6]. - The intelligent power management system optimizes energy consumption, extending battery life to 2000 cycles, thereby reducing long-term usage costs [8]. Group 2: Navigation and Adaptability - The Hermes chassis is equipped with dual radar and multiple depth vision sensors to handle complex low-obstacle environments, ensuring stable and reliable positioning and navigation [9]. - It has been successfully applied in various top-tier embodied intelligence companies, demonstrating its adaptability to different robotic arms, sensors, and industry-specific requirements [11]. - The chassis includes an open interface with an expandable Android system, supporting CAN/RS485 communication for seamless integration with navigation and vision systems, making it suitable for diverse applications such as service robots and industrial AMRs [13]. Group 3: Application Scenarios - In industrial manufacturing and warehouse logistics, the Hermes chassis supports flexible production line collaborative robots, AMRs, and high-risk environment inspections, facilitating high-load transportation and flexible production needs [14]. - In smart healthcare, it aids in drug transportation and equipment delivery, contributing to the intelligent upgrade of hospitals [14]. - For commercial services and public facilities, it enables smart robots to perform cross-floor deliveries with extended standby times, reducing labor costs [14]. Group 4: Market Positioning - The launch of the 48V Hermes chassis marks a significant advancement in the embodied intelligence sector, redefining the standards for intelligent robotic platforms by combining explosive power and endurance [16].
这家具身公司落地场景竟然是这个?待遇最高100w招募算法研究员
具身智能之心· 2025-07-17 09:07
OneStar由吉利集团孵化,以"真实 数据驱动的智能进化机器人"为核心定位,锚定大工业场景,通过持续积累与优 化真实场景数据,让机器人在实践中实现智能迭代,为工业生产与智能化升级提供全新解题思路。 一星机器人联合 全球顶尖多模态大模型及FastUMI数采技术团队,融合吉利新能源汽车三电与智能能力,构建"模型+数据+本体"综 合竞争力。聚焦多模态扩散大模型开发与高精度真机数据采集,依托整车制造等大工业场景,加速商业化落地, 让"高精数据驱动的智能进化机器人"从概念迈向实践。 待遇说明 岗位一览 极具竞争力的薪酬与回报: 正式员工:博士年薪70-100万,硕士年薪40-60万(优秀者薪资可面议),并设有丰厚的年度绩效激励; 技术团队专属激励:项目盈利的10%归属技术团队分配,让您的智慧创造获得真金白银的回报; 实习生待遇:硕士实习生300元/天,博士实习生400元/天,并免费提供住宿,助力优秀人才无忧启航; 完善的福利保障: 投递说明 更多求职相关内容,欢迎加入我们的AutoRobo知识星球,一个覆盖机器人、自动驾驶、具身智能方向的求职社区! 这也是国内首个以自动驾驶和具身为主要方向的社区。 三周年大额优惠来啦 ...
PhysX:南洋理工与上海AI Lab首创物理基础3D资产生成框架
具身智能之心· 2025-07-17 09:07
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Ziang Cao等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 数据集系统定义了三类属性(figure 2上),涵盖目标从识别到操作的全维度: 特别地,为避免过细粒度标注的冗余,数据集将顶点和面积小于阈值的微小部件与相邻部件合并。 研究背景与动机 3D资产生成在游戏、机器人和具身仿真器等领域应用日益广泛,但现有研究多聚焦于外观和几何结构,忽 视了真实世界目标固有的物理属性。真实目标除了结构特征外,还包含绝对尺度、材料、交互可能性 (affordance)、运动学参数和功能描述等物理与语义特性,这些特性是物理仿真、机器人操作等场景的关 键基础。 现有数据集存在明显局限:PartNet-Mobility虽包含2.7K带运动约束的3D模型,但缺乏尺寸、材料等物理描 述;ABO数据集虽有材料元数据,但仅停留在目标层面,无法支持部件级应用。这种缺口使得3D生成模型 难以满足物理建模和推理的 ...
这家具身公司的定位很工业化?!待遇最高100w招募算法研究员
具身智能之心· 2025-07-17 02:58
待遇说明 极具竞争力的薪酬与回报: 正式员工:博士年薪70-100万,硕士年薪40-60万(优秀者薪资可面议),并设有丰厚的年度绩效激励; 技术团队专属激励:项目盈利的10%归属技术团队分配,让您的智慧创造获得真金白银的回报; 实习生待遇:硕士实习生300元/天,博士实习生400元/天,并免费提供住宿,助力优秀人才无忧启航; 完善的福利保障: 足额缴纳五险一金(公积金按双边合计24%的顶格比例缴纳); 额外提供房补与饭补; 全天候不间断的零食饮料补给; 投递说明 更多求职相关内容,欢迎加入我们的AutoRobo知识星球,一个覆盖机器人、自动驾驶、具身智能方向的求职社 区!这也是国内首个以自动驾驶和具身为主要方向的社区。 三周年大额优惠来啦!欢迎和我们一起继续成长 AutoRobo知识星球 这是一个给自动驾驶、具身智能、机器人方向同学求职交流的地方,目前近1000名成员了,成员范围包含已经 工作的社招同学,如地平线、理想汽车、华为、小米汽车、momenta、元戎启行等公司。同时也包含2024年秋 招、2025年秋招的小伙伴,方向涉及自动驾驶与具身智能绝大领域。 星球内部有哪些内容?这一点结合我们已有的优势,给大 ...
果然!秋招会惩罚每一个本末倒置的研究生!
具身智能之心· 2025-07-17 00:53
Core Viewpoint - The article emphasizes the importance of proactive engagement in research and academic writing for students, especially those in graduate programs, to enhance their employability and academic credentials. Group 1: Employment and Academic Pressure - The article highlights the increasing anxiety among students regarding job prospects as the job market evolves, urging them to take action rather than wait passively [1] - It suggests that students should focus on both campus recruitment and social recruitment to identify gaps in their skills and knowledge [1] Group 2: Research Guidance and Support - The company offers a comprehensive research guidance program aimed at helping students produce high-quality academic papers, particularly in fields like autonomous driving and embodied intelligence [3][12] - The program has a high success rate, with a 96% acceptance rate for papers submitted by students who received guidance [3] Group 3: Structured Research Process - The article outlines a 12-week structured process for completing a research paper, including topic selection, literature review, experimental design, and submission [5] - This structured approach is designed to help students overcome challenges such as lack of guidance from supervisors and fragmented knowledge [6] Group 4: Target Audience and Benefits - The program is tailored for graduate students who need to produce research papers for graduation, enhance their academic profiles, or improve their job competitiveness in the AI field [11] - Participants can expect to gain not only a published paper but also skills in research methodology, coding, and access to networking opportunities with prestigious institutions [15] Group 5: Personalized Support and Flexibility - The company provides personalized mentoring, real-time interaction with instructors, and flexible learning options, including recorded sessions and 24-hour support [12][16] - A matching system is in place to ensure that students are paired with mentors who align with their research interests and goals [14]
小模型逆袭!复旦&创智邱锡鹏团队造出「世界感知」具身智能体,代码数据完全开源!
具身智能之心· 2025-07-16 09:12
Core Viewpoint - The article discusses the introduction of the World-Aware Planning Narrative Enhancement (WAP) framework, which significantly improves the performance of large vision-language models (LVLMs) in embodied planning tasks by integrating world knowledge into the data and reasoning chain [2][17]. Group 1: Introduction - LVLMs are becoming central in embodied planning, but existing methods often rely on environment-agnostic imitation learning, leading to poor performance in unfamiliar scenarios [2]. - The WAP framework has shown a success rate increase from 2% to 62.7% on the EB-ALFRED benchmark, surpassing models like GPT-4o and Claude-3.5-Sonnet, highlighting the importance of world perception in high-level planning [2][17]. Group 2: Related Work - WAP differs from existing approaches by explicitly binding instruction-environment context at the data level and relying solely on visual feedback without privileged information [4]. Group 3: Technical Method - The framework injects four-dimensional cognitive narratives (visual, spatial, functional, syntactic) into the data layer, allowing the model to understand the environment before reasoning deeply [6]. - It employs closed-loop observation (only RGB + instructions) and a three-stage curriculum learning approach to develop environmental understanding and long-term reasoning capabilities [6][12]. Group 4: Experiments - The performance comparison on the EmbodiedBench (EB-ALFRED) shows that the WAP approach significantly enhances success rates across various task categories, with Qwen2.5-VL achieving a 60.7 percentage point increase in average success rate [14]. - The WAP framework demonstrates a notable improvement in long-term task success rates, achieving 70% compared to previous models [14][16]. Group 5: Conclusion and Future Work - WAP effectively incorporates world knowledge into the data and reasoning processes, allowing smaller open-source LVLMs to outperform commercial models in pure visual closed-loop settings [17]. - Future work includes expanding to dynamic industrial/outdoor scenes and exploring self-supervised narrative evolution for data-model iterative improvement [21].