具身智能之心
Search documents
最近收到了很多同学关于具身方向选择的咨询......
具身智能之心· 2025-12-17 00:05
Group 1 - The article discusses various directions in embodied intelligence, including VLN, VLA, reinforcement learning, and real2sim2real, highlighting the confusion among newcomers regarding which path to choose [1] - For those engaged in SLAM, both VLN and VLA are recommended as good entry points, especially if they have robotic arms, while low-cost hardware options like SO-100 can be utilized for experiments [1] - The importance of having a good idea is emphasized, as many new researchers face challenges in finding innovative topics, and the article offers a paper guidance service to assist them [1][2] Group 2 - The paper guidance service is led by a team of experts from top universities and leading companies, covering a range of prestigious conferences and journals [2] - The service provides a comprehensive support process from topic selection to publication strategy, aiming to help researchers produce high-quality results quickly [2][3] - The article also mentions a promotional offer where the first ten inquiries can receive a free matching with a dedicated mentor [5]
具身的半壁江山都在VLA了......
具身智能之心· 2025-12-16 09:25
Core Viewpoint - The article emphasizes the increasing demand for VLA (Variable Learning Algorithm) in the industry, highlighting the challenges associated with data collection and model training, and the need for practical learning resources in this field [1][2][3]. Group 1: VLA Demand and Challenges - There is a significant demand for VLA algorithms in job postings, indicating a growing interest in this technology [1]. - Many practitioners express frustration with the difficulties in tuning VLA algorithms and the complexities of data collection [2]. - The reliance on real machine data for effective VLA model training poses challenges, as many companies struggle with the quality of the collected data [3]. Group 2: VLA Implementation Modules - The implementation of VLA involves several key modules, including data collection methods based on imitation learning and reinforcement learning [8]. - Training VLA models typically requires simulation debugging, especially when real machine data is insufficient, making simulation frameworks like Mujoco and Isaac Gym crucial [9]. - After training, VLA models often require optimization techniques such as quantization and distillation to reduce model size while maintaining performance [10]. Group 3: Educational Resources and Courses - The article introduces a practical course aimed at helping individuals learn VLA effectively, addressing the rapid updates in technology and the challenges faced by learners [11]. - The course covers a comprehensive curriculum, including mechanical arm hardware, data collection, VLA algorithms, evaluation, simulation, and deployment [16][17]. - Participants will receive hands-on experience with real hardware, enhancing their learning and practical skills in the VLA domain [28].
NBA球星,成为英伟达副总裁
具身智能之心· 2025-12-16 00:02
Core Insights - The article discusses NVIDIA's unique management structure under CEO Jensen Huang, who directly oversees a flat team of 36 executives, down from a peak of 55, emphasizing efficiency and direct communication [4][24]. - Huang believes in the principle that "information is power," allowing each executive to access firsthand information to accelerate decision-making and innovation [5][13]. - The article highlights the diverse backgrounds of Huang's team, which includes veterans, industry experts, and newcomers, all contributing to NVIDIA's success in various fields such as AI, automotive, and cloud computing [29][30]. Group 1: Management Structure - Huang's direct management of 36 executives is considered atypical in the tech industry, where leaders like Mark Zuckerberg and Elon Musk manage smaller teams [8][11]. - The flat structure reduces layers of hierarchy, facilitating faster information flow and decision-making [14][15]. - Huang's approach fosters a culture of high workload and commitment, maintaining an entrepreneurial spirit even as the company grows [19][20]. Group 2: Key Executives - The article introduces several key executives, including Chris Malachowsky, Dwight Diercks, and Jeff Fisher, who have been instrumental in NVIDIA's growth and innovation [33][43][51]. - Malachowsky, a co-founder, focuses on core technology strategy, while Diercks has been pivotal in software development for NVIDIA's products [37][45]. - Fisher has played a crucial role in establishing NVIDIA's brand in the gaming market, leading the GeForce business unit [53][56]. Group 3: Technical Leadership - Bill Dally, NVIDIA's Chief Scientist, is noted for his contributions to parallel computing and deep learning acceleration, marking a significant shift in the company's focus [78][84]. - Michael Kagan, the CTO, integrates networking technology with GPU advancements, driving innovations like the Data Processing Unit (DPU) [88][91]. - Ian Buck, a pioneer in GPU computing, oversees the data center business, ensuring NVIDIA's leadership in AI and supercomputing [96][100]. Group 4: Business and Operations - Colette Kress, CFO, has been crucial in balancing R&D investments with profitability, helping NVIDIA achieve significant revenue growth [158][163]. - Jay Puri, responsible for global business development, has expanded NVIDIA's market reach across various sectors, including gaming and data centers [169][175]. - Debora Shoquist, overseeing operations, has restructured supply chains to meet the increasing demand for GPUs, ensuring timely delivery [182][189]. Group 5: New Business Ventures - Howard Wright, a recent addition, leads NVIDIA's Inception startup program, leveraging his extensive network to foster innovation in AI [248][259]. - Xinzhou Wu, responsible for automotive business, brings expertise from his previous roles in autonomous driving, aiming to enhance NVIDIA's market presence in this sector [264][270]. - Alexis Bjorlin, managing DGX Cloud services, focuses on providing AI computing capabilities through cloud platforms, marking NVIDIA's transition to a service-oriented model [278][285].
UniBYD:超越人类示教模仿的跨实体机器人操作学习统一框架
具身智能之心· 2025-12-16 00:02
Research Background and Core Issues - The mainstream paradigm in embedded intelligence is learning robot operations from human demonstrations, but the morphological differences between human hands and various robotic hands (e.g., 2-finger, 3-finger, 5-finger) pose a significant barrier to technology implementation [3] - The core goal of UniBYD is to establish a learning paradigm that transcends mere imitation of human actions, enabling robots to autonomously discover operational strategies that match their physical characteristics, thus achieving efficient generalization across different robotic hand forms [3] Core Innovation: UniBYD Framework Design - UniBYD is a unified reinforcement learning framework that facilitates a smooth transition from imitation to exploration through three core components: unified morphological representation, dynamic reinforcement learning mechanism, and fine imitation guidance [5] Unified Morphological Representation (UMR) - UMR addresses modeling differences among various robotic hand morphologies by unifying dynamic states and static attributes into a fixed-dimensional representation [7] - Dynamic state processing involves fixing the wrist state to 13 dimensions (position, posture, speed) and padding joint states to the maximum degrees of freedom, using trigonometric encoding to avoid wrapping issues [8] Dynamic PPO: Gradual Learning from Imitation to Exploration - Traditional imitation learning is limited to replicating human actions, resulting in performance far below human levels due to physical differences [10] - Dynamic PPO utilizes a reward annealing mechanism and loss collaborative balance to achieve a smooth transition from imitating humans to autonomous exploration [12] Reward Mechanisms - The reward structure includes imitation rewards, which quantify the similarity between the current state and human demonstration across multiple dimensions, and goal rewards, which are given only upon successful task completion [13][14] - The total reward is a weighted sum of these two types, with weights dynamically adjusted based on training progress and success rates [15] Loss Collaborative Balance - To ensure effective exploration and physical feasibility, two types of losses are incorporated into the PPO objective: entropy regularization to encourage exploration and boundary loss to prevent actions from exceeding physical limits [16][17] Mixed Markov Shadow Engine: Fine Guidance in Early Imitation - The shadow engine addresses early training challenges by combining action mixing and object-assisted control, ensuring stability during the initial phases of training [20] Performance Validation - The UniManip benchmark is designed as the first cross-morphological robotic operation benchmark, covering 29 types of single/double-hand tasks adaptable to 2-finger, 3-finger, and 5-finger robotic hands [25] - The framework demonstrates high success rates across all hand morphologies, with a 67.9% improvement over existing methods, and significant reductions in position and orientation errors [28] Real-World Transfer: Effectiveness from Simulation to Physical Robots - The framework was validated on various robotic hands, achieving success rates of 52% for 2-finger, 64% for 3-finger, and 70% for 5-finger robots, showcasing adaptability to different hardware characteristics [34] Core Conclusions and Significance - UniBYD represents a paradigm shift by moving beyond the limitations of "copying human actions" to propose a "morphological adaptation strategy" learning paradigm, facilitating a smooth transition from imitation to exploration through dynamic reinforcement learning [39] - The unified morphological representation enables the framework to directly adapt to various robotic hand forms, addressing the core challenges of cross-morphological generalization [39] - The performance significantly surpasses state-of-the-art methods and successfully transfers to real-world robots, providing a universal solution for diverse robotic operation tasks [39]
许华哲,抓紧时间慢慢等具身的未来......
具身智能之心· 2025-12-16 00:02
作者丨 许华哲 编辑丨具身智能之心 本文已经得到许华哲博士的授权,未经允许,不得二次转载。 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 昨天看到了许华哲老师在社交媒体上的分享,关于数据、量产、本体和场景。类似的观点,今年IROS圆桌期间,许博也站在智能第一性原理上,将具身的未来发展 方向划分为欲望、先验和经验三个模块。 欲望。 在做智能体的时候,无论是物理的还是虚拟的,总觉得现在机器学习没有自己的学习欲望。我们可以设想一下,能不能给机器人一种自己的欲望? 经验。 经验是完成世界最终闭环的一种手段。有一天,在家里面看到一位维修师傅就是帮我们修煤气灶,他踩在一个梯子上拧一个东西,整个身体造型极为扭曲, 但他仍可以完美控制重心保持平衡,并且手上还可以做非常精细的操作。 ★ 这种思想也贯穿在后续的研发和学术探索上。 回想起几年前,我们还在讨论机器人什么时候能全地形走路,后来发现这个话题变成了"跑酷"、"跳舞"、"篮球"。这个变化速率让我知道这个事儿已经成了,如果 明年可以攀岩我并不吃惊。 但这极快的变化速率又显得格外不协调,因为我没在任何地方看到人形机器人真正服务人 ...
新国大团队首创!当VLA具备4D感知能力后会怎么样?
具身智能之心· 2025-12-15 03:17
Core Insights - The article discusses the VLA-4D model, which integrates 4D awareness into vision-language-action frameworks for coherent robotic manipulation, addressing challenges in spatiotemporal consistency in robotic tasks [2][3]. Group 1: Model Features - VLA-4D enhances traditional spatial action representation by incorporating temporal information, allowing for improved spatiotemporal action planning and prediction [2]. - The model consists of two key modules: a 4D perception visual representation that combines visual features with temporal data, and a spatiotemporal action representation that aligns multimodal representations with large language models [2]. Group 2: Applications and Challenges - The VLA-4D model aims to achieve both spatial fluidity and temporal consistency in robotic operations, which is crucial for dynamic environments [2]. - Existing methods struggle with maintaining temporal coherence during action execution, highlighting the need for advancements like VLA-4D [2]. Group 3: Related Technologies - The article also mentions foundational models such as 4D-VGGT for dynamic geometric perception and LLaVA-4D for enhanced dynamic scene reasoning, which complement the capabilities of VLA-4D [6][7].
看一次就能执行!单视频示范零样本学习&跨模态动作知识迁移
具身智能之心· 2025-12-15 01:04
Core Insights - The article discusses the ViVLA framework, which enables robots to learn new skills from single video demonstrations, addressing the limitations of existing Vision-Language-Action (VLA) models in generalizing to tasks outside their training distribution [1][2][25]. Group 1: Challenges in Robot Skill Generalization - Four core challenges hinder the generalization of robot skills: insufficient fine-grained action recognition, differences in action representation and modalities, inherent flaws in autoregressive modeling, and a lack of diverse expert-agent pairing data [4][5][7]. Group 2: ViVLA's Technical Framework - ViVLA employs a three-layer technical system: unified action space construction, parallel decoding optimization, and large-scale data generation, facilitating efficient learning from single expert demonstration videos [1][8]. - The first layer focuses on latent action learning through an Action-Centric Cycle-Consistency (A3C) framework to bridge the gap between different expert and agent action spaces [10]. - The second layer enhances model training efficiency with parallel decoding and spatiotemporal masking strategies, improving video understanding and action prediction [11][12]. Group 3: Data Generation and Validation - ViVLA's data generation pipeline converts human videos into high-quality paired data, resulting in a dataset of over 892,911 expert-agent training samples [13][17]. - The framework's effectiveness is validated through a three-tier performance verification system, demonstrating significant improvements in unseen task success rates compared to baseline models [14][16]. Group 4: Performance Metrics - In the LIBERO benchmark test, ViVLA achieved over a 30% performance increase in unseen tasks compared to baseline models, with success rates of 74% in real-world manipulation tasks, significantly outperforming other models [14][16][18]. - The model maintained a success rate of over 70% in varying environmental conditions, showcasing its robustness [20]. Group 5: Future Directions and Limitations - While ViVLA represents a breakthrough in single-sample video imitation learning, there are areas for optimization, including enhancing error recovery capabilities and expanding data diversity through automated filtering of human videos [25][27].
面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS'25
具身智能之心· 2025-12-15 01:04
编辑丨量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 这些运行在距地数百公里的卫星星座,正默默支撑着遥感、通信、导航、气象预测等关键行业。但每一个稳定运行的星座背后,都藏着一 个高维、动态、强约束的规划难题。 如何在短短几分钟的观测窗口内,调度数十颗卫星形成协同观测网络,执行上百项任务,同时响应 地震救援、海上搜救、森林火灾等突发需求? 人工智能技术正在成为破解这一难题的关键钥匙。北航刘偲教授团队提出 首个大规模真实 星座调度基准AEOS-Bench ,更创新性地将Transformer模型的泛化能力与航天工程的专业需求深度融合,训练 内嵌时间约束的调度模 型AEOS-Former 。这一组合为未来的"AI星座规划"奠定了新的技术基准。 该研究目前已发表于NeurIPS 2025。 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 将卫星星座送入轨道我们都知道很难,但高效规划调度在轨卫星星座执行任务也不简单。 随着部署的星座规模越来越大,通过人力进行任务规划的效率已经赶不上卫星的任务执行效率,于是研 ...
Q4融资超过亿元的具身公司.......
具身智能之心· 2025-12-15 01:04
Core Insights - The article provides an overview of the financing situation for embodied robotics companies, highlighting investments over 100 million yuan across various funding rounds from angel to Series C [1]. Company Summaries - **AI² Robotics**: Secured hundreds of millions in funding, focusing on AGI-native general intelligent robots, with applications in semiconductor, automotive, electronics, biotechnology, and public services [4]. - **Self-Variable Robotics**: Raised 1 billion yuan, specializing in AI and robotics technology innovation, building general intelligent agents based on large robot models [5]. - **Xingyuan Intelligent Robotics**: Received 300 million yuan, developing a general embodied brain technology aimed at creating a universal brain for physical world interaction [6]. - **Micro Differential Intelligence**: Funded 100 million yuan, focusing on aerial robotics and intelligent systems for industrial and urban applications [7]. - **Dyna Robotics**: Raised 120 million yuan, dedicated to AI-driven robotics for various tasks, emphasizing cost-effective learning in real production scenarios [8]. - **Motorevo**: Secured 100 million yuan, specializing in robotic joints and power units for various robotic applications [9][18]. - **Lexiang Technology**: Funded 200 million yuan, focusing on general household robotics in the AI era [10]. - **Qianjue Robotics**: Raised 100 million yuan, developing high-dimensional multi-modal tactile perception technology for robotics [11]. - **Leju Robotics**: Secured 1.5 billion yuan, focusing on humanoid robot commercialization and technology accumulation [12]. - **Lingxin Qiaoshou**: Received hundreds of millions, developing a platform centered on dexterous hands and cloud intelligence [13]. - **Songyan Power**: Funded 300 million yuan, focusing on humanoid robot development and manufacturing [14]. - **Wubai Intelligent**: Raised 500 million yuan, a state-owned enterprise focusing on bionic intelligence and robotics [15]. - **Shengshi Weisheng**: Secured 100 million yuan, developing intelligent robots for manufacturing automation [16]. - **Zhongke Optoelectronics**: Funded 215 million yuan, focusing on high-end intelligent robot products for military and manufacturing sectors [17]. - **Deepwood Intelligent**: Raised 200 million yuan, specializing in general embodied intelligent robotics [19]. - **Wujie Power**: Secured 300 million yuan, focusing on building a "universal brain" for robotics [20]. - **Yuanli Lingji**: Funded hundreds of millions, focusing on industrial and logistics automation solutions [21]. - **Accelerated Evolution**: Raised 100 million yuan, developing humanoid robots with advanced motion capabilities [22]. - **Stardust Intelligent**: Secured hundreds of millions, focusing on commercial humanoid robots with strong operational performance [23]. - **Guanglun Intelligent**: Developing solutions for robotics using high-quality simulation and physical AI technology [24]. - **New Era Intelligent**: Raised 100 million yuan, focusing on commercial cleaning robots [25]. - **Star Motion Era**: Secured over 1 billion yuan, focusing on general humanoid robotics technology [26]. - **Aoyi Technology**: Funded 160 million yuan, specializing in non-invasive brain-machine interfaces and rehabilitation robotics [27]. - **Daimeng Robotics**: Raised 100 million yuan, focusing on multi-modal tactile perception and wearable remote operation systems [28]. - **Luming Robotics**: Secured hundreds of millions, focusing on family-oriented intelligent robotics [29]. - **UniX AI**: Funded 300 million yuan, specializing in AI and humanoid robotics technology [30]. - **Ling Sheng Technology**: Raised 100 million yuan, focusing on integrated systems for humanoid and embodied intelligent robotics [31]. - **Cloud Deep Technology**: Funded 500 million yuan, specializing in quadruped robot development [32].
没有好的科研能力,别想着去工业搞具身了~
具身智能之心· 2025-12-15 01:04
Core Insights - The current job market highly values individuals with embodied intelligence research experience, with many students being recruited even before graduation [1] - Complete research capability is defined as the ability to identify problems, define them, propose solutions, and produce methodological outputs, which goes beyond merely reading papers [1] Group 1: Research Challenges - Many students face challenges such as lack of familiarity with embodied intelligence topics and the need for self-research [2] - The fastest way to improve is to work alongside an experienced researcher, as offered by the company through a 1-on-1 research mentoring service [2] Group 2: Mentoring Focus Areas - The mentoring services cover a wide range of topics including large models, visual language navigation, reinforcement learning, and various robotics applications such as motion planning and tactile perception [3] Group 3: Services Offered - The company provides services for paper topic selection and full-process guidance for research papers [6] - There is a high success rate for papers submitted to top conferences and journals such as CVPR, AAAI, and ICLR [8] Group 4: Common Research Issues - Common issues faced by students include lack of understanding of field pain points, insufficient hands-on experience, and difficulties in experimental design and paper writing [7] Group 5: Pricing and Consultation - The mentoring prices vary based on the level of the paper, and further details can be obtained by consulting the research assistants [9] - The company supports various publication levels including top conferences and journals across different categories [11]