Workflow
具身智能之心
icon
Search documents
看一次就能执行!单视频示范零样本学习&跨模态动作知识迁移
具身智能之心· 2025-12-15 01:04
Core Insights - The article discusses the ViVLA framework, which enables robots to learn new skills from single video demonstrations, addressing the limitations of existing Vision-Language-Action (VLA) models in generalizing to tasks outside their training distribution [1][2][25]. Group 1: Challenges in Robot Skill Generalization - Four core challenges hinder the generalization of robot skills: insufficient fine-grained action recognition, differences in action representation and modalities, inherent flaws in autoregressive modeling, and a lack of diverse expert-agent pairing data [4][5][7]. Group 2: ViVLA's Technical Framework - ViVLA employs a three-layer technical system: unified action space construction, parallel decoding optimization, and large-scale data generation, facilitating efficient learning from single expert demonstration videos [1][8]. - The first layer focuses on latent action learning through an Action-Centric Cycle-Consistency (A3C) framework to bridge the gap between different expert and agent action spaces [10]. - The second layer enhances model training efficiency with parallel decoding and spatiotemporal masking strategies, improving video understanding and action prediction [11][12]. Group 3: Data Generation and Validation - ViVLA's data generation pipeline converts human videos into high-quality paired data, resulting in a dataset of over 892,911 expert-agent training samples [13][17]. - The framework's effectiveness is validated through a three-tier performance verification system, demonstrating significant improvements in unseen task success rates compared to baseline models [14][16]. Group 4: Performance Metrics - In the LIBERO benchmark test, ViVLA achieved over a 30% performance increase in unseen tasks compared to baseline models, with success rates of 74% in real-world manipulation tasks, significantly outperforming other models [14][16][18]. - The model maintained a success rate of over 70% in varying environmental conditions, showcasing its robustness [20]. Group 5: Future Directions and Limitations - While ViVLA represents a breakthrough in single-sample video imitation learning, there are areas for optimization, including enhancing error recovery capabilities and expanding data diversity through automated filtering of human videos [25][27].
面向「空天具身智能」,北航团队提出星座规划新基准丨NeurIPS'25
具身智能之心· 2025-12-15 01:04
编辑丨量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 这些运行在距地数百公里的卫星星座,正默默支撑着遥感、通信、导航、气象预测等关键行业。但每一个稳定运行的星座背后,都藏着一 个高维、动态、强约束的规划难题。 如何在短短几分钟的观测窗口内,调度数十颗卫星形成协同观测网络,执行上百项任务,同时响应 地震救援、海上搜救、森林火灾等突发需求? 人工智能技术正在成为破解这一难题的关键钥匙。北航刘偲教授团队提出 首个大规模真实 星座调度基准AEOS-Bench ,更创新性地将Transformer模型的泛化能力与航天工程的专业需求深度融合,训练 内嵌时间约束的调度模 型AEOS-Former 。这一组合为未来的"AI星座规划"奠定了新的技术基准。 该研究目前已发表于NeurIPS 2025。 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 将卫星星座送入轨道我们都知道很难,但高效规划调度在轨卫星星座执行任务也不简单。 随着部署的星座规模越来越大,通过人力进行任务规划的效率已经赶不上卫星的任务执行效率,于是研 ...
Q4融资超过亿元的具身公司.......
具身智能之心· 2025-12-15 01:04
Core Insights - The article provides an overview of the financing situation for embodied robotics companies, highlighting investments over 100 million yuan across various funding rounds from angel to Series C [1]. Company Summaries - **AI² Robotics**: Secured hundreds of millions in funding, focusing on AGI-native general intelligent robots, with applications in semiconductor, automotive, electronics, biotechnology, and public services [4]. - **Self-Variable Robotics**: Raised 1 billion yuan, specializing in AI and robotics technology innovation, building general intelligent agents based on large robot models [5]. - **Xingyuan Intelligent Robotics**: Received 300 million yuan, developing a general embodied brain technology aimed at creating a universal brain for physical world interaction [6]. - **Micro Differential Intelligence**: Funded 100 million yuan, focusing on aerial robotics and intelligent systems for industrial and urban applications [7]. - **Dyna Robotics**: Raised 120 million yuan, dedicated to AI-driven robotics for various tasks, emphasizing cost-effective learning in real production scenarios [8]. - **Motorevo**: Secured 100 million yuan, specializing in robotic joints and power units for various robotic applications [9][18]. - **Lexiang Technology**: Funded 200 million yuan, focusing on general household robotics in the AI era [10]. - **Qianjue Robotics**: Raised 100 million yuan, developing high-dimensional multi-modal tactile perception technology for robotics [11]. - **Leju Robotics**: Secured 1.5 billion yuan, focusing on humanoid robot commercialization and technology accumulation [12]. - **Lingxin Qiaoshou**: Received hundreds of millions, developing a platform centered on dexterous hands and cloud intelligence [13]. - **Songyan Power**: Funded 300 million yuan, focusing on humanoid robot development and manufacturing [14]. - **Wubai Intelligent**: Raised 500 million yuan, a state-owned enterprise focusing on bionic intelligence and robotics [15]. - **Shengshi Weisheng**: Secured 100 million yuan, developing intelligent robots for manufacturing automation [16]. - **Zhongke Optoelectronics**: Funded 215 million yuan, focusing on high-end intelligent robot products for military and manufacturing sectors [17]. - **Deepwood Intelligent**: Raised 200 million yuan, specializing in general embodied intelligent robotics [19]. - **Wujie Power**: Secured 300 million yuan, focusing on building a "universal brain" for robotics [20]. - **Yuanli Lingji**: Funded hundreds of millions, focusing on industrial and logistics automation solutions [21]. - **Accelerated Evolution**: Raised 100 million yuan, developing humanoid robots with advanced motion capabilities [22]. - **Stardust Intelligent**: Secured hundreds of millions, focusing on commercial humanoid robots with strong operational performance [23]. - **Guanglun Intelligent**: Developing solutions for robotics using high-quality simulation and physical AI technology [24]. - **New Era Intelligent**: Raised 100 million yuan, focusing on commercial cleaning robots [25]. - **Star Motion Era**: Secured over 1 billion yuan, focusing on general humanoid robotics technology [26]. - **Aoyi Technology**: Funded 160 million yuan, specializing in non-invasive brain-machine interfaces and rehabilitation robotics [27]. - **Daimeng Robotics**: Raised 100 million yuan, focusing on multi-modal tactile perception and wearable remote operation systems [28]. - **Luming Robotics**: Secured hundreds of millions, focusing on family-oriented intelligent robotics [29]. - **UniX AI**: Funded 300 million yuan, specializing in AI and humanoid robotics technology [30]. - **Ling Sheng Technology**: Raised 100 million yuan, focusing on integrated systems for humanoid and embodied intelligent robotics [31]. - **Cloud Deep Technology**: Funded 500 million yuan, specializing in quadruped robot development [32].
没有好的科研能力,别想着去工业搞具身了~
具身智能之心· 2025-12-15 01:04
完整的科研能力是什么呢?代表能发现问题、定义问题、提出解决问题的方法、能形成方法论输出观 点。这并不是简单的读论文,很多同学都错判了这点。 这一年接触到了很多有科研需求的同学,主要有以下几个难题: 老师不熟悉具身方向,需要自己调研; 最快的提升方法则是跟着一个有经验的researcher一起工作,具身智能之心前面推出了1v1科研辅导业务, 也欢迎大家咨询了解。 主要辅导方向 大模型、VLA、VLA+RL、视觉语言导航、端到端、强化学习、Diffusion Policy、sim2real、具身交互、 位姿估计、机器人决策规划、运动规划、3DGS、SLAM、触觉感知、双足/四足机器人、遥控操作、零样 本学习等。 如果您有任意论文发表需求,支持带课题/研究方向咨询,欢迎联系我们, 微信:paperguidance 最近和做人力服务的几个朋友聊天,说到现在市场上有具身领域科研经验的同学都是香饽饽(已经不敢 奢求工业界经验了)。很多同学,还没毕业,就被各类猎头和HR预定了。要求不算很高,那就是"具备 完整的科研能力",能独立完成对应工作。如果缺乏这个,不敢轻易推荐给企业。 提供的服务 论文选题; 论文全流程指导; 不知 ...
具身智能之心招募编辑、运营和销售的童鞋啦
具身智能之心· 2025-12-13 16:02
负责平台课程、硬件等产品的销售推广。我们希望您具备一定的销售基础,对具身用户需求与市场 有一定的了解。 运营岗位 负责公众号、小红书、社群的运营,提升粉丝粘性和关注度。我们希望您有一定的运营能力,对自 媒体平台的玩法有一定认识。 具身智能之心招募编辑、运营和销售的童鞋啦~ 具身智能之心是具身领域的优秀技术创作平台,为行业输出了大量的前沿技术、课程、行业概况、 融资、产品、政策等内容。 现平台正处于上升期,因业务需求,面向全体粉丝招募编辑、运营、销售岗位,全职哦~ 编辑岗位 负责日常公众号平台的内容创作、编辑,我们希望您具备一定的专业基础,在知乎、公众号等平台 上具有内容创作经验。 销售岗位 咨询我们 如果您有兴趣和我们一起成长,欢迎添加峰哥微信oooops-life ...
在看完近50篇VLA+RL工作之后......
具身智能之心· 2025-12-13 16:02
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models and their integration with reinforcement learning (RL) techniques, highlighting various research papers and projects that contribute to this field [2][4][5]. Group 1: Offline RL-VLA - NORA-1.5 is introduced as a vision-language-action model trained using world model- and action-based preference rewards, showcasing its potential in offline reinforcement learning [2][4]. - The paper "Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models" emphasizes the importance of balancing signal and variance in offline RL applications [7]. - CO-RFT presents an efficient fine-tuning method for VLA models through chunked offline reinforcement learning, indicating a trend towards optimizing model performance post-training [9]. Group 2: Online RL-VLA - The concept of reinforcing action policies by prophesying is explored, suggesting a novel approach to enhance online reinforcement learning for VLA models [22]. - WMPO focuses on world model-based policy optimization for VLA models, indicating a shift towards utilizing world models for better policy learning [24]. - RobustVLA emphasizes robustness-aware reinforcement post-training, highlighting the need for models to maintain performance under varying conditions [27]. Group 3: Hybrid Approaches - GR-RL aims to improve dexterity and precision in long-horizon robotic manipulation by combining offline and online reinforcement learning strategies [100]. - The paper "Discover, Learn, and Reinforce" discusses scaling VLA pretraining with diverse RL-generated trajectories, indicating a comprehensive approach to model training [104]. - SRPO introduces self-referential policy optimization for VLA models, showcasing innovative methods to enhance model adaptability and performance [106].
招募VLA+RL&人形运控&数采相关的合作伙伴!
具身智能之心· 2025-12-13 16:02
具身VLA+RL、运控、数采相关课程设计、PPT制作。 正在从事具身领域研究的童鞋,我们期望您至少发表一篇ccf-a级别会议或有1年以上的工业界经验。 高于行业水平的薪资和资源共享,可兼职,感兴趣的可以添加负责人微信做进一步沟通。 招募VLA+RL&人形运控&数采相关的合作伙伴! 最近后台收到很多同学关于具身VLA+RL、机器人运控、数采相关的内容咨询,确实是行业比较有价值的方 向,但又存在一定的门槛。 具身智能之心期望和领域大牛一起研发相关方向的课程或实战项目,为正在从事相关工作的同学提供更多见 解。 如果有大佬感兴趣,可以添加峰哥微信:oooops-life做进一步咨询。 合作内容 待遇说明 一些要求 ...
用SO-100,竟然完成这么多VLA实战......
具身智能之心· 2025-12-13 01:02
Core Viewpoint - The article discusses the challenges and complexities faced by beginners in implementing VLA (Vision-Language Alignment) models, emphasizing the need for practical experience and effective training methods to achieve successful deployment in real-world applications [2][4]. Group 1: Challenges in VLA Implementation - Many students report difficulties in achieving effective results with open-source models like GR00T and PI0, despite low training loss in simulations [2][4]. - The transition from simulation to real-world application (sim2real) poses significant challenges, particularly in data collection and model training [6][7]. - Beginners often struggle with the intricacies of VLA models, leading to prolonged periods of trial and error without achieving satisfactory outcomes [4][6]. Group 2: VLA Model Components - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on high-quality data acquisition [6]. - Training VLA models typically requires extensive simulation debugging, especially when real-world data is insufficient, utilizing frameworks like Mujoco and Isaac Gym [7]. - Post-training, models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [9]. Group 3: Educational Initiatives - The article introduces a practical course aimed at addressing the learning curve associated with VLA technologies, developed in collaboration with industry experts [10][12]. - The course covers a comprehensive range of topics, including hardware, data collection, VLA algorithms, and real-world experiments, designed to enhance practical skills [12][25]. - The course is targeted at individuals seeking to enter or advance in the field of embodied intelligence, with prerequisites including a foundational knowledge of Python and PyTorch [22].
看一次就能执行!VLA的零样本学习是伪命题吗?
具身智能之心· 2025-12-13 01:02
Core Insights - The article discusses the ViVLA framework, which enables robots to learn new skills from single video demonstrations, addressing the limitations of existing Vision-Language-Action (VLA) models in generalizing to tasks outside their training distribution [1][2][25] Group 1: Challenges in Robot Skill Generalization - Four core challenges hinder the generalization of robot skills: insufficient fine-grained action recognition, differences in action representation and modalities, inherent flaws in autoregressive modeling, and a lack of diverse expert-agent pairing data [4][5][7] Group 2: ViVLA's Technical Framework - ViVLA employs a three-layer technical system: unified action space construction, parallel decoding optimization, and large-scale data generation to achieve efficient learning from single video demonstrations [8] - The first layer focuses on latent action learning through an Action-Centric Cycle-Consistency (A3C) framework to bridge the gap between different expert and agent action spaces [10] - The second layer enhances model training efficiency with parallel decoding and spatiotemporal masking strategies, improving video understanding and reducing inference delays [11][12] Group 3: Data Generation and Validation - ViVLA's data generation pipeline converts human videos into high-quality paired data, resulting in a dataset of over 892,911 expert-agent training samples [13][17] - The framework's effectiveness is validated through a three-tier performance verification system, demonstrating significant improvements in unseen task success rates compared to baseline models [14][16] Group 4: Performance Metrics - In the LIBERO benchmark, ViVLA achieved over a 30% performance increase in unseen tasks compared to baseline models, with success rates of 74% in real-world manipulation tasks, significantly outperforming other models [14][16][18] - The model maintained a success rate of over 70% in varying environmental conditions, showcasing its robustness [20] Group 5: Future Directions and Limitations - While ViVLA represents a breakthrough in single-sample video imitation learning, there are areas for optimization, including enhancing error recovery capabilities and expanding data diversity through automated filtering of human videos [25][27]
全球强化学习+VLA范式,PI*0.6背后都有这家公司技术伏笔
具身智能之心· 2025-12-13 01:02
以下文章来源于具身纪元 ,作者具身纪元 具身纪元 . 见证具身浪潮,书写智能新纪元 编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在 Physical Intelligence 最新的成果 π 0.6 论文里,他们介绍了 π 0 .6 迭代式强化学习的思路来源: 其中有我们熟悉的 Yuke Zhu 的研究,也有他们自己(Chelsea Finn、Sergey Levine)的一些研究,我们之前对这些工作一直有跟踪和介绍。此外,还 有来自国内具身智能团队的工作,比如清华大学、星动纪元的研究。 随着 π*0.6 的发布,VLA+online RL 成为了一个行业共识的非常有前景的研究方向 ( 深扒了Π*0.6的论文,发现它不止于真实世界强化学习 、 英伟达也来做VLA在真实世界自我改进的方法了 )大语言模型从SFT到RL的发展方向也逐渐 在具身研究中清晰明朗。 一、为什么VLA+RL很重要 图注:VLA模型依赖研读微调 在具身智能(Embodi ...