Workflow
具身智能之心
icon
Search documents
招募VLA+RL&人形运控&数采相关的合作伙伴!
具身智能之心· 2025-12-13 16:02
具身VLA+RL、运控、数采相关课程设计、PPT制作。 正在从事具身领域研究的童鞋,我们期望您至少发表一篇ccf-a级别会议或有1年以上的工业界经验。 高于行业水平的薪资和资源共享,可兼职,感兴趣的可以添加负责人微信做进一步沟通。 招募VLA+RL&人形运控&数采相关的合作伙伴! 最近后台收到很多同学关于具身VLA+RL、机器人运控、数采相关的内容咨询,确实是行业比较有价值的方 向,但又存在一定的门槛。 具身智能之心期望和领域大牛一起研发相关方向的课程或实战项目,为正在从事相关工作的同学提供更多见 解。 如果有大佬感兴趣,可以添加峰哥微信:oooops-life做进一步咨询。 合作内容 待遇说明 一些要求 ...
用SO-100,竟然完成这么多VLA实战......
具身智能之心· 2025-12-13 01:02
Core Viewpoint - The article discusses the challenges and complexities faced by beginners in implementing VLA (Vision-Language Alignment) models, emphasizing the need for practical experience and effective training methods to achieve successful deployment in real-world applications [2][4]. Group 1: Challenges in VLA Implementation - Many students report difficulties in achieving effective results with open-source models like GR00T and PI0, despite low training loss in simulations [2][4]. - The transition from simulation to real-world application (sim2real) poses significant challenges, particularly in data collection and model training [6][7]. - Beginners often struggle with the intricacies of VLA models, leading to prolonged periods of trial and error without achieving satisfactory outcomes [4][6]. Group 2: VLA Model Components - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on high-quality data acquisition [6]. - Training VLA models typically requires extensive simulation debugging, especially when real-world data is insufficient, utilizing frameworks like Mujoco and Isaac Gym [7]. - Post-training, models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [9]. Group 3: Educational Initiatives - The article introduces a practical course aimed at addressing the learning curve associated with VLA technologies, developed in collaboration with industry experts [10][12]. - The course covers a comprehensive range of topics, including hardware, data collection, VLA algorithms, and real-world experiments, designed to enhance practical skills [12][25]. - The course is targeted at individuals seeking to enter or advance in the field of embodied intelligence, with prerequisites including a foundational knowledge of Python and PyTorch [22].
看一次就能执行!VLA的零样本学习是伪命题吗?
具身智能之心· 2025-12-13 01:02
Core Insights - The article discusses the ViVLA framework, which enables robots to learn new skills from single video demonstrations, addressing the limitations of existing Vision-Language-Action (VLA) models in generalizing to tasks outside their training distribution [1][2][25] Group 1: Challenges in Robot Skill Generalization - Four core challenges hinder the generalization of robot skills: insufficient fine-grained action recognition, differences in action representation and modalities, inherent flaws in autoregressive modeling, and a lack of diverse expert-agent pairing data [4][5][7] Group 2: ViVLA's Technical Framework - ViVLA employs a three-layer technical system: unified action space construction, parallel decoding optimization, and large-scale data generation to achieve efficient learning from single video demonstrations [8] - The first layer focuses on latent action learning through an Action-Centric Cycle-Consistency (A3C) framework to bridge the gap between different expert and agent action spaces [10] - The second layer enhances model training efficiency with parallel decoding and spatiotemporal masking strategies, improving video understanding and reducing inference delays [11][12] Group 3: Data Generation and Validation - ViVLA's data generation pipeline converts human videos into high-quality paired data, resulting in a dataset of over 892,911 expert-agent training samples [13][17] - The framework's effectiveness is validated through a three-tier performance verification system, demonstrating significant improvements in unseen task success rates compared to baseline models [14][16] Group 4: Performance Metrics - In the LIBERO benchmark, ViVLA achieved over a 30% performance increase in unseen tasks compared to baseline models, with success rates of 74% in real-world manipulation tasks, significantly outperforming other models [14][16][18] - The model maintained a success rate of over 70% in varying environmental conditions, showcasing its robustness [20] Group 5: Future Directions and Limitations - While ViVLA represents a breakthrough in single-sample video imitation learning, there are areas for optimization, including enhancing error recovery capabilities and expanding data diversity through automated filtering of human videos [25][27]
效率提升25%,灵巧操作数采困境被「臂-手共享自主框架」解决
具身智能之心· 2025-12-13 01:02
编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 实现通用机器人的类人灵巧操作能力,是机器人学领域长期以来的核心挑战之一。近年来,视觉 - 语言 - 动作 (Vision-Language-Action,VLA) 模型在机器人技能学 习方面展现出显著潜力,但其发展受制于一个根本性瓶颈: 高质量操作数据的获取。 ByteDance Seed 团队最新的研究论文《End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy》[1],针对这一关键问题提出了解决方案。 该研究的核心贡献在于提出了共享自主 (Shared Autonomy) 框架,通过合理划分人类操作员与自主 AI 系统的控制职责——人通过 VR 遥操作控制机械臂 (负责高层 定位和避障),DexGrasp-VLA 自主控制灵巧手 (负责精细抓握),消除了同时遥操作臂和灵巧手的需求,大幅降低操作员认知负荷,有效解决了机器人部署中最关 键的数据采集成本问题。通过将数据采集 ...
全球强化学习+VLA范式,PI*0.6背后都有这家公司技术伏笔
具身智能之心· 2025-12-13 01:02
以下文章来源于具身纪元 ,作者具身纪元 具身纪元 . 见证具身浪潮,书写智能新纪元 编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在 Physical Intelligence 最新的成果 π 0.6 论文里,他们介绍了 π 0 .6 迭代式强化学习的思路来源: 其中有我们熟悉的 Yuke Zhu 的研究,也有他们自己(Chelsea Finn、Sergey Levine)的一些研究,我们之前对这些工作一直有跟踪和介绍。此外,还 有来自国内具身智能团队的工作,比如清华大学、星动纪元的研究。 随着 π*0.6 的发布,VLA+online RL 成为了一个行业共识的非常有前景的研究方向 ( 深扒了Π*0.6的论文,发现它不止于真实世界强化学习 、 英伟达也来做VLA在真实世界自我改进的方法了 )大语言模型从SFT到RL的发展方向也逐渐 在具身研究中清晰明朗。 一、为什么VLA+RL很重要 图注:VLA模型依赖研读微调 在具身智能(Embodi ...
具身智能之心论文辅导正式推出了,国内最专业的师资来啦!
具身智能之心· 2025-12-12 07:59
如果您有任意论文发表需求,支持带课题/研究方向咨询,欢迎联系我们, 微信:paperguidance 具身智能之心论文辅导正式推出了,国内最专业的师资来啦! 如果您是大模型、VLA、VLA+RL、视觉语言导航、端到端、强化学习、Diffusion Policy、sim2real、具身交 互、位姿估计、机器人决策规划、运动规划、3DGS、SLAM、触觉感知、双足/四足机器人、遥控操作、零样 本学习等方向。 提供的服务 论文全流程指导; 实验指导; 申博指导; 具身智能顶会/顶刊,CCF-A、CCF-B、CCF-C等; SCI一区~四区; 中科院1区,2区,3区,4区; EI/中文核心; 毕设论文/申博/比赛等; 更多咨询 论文选题; 中标率很高哦! 论文中已有多篇被CVPR、AAAI、ECCV、CoRL、ICLR、IROS、ICRA、ACL等顶会顶刊收录。 根据不同论文级别,辅导价格不同,具体如下: 多咨询 更多论文辅导内容,欢迎咨询科研助理, 微信:paperguidance ...
大摩预测了25家人形机器人公司将主导行业,没有宇树、智元
具身智能之心· 2025-12-12 07:59
Core Insights - Morgan Stanley predicts that 25 humanoid robot companies will dominate the industry, with 7 Chinese companies listed [2][3] - The Chinese companies include Baidu, Alibaba, Horizon Robotics, Junsheng Electronics, iFlytek, Desay SV, and Hesai Technology, focusing on various sectors such as AI, automotive, and electronic manufacturing [3][4] - The report emphasizes the importance of component and module suppliers over traditional humanoid robot manufacturers, highlighting the critical role of companies providing AI chips, visual sensors, precision actuators, and power management chips [3][4] Company and Industry Summary - The 7 Chinese companies identified are significant players in their respective fields, with a focus on AI, automotive intelligence, language recognition, and electronic manufacturing [3] - The absence of companies like Yushun and Zhiyuan in the report raised questions about its professionalism, but Morgan Stanley justified this by focusing on the foundational components essential for the humanoid robot industry [4] - The Chinese market has seen the emergence of nearly 150 humanoid robot startups, indicating a growing interest and investment in this sector, regardless of potential market bubbles [4]
GLaD:知识蒸馏将3D几何先验注入VLA模型,任务成功率突破94%
具身智能之心· 2025-12-12 01:22
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Minghao Guo等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 一、研究背景与核心动机 视觉-语言-动作(VLA)模型是具身智能领域的关键技术,能够让机器人直接从视觉观测和自然语言指令中生成控制动作。现有VLA模型大多依赖CLIP、SigLIP等 2D视觉编码器,这类编码器擅长捕捉图像与文本的语义对应关系,却无法编码3D空间信息(如深度、物体位姿、空间关系)。 这种缺陷会导致模型在操作任务中出现错误的注意力分配,如figure1所示:在"将桌布从桌角移到桌边"和"拾取盘子与ramekin之间的黑碗并放到盘子上"任务中,传 统VLA模型会错误关注无关区域,无法精准定位任务相关物体,进而影响操作任务的完成精度。 为解决这一问题,研究团队提出GLaD框架,核心思路是通过知识蒸馏将3D几何先验注入VLA模型,使其同时具备语义理解和空间推理能力,且无需依赖额外的深 度传感器或3D标注。 ...
被拒≠失败!这些高影响力论文都被顶会拒收过
具身智能之心· 2025-12-12 01:22
Core Insights - Waymo has released a deep blog detailing its AI strategy centered around its foundational model, emphasizing the use of distillation methods to create high-efficiency models for onboard operations [1][2] - Jeff Dean highlighted the significance of knowledge distillation, comparing it to the creation of the Gemini Flash model, which showcases the importance of distillation in AI model efficiency [1][2] Historical Context of Rejected Papers - Many foundational technologies in AI, such as optimizers for large models and computer vision techniques, were initially rejected by top conferences, showcasing a historical pattern of oversight in recognizing groundbreaking innovations [6] - Notable figures in AI, including Geoffrey Hinton and Yann LeCun, have faced rejection for their pioneering work, which was later recognized as transformative [6] Case Studies of Rejected Innovations - LSTM, a milestone for sequence data processing, was rejected by NIPS in 1996 but later became crucial in speech recognition and machine translation, highlighting the delayed recognition of its value [7][10] - SIFT, a dominant algorithm in computer vision, faced rejection from ICCV and CVPR due to its perceived complexity, yet proved to be vital in real-world image processing [11][13] - Dropout, a key regularization method for deep neural networks, was initially rejected for its radical approach but later became essential in training deep networks effectively [17][19] - Word2Vec, despite being rejected at ICLR, became a cornerstone in NLP due to its efficiency and practical application, eventually receiving recognition for its impact [20][24] - YOLO transformed object detection by prioritizing speed over precision, facing rejection for its perceived shortcomings but later becoming a widely adopted framework in the industry [28][30] Reflection on Peer Review Limitations - The peer review system often struggles to recognize disruptive innovations, leading to a systematic cognitive lag in evaluating groundbreaking research [40][41] - The tendency to equate mathematical complexity with research contribution can hinder the acceptance of simpler yet effective methods [41] - Historical examples illustrate that the true measure of a research's impact is not determined by initial peer review outcomes but by its long-term relevance and problem-solving capabilities [43][47]
NeurIPS'25! AutoSeg3D:在线完成任意3D分割,只需1张4090
具身智能之心· 2025-12-12 01:22
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 前沿 在大模型时代都在拼scaling,对于具身自驾这种任务似乎都想要8卡起步训练一个模型,今天借助分享的机会也给大家推荐可以1张4090就能发顶会的方向,就是本 文写的具身场景点云实例分割。当然不是说推荐给大家一个用少量资源"水论文"的方式,当时让学生做这个方向也是因为觉得是可以真实落地的技术,也没特别出 乎意料,这篇论文已经开始在两家公司进行技术转化切实落地。对于具身来说,VLA或者各种所谓世界模型是很fancy,但是还有很多听起来不那么fancy的方向既 能发论文又能真实落地,也希望能看到更多底层技术可以被研究优化支撑真正的产业化。 也欢迎大家来无界-AutoLab联合实验室(上海)实习,一起共创各种有意 思的技术方向:) -- Dylan老师 论文总结 (1)作者发现现有的在线 VFM 辅助方法通常先用 SAM 等 VFMs 预测 2D ...