具身智能之心
Search documents
欢迎具身世界模型&数采相关方向的大佬加入我们!
具身智能之心· 2025-11-05 09:00
具身世界模型、运控、数采相关课程设计、制作。 招募具身世界模型&数采相关的合作伙伴! 最近后台收到很多同学关于具身世界模型、机器人运控、数采相关的内容咨询,确实是行业比较有价值的 方向,但又存在一定的门槛。 具身智能之心期望和领域大牛一起研发相关方向的课程或实战项目,为正在从事相关工作的同学提供更多 见解。 如果有大佬感兴趣,可以添加峰哥微信:oooops-life做进一步咨询。 合作内容 一些要求 正在从事具身领域研究的童鞋,我们期望您至少发表一篇ccf-a级别会议或有1年以上的工业界经验。 待遇说明 高于行业水平的薪资和资源共享,可兼职,感兴趣的可以添加负责人微信做进一步沟通。 ...
清华团队提出AirScape:动作意图可控的低空世界模型,全面开源!
具身智能之心· 2025-11-05 09:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Baining Zhao等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 人类空间感的重要组成部分之一,是对自身移动会产生的视觉观测变化的预期。这对于空间移动下的任务/动作决策至关重要。 因此,推演和想象是具身智能领域的基础问题之一,表现为预测:如果本体执行移动意图,那么具身观测将会如何变化。 现有世界模型的研究主要聚焦于人形机器人和自动驾驶应用,它们大多在二维平面上操作,动作空间有限。 具体而言,关键挑战包括: 为此,清华大学团队提出 AirScape ,专为六自由度(6DoF)空中具身智能体设计的生成式世界模型。 利用提出的 11k 视频-意图对数据集 ,对视频生成基础模型进行监督微调。这一阶段使模型获得对低空动作意图的基本理解和生成能力。 AirScape 能基于当前的低空视觉观测和动作意图,推演未来的序列观测。 项目的数据集和代码已全面开源。 低空世界模型数据集 为支撑低空世界 ...
苏州跑出的这只机器狗,在IROS拿了冠军
具身智能之心· 2025-11-05 00:02
以下文章来源于甲子苏州 ,作者刘杨楠 甲子苏州 . 甲子苏州,甲子光年长三角总部官方账号,由甲子光年与隆湫资本合资成立,立足于中国科技创新前沿阵地,以苏州市场为核心,辐射长三角,借助媒 体、智库与基金的势能,激活长三角科技产业的活力,助力科技落地、产业升级。 作者丨 刘杨楠 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 做皮实耐用、真正能干活的机器狗。 在上周落幕的IROS 2025四足机器人挑战赛上,来自 苏州智身科技 的"钢镚L1"首次参赛便携手曼彻斯特大学一举夺冠。 在历年的比赛中,冠军团队使用的比赛机器大多为海外品牌,如波士顿动力等;2024年的挑战赛中,中国宇树科技首次作为比赛用机获得冠军。 智身科技 此次参赛的"钢镚L1"则创造了新纪录,这场胜利让业界开始关注到这家成立不久却进展颇快的初创企业。 智身科技成立于2023年,曾经历过一次重大战略调整。创立初期,公司对标特斯拉的技术方案,将精力投入重载人形机器人的研发,尝试让机械臂达到40 公 ...
这款平台支持了pi0和pi0.5~
具身智能之心· 2025-11-05 00:02
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 太贵的机械臂买不起,太便宜的又难用、难上手? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完成 算法验证与项目开发。 ✅ 兼容 ROS1 / ROS2,并提供 URDF 模型,仿真与真机无缝切换; ✅ 24小时快速售后响应,遇到问题不卡壳,学习路上有保障! 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发与应用推广。 | 本体重量 | 4.2KG | 额定负载 | 3KG | 自由度 | 6 | | --- | --- | --- | --- | --- | --- | | 工作半径 | 612.5mm | 重复定位精度 | ±0. 1mm | 底座安装尺寸 | 90mm*90mm*M5*4 ...
KAIST团队:基于双流扩散的世界模型增强VLA模型
具身智能之心· 2025-11-05 00:02
Group 1 - The core issue addressed in the article is the limitation of Vision-Language-Action models (VLAs) in modeling the impact of actions on the environment, which affects their generalization and robustness [3][4][8] - The proposed solution is the Dual-Stream Diffusion Framework (DUST), which aims to maintain modality specificity while enabling cross-modal knowledge sharing to resolve the modal conflict in joint predictions [5][10] Group 2 - DUST is built on the foundation of diffusion-based VLA designs, focusing on semantic feature extraction, action diffusion modeling, and a reasoning process that avoids pixel-level modeling costs [9][12] - The architecture of DUST includes a multi-modal diffusion Transformer (MMDiT) that separates the processing of action and visual streams while allowing for temporary information exchange through cross-modal attention layers [16][33] Group 3 - Experimental results demonstrate that DUST outperforms state-of-the-art models in both simulated and real-world scenarios, showing an average success rate improvement of 18% over GR00T-N1.5 and 5% over FLARE in simulated environments with 100 demonstrations [20][25] - DUST's ability to utilize unannotated video data for pre-training significantly reduces the reliance on costly robot demonstration data, achieving a 13% higher average success rate compared to GR00T-N1.5 in transfer learning tasks [25][26] Group 4 - The article highlights the importance of asynchronous joint sampling strategies in DUST, which allows for flexible balancing between prediction accuracy and inference speed by adjusting the number of denoising steps for different modalities [18][28] - The necessity of DUST's core components is validated through ablation studies, confirming that the combination of dual-stream architecture and decoupled training is essential for optimal performance [29][30]
宾夕法尼亚大学!MAESTRO:基于VLM的零样本通用机器人框架
具身智能之心· 2025-11-05 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Junyao Shi等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 MAESTRO 是一种以视觉语言模型(VLM)为核心的模块化机器人框架,通过动态组合感知、规划、控制等专用模块,在无需大规模机器人训练数据的情况 下,实现了超越现有视觉语言动作(VLA)模型的零样本操作性能,同时具备可扩展性、可调试性等优势。 论文链接:https://arxiv.org/pdf/2511.00917 核心架构与关键设计 1. 整体框架 MAESTRO 以VLM编码代理为核心,接收语言指令和场景图像后,动态编写代码组合工具模块,形成程序化策略。框架采用闭环交互机制,在执行过程中持续 监控环境反馈,实时调整代码和动作,构成"感知-动作-学习"的自适应循环。 利用VLM已有的强大通用能力,避免对机器人专属数据的依赖; 通过模块化设计整合机器人领域成熟的专用工具,弥补VLM在低级别操作上的不足; 突破传统模 ...
当还在纠结研究方向的时候!别的同学已经CCF-A了......
具身智能之心· 2025-11-04 00:05
Group 1 - The article introduces a new research guidance service focused on embodied intelligence, addressing common challenges faced by newcomers in selecting research topics and methodologies [1][2] - The guidance covers various advanced topics such as multimodal large models, reinforcement learning, and robot simulation, providing tailored one-on-one support [2][3] - The service is backed by a team of experienced mentors from prestigious institutions and leading companies, ensuring high-quality assistance throughout the research process [2][3] Group 2 - The program emphasizes a dual perspective from both industry and academia, aiming not only for publication but also for practical application and value [3] - An introductory offer is available for the first ten inquiries, allowing students to receive personalized mentorship and tailored advice on suitable conferences and journals [4]
Dexmal原力灵机发布实时VLA模型!消费级显卡上完成pi0模型30Hz以上推理
具身智能之心· 2025-11-04 00:05
Core Insights - The article discusses the development of a real-time visual-language-action (VLA) model that achieves a significant reduction in inference time, enabling dynamic tasks such as object grasping to be performed effectively [3][6][23]. Optimization Strategies - The research outlines a comprehensive optimization pipeline that reduces inference time from over 100ms to 27.3ms for a two-view model, achieved through four main steps: eliminating basic overhead, simplifying the computation graph, optimizing kernel depth, and tuning GEMM parameters [7][18][22]. - The first step involves removing CPU overhead by utilizing CUDA Graphs, which reduces inference time from 106.5ms to approximately 53.9ms [9][10]. - The second step simplifies the computation graph, further reducing inference time to about 45.8ms [12][14]. - The third step focuses on optimizing kernel depth, which includes techniques like weight folding and merging operations to enhance performance [15][18]. Performance Validation - The article employs the roofline model to assess the theoretical lower bound of performance, indicating that the actual inference time of 27.3ms is only 30% higher than the theoretical limit of 20.6ms, suggesting that the optimizations are close to hardware limits [20][22]. - The synchronization overhead is also analyzed, showing significant reductions when using optimized methods compared to naive implementations [21][24]. Real-World Application - A real-world experiment involving the grasping of a falling pen demonstrates the model's effectiveness, achieving a 100% success rate in trials, which highlights the model's capability to meet stringent timing constraints [36][37]. - The framework allows for high-frequency control, with the potential to run 30 VLA models at 30Hz and 480 action experts at 480Hz, showcasing its applicability in dynamic robotic tasks [31][32]. Future Directions - The article suggests future research directions, including exploring larger model sizes and finer-grained feedback loops to enhance performance and adaptability in real-time applications [37].
突发!arXiv CS新规:未经同行评审,一律不收
具身智能之心· 2025-11-04 00:05
arXiv重磅新规! 从现在起,arXiv中的CS板块,关于「综述/调研」和「立场」类的论文,全部经由同行评审后,才可以被收录。 也就是说,以后不带「同行评审通行证」,就别想上车! 消息一出,一度登上HK热榜TOP3。 编辑丨新智元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 而现在,LLM生成论文激增,各大顶会/期刊全都蚌埠住了,更别提arXiv。 如今,arXiv CS每月数百篇综述涌进来,90%都是「带注释的文献清单」,基本没有实质性的价值。 由此,arXiv官方决定收紧CS这类论文的把关。 接下来,「综述」和「立场」论文已被期刊、顶会接收,并完成同行评审后,就可以收录到arXiv CS。 提交时,作者需同时提供经同行评审的期刊引用,以及DOI元数据。 MIT EECS副教授Phillip Isola认为,这一步走错了方向,同行评审有各种途径。 arXiv就该保持「科研界GitHub」的定位,没必要搞成学术期刊的样子。 过去,这类论文本来就不 ...
向黄仁勋汇报的英伟达36人
具身智能之心· 2025-11-04 00:05
Core Insights - The article discusses the organizational structure and strategic focus of NVIDIA under CEO Jensen Huang, highlighting the importance of hardware and AI technologies in the company's growth strategy [6][8][10]. Group 1: Organizational Structure - Jensen Huang has 36 direct reports, which is a significant number for a CEO of a $4 trillion company [74]. - The direct reports are divided into seven functional areas: strategy, hardware, software, AI, public relations, networking, and Huang's executive assistant [3][4]. - Huang's management style emphasizes a flat organizational structure to enhance information flow and decision-making speed [80][81]. Group 2: Focus on Hardware and AI - Hardware remains the cornerstone of NVIDIA, with one-third of Huang's direct reports focused on hardware-related business [7]. - AI and emerging technologies are becoming the second pillar of Huang's business strategy, with a dedicated team working on these areas [8][10]. - The company is exploring new markets, referred to as "zero billion markets," indicating a focus on untapped opportunities [10]. Group 3: Key Personnel - Key figures in Huang's team include Jonah Alben, Dwight Diercks, and Bill Dally, who have been with the company for decades and play crucial roles in GPU architecture and software development [21][32][42]. - New addition Wu Xinzhou, responsible for automotive business strategy, brings significant experience from Qualcomm and XPeng Motors, indicating a strategic push into the automotive sector [56][59][71]. Group 4: Financial Performance - NVIDIA's net profit surged to approximately $29.5 billion in the 2024 fiscal year, a nearly 600% increase year-over-year [98]. - The company's workforce grew from 29,600 to 36,000 employees within a year, marking a 21.62% increase [100]. - The automotive business revenue is projected to nearly double from $281 million to $567 million in the 2024-2025 fiscal year [71].