Workflow
具身智能之心
icon
Search documents
清华大学最新!πRL:用在线强化学习让机器人 “边学边做” 的通用方案
具身智能之心· 2025-11-03 00:03
Core Insights - The article discusses the breakthrough in adapting Reinforcement Learning (RL) for flow-based Vision-Language-Action (VLA) models, overcoming the limitations of traditional supervised fine-tuning (SFT) and existing RL approaches [1][3][30] Group 1: Challenges in Current VLA Model Training - Current VLA model training faces a dilemma: SFT relies on large expert trajectories, which are costly and have weak generalization, while RL cannot adapt to the core characteristics of flow-based models [3][4] - The core issue is the fundamental barrier in RL adaptation for flow-based VLA models, primarily due to the difficulty in calculating action log-likelihood during the denoising process [4][5] Group 2: Innovative Solutions Proposed - A new framework using "Flow-Noise and Flow-SDE dual algorithms + parallel simulation training" has been proposed to address the RL adaptation challenges for flow-based VLA models [1][5] - The Flow-Noise algorithm introduces a learnable noise network to optimize the denoising process, while Flow-SDE converts deterministic ODE denoising into stochastic SDE to balance exploration and efficiency [7][9] Group 3: Performance Improvements - The proposed methods have shown significant performance improvements in multi-task benchmark tests, achieving near-perfect scores and breaking through the SFT bottleneck [15][16] - In the LIBERO benchmark, the Flow-Noise and Flow-SDE models achieved average scores of 97.6% and 96.1% respectively, significantly outperforming traditional SFT methods [16][18] Group 4: Large-Scale Adaptation and Training - The framework supports large-scale multi-task optimization, demonstrated by the ability to handle 4,352 task combinations in the ManiSkill benchmark while maintaining performance advantages [20][22] - The use of 320 parallel environments for training significantly reduces data transmission delays and enhances optimization efficiency [17][22] Group 5: Future Directions - Future research will focus on optimizing noise injection strategies, improving out-of-distribution (OOD) generalization, and validating the framework's adaptability in real-world robotic applications [29][30] - The integration of multi-modal observations, such as tactile and force feedback, is also suggested to enhance robustness in complex scenarios [29][30]
招募世界模型&人形运控&数采相关的合作伙伴!
具身智能之心· 2025-11-02 04:00
Group 1 - The article emphasizes the importance of embodied world models, robotic control, and data collection as valuable directions in the industry, despite existing barriers to entry [2] - The company seeks to collaborate with experts in the field to develop courses or practical projects related to these topics, aiming to provide insights for professionals currently working in these areas [2] - Interested parties are encouraged to contact the company for further consultation regarding course design and presentation materials related to embodied world models, control, and data collection [3] Group 2 - The company is looking for individuals engaged in embodied research who have either published a paper in a CCF A-level conference or possess over one year of industry experience [4] - The company offers competitive salaries and resource sharing, with opportunities for part-time involvement for interested candidates [6] - Specific requirements for collaboration are outlined, indicating a focus on expertise and experience in the relevant fields [7]
国产GPU第一股IPO获批,募资80亿!
具身智能之心· 2025-11-01 16:03
Core Viewpoint - The article discusses the rapid progress of Moore Threads, a domestic GPU company, in its IPO process, highlighting its plans to raise 8 billion yuan for research and development, particularly in AI and graphics chips, while also noting significant revenue growth and a shift in business focus towards high-performance AI computing products [2][4][10]. Group 1: IPO Progress - Moore Threads' IPO registration application has been approved by the China Securities Regulatory Commission, marking it as the first domestic GPU company to achieve this milestone [2]. - The company submitted its prospectus on June 30 and received approval in just four months, indicating a swift process [3][17]. - The planned fundraising of 8 billion yuan will primarily support R&D projects, including AI training chips and graphics chips [4][5]. Group 2: Financial Performance - In the first half of the year, Moore Threads reported revenue of 702 million yuan, surpassing its total revenue for the entire year of 2024 [9]. - The net loss for the first half of the year was 271 million yuan, a significant improvement compared to the previous year, with management projecting potential profitability by 2027 [10]. - The company's revenue structure has shifted dramatically, with AI computing products contributing 94.85% of total revenue in the first half of the year, up from 71.44% in 2022 [12][13]. Group 3: Business Focus and Technology - Moore Threads has transitioned its strategic focus from desktop graphics products to high-performance AI computing, which has become the main revenue driver [11][12]. - The company operates on a Fabless model and has developed its own unified system architecture (MUSA) that integrates various computing capabilities into a single chip [21][22]. - The MUSA architecture supports AI acceleration, graphics rendering, and other computational tasks, with the company having launched four generations of GPU chips [24].
边缘设备上高效运行!NanoVLA :保留 VLA 模型的精度与泛化能力,推理速度提升 52 倍
具身智能之心· 2025-11-01 16:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Jiahong Chen等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在机器人操控领域," 通用化 " 与 " 轻量化 " 的矛盾长期制约着技术落地——现有视觉-语言-动作(VLA)模型虽能实现复杂任务推理,但因参数量庞大、计算需求 高,难以部署在移动机器人、嵌入式系统(如 Jetson Orin Nano)等资源受限设备上。 而由英属哥伦比亚大学、阿尔伯塔大学与小米汽车团队联合提出的 NanoVLA ,用 " 视觉-语言解耦融合+长短动作分块+动态路由 " 的创新架构,彻底打破这一困 境:既保留通用 VLA 模型的任务精度与泛化能力,又将推理速度提升 52 倍、参数量压缩 98%,首次实现 "在边缘设备上高效运行通用机器人策略" 的目标。 为什么要重构 VLA 模型的边缘部署逻辑? 当前主流 VLA 模型陷入 "性能与效率不可兼得" 的困境:为实现跨任务泛化,模型通常依赖数十亿参 ...
单张4090跑到30fps,范浩强团队让VLA实时跑起来了
具身智能之心· 2025-11-01 16:03
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 具体而言,对于常用的 Pi0 级别的模型(30 亿参数),在单张消费级显卡 RTX 4090 上最快可以跑到 30fps。这和大家对于 VLA 模型动辄要几十甚至上百毫秒的 刻板印象形成鲜明对比。 为实现这点,研究者深入分析 Pi0 的模型结构,通过一系列优化把用时从开始的 100+ ms 进行数倍缩减(针对双视角,甚至已经达到 27ms),显著强于 openpi 里采用的基于 jax 的自动优化的结果。 此外,研究者基于现有结果探讨了未来的"实时"运行的 VLA 结构,设计出一个有潜力最高实现 480Hz 闭环控制的算法框架。目前,优化后的代码已在 GitHub 开 源,全部实现均打包为一个只依赖于 torch 和 triton 的单一文件,大家可在自己的项目里 "开箱即用"。这是 Dexmal 原力灵机 继 开源一站式 VLA 工具箱 Dexbotic 之后的又一开源代码工作。 解决什么痛点? 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有 ...
走路、擦板、端盘都不抖!SEEC框架:人形机器人也懂"物理补偿"
具身智能之心· 2025-11-01 16:03
具身智能研究室 . 我们是一群AI探险家,聚焦智能体与具身智能的知识分享。在这里,您将获得:✓ 精选论文解读 ✓ 核心算法抽丝剥茧 ✓ 前沿技术动态速递 。期待与每 一位好奇的您,共同构建AI的未来图景。 作者丨 小智 编辑丨具身智能实验室 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 以下文章来源于具身智能研究室 ,作者小智 核心思路与创新点 小编观点 SEEC 框架用一次"模型增强残差学习",让机器人在走路、擦板、端盘时都能稳如老狗。 SEEC 不是去硬控姿态,而是让上肢策略学会 自动抵消下肢扰动 。把模型推导的加速度补偿信号蒸馏进 RL 策 略,让"学习"与"物理"共同决策,从而在真实 Booster-T1 上实现 零额外训练的稳健上肢控制 项目主页: https://zhuoheng0910.github.io/seec-humanoid.github.io/ 论文核心思路 SEEC 框架将人形机器人控制器解耦为上下两层: 下层: 负责步态稳 ...
社区准备做一些访谈了,关于求职,读博/转方向......
具身智能之心· 2025-11-01 05:40
Core Insights - The article emphasizes the growing opportunities in the embodied intelligence sector, highlighting an increase in funding and job openings compared to the previous year [1][2] - The community is preparing interviews with industry leaders to provide insights on job hunting and research advice for newcomers [1][2] Group 1: Community Engagement - The community is organizing interviews with experienced professionals to share their career paths and insights into the industry [1] - There is a focus on creating a closed-loop system for sharing knowledge across various fields, including industry, academia, and job opportunities [2][5] - The community has established a referral mechanism for job placements with various companies in the embodied intelligence sector [11] Group 2: Educational Resources - A comprehensive technical roadmap has been developed for beginners, outlining essential skills and knowledge areas [7] - The community has compiled numerous open-source projects and datasets relevant to embodied intelligence, facilitating quick access for newcomers [12][26] - Various learning paths have been organized, covering topics such as reinforcement learning, multi-modal models, and robotic navigation [12][40] Group 3: Industry Insights - The community is hosting roundtable discussions and live streams to address ongoing challenges and developments in the embodied intelligence industry [5] - A collection of industry reports and research papers has been compiled to keep members informed about the latest advancements and applications [19] - The community includes members from renowned universities and leading companies in the field, fostering a rich environment for knowledge exchange [11][15]
招募VLA+RL方向的合伙人!
具身智能之心· 2025-10-31 04:00
Core Insights - The article discusses the recruitment of a lecturer for an online course focused on VLA (Vision-Language Alignment) and RL (Reinforcement Learning) [1][2] - The community aims to enhance understanding and knowledge sharing in the field of embodied intelligence, specifically in VLA and RL [3] Recruitment Requirements - Candidates should have a research background in VLA and RL, preferably holding a PhD or being a doctoral student, with publications in top conferences [2] - Practical experience in the industry, including hands-on debugging with real machines, is also desired [2] Community Overview - The company, "Embodied Intelligence Heart," is identified as the first comprehensive technical exchange community in China, focusing on VLA and RL [3] - The community has attracted a significant number of individuals interested in these research areas [3] Compensation and Resources - The company offers compensation that is above the industry average, along with access to extensive industry resources [4]
再创历史!英伟达市值一夜突破5万亿美元!
具身智能之心· 2025-10-31 00:04
Core Viewpoint - Nvidia has become the first company in history to surpass a market valuation of $5 trillion, marking a significant milestone in the tech industry [2][4][11]. Group 1: Market Performance - On October 29, Nvidia's stock price rose by 5.44%, reaching an intraday high of $212.19 per share, and closing at $207.04 per share, resulting in a market capitalization of $5.03 trillion [3][11]. - Since the beginning of 2025, Nvidia's stock has surged by 56%, showcasing its rapid growth compared to other major tech companies [6][40]. - Nvidia's market value now exceeds the combined market capitalizations of major competitors such as AMD, Intel, and Qualcomm, as well as entire sectors within the S&P 500 [6][7]. Group 2: Growth Trajectory - Nvidia's market value has skyrocketed from $1 trillion to $5 trillion in just two and a half years, a feat unmatched by other tech giants [10][24]. - The company achieved its first $1 trillion valuation in May 2023, followed by reaching $3 trillion in June 2024, and then $4 trillion in just over a year [23][24]. - In contrast, Microsoft took nearly six years to grow from $1 trillion to $4 trillion, while Apple took over seven years for the same growth [17][20]. Group 3: Strategic Developments - The recent surge in Nvidia's market value is attributed to announcements made during the GTC developer conference, where CEO Jensen Huang unveiled several technological advancements and partnerships [26][40]. - Key highlights from the conference included plans to collaborate with the U.S. Department of Energy to build new supercomputers and the introduction of the Blackwell chip series, which is expected to significantly increase production [27][28]. - Nvidia's new open system architecture, Nvidia NVQLink, aims to accelerate the development of quantum supercomputers, further positioning the company at the forefront of technological innovation [29][32]. Group 4: Future Outlook - Nvidia anticipates that the cumulative revenue from its upcoming products, including the Blackwell and Rubin chip platforms, could reach $500 billion by the end of next year [32][34]. - The company is also set to invest up to $100 billion in building AI data centers in collaboration with OpenAI, indicating a strong commitment to expanding its AI infrastructure [40][41]. - Nvidia's growth is closely tied to the increasing demand for computational power driven by AI advancements, with its GPUs being integral to the infrastructure of leading AI companies [40].
OmniDexGrasp 揭秘:基础模型 + 力反馈,让机器人 “看懂指令、灵活抓握” 的通用方案
具身智能之心· 2025-10-31 00:04
Core Insights - The article discusses the OmniDexGrasp framework, which addresses the challenges of dexterous grasping in robotics by combining foundation models with force feedback control to achieve generalizable and physically feasible grasping [1][2][21]. Group 1: Challenges in Dexterous Grasping - Current dexterous grasping solutions face a dilemma between data-driven approaches, which struggle with generalization due to limited datasets, and foundation models, which often fail to translate abstract knowledge into physical actions [2]. - The core issue is the inability to balance generalization and physical feasibility, leading to failures in grasping new objects or in complex scenarios [2]. Group 2: OmniDexGrasp Framework - OmniDexGrasp employs a three-stage approach: generating human grasping images, action transfer to robots, and force feedback control, effectively bridging the gap between abstract knowledge and physical execution [4][21]. - The framework retains the generalization capabilities of foundation models while ensuring physical feasibility through precise action transformation and control strategies [4]. Group 3: Key Modules of OmniDexGrasp - **Module 1**: Generates human grasping images to help robots understand how to grasp objects, utilizing a variety of input designs to accommodate different user needs [6][8]. - **Module 2**: Translates human grasping images into robot actions, addressing the challenge of aligning human intent with robotic capabilities through a three-step transfer strategy [9][12]. - **Module 3**: Implements force feedback control to ensure stable and safe grasping, adapting to the physical properties of objects and preventing damage during the grasping process [12][13]. Group 4: Experimental Results - OmniDexGrasp demonstrated an average success rate of 87.9% across six core grasping tasks, significantly outperforming traditional methods [15]. - In comparative tests, OmniDexGrasp showed superior generalization, especially with new objects, achieving success rates that far exceeded those of existing solutions [16][18]. Group 5: Future Directions - The framework suggests future enhancements through multi-modal observation integration and deeper control task development, aiming for end-to-end general manipulation capabilities [22]. - The potential for OmniDexGrasp to extend beyond grasping to broader manipulation tasks is highlighted, indicating its versatility in robotic applications [20].