Diffusion Policy
Search documents
具身界影响力最大的两位博士创业了!
自动驾驶之心· 2025-11-18 00:05
具身界影响力最大的两位博士创业了! 这两天刷到了sundayrobotics,following的2位大佬也都加入创业了(底部是抱抱脸co-founder的先忽略)。 其中Tony Z.Zhao担任CEO,Cheng Chi担任CTO。 这两位大佬的名字并不陌生,Tony还没完成斯坦福的phd(目前dropout),在校期间参与了ALOHA、 ALOHA2、Mobile ALOHA等一系列很有影响力的工作。Cheng Chi则是UMI和Diffuion Policy的作者。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑 | 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 Tony Z. Zhao 个人主页:https://tonyzhaozh.github.io/ 斯坦福大学计算机科学专业的三年级博士生(dropout),提出ALOHA、ALOHA2、Mobile ALOHA等方 案。 Cheng Chi 个人主页:https://cheng-chi.github.io/ 哥伦比亚大学博士&Student of New Faculty (SNF) a ...
具身界影响力最大的两位博士创业了!
具身智能之心· 2025-11-17 04:00
Core Insights - The article highlights the entrepreneurial ventures of two influential figures in the field of embodied intelligence, Tony Z. Zhao and Cheng Chi, who have recently co-founded a company named Sunday Robotics [2][4]. Group 1: Key Individuals - Tony Z. Zhao is a dropout PhD student from Stanford University, known for his contributions to ALOHA, ALOHA2, and Mobile ALOHA during his academic tenure [4][5]. - Cheng Chi, a PhD from Columbia University and a student of Shuran Song at Stanford, is recognized for his work on Universal Manipulation Interface (UMI) and Diffusion Policy, the latter being a finalist for Best Systems Paper at RSS 2024 [10]. Group 2: Company Overview - Sunday Robotics is the new venture launched by Tony Z. Zhao and Cheng Chi, indicating a significant step in the development of embodied intelligence technologies [2].
如果Policy模型也能动态思考推理,是否能让机器人在真实世界中表现得更好?
具身智能之心· 2025-11-13 02:05
Core Insights - The article introduces EBT-Policy (Energy-Based Transformer Policy), a new strategy architecture based on Energy-Based Models (EBM), which enhances robot performance in real-world scenarios by enabling dynamic reasoning and understanding of uncertainty [2][6]. Group 1: EBT-Policy Overview - EBT-Policy significantly improves training and inference efficiency, showcasing a unique "zero-shot retry" capability [4]. - The model learns an energy value to assess the compatibility between input variables, optimizing the energy landscape during language modeling tasks [5]. - EBT-Policy outperforms traditional Diffusion Policy in both simulated and real-world tasks, reducing computational requirements by up to 50 times [6][18]. Group 2: Key Features and Advantages - The model minimizes energy through multiple forward passes during inference, adjusting computational resources based on problem difficulty [8]. - EBT-Policy's emergent retry behavior allows it to recover from errors by dynamically redirecting itself towards lower energy states [10]. - Compared to Diffusion Policy, EBT-Policy requires only 2 steps for inference, while Diffusion Policy typically requires around 100 steps [11]. Group 3: Performance Metrics - In real-world tasks, EBT-Policy demonstrated superior performance, achieving scores of 86, 75, and 92 in tasks like "Fold Towel," "Collect Pan," and "Pick And Place," respectively, compared to Diffusion Policy's lower scores [17]. - The convergence speed during training improved by approximately 66%, and the model's inference process is significantly more efficient [18]. Group 4: Future Outlook - The research team plans to continue optimizing hyperparameters and model scale, expecting further performance enhancements as more experimental data is collected [22].
NIPS 2025 MARS 多智能体具身智能挑战赛正式启动!
具身智能之心· 2025-08-18 00:07
Core Insights - The article discusses the challenges and advancements in multi-agent embodied intelligence, emphasizing the need for efficient collaboration among robotic systems to tackle complex tasks in real-world environments [3][4]. Group 1: Challenges in Embodied Intelligence - Single intelligent agents are insufficient for complex and dynamic task scenarios, necessitating high-level collaboration among multiple embodied agents [3]. - The MARS Challenge aims to address these challenges by encouraging global researchers to explore high-level planning and low-level control capabilities of multi-agent systems [4]. Group 2: MARS Challenge Overview - The MARS Challenge features two complementary tracks focusing on planning and control, aiming to evaluate the capabilities of intelligent agents in complex tasks [4][12]. - The challenge will culminate in results and awards announced at the NeurIPS 2025 SpaVLE Workshop [4]. Group 3: Track 1 - Multi-Agent Embodied Planning - Track 1 focuses on high-level task planning and role assignment for heterogeneous robots, utilizing the ManiSkill platform and RoboCasa dataset [5][6]. - Participants will use visual language models to select appropriate robot combinations and create high-level action sequences based on natural language instructions [5][8]. Group 4: Track 2 - Multi-Agent Control Strategy Execution - Track 2 emphasizes the collaborative capabilities of multi-agent systems in executing complex tasks, requiring real-time interaction with dynamic environments [12]. - The RoboFactory simulation environment will be used to develop and evaluate cooperative strategies, with participants designing deployable control models [12][13]. Group 5: Timeline and Participation - The challenge timeline includes a warm-up round starting on August 18, 2025, and the official competition beginning on September 1, 2025, concluding on October 31, 2025 [25]. - Participants from various fields such as robotics, computer vision, and natural language processing are encouraged to join and showcase their creativity and technology [26].
VLA之外,具身+VA工作汇总
具身智能之心· 2025-07-14 02:21
Core Insights - The article focuses on advancements in embodied intelligence and robotic manipulation, highlighting various research projects and methodologies aimed at improving robotic capabilities in real-world applications [2][3][4]. Group 1: 2025 Research Initiatives - Numerous projects are outlined for 2025, including "Steering Your Diffusion Policy with Latent Space Reinforcement Learning" and "Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation," which aim to enhance robotic manipulation through advanced learning techniques [2][3]. - The "BEHAVIOR Robot Suite" is designed to streamline real-world whole-body manipulation for everyday household activities, indicating a focus on practical applications of robotics [2]. - "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations" emphasizes the potential for efficient learning methods in robotic training [2][3]. Group 2: Methodologies and Techniques - The article discusses various methodologies such as "Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning" and "Learning the RoPEs: Better 2D and 3D Position Encodings with STRING," which aim to improve the adaptability and efficiency of robotic systems [2][3][4]. - "RoboGrasp: A Universal Grasping Policy for Robust Robotic Control" highlights the development of a versatile grasping policy that can be applied across different robotic platforms [2][3]. - "Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion" showcases advancements in fine motor skills for robots, crucial for complex tasks [4]. Group 3: Future Directions - The research emphasizes the importance of integrating visual and tactile feedback in robotic systems, as seen in projects like "Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation" [7]. - "Zero-Shot Visual Generalization in Robot Manipulation" indicates a trend towards developing robots that can generalize learned skills to new, unseen scenarios without additional training [7]. - The focus on "Human-to-Robot Data Augmentation for Robot Pre-training from Videos" suggests a shift towards leveraging human demonstrations to enhance robotic learning processes [7].