具身智能之心

Search documents
合伙人招募,和具身智能之心一起共建平台和社区吧~
具身智能之心· 2025-09-15 05:00
一起承接B端和C端在具身数据、本体、算法和部署等方面的咨询,助力产业升级转型、促进行业人才发 展。 在企业就职的同学也不用担心啦,我们将充分保护个人隐私。 课程开发 转眼到下半年,总感觉今年的规划完不成了,和年初的预期不太一样,因为真的有太多事情值得去做了。 具身这个领域的爆发有点迅速和集中,很多业务的需求非常大,特别是各类咨询和高校的课程&学科共 建。 一个社区的运营,离不开大家的鼎力支持,具身智能之心期望能够在这波的激流中贡献自己的力量,而不 是仅仅局限于媒体身份。我们致力于成为一个真的能给行业带来价值的平台。 我们真诚邀请那些对具身领域产生影响力的大佬。和我们一起在开源项目复现、咨询服务、课程研发、学 科共建、硬件研发等多个方向展开合作。 合作内容 开源项目 和具身智能之心一起搭建具备全球影响力的开源项目。 咨询服务 待遇说明 我们提供行业有竞争力的报酬(详细内容欢迎私聊),同时您也将拥有我们的行业资源。 联系我们 感兴趣的小伙伴欢迎添加微信oooops-life做进一步咨询。 和我们一起搭建能让更多初学者受益的课程,推动行业向前发展。包括C端、企业培训、高校学科建设。 硬件研发 和我们一起搭建好用、性 ...
具身智能开源周:上海AI实验室加速助力机器人训练及应用
具身智能之心· 2025-09-15 00:04
作者丨上海人工智能实验室 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 7月,上海人工智能实验室(上海AI实验室)开源了 『书生』具身全栈引擎Intern-Robotics ,通过仿真、数据、训测三大引擎,推动具身大脑从 "碎片化开发" 迈向 "全栈化量产"时代。截止目前,相关模型和数据集下载量超14万次。 围绕具身智能领域的实际发展需求,上海AI实验室基于Intern-Robotics,进一步推出一系列技术新进展,涵盖导航、操作、人形机器人运动大模 型,以及数据集和评测等主要方向。9月14日起,上述新进展将集中开源,助力机器人行业破解从训练到应用场景落地核心难题。 期间,上海AI实验室将联合多个行业专业机构,在 9月17日、19日 推出 两场专题直播 ,帮助大家更好地理解并熟练运用相关技术。 导航大模型 端到端双系统导航大模型InternVLA N1 ——长程空间推理与敏捷执行的有机融合 基于Intern-Robotics构建,实现了高层远距离目标 ...
正式开课!具身大脑和小脑算法与实战教程来啦
具身智能之心· 2025-09-15 00:04
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3] - The development of embodied intelligence technology has evolved through various stages, from low-level perception to high-level task understanding and generalization [6][14] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a comprehensive ecosystem for embodied intelligence, while international players like Tesla and investment firms support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several phases: - The first phase focused on grasp pose detection, which lacked the ability to model task context and action sequences [6] - The second phase introduced behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6] - The third phase, emerging in 2023, utilized Diffusion Policy methods to enhance stability and generalization by modeling action trajectories [6][7] - The fourth phase, starting in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome current limitations [9][11][12] Educational Initiatives - The demand for engineering and system capabilities in embodied intelligence is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [17] - A comprehensive curriculum has been developed to cover various aspects of embodied intelligence, including practical applications and advanced topics, aimed at both beginners and advanced learners [14][20]
清华、上海AI Lab等顶级团队发布推理模型RL超全综述
具身智能之心· 2025-09-15 00:04
Core Viewpoint - The article discusses the significant advancements in Reinforcement Learning (RL) for Large Reasoning Models (LRM), emphasizing its potential to enhance reasoning and logical thinking capabilities in AI systems through verifiable reward mechanisms and advanced optimization algorithms [4][8][19]. Group 1: Introduction to RL and LRM - Reinforcement Learning (RL) has been a crucial method in AI development since its introduction by Sutton in 1998, enabling agents to learn in complex environments through clear reward signals [4]. - The emergence of large models has provided a new platform for RL, initially used to align models with human preferences, and now evolving towards enhancing reasoning capabilities [5][6]. Group 2: Recent Trends and Challenges - A new trend is emerging where researchers aim to use RL not just for compliance but to genuinely enhance reasoning abilities in models, leading to the development of LRM systems [5][6]. - Significant challenges remain for the large-scale application of RL in LRM, including reward design, algorithm efficiency, and the need for substantial data and computational resources [6][8]. Group 3: Key Developments and Milestones - The article highlights key milestones in RL applications for LRM, such as OpenAI's o1 and DeepSeek-R1, which demonstrate the effectiveness of RL in achieving long-chain reasoning capabilities through verifiable rewards [13][15]. - The performance of models like o1 improves with additional RL training and increased computational resources during reasoning, indicating a new path for expansion beyond pre-training [13][15]. Group 4: Foundational Components and Problems - The foundational components of RL for LRM include reward design, policy optimization, and sampling strategies, which are essential for enhancing model capabilities [16]. - The article discusses foundational and controversial issues in RL for LRM, such as the role of RL, the comparison between RL and supervised fine-tuning (SFT), and the types of rewards used [16]. Group 5: Training Resources and Applications - Training resources for RL include static corpora, dynamic environments, and infrastructure, which need further standardization and development for effective use [16]. - The applications of RL span various tasks, including coding, agentic tasks, multimodal tasks, and robotics, showcasing its versatility [16][18]. Group 6: Future Directions - Future research directions for RL in LLMs include continual RL, memory-based RL, and model-based RL, aiming to enhance reasoning efficiency and capabilities [18]. - The exploration of new algorithms and mechanisms is crucial for advancing RL's role in achieving Artificial Superintelligence (ASI) [15][19].
SimpleVLA-RL:突破 VLA 模型训练瓶颈,RL实现端到端在线训练
具身智能之心· 2025-09-15 00:04
Core Insights - The article discusses the development of the SimpleVLA-RL framework, which enhances the training of Visual-Language-Action (VLA) models in robotics through reinforcement learning (RL) techniques, addressing the limitations of traditional supervised fine-tuning (SFT) methods [2][4][30] Group 1: Research Background and Challenges - VLA models are crucial for integrating visual perception, language understanding, and action generation in robotic control, but current training methods face significant challenges, including data scarcity and weak generalization capabilities [2][5] - The breakthrough in large reasoning models suggests that RL can improve the sequential action planning capabilities of VLA models, but traditional RL methods are limited by manual reward design and the high cost of environmental interactions [2][5] Group 2: Contributions of SimpleVLA-RL - SimpleVLA-RL is designed specifically for VLA, incorporating interactive trajectory sampling and multi-environment parallel rendering, which significantly reduces training costs and improves scalability [6][9] - The framework has achieved state-of-the-art (SOTA) performance across multiple benchmarks, with notable improvements in success rates, such as LIBERO's average success rate increasing from 91.0% to 99.1% [6][12] - SimpleVLA-RL demonstrates enhanced data efficiency, achieving a LIBERO average success rate of 96.9% with only one demonstration trajectory, surpassing traditional methods [16][17] Group 3: Generalization and Real-World Application - The framework shows robust generalization capabilities across unseen tasks, with significant performance improvements in various scenarios, indicating its ability to learn universal skills rather than overfitting to specific data [22][30] - SimpleVLA-RL has proven effective in sim-to-real transfer, with real-world task success rates improving from 17.5% to 38.5%, validating its deployment capabilities [7][21] Group 4: Key Discoveries - The framework has led to the discovery of the "Pushcut" phenomenon, where the RL-trained model autonomously develops more efficient strategies beyond human demonstrations, showcasing the potential for innovative robotic behaviors [24][30] - The effectiveness of SimpleVLA-RL is contingent on the initial model capabilities, with significant performance enhancements observed when starting from a higher baseline success rate [28][29]
明天开课啦!3个月带你搞透具身大脑+小脑算法
具身智能之心· 2025-09-14 08:00
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3] - The development of embodied intelligence technology has progressed through various stages, from low-level perception to high-level task understanding and generalization [6][14] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a comprehensive ecosystem for embodied intelligence, while international firms like Tesla and investment institutions support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has seen four main stages: 1. Grasp Pose Detection focusing on static object manipulation [6] 2. Behavior Cloning enabling robots to imitate human tasks but facing challenges in generalization [6][7] 3. The introduction of Diffusion Policy methods enhancing stability and generalization through sequence modeling [6][9] 4. The emergence of Vision-Language-Action (VLA) models integrating multimodal capabilities for improved task execution and generalization [7][9] Future Directions - The industry is exploring the integration of VLA models with reinforcement learning, world models, and tactile sensing to overcome current limitations, enhancing robots' capabilities in long-term tasks and complex environments [9][11][12] - The demand for engineering and system capabilities is increasing as embodied intelligence transitions from research to deployment, necessitating advanced training and simulation on platforms like Mujoco and IsaacGym [17]
国内外那些做具身大脑的公司们......
具身智能之心· 2025-09-13 04:03
Core Insights - The article focuses on the emerging field of embodied intelligence, highlighting the development of general-purpose robotic "brain" systems and multi-modal perception-decision systems, which are gaining significant attention from both capital and industry sectors [2][3]. Domestic Companies - **Xinghai Map**: Founded in 2023, focuses on developing a general embodied large model using real-world data to create robots with fine operational capabilities. The company has completed 8 rounds of financing in less than two years. Its representative product, WALL-A model, is set to launch in October 2024 and is claimed to be the largest parameter scale embodied intelligence model globally, integrating visual, language, and motion control signals [6]. - **UBTECH**: Established in 2012, it is a leader in humanoid robot commercialization with comprehensive self-research capabilities. The Thinker model, set to be released in 2025, has achieved top rankings in international benchmark tests, significantly enhancing robots' perception and planning capabilities in complex environments [10]. - **ZhiYuan Robotics**: Founded in February 2023, it aims to create world-class general embodied intelligent robots. Its Genie Operator-1 model, to be released in March 2025, integrates multi-modal large model and mixed expert technologies, improving task success rates by 32% compared to market models [12]. - **Galaxy General**: Established in May 2023, it focuses on multi-modal large models driven by synthetic data. Its VLA model is the first general embodied large model globally, utilizing a "brain + cerebellum" collaborative framework [14]. - **Qianxun Intelligent**: Founded in 2024, it is a leading AI + robotics company with a focus on flexible object manipulation. Its Spirit V1 VLA model is the first to tackle long-range operations of flexible objects [16]. - **Star Motion Era**: A new tech company incubated by Tsinghua University, focusing on general artificial intelligence applications. Its ERA-42 model supports over 100 dynamic tasks through video training [18]. - **Zhujidi Power**: Concentrates on embodied intelligent robots, developing core technologies for hardware design, full-body motion control, and training paradigms [20]. International Companies - **Figure AI**: Focuses on embodied intelligence operation algorithms, enhancing data training and algorithm performance through video generation technology [17]. - **Physical Intelligence**: Founded in January 2023, it aims to develop advanced intelligent software for various robots. Its π0 model, released in October 2024, is a universal robot foundation model [22]. - **Google DeepMind**: Merged with Google Brain in 2023, it focuses on general artificial intelligence research. Its Gemini Robotics model can control robots to perform complex tasks without specialized training [20]. - **Skild AI**: A leading robotics "brain" development company in the US, aiming to create a universal robot operating system that enables intelligent operations across various scenarios [26].
组内没有人做具身,导师让我先去踩坑......
具身智能之心· 2025-09-12 16:03
Core Viewpoint - The article emphasizes the importance of building a solid foundation in hardware and algorithms for embodied intelligence research, particularly for newcomers in the field [1][12]. Group 1: Research Guidance - For teams without prior experience in embodied intelligence, it is recommended to start with simpler tasks using robotic arms before tackling more complex humanoid robots [1]. - Those with a background in large models should focus on specific downstream tasks like VLA (Visual-Language Action) and VLN (Visual-Language Navigation) to bridge the gap between theory and practical applications [1]. - It is advised to solidify knowledge in reinforcement learning before attempting to develop humanoid robots, as this area is still underdeveloped in the domestic market [1]. Group 2: Community and Resources - The "Embodied Intelligence Knowledge Planet" community serves as a comprehensive platform for sharing knowledge, with nearly 2000 members and aims to grow to 10,000 in two years [3][12]. - The community provides various resources, including technical routes, Q&A, and job opportunities, making it a valuable asset for both beginners and advanced researchers [4][13]. - Members can access a wealth of information, including over 30 technical routes, open-source projects, and data sets related to embodied intelligence [12][26]. Group 3: Technical Insights - The community addresses practical issues such as data collection, model deployment, and the challenges of sim-to-real transitions in robotics [4][5]. - It offers insights into various models and frameworks, including VLA and reinforcement learning, and discusses their applications in robotic tasks [5][6]. - The community also organizes forums and live discussions to keep members updated on the latest trends and challenges in the embodied intelligence field [4][11].
当准备开展VLA后,发现真的太难了。。。。。。
具身智能之心· 2025-09-12 12:02
Core Insights - The Vision-Language-Action (VLA) model represents a new paradigm in embodied intelligence, enabling robots to generate executable actions from language instructions and visual signals, thus enhancing their adaptability to complex environments [1][3]. - VLA breaks the traditional single-task limitations, allowing robots to make autonomous decisions in diverse scenarios, which is applicable in manufacturing, logistics, and home services [3]. - The VLA model has become a research hotspot, driving collaboration between academia and industry, with various cutting-edge projects like pi0, RT-2, OpenVLA, QUAR-VLA, and HumanVLA emerging [3][5]. Industry Development - The embodied intelligence sector is experiencing robust growth, with teams like Unitree, Zhiyuan, Xinghaitu, Galaxy General, and Zhujidongli transitioning from laboratories to commercialization [5]. - Major tech companies such as Huawei, JD.com, and Tencent are actively investing in this field, alongside international firms like Tesla and Figure AI [5]. Educational Initiatives - A specialized VLA research guidance course has been launched to assist individuals in quickly entering or transitioning within the VLA research domain, addressing the complexity of the related systems and frameworks [5]. - The course focuses on the perception-cognition-action loop, providing a comprehensive understanding of VLA's theoretical foundations and practical applications [7][8]. Technical Evolution - The course will analyze the technical evolution of the VLA paradigm, from early grasp pose detection to recent advancements like Diffusion Policy and multimodal foundational models [8]. - It will also explore core challenges in embodied intelligence, such as cross-domain generalization and long-term planning, while integrating large language models with robotic control systems [9]. Course Structure and Outcomes - The curriculum emphasizes a full-chain training approach, covering theoretical foundations, simulation environment setup, experimental design, and paper writing [15]. - Students will gain skills in academic research methodologies, including literature review, innovation extraction, and the ability to identify valuable research directions [15]. - The course aims to help students develop their research ideas, conduct preliminary experiments, and produce high-quality academic papers [15].
全新范式!LLaDA-VLA:首个基于大语言扩散模型的VLA模型
具身智能之心· 2025-09-12 00:05
Core Viewpoint - The article discusses the advancements in Vision-Language Models (VLMs) and introduces LLaDA-VLA, the first Vision-Language-Action Model developed using large language diffusion models, which demonstrates superior multi-task performance in robotic action generation [1][5][19]. Group 1: Introduction to LLaDA-VLA - LLaDA-VLA integrates Masked Diffusion Models (MDMs) into robotic action generation, leveraging pre-trained multimodal large language diffusion models for fine-tuning and enabling parallel action trajectory prediction [5][19]. - The model architecture consists of three core modules: a vision encoder for RGB feature extraction, a language diffusion backbone for integrating visual and language information, and a projector for mapping visual features to language token space [10][7]. Group 2: Key Technical Innovations - Two major breakthroughs are highlighted: - Localized Special-token Classification (LSC), which reduces cross-domain transfer difficulty by classifying only action-related special tokens, thus improving training efficiency [8][12]. - Hierarchical Action-Structured Decoding (HAD), which explicitly models hierarchical dependencies between actions, resulting in smoother and more reasonable generated trajectories [9][13]. Group 3: Performance Evaluation - LLaDA-VLA outperforms state-of-the-art methods across various environments, including SimplerEnv, CALVIN, and real robot WidowX, achieving significant improvements in success rates and task completion metrics [4][21]. - In specific task evaluations, LLaDA-VLA achieved an average success rate of 58% across multiple tasks, surpassing previous models [15]. Group 4: Experimental Results - The model demonstrated a notable increase in task completion rates and average task lengths compared to baseline models, validating the effectiveness of the proposed LSC and HAD strategies [18][14]. - In a comparative analysis, LLaDA-VLA achieved a success rate of 95.6% in a specific task, significantly higher than other models [14][18]. Group 5: Research Significance and Future Directions - The introduction of LLaDA-VLA establishes a solid foundation for applying large language diffusion models in robotic operations, paving the way for future research in this domain [19][21]. - The design strategies employed in LLaDA-VLA not only enhance model performance but also open new avenues for exploration in the field of embodied intelligence [19].