Workflow
具身智能之心
icon
Search documents
合伙人招募,和具身智能之心一起共建平台和社区吧~
具身智能之心· 2025-09-15 05:00
Core Insights - The article emphasizes the rapid growth and demand in the field of embodied intelligence, highlighting the need for collaboration and contribution to the industry [1] - The company aims to establish itself as a valuable platform rather than just a media entity, inviting influential figures in the field to collaborate on various projects [1] Collaboration Areas - **Open Source Projects**: The company seeks to build globally impactful open source projects in collaboration with others in the field [2] - **Consulting Services**: The company aims to provide consulting for both B2B and B2C sectors in areas such as embodied data, ontology, algorithms, and deployment to facilitate industry upgrades and talent development [3] - **Course Development**: The company plans to develop courses that benefit beginners and promote industry advancement, including training for individuals, corporate training, and academic curriculum development [5] - **Hardware Development**: The company intends to create affordable and user-friendly research platforms for developers and beginners in the field [6] Job Requirements - The company is looking for individuals with relevant engineering experience or those holding a PhD or higher, with opportunities for both full-time and part-time positions [7] Compensation and Resources - The company offers competitive industry compensation and access to industry resources for collaborators [8] Contact Information - Interested individuals are encouraged to reach out via WeChat for further inquiries [9]
具身智能开源周:上海AI实验室加速助力机器人训练及应用
具身智能之心· 2025-09-15 00:04
Core Insights - The article discusses the advancements in embodied intelligence technology led by the Shanghai AI Laboratory, particularly focusing on the open-source release of the Intern-Robotics engine, which aims to transition from fragmented development to full-stack production in the field of embodied intelligence [3]. Group 1: Technological Developments - The Shanghai AI Laboratory has launched the Intern-Robotics engine, which includes simulation, data, and training engines, achieving over 140,000 downloads of related models and datasets [3]. - A series of technical advancements have been introduced based on Intern-Robotics, covering navigation, operation, humanoid robot motion models, and datasets, with open-source releases starting from September 14, 2025 [3][4]. Group 2: Navigation Models - The InternVLA N1 navigation model integrates long-range spatial reasoning and agile execution, achieving international leading scores in six mainstream benchmark tests and demonstrating zero-shot generalization across scenes and entities at a continuous reasoning efficiency of 60Hz [6]. Group 3: Operation Models - The InternVLA M1 operation model creates a complete feedback loop covering "perception-planning-action," enhancing spatial reasoning and planning capabilities through a two-stage training strategy [11]. - The InternVLA A1 model focuses on high-dynamic scene multi-robot collaboration, showing superior performance in real-world evaluations compared to existing models [12]. Group 4: Reinforcement Learning - The VLAC model enhances reinforcement learning efficiency in real-world scenarios by providing continuous and reliable supervision signals, effectively distinguishing between normal and abnormal behaviors [13]. Group 5: Datasets and Evaluation - The InternScenes dataset contains approximately 40,000 indoor scenes and 1.96 million 3D object data, significantly surpassing existing datasets, providing a solid foundation for research in embodied and spatial intelligence [15]. - The OmniWorld dataset integrates large-scale synthetic data and multi-source heterogeneous data, facilitating advancements in world modeling and significantly improving performance in tasks like reconstruction and rendering [16]. Group 6: Community and Support - The article promotes the establishment of a community focused on embodied intelligence, offering resources for academic support, technical discussions, and collaboration opportunities among developers and researchers in the field [20][21].
正式开课!具身大脑和小脑算法与实战教程来啦
具身智能之心· 2025-09-15 00:04
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3] - The development of embodied intelligence technology has evolved through various stages, from low-level perception to high-level task understanding and generalization [6][14] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a comprehensive ecosystem for embodied intelligence, while international players like Tesla and investment firms support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several phases: - The first phase focused on grasp pose detection, which lacked the ability to model task context and action sequences [6] - The second phase introduced behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6] - The third phase, emerging in 2023, utilized Diffusion Policy methods to enhance stability and generalization by modeling action trajectories [6][7] - The fourth phase, starting in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome current limitations [9][11][12] Educational Initiatives - The demand for engineering and system capabilities in embodied intelligence is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [17] - A comprehensive curriculum has been developed to cover various aspects of embodied intelligence, including practical applications and advanced topics, aimed at both beginners and advanced learners [14][20]
清华、上海AI Lab等顶级团队发布推理模型RL超全综述
具身智能之心· 2025-09-15 00:04
Core Viewpoint - The article discusses the significant advancements in Reinforcement Learning (RL) for Large Reasoning Models (LRM), emphasizing its potential to enhance reasoning and logical thinking capabilities in AI systems through verifiable reward mechanisms and advanced optimization algorithms [4][8][19]. Group 1: Introduction to RL and LRM - Reinforcement Learning (RL) has been a crucial method in AI development since its introduction by Sutton in 1998, enabling agents to learn in complex environments through clear reward signals [4]. - The emergence of large models has provided a new platform for RL, initially used to align models with human preferences, and now evolving towards enhancing reasoning capabilities [5][6]. Group 2: Recent Trends and Challenges - A new trend is emerging where researchers aim to use RL not just for compliance but to genuinely enhance reasoning abilities in models, leading to the development of LRM systems [5][6]. - Significant challenges remain for the large-scale application of RL in LRM, including reward design, algorithm efficiency, and the need for substantial data and computational resources [6][8]. Group 3: Key Developments and Milestones - The article highlights key milestones in RL applications for LRM, such as OpenAI's o1 and DeepSeek-R1, which demonstrate the effectiveness of RL in achieving long-chain reasoning capabilities through verifiable rewards [13][15]. - The performance of models like o1 improves with additional RL training and increased computational resources during reasoning, indicating a new path for expansion beyond pre-training [13][15]. Group 4: Foundational Components and Problems - The foundational components of RL for LRM include reward design, policy optimization, and sampling strategies, which are essential for enhancing model capabilities [16]. - The article discusses foundational and controversial issues in RL for LRM, such as the role of RL, the comparison between RL and supervised fine-tuning (SFT), and the types of rewards used [16]. Group 5: Training Resources and Applications - Training resources for RL include static corpora, dynamic environments, and infrastructure, which need further standardization and development for effective use [16]. - The applications of RL span various tasks, including coding, agentic tasks, multimodal tasks, and robotics, showcasing its versatility [16][18]. Group 6: Future Directions - Future research directions for RL in LLMs include continual RL, memory-based RL, and model-based RL, aiming to enhance reasoning efficiency and capabilities [18]. - The exploration of new algorithms and mechanisms is crucial for advancing RL's role in achieving Artificial Superintelligence (ASI) [15][19].
SimpleVLA-RL:突破 VLA 模型训练瓶颈,RL实现端到端在线训练
具身智能之心· 2025-09-15 00:04
Core Insights - The article discusses the development of the SimpleVLA-RL framework, which enhances the training of Visual-Language-Action (VLA) models in robotics through reinforcement learning (RL) techniques, addressing the limitations of traditional supervised fine-tuning (SFT) methods [2][4][30] Group 1: Research Background and Challenges - VLA models are crucial for integrating visual perception, language understanding, and action generation in robotic control, but current training methods face significant challenges, including data scarcity and weak generalization capabilities [2][5] - The breakthrough in large reasoning models suggests that RL can improve the sequential action planning capabilities of VLA models, but traditional RL methods are limited by manual reward design and the high cost of environmental interactions [2][5] Group 2: Contributions of SimpleVLA-RL - SimpleVLA-RL is designed specifically for VLA, incorporating interactive trajectory sampling and multi-environment parallel rendering, which significantly reduces training costs and improves scalability [6][9] - The framework has achieved state-of-the-art (SOTA) performance across multiple benchmarks, with notable improvements in success rates, such as LIBERO's average success rate increasing from 91.0% to 99.1% [6][12] - SimpleVLA-RL demonstrates enhanced data efficiency, achieving a LIBERO average success rate of 96.9% with only one demonstration trajectory, surpassing traditional methods [16][17] Group 3: Generalization and Real-World Application - The framework shows robust generalization capabilities across unseen tasks, with significant performance improvements in various scenarios, indicating its ability to learn universal skills rather than overfitting to specific data [22][30] - SimpleVLA-RL has proven effective in sim-to-real transfer, with real-world task success rates improving from 17.5% to 38.5%, validating its deployment capabilities [7][21] Group 4: Key Discoveries - The framework has led to the discovery of the "Pushcut" phenomenon, where the RL-trained model autonomously develops more efficient strategies beyond human demonstrations, showcasing the potential for innovative robotic behaviors [24][30] - The effectiveness of SimpleVLA-RL is contingent on the initial model capabilities, with significant performance enhancements observed when starting from a higher baseline success rate [28][29]
明天开课啦!3个月带你搞透具身大脑+小脑算法
具身智能之心· 2025-09-14 08:00
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3] - The development of embodied intelligence technology has progressed through various stages, from low-level perception to high-level task understanding and generalization [6][14] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a comprehensive ecosystem for embodied intelligence, while international firms like Tesla and investment institutions support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has seen four main stages: 1. Grasp Pose Detection focusing on static object manipulation [6] 2. Behavior Cloning enabling robots to imitate human tasks but facing challenges in generalization [6][7] 3. The introduction of Diffusion Policy methods enhancing stability and generalization through sequence modeling [6][9] 4. The emergence of Vision-Language-Action (VLA) models integrating multimodal capabilities for improved task execution and generalization [7][9] Future Directions - The industry is exploring the integration of VLA models with reinforcement learning, world models, and tactile sensing to overcome current limitations, enhancing robots' capabilities in long-term tasks and complex environments [9][11][12] - The demand for engineering and system capabilities is increasing as embodied intelligence transitions from research to deployment, necessitating advanced training and simulation on platforms like Mujoco and IsaacGym [17]
国内外那些做具身大脑的公司们......
具身智能之心· 2025-09-13 04:03
Core Insights - The article focuses on the emerging field of embodied intelligence, highlighting the development of general-purpose robotic "brain" systems and multi-modal perception-decision systems, which are gaining significant attention from both capital and industry sectors [2][3]. Domestic Companies - **Xinghai Map**: Founded in 2023, focuses on developing a general embodied large model using real-world data to create robots with fine operational capabilities. The company has completed 8 rounds of financing in less than two years. Its representative product, WALL-A model, is set to launch in October 2024 and is claimed to be the largest parameter scale embodied intelligence model globally, integrating visual, language, and motion control signals [6]. - **UBTECH**: Established in 2012, it is a leader in humanoid robot commercialization with comprehensive self-research capabilities. The Thinker model, set to be released in 2025, has achieved top rankings in international benchmark tests, significantly enhancing robots' perception and planning capabilities in complex environments [10]. - **ZhiYuan Robotics**: Founded in February 2023, it aims to create world-class general embodied intelligent robots. Its Genie Operator-1 model, to be released in March 2025, integrates multi-modal large model and mixed expert technologies, improving task success rates by 32% compared to market models [12]. - **Galaxy General**: Established in May 2023, it focuses on multi-modal large models driven by synthetic data. Its VLA model is the first general embodied large model globally, utilizing a "brain + cerebellum" collaborative framework [14]. - **Qianxun Intelligent**: Founded in 2024, it is a leading AI + robotics company with a focus on flexible object manipulation. Its Spirit V1 VLA model is the first to tackle long-range operations of flexible objects [16]. - **Star Motion Era**: A new tech company incubated by Tsinghua University, focusing on general artificial intelligence applications. Its ERA-42 model supports over 100 dynamic tasks through video training [18]. - **Zhujidi Power**: Concentrates on embodied intelligent robots, developing core technologies for hardware design, full-body motion control, and training paradigms [20]. International Companies - **Figure AI**: Focuses on embodied intelligence operation algorithms, enhancing data training and algorithm performance through video generation technology [17]. - **Physical Intelligence**: Founded in January 2023, it aims to develop advanced intelligent software for various robots. Its π0 model, released in October 2024, is a universal robot foundation model [22]. - **Google DeepMind**: Merged with Google Brain in 2023, it focuses on general artificial intelligence research. Its Gemini Robotics model can control robots to perform complex tasks without specialized training [20]. - **Skild AI**: A leading robotics "brain" development company in the US, aiming to create a universal robot operating system that enables intelligent operations across various scenarios [26].
组内没有人做具身,导师让我先去踩坑......
具身智能之心· 2025-09-12 16:03
Core Viewpoint - The article emphasizes the importance of building a solid foundation in hardware and algorithms for embodied intelligence research, particularly for newcomers in the field [1][12]. Group 1: Research Guidance - For teams without prior experience in embodied intelligence, it is recommended to start with simpler tasks using robotic arms before tackling more complex humanoid robots [1]. - Those with a background in large models should focus on specific downstream tasks like VLA (Visual-Language Action) and VLN (Visual-Language Navigation) to bridge the gap between theory and practical applications [1]. - It is advised to solidify knowledge in reinforcement learning before attempting to develop humanoid robots, as this area is still underdeveloped in the domestic market [1]. Group 2: Community and Resources - The "Embodied Intelligence Knowledge Planet" community serves as a comprehensive platform for sharing knowledge, with nearly 2000 members and aims to grow to 10,000 in two years [3][12]. - The community provides various resources, including technical routes, Q&A, and job opportunities, making it a valuable asset for both beginners and advanced researchers [4][13]. - Members can access a wealth of information, including over 30 technical routes, open-source projects, and data sets related to embodied intelligence [12][26]. Group 3: Technical Insights - The community addresses practical issues such as data collection, model deployment, and the challenges of sim-to-real transitions in robotics [4][5]. - It offers insights into various models and frameworks, including VLA and reinforcement learning, and discusses their applications in robotic tasks [5][6]. - The community also organizes forums and live discussions to keep members updated on the latest trends and challenges in the embodied intelligence field [4][11].
当准备开展VLA后,发现真的太难了。。。。。。
具身智能之心· 2025-09-12 12:02
Core Insights - The Vision-Language-Action (VLA) model represents a new paradigm in embodied intelligence, enabling robots to generate executable actions from language instructions and visual signals, thus enhancing their adaptability to complex environments [1][3]. - VLA breaks the traditional single-task limitations, allowing robots to make autonomous decisions in diverse scenarios, which is applicable in manufacturing, logistics, and home services [3]. - The VLA model has become a research hotspot, driving collaboration between academia and industry, with various cutting-edge projects like pi0, RT-2, OpenVLA, QUAR-VLA, and HumanVLA emerging [3][5]. Industry Development - The embodied intelligence sector is experiencing robust growth, with teams like Unitree, Zhiyuan, Xinghaitu, Galaxy General, and Zhujidongli transitioning from laboratories to commercialization [5]. - Major tech companies such as Huawei, JD.com, and Tencent are actively investing in this field, alongside international firms like Tesla and Figure AI [5]. Educational Initiatives - A specialized VLA research guidance course has been launched to assist individuals in quickly entering or transitioning within the VLA research domain, addressing the complexity of the related systems and frameworks [5]. - The course focuses on the perception-cognition-action loop, providing a comprehensive understanding of VLA's theoretical foundations and practical applications [7][8]. Technical Evolution - The course will analyze the technical evolution of the VLA paradigm, from early grasp pose detection to recent advancements like Diffusion Policy and multimodal foundational models [8]. - It will also explore core challenges in embodied intelligence, such as cross-domain generalization and long-term planning, while integrating large language models with robotic control systems [9]. Course Structure and Outcomes - The curriculum emphasizes a full-chain training approach, covering theoretical foundations, simulation environment setup, experimental design, and paper writing [15]. - Students will gain skills in academic research methodologies, including literature review, innovation extraction, and the ability to identify valuable research directions [15]. - The course aims to help students develop their research ideas, conduct preliminary experiments, and produce high-quality academic papers [15].
全新范式!LLaDA-VLA:首个基于大语言扩散模型的VLA模型
具身智能之心· 2025-09-12 00:05
Core Viewpoint - The article discusses the advancements in Vision-Language Models (VLMs) and introduces LLaDA-VLA, the first Vision-Language-Action Model developed using large language diffusion models, which demonstrates superior multi-task performance in robotic action generation [1][5][19]. Group 1: Introduction to LLaDA-VLA - LLaDA-VLA integrates Masked Diffusion Models (MDMs) into robotic action generation, leveraging pre-trained multimodal large language diffusion models for fine-tuning and enabling parallel action trajectory prediction [5][19]. - The model architecture consists of three core modules: a vision encoder for RGB feature extraction, a language diffusion backbone for integrating visual and language information, and a projector for mapping visual features to language token space [10][7]. Group 2: Key Technical Innovations - Two major breakthroughs are highlighted: - Localized Special-token Classification (LSC), which reduces cross-domain transfer difficulty by classifying only action-related special tokens, thus improving training efficiency [8][12]. - Hierarchical Action-Structured Decoding (HAD), which explicitly models hierarchical dependencies between actions, resulting in smoother and more reasonable generated trajectories [9][13]. Group 3: Performance Evaluation - LLaDA-VLA outperforms state-of-the-art methods across various environments, including SimplerEnv, CALVIN, and real robot WidowX, achieving significant improvements in success rates and task completion metrics [4][21]. - In specific task evaluations, LLaDA-VLA achieved an average success rate of 58% across multiple tasks, surpassing previous models [15]. Group 4: Experimental Results - The model demonstrated a notable increase in task completion rates and average task lengths compared to baseline models, validating the effectiveness of the proposed LSC and HAD strategies [18][14]. - In a comparative analysis, LLaDA-VLA achieved a success rate of 95.6% in a specific task, significantly higher than other models [14][18]. Group 5: Research Significance and Future Directions - The introduction of LLaDA-VLA establishes a solid foundation for applying large language diffusion models in robotic operations, paving the way for future research in this domain [19][21]. - The design strategies employed in LLaDA-VLA not only enhance model performance but also open new avenues for exploration in the field of embodied intelligence [19].