具身智能之心 - filings, earnings calls, financial reports, news

具身智能之心

Search documents

模拟大脑功能分化！Fast-in-Slow VLA，让“快行动”和“慢推理”统一协作

具身智能之心· 2025-07-13 09:48

Core Viewpoint - The article discusses the introduction of the Fast-in-Slow (FiS-VLA) model, a novel dual-system visual-language-action model that integrates high-frequency response and complex reasoning in robotic control, showcasing significant advancements in control frequency and task success rates [5][29]. Group 1: Model Overview - FiS-VLA combines a fast execution module with a pre-trained visual-language model (VLM), achieving a control frequency of up to 117.7Hz, which is significantly higher than existing mainstream solutions [5][25]. - The model employs a dual-system architecture inspired by Kahneman's dual-system theory, where System 1 focuses on rapid, intuitive decision-making, while System 2 handles slower, deeper reasoning [9][14]. Group 2: Architecture and Design - The architecture of FiS-VLA includes a visual encoder, a lightweight 3D tokenizer, and a large language model (LLaMA2-7B), with the last few layers of the transformer repurposed for the execution module [13]. - The model utilizes heterogeneous input modalities, with System 2 processing 2D images and language instructions, while System 1 requires real-time sensory inputs, including 2D images and 3D point cloud data [15]. Group 3: Performance and Testing - In simulation tests, FiS-VLA achieved an average success rate of 69% across various tasks, outperforming other models like CogACT and π0 [18]. - Real-world testing on robotic platforms showed success rates of 68% and 74% for different tasks, demonstrating superior performance in high-precision control scenarios [20]. - The model exhibited robust generalization capabilities, with a smaller accuracy decline when faced with unseen objects and varying environmental conditions compared to baseline models [23]. Group 4: Training and Optimization - FiS-VLA employs a dual-system collaborative training strategy, enhancing System 1's action generation through diffusion modeling while retaining System 2's reasoning capabilities [16]. - Ablation studies indicated that the optimal performance of System 1 occurs when sharing two transformer layers, and the best operational frequency ratio between the two systems is 1:4 [25]. Group 5: Future Prospects - The authors suggest that future enhancements could include dynamic adjustments to the shared structure and collaborative frequency strategies, which would further improve the model's adaptability and robustness in practical applications [29].

双系统理论

视觉 - 语言 - 动作模型

人工智能

Fast-in-Slow（FiS-VLA）

双系统理论

视觉 - 语言 - 动作模型

人工智能

Fast-in-Slow（FiS-VLA）

MuJoCo明天即将开课啦！从0基础到强化学习，再到sim2real

具身智能之心· 2025-07-13 09:48

Core Viewpoint - The article discusses the unprecedented advancements in AI, particularly in embodied intelligence, which is transforming the relationship between humans and machines. Major tech companies are competing in this revolutionary field, which has the potential to significantly impact various industries such as manufacturing, healthcare, and space exploration [1][2]. Group 1: Embodied Intelligence - Embodied intelligence is characterized by machines that can understand language commands, navigate complex environments, and make intelligent decisions in real-time [1]. - Leading companies like Tesla, Boston Dynamics, OpenAI, and Google are actively developing technologies in this area, emphasizing the need for AI systems to have both a "brain" and a "body" [1][2]. Group 2: Technical Challenges - Achieving true embodied intelligence presents significant technical challenges, including the need for advanced algorithms and a deep understanding of physical simulation, robot control, and perception fusion [2][4]. - MuJoCo (Multi-Joint dynamics with Contact) is highlighted as a key technology in overcoming these challenges, serving as a high-fidelity training environment for robot learning [4][6]. Group 3: MuJoCo's Role - MuJoCo is not just a physics simulation engine; it acts as a crucial bridge between the virtual and real worlds, enabling robots to learn complex motor skills without risking expensive hardware [4][6]. - The advantages of MuJoCo include simulation speeds hundreds of times faster than real-time, the ability to conduct millions of trials in a virtual environment, and successful transfer of learned strategies to the real world through domain randomization [6][8]. Group 4: Research and Development - Numerous cutting-edge research studies and projects in robotics are based on MuJoCo, with major tech firms like Google, OpenAI, and DeepMind utilizing it for their research [8]. - Mastery of MuJoCo positions researchers and engineers at the forefront of embodied intelligence technology, providing them with opportunities to participate in this technological revolution [8]. Group 5: Practical Training - A comprehensive MuJoCo development course has been created, focusing on both theoretical knowledge and practical applications within the embodied intelligence technology stack [9][11]. - The course is structured into six weeks, each with specific learning objectives and practical projects, ensuring a solid grasp of key technical points [15][17]. Group 6: Course Projects - The course includes six progressively challenging projects, such as building a smart robotic arm, implementing vision-guided grasping systems, and developing multi-robot collaboration systems [19][27]. - Each project is designed to reinforce theoretical concepts through hands-on experience, ensuring participants understand both the "how" and the "why" behind the technologies [30][32]. Group 7: Career Development - Completing the course equips participants with a complete embodied intelligence technology stack, enhancing their technical, engineering, and innovative capabilities [31][33]. - Potential career paths include roles as robotics algorithm engineers, AI research engineers, or product managers, with competitive salaries ranging from 300,000 to 1,500,000 CNY depending on the position and company [34].