Workflow
具身智能之心
icon
Search documents
准备扩大具身团队了,欢迎加入我们......
具身智能之心· 2025-08-01 16:02
Core Viewpoint - The rapid development of embodied intelligence is being recognized, with several leading companies preparing for IPOs, highlighting the importance of collaboration and communication within the industry [1] Group 1: Collaboration and Industry Development - The industry is encouraged to engage in active communication to overcome technological isolation, which can hinder overall development [1] - The company aims to create a platform that gathers talent from across the industry to promote progress [1] Group 2: Project Collaboration - The company is establishing project research teams in major cities including Beijing, Shanghai, Shenzhen, Guangzhou, Hangzhou, and Wuhan, with opportunities for part-time involvement [3] - Each city will recruit around 10 individuals with over 2 years of experience in embodied algorithms and robotics research [4] Group 3: Education and Consulting Services - The company invites industry experts to develop online courses and consulting services in the field of embodied intelligence [5] - Specific areas of expertise sought include large models, multi-modal models, reinforcement learning, and robot motion planning, among others [5][6] Group 4: Compensation and Opportunities - The company offers significant profit-sharing and resource sharing across the industry, with options for both part-time and full-time positions [7]
大话一下!具身里面视觉语言导航和目标导航有什么区别?
具身智能之心· 2025-08-01 10:30
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes autonomous exploration and pathfinding based on environmental understanding [1][5]. Group 1: Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of a visual language encoder, historical environmental representation, and action strategy modules [2][4]. - The learning process for the strategy network has shifted from extracting patterns from labeled datasets to leveraging large language models (LLMs) for effective planning information extraction [4] - The architecture of VLN robots requires them to accumulate visual observations and execute actions in a loop, making it crucial to determine the current task stage for informed decision-making [4]. Group 2: Goal Navigation - Goal navigation extends VLN by enabling agents to autonomously explore and plan paths in unfamiliar 3D environments based solely on target descriptions, such as coordinates or images [5][7]. - Unlike traditional VLN, goal-driven navigation systems must transition from understanding commands to independently interpreting the environment and making decisions, integrating computer vision, reinforcement learning, and 3D semantic understanding [7]. Group 3: Commercial Applications and Demand - Goal-driven navigation technology has been successfully implemented in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments and human interactions [9]. - Companies like Meituan and Starship Technologies have deployed delivery robots in complex urban settings, while others like Aethon have developed service robots for medical and hospitality sectors, enhancing service efficiency [9][10]. - The growth of humanoid robots has led to an increased focus on adapting navigation technology for applications in home services, healthcare, and industrial logistics, creating significant job demand in the navigation sector [10]. Group 4: Learning and Knowledge Challenges - Both VLN and goal navigation require knowledge across multiple domains, including natural language processing, computer vision, reinforcement learning, and graph neural networks, making it challenging for newcomers to gain comprehensive expertise [11]. - The fragmented nature of knowledge in these fields can lead to difficulties in learning, often causing individuals to abandon their studies before achieving a solid understanding [11].
加入智源!具身大模型研究员岗位开放 (社招、校招、实习均可)
具身智能之心· 2025-08-01 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 岗位职责描述 社招&校招&实习生都需要,欢迎投递简历到pwwang@baai.ac.cn 职位要求 投递说明 1. 计算机科学、人工智能、机器人、自动化、数学等相关领域的硕士及以上学历; 2. 精通 Python,具有良好的深度学习基础,熟悉 TensorFlow、PyTorch 等深度学习框架; 3. 具备大模型领域的研究经验,对主流视觉与语言大模型有深入理解,具备预训练、微调、部署等流程的工 作经验; 4. 具备机器人控制经验,对主流具身模型训练以及部署有良好的经验优先 5. 具备优秀的学习能力,英语水平,动手能力以及良好的团队沟通与协作能力; 6. 有相关机器人、自然语言处理以及计算机视觉顶会论文(RSS,ICRA, CVPR, CoRL, ICLR, NeurlPS,ACL 等)发表优先。 1. 负责具身智能大模型(VLA大模型或者分层架构)的研究和开发。 2. 设计,优化模型架构,完成对模型的数据处理,训练与真机部署工作。 3. 深入调研具身智能领域相关的前沿技术,跟踪业内大模型领域的最新进展并推进相关研究,探寻将最新技 术应用到具身智能领域的可能 ...
都说强化+VLA才是未来?相关工作汇总来啦
具身智能之心· 2025-08-01 00:03
Core Viewpoint - The integration of Vision-Language-Action (VLA) models with Reinforcement Learning (RL) presents a promising new paradigm that leverages both environmental trial-and-error interactions and pre-collected suboptimal data for enhanced performance [2]. Group 1: Offline RL Training without Environment - The paper "MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models" discusses scalability in RL applications [3]. - "Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions" focuses on offline RL techniques [3]. Group 2: Online RL Training with Environment - Online RL training enhances VLA models through trial-and-error interactions in real-time environments, leading to performance improvements [4]. - The paper "ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning" explores this concept [5]. - "GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot" presents a generalist approach in robotic models [5]. Group 3: Simulator-Based Approaches - Various projects aim to improve VLA models using simulation environments, such as "OctoNav: Towards Generalist Embodied Navigation" [6]. - "TGRPO: Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization" focuses on optimizing VLA models through trajectory-based methods [6]. - "VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning" emphasizes scalable RL for robotic manipulation [6]. Group 4: Real-World Applications - The deployment phase of RL training is crucial for testing VLA models in real-world scenarios [8]. - "Dynamism v1 (DYNA-1) Model: A Breakthrough in Performance and Production-Ready Embodied AI" highlights advancements in embodied AI [9]. - "ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy" discusses fine-tuning methods for VLA models [9]. Group 5: RL Alignment Training - "GRAPE: Generalizing Robot Policy via Preference Alignment" addresses the alignment of robot policies with user preferences [11]. - "SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning" focuses on safety in VLA model training [12].
科研论文这件小事,总是开窍后已经太晚......
具身智能之心· 2025-07-31 06:28
Core Viewpoint - The article emphasizes the importance of early action in academic research, particularly for master's students, to avoid delays in thesis completion and publication. It highlights common pitfalls that lead to procrastination and the need for a proactive approach to research and writing [1][2]. Group 1: Common Pitfalls - "Waiting for Guidance" Type: Students often feel lost without clear direction from their advisors, leading to passive waiting and wasted time [1]. - "Perfectionist" Type: The desire to master all knowledge before starting leads to endless delays, as foundational knowledge is never fully complete [1]. - "Procrastination" Type: Students may avoid the daunting tasks of literature review and writing, distracting themselves with other activities [1]. - "Underestimating Time" Type: Many students mistakenly believe that the process from idea to publication is quick, not realizing it can take several months to years [2]. Group 2: Action Guidelines - Establish "Paper Awareness" Early: Students should clarify graduation requirements and familiarize themselves with relevant journals and conferences from the first semester [3]. - Seize Opportunities: Engaging with advisors early, even with vague ideas, is crucial. The summer after the first year is highlighted as a prime time for research initiation [3]. Group 3: Iterative Research Approach - Complete Before Perfecting: Students are encouraged to start with small goals, such as replicating a classic paper or running a baseline model, rather than aiming for a perfect paper from the outset [4]. - Quick Iteration: Initial results, even if not ideal, should be organized into a paper for submission to workshops or lower-tier conferences, as feedback from reviews is invaluable for improvement [4].
科研只需要这一台!GeoScan S1:最高性价比3D激光扫描仪(支持3DGS)
具身智能之心· 2025-07-31 06:28
Core Viewpoint - GeoScan S1 is introduced as a high-performance, cost-effective handheld 3D laser scanner, designed for various applications with advanced features such as multi-sensor integration and real-time 3D reconstruction capabilities [1][3][4]. Product Introduction - GeoScan S1 features lightweight design, one-click operation, and centimeter-level precision for real-time 3D scene reconstruction. It can cover areas over 50,000 square meters and supports a measurement distance of up to 70 meters with a point cloud generation rate of 100,000 points per second [1][20][21]. - The device is equipped with a handheld Ubuntu system and various sensor devices, allowing for flexible integration and expansion for research and development [1]. Team Background - The product is developed through collaboration between Professor Liu Chun's team from Tongji University and the industrialization team from Northwestern Polytechnical University, backed by years of research and numerous validated projects [3]. Technical Specifications - The GeoScan S1 supports multi-sensor fusion, including RTK, 3D laser radar, dual wide-angle cameras, and a depth camera, achieving high precision and reliability in complex environments [8][12][23]. - It has a relative accuracy of better than 3 cm and absolute accuracy of better than 5 cm, with a maximum scanning area of 50,000 square meters and a point cloud output of 200,000 points per second [15][20]. Software Features - The software allows for data collection and storage in various formats, including .pcd and .bag files, and supports real-time mapping and color point cloud generation [28][29]. - Users can initiate RTK functionality and 3D Gaussian data collection, with options for online and offline versions available for enhanced capabilities [29][44]. Application Scenarios - GeoScan S1 is suitable for various environments, including office buildings, parking lots, industrial parks, tunnels, and forests, enabling precise 3D mapping [33]. - The device supports integration with unmanned platforms such as drones and robots, facilitating automated operations [31]. Pricing Information - The base version of GeoScan S1 is priced at 19,800, with additional versions available at higher price points for enhanced features [44].
VLA+强化学习,会催生更强大的系统!
具身智能之心· 2025-07-31 00:04
Core Viewpoint - The article discusses the advancements in robotic models, particularly focusing on the development of the RT-2 and RT-X models, which enhance the capabilities of robots in executing tasks through visual language models and diverse datasets [5][10][11]. Group 1: RT-2 and Its Capabilities - RT-2 is introduced as a foundational robot model that can process visual questions and execute tasks based on language instructions, showcasing the potential of remote-accessible robotic models [5][7]. - The model's ability to convert robot control tasks into question-answer formats allows it to perform various basic language instructions effectively [7][8]. Group 2: RT-X Dataset and Its Impact - The RT-X dataset, developed by DeepMind, comprises data from 34 research labs and 22 types of robots, providing a diverse training ground for robotic models [10]. - Models trained on the RT-X dataset outperform specialized models by approximately 50% in various tasks, indicating the advantages of cross-embodiment models [11]. Group 3: Evolution of VLA Models - The first-generation VLA model, RT-2, is noted for its simplicity, while the second-generation models utilize continuous action distributions for improved performance in complex tasks [14][15]. - The second-generation VLA models incorporate specialized mechanisms for generating continuous actions, enhancing their control capabilities [17][18]. Group 4: π0 and π0.5 Models - The π0 model, based on a large language model with 3 billion parameters, is designed to handle various tasks, including folding clothes, demonstrating its adaptability in different environments [18][23]. - The latest π0.5 model is aimed at executing long-term tasks in new environments, integrating high-level reasoning capabilities to manage complex instructions [28][30]. Group 5: Future Directions and Reinforcement Learning - Future VLA models are expected to integrate reinforcement learning techniques to enhance robustness and performance, moving beyond imitation learning [34][39]. - The combination of VLA and DLA (Deep Learning Architecture) is proposed to create a more effective system, leveraging expert data to improve generalist capabilities [44][46].
买来的足式机器人,调了好久不work......
具身智能之心· 2025-07-31 00:04
Core Viewpoint - The article emphasizes the significance of legged robots in the field of robotics, highlighting their ability to navigate complex terrains and perform various tasks, making them a focal point for future applications in inspection, security, rescue, and industrial automation [2][4]. Group 1: Importance of Legged Robots - Legged robots are considered a milestone in robotics due to their capability to handle complex environments and obstacles, moving beyond flat surfaces [2]. - There is a growing demand for talent in the legged robotics sector, with companies willing to invest heavily in skilled individuals [2]. - The article suggests that now is the optimal time to enter the legged robotics field, as it presents numerous opportunities for learning and development [2]. Group 2: Educational Initiatives - The article introduces a comprehensive course titled "From Quadruped to Biped: Full-Stack Algorithms," aimed at addressing the challenges faced by beginners in the legged robotics domain [2]. - The course covers a full technology stack from quadruped to biped robots, incorporating real-world applications and simulation environments like Isaac Gym, Gazebo, and MuJoCo [2][4]. - Key topics include the basics of quadruped robots, advanced biped robot techniques, and high-level algorithms for multi-task adaptation [2][4]. Group 3: Technical Aspects - The curriculum includes kinematics and dynamics, multi-modal sensor fusion, and practical implementations in simulation environments [3][4]. - It also covers deep reinforcement learning and imitation learning techniques, focusing on algorithms like PPO and SAC for gait control [4]. - Safety mechanisms, collision detection, and hardware deployment strategies are integral parts of the training, ensuring a comprehensive understanding of real-world applications [4][7]. Group 4: Target Audience and Prerequisites - The course is designed for AI robotics practitioners, graduate and undergraduate students, career changers, and enthusiasts interested in cutting-edge technology [16]. - Participants are expected to have a foundational knowledge of programming, algorithms, and mathematics, with recommendations for having a GPU for practical exercises [16][17]. - The training emphasizes hands-on experience, allowing learners to translate theoretical knowledge into practical engineering solutions [16].
WAIC2025之后!上海具身智能机器人产业大会来啦~
具身智能之心· 2025-07-31 00:04
曾经囿于实验室精密操控的具身智能,如今正以惊人的态势突破感知与行动的重重壁垒,灵巧的机械臂 在无序环境中自主作业,智能体在复杂动态场景中理解并执行指令,机器人真正"理解"物理世界并与人 类自然协作......具身智能的落地应用,正从蓝图变为触手可及的现实,这不仅是技术的跃迁,更是一场 颠覆人机交互范式、重塑产业未来的场景革命! 把握变革脉搏,共绘智能蓝图! 2025年8月13-15日 ,万众瞩目的 2025中国具身智能机器人产业大会 暨展览会 将在 上海新国际博览中心 盛大启幕! 来EAI SHOW 2025: 洞悉前沿趋势: 聚焦具身智能,聆听全球顶尖专家解读技术突破与产业方向,打通技术到商业的变现通路! 见证创新力量: 近距离接触未来科技,体验最具突破性的具身智能产品与解决方案; 链接产业生态: 同期精彩活动云集,现场与产、学、研、投各界专业人士深度交流,探寻合作共赢新机遇。 [ 聚2焦0具2身5智能] 零距离接触未来科技 聚焦具身智能,一站式把握产业脉搏 。无论是知名企业还是行业新锐,产业链上中下游的企业、产品 和解决方案都将展示更多创新应用方向,为观众带来具身智能机器人创新成果与新兴产业深度融合的沉 ...
贝叶斯推断与具身智能的联系探索:迈向开放物理世界的具身AI系统
具身智能之心· 2025-07-31 00:04
Core Insights and Background - The article explores the deep conceptual connection between Bayesian statistics and embodied intelligence, emphasizing that cognitive abilities fundamentally arise from real-time sensor interactions between agents and their environments [3] - Bayesian statistics provides a principled probabilistic framework for continuously reasoning under uncertainty by representing knowledge as probability distributions and updating beliefs based on new evidence [3] - Despite this connection, Bayesian principles are not widely applied in current embodied intelligence systems, which are analyzed through the lenses of search and learning, as highlighted by Rich Sutton in "The Bitter Lesson" [3][4] Search and Learning: Foundations of Modern AI - Search and learning are identified as universal methods driving significant breakthroughs in AI as computational power increases, with search involving systematic exploration of potential solutions and learning focusing on training models through data [4] - Sutton's insight indicates that while researcher-designed systems may succeed initially, they often hit performance bottlenecks, whereas systems built on scalable general methods like search and learning continue to improve with increased computational resources [4] Current Practices in Embodied Intelligence - Mainstream embodied intelligence methods are based on advancements in AI foundational models, such as pre-trained large language models and vision-language models, which provide rich prior knowledge about the world for embodied agents like robots [5] - However, these foundational models are insufficient for all requirements of embodied intelligence systems, as the encoded prior knowledge is static and coarse, lacking the precision needed for dynamic environments [6] Approaches to Addressing Limitations - Two primary approaches are identified to address the limitations of foundational models: embedding search operations within model training or fine-tuning processes in data-driven learning paradigms, and incorporating explicit search mechanisms for planning, similar to those used in AlphaGo and AlphaZero [7] Deep Connection Between Bayesian and Embodied Intelligence - From a philosophical perspective, Bayesianism and embodied intelligence are closely linked, with Bayesianism quantifying subjective beliefs and emphasizing dynamic knowledge updates through evidence [8] - Both frameworks share a common learning mechanism that views cognition/intelligence as a process dependent on dynamic interactions rather than static data, aligning with the paradigm of emergent intelligence [8] Gaps Between Bayesian Methods and Current Practices - There is a significant gap between Bayesian methods and current practices in embodied intelligence, particularly in learning and search, as Bayesian learning methods often rely on structured priors or explicit model assumptions that may hinder scalability [9] - A comparison highlights fundamental differences in model dependency, human knowledge injection frequency, learning scalability, and search methods between Bayesian intelligence and Sutton's preferred approaches [9] Future of Embodied Intelligence Shaped by Bayesian Methods - Modern embodied intelligence systems, especially those based on deep learning and large pre-trained models, have adopted data-driven, hypothesis-light methods that align with Sutton's preferences [10] - These systems can be constructed using pre-trained foundational models as building blocks, supplemented with additional modules for memory, atomic skill models, perception, sensor control, and navigation [11] Strategies for Data Scarcity - In scenarios of data scarcity, two mitigation strategies are proposed: collecting human demonstration data and resorting to simulations to create digital counterparts of the physical world [12] - Current large pre-trained models are seen as rough approximations of world models, insufficient for supporting embodied intelligence in rich, dynamic, and three-dimensional physical environments [12] Goals for Open Physical Environments - The ultimate goal for embodied intelligence is to operate in open physical environments, where knowledge and skills acquired in closed settings serve as prior knowledge [12] - In open worlds, embodied agents must continuously adapt their behavior through real-time sensor interactions, necessitating ongoing reasoning under uncertainty [12] Bayesian Methods for Complex Systems - Various existing Bayesian methods have been developed for global optimization in complex systems, particularly where traditional gradient-based methods are unsuitable [13] - The flexibility and generalization capabilities in real-world scenarios can be enhanced by relaxing the dependency on structured model assumptions, allowing for operations on model collections rather than committing to a single fixed model [13]