Workflow
视觉语言动作模型(VLA)
icon
Search documents
灵宝机器人团队在具身智能新赛道上不断突破 让机器人“心灵手巧”(科技视点·一线探创新)
Ren Min Ri Bao· 2025-07-27 22:23
Group 1 - The core message emphasizes the importance of technological innovation in advancing China's modernization and competitiveness in the global arena [1] - The article introduces a series of reports titled "Frontline Innovation," focusing on the experiences and observations of researchers in the field of scientific innovation [1] Group 2 - Lingbao Robotics, founded in 2023, specializes in developing general humanoid robots and embodied intelligence products, with a focus on practical applications [3][4] - The company utilizes a visual language action model (VLA) to enable robots to learn skills through imitation, significantly improving the efficiency of skill acquisition [4][5] - The robots developed by Lingbao can perform precise tasks, such as assembling computer components with a precision of 0.3 mm, showcasing their advanced capabilities [3][4] Group 3 - Lingbao Robotics is working on flexible automation solutions for the shoe manufacturing industry, addressing the challenges of high costs and low adaptability in traditional production lines [6][7] - The company has developed a system that allows robots to learn to perform tasks in dynamic environments, reducing the time required for training to about one hour [7] - The humanoid robot developed by Lingbao, CASBOT 01, features a bionic hand capable of executing complex tasks, highlighting the integration of embodied intelligence and precision operation [8] Group 4 - The domestic development of embodied intelligence is rapidly advancing, with a growing variety of tactile sensors and technologies being integrated into the industry [9] - Lingbao Robotics emphasizes the importance of collaboration between academia and industry, applying the latest research findings to product development while also contributing to academic research [9]
学习端到端大模型,还不太明白VLM和VLA的区别。。。
自动驾驶之心· 2025-06-19 11:54
Core Insights - The article emphasizes the growing importance of large models (VLM) in the field of intelligent driving, highlighting their potential for practical applications and production [2][4]. Group 1: VLM and VLA - VLM (Vision-Language Model) focuses on foundational capabilities such as detection, question answering, spatial understanding, and reasoning [4]. - VLA (Vision-Language Action) is more action-oriented, aimed at trajectory prediction in autonomous driving, requiring a deep understanding of human-like reasoning and perception [4]. - It is recommended to learn VLM first before expanding to VLA, as VLM can predict trajectories through diffusion models, enhancing action capabilities in uncertain environments [4]. Group 2: Community and Resources - The article invites readers to join a knowledge-sharing community that offers comprehensive resources, including video courses, hardware, and coding materials related to autonomous driving [4]. - The community aims to build a network of professionals in intelligent driving and embodied intelligence, with a target of gathering 10,000 members in three years [4]. Group 3: Technical Directions - The article outlines four cutting-edge technical directions in the industry: Visual Language Models, World Models, Diffusion Models, and End-to-End Autonomous Driving [5]. - It provides links to various resources and papers that cover advancements in these areas, indicating a robust framework for ongoing research and development [6][31]. Group 4: Datasets and Applications - A variety of datasets are mentioned that are crucial for training and evaluating models in autonomous driving, including pedestrian detection, object tracking, and scene understanding [19][20]. - The article discusses the application of language-enhanced systems in autonomous driving, showcasing how natural language processing can improve vehicle navigation and interaction [20][21]. Group 5: Future Trends - The article highlights the potential for large models to significantly impact the future of autonomous driving, particularly in enhancing decision-making and control systems [24][25]. - It suggests that the integration of language models with driving systems could lead to more intuitive and human-like vehicle behavior [24][25].
具身智能:一场需要谦逊与耐心的科学远征
Robot猎场备忘录· 2025-05-20 05:01
Core Viewpoints - Embodied intelligence is injecting new research vitality into the robotics field and has the potential to break through performance limits [1] - The development of embodied intelligence relies on breakthroughs in specific scientific problems and should not dismiss contributions from traditional robotics [2] - General intelligence cannot exist without a focus on specific tasks, as expertise in particular areas leads to advancements in broader capabilities [3] Group 1: Interdisciplinary Collaboration - Embodied intelligence is a cross-disciplinary product that requires collaboration with fields such as material science, biomechanics, and design aesthetics [2] - Breakthroughs often occur at the intersection of disciplines, highlighting the importance of diverse scientific contributions [2] Group 2: Technology Evolution - Technological evolution should not be viewed as a complete replacement of old systems; rather, it is a process of sedimentation where foundational technologies continue to support advancements [5] - The current trend in visual-language-action models may soon be replaced by more efficient alternatives, emphasizing the need for continuous innovation [5] Group 3: Realistic Expectations for AGI - Viewing embodied intelligence as the sole path to artificial general intelligence (AGI) is a dangerous oversimplification; AGI development requires a multitude of conditions and interdisciplinary knowledge [6] - The complexity of embodied systems necessitates a collaborative approach across various fields, rather than relying on a few "genius" individuals [6] Group 4: Current State of Embodied Intelligence - The field of embodied intelligence is still in its early stages, with significant challenges remaining in hardware and algorithm development [7] - Current human-like robots are not yet fully autonomous and often require human intervention, indicating that the technology is still evolving [7] Group 5: VLA Technology Pathway - The development of visual-language-action (VLA) models may not be the most efficient approach, as operational skills often precede language capabilities in learning processes [9] - Many current VLA models are resource-intensive and may be replaced by more efficient solutions in the future [9] Group 6: Balancing Short-term and Long-term Goals - A combination of learning and modeling approaches is seen as more practical in the short term, while pure learning methods may represent the long-term future of robotics [10] - Successful robotic solutions in industry often rely on model-based methods due to their stability and reliability [10] Group 7: Human-like Robots and Practicality - The design of human-like robots is driven by emotional projection and environmental adaptability, but specialized non-human forms may offer better efficiency in many applications [11] - There is a concern about over-investment in human-like robots at the expense of practical and economically viable solutions [11] Group 8: Building Technical Barriers - True competitive advantages in technology arise from extensive practical experience and meticulous attention to detail, rather than solely from innovative algorithms [12] - Long-term technical barriers are built through consistent effort and iterative improvements in engineering practices [12] Group 9: Vision and Practicality - Scientific research requires both grand visions and grounded practices, with embodied intelligence embodying both idealistic aspirations and real-world challenges [13] - The importance of foundational theories, such as control theory, remains critical in ensuring the safety and functionality of robotic systems [13]