银河通用创始人王鹤：做好VLA，将见证具身智能第一次真正高峰的到来

Core Insights - The current goal of embodied intelligence is to promote its industrialization, as stated by the founder and CTO of Galaxy General Robotics, Wang He [1][4][7] - The GALBOT G1 robot was showcased at the event, demonstrating its ability to accurately retrieve items from densely packed shelves upon receiving commands [1][3] Company Overview - Galaxy General Robotics was established in May 2023 in Haidian, Beijing, focusing on humanoid robot hardware and embodied intelligence large models [3] - The company has completed over 1.2 billion yuan in financing within the past year, attracting investments from various strategic and industry investors, including Meituan and IDG Capital [3] Product Development - On June 1, 2023, Galaxy General launched its self-developed end-to-end navigation large model, TrackVLA, which features pure visual environmental perception and language command-driven capabilities [3] - The robot dog, enhanced by the large model, can navigate complex environments like supermarkets and assist in carrying heavy items [3] Industry Trends - Embodied intelligence has gained significant public attention, highlighted by events such as the world's first humanoid robot half-marathon and a recent robot combat competition [4] - The industry faces the challenge of practical implementation, with a focus on how to effectively deploy humanoid robots in real-world scenarios [4][7] Future Plans - Galaxy General's robots are already operational in seven unmanned pharmacies in Beijing, with plans to open 100 more in major cities by the end of the year [8] - The upcoming "World Humanoid Robot Sports Conference" is scheduled for August 15-17, 2023, at the National Stadium and National Speed Skating Hall [8] Technological Insights - Wang He emphasized that the VLA (Vision-Language-Action) model represents a starting point for embodied intelligence, capable of direct visual observation and action output without intermediary steps [9] - The current focus for VLA applications is on mobility, grasping, and placing tasks, which are primarily visual-based and can be enhanced with tactile and mechanical sensors [9]