长时序推理
Search documents
阿里亲身入局具身智能!Qwen内部组团,通义千问技术负责人带队
量子位· 2025-10-09 07:03
Core Viewpoint - Alibaba has established a new team focused on embodied intelligence, marking a significant step in its exploration of physical AI systems, following in the footsteps of companies like OpenAI and Google [2][3][5]. Group 1: Team Formation and Leadership - The Qwen team, under Alibaba, has formed a dedicated embodied intelligence squad, which is part of the core department responsible for the development of the Qwen series of large models [6][7]. - Justin Lin, the technical lead of Qwen, personally set up this team, indicating a hands-on approach to its development [10][11]. - Lin has a strong background in AI foundational models, having transitioned from natural language processing to large-scale pre-trained models and now leading the Qwen project [12][18][19]. Group 2: Strategic Direction and Investment - Alibaba has been strategically focused on embodied intelligence since 2024, investing in several companies in this field, including Zhi Ji Power and Xingdong Era [21][22]. - In September 2023, Alibaba Cloud led a $140 million financing round for a robotics company, marking its first direct investment in the embodied intelligence sector [23]. - The company aims to integrate AI large models with robotics and automation, as highlighted in the "Physical AI" initiative announced at the 2025 Cloud Summit [24][25]. Group 3: Technological Evolution and Future Outlook - The establishment of the embodied intelligence team signals Alibaba's shift towards applying AI in real-world scenarios, moving beyond purely virtual applications [27][30]. - The growth in model scale has enhanced AI's capabilities in abstract reasoning and task decomposition, enabling a transition from software simulations to real-world applications [28][29]. - Alibaba's CEO has projected that global AI investment will exceed $4 trillion in the next five years, emphasizing the company's commitment to advancing AI technology towards embodied intelligence and robotic applications [30][31].
FindingDory:具身智能体记忆评估的基准测试
具身智能之心· 2025-06-22 10:56
Group 1 - The core issue in embodied intelligence is the lack of long-term memory, which limits the ability to process multimodal observational data across time and space [3] - Current visual language models (VLMs) excel in planning and control tasks but struggle with integrating historical experiences in embodied environments [3][5] - Existing video QA benchmarks fail to adequately assess tasks requiring fine-grained reasoning, such as object manipulation and navigation [5] Group 2 - The proposed benchmark includes a task architecture that allows for dynamic environment interaction and memory reasoning validation [4][6] - A total of 60 task categories are designed to cover spatiotemporal semantic memory challenges, including spatial relations, temporal reasoning, attribute memory, and multi-target recall [7] - Key technical innovations include a programmatic expansion of task complexity through increased interaction counts and a strict separation of experience collection from interaction phases [9][6] Group 3 - Experimental results reveal three major bottlenecks in VLM memory capabilities across 60 tasks, including failures in long-sequence reasoning, weak spatial representation, and collapse in multi-target processing [13][14][16] - The performance of native VLMs declines as the number of frames increases, indicating ineffective utilization of long contexts [20] - Supervised fine-tuning models show improved performance by leveraging longer historical data, suggesting a direction for VLM refinement [25] Group 4 - The benchmark represents the first photorealistic embodied memory evaluation framework, covering complex household environments and allowing for scalable assessment [26] - Future directions include memory compression techniques, end-to-end joint training to address the split between high-level reasoning and low-level execution, and the development of long-term video understanding [26]