Workflow
分层架构
icon
Search documents
腾讯张正友:具身智能必须回答的三个「真问题」
机器之心· 2025-08-10 04:31
Core Viewpoint - Tencent has launched the Tairos platform for embodied intelligence, aiming to provide a modular support system for the development and application of large models, development tools, and data services [2][3]. Group 1: Platform Development - The Tairos platform is a culmination of over seven years of research by Tencent's Robotics X Lab, which has developed various robotic prototypes to explore full-stack robotic technologies [2][3]. - The establishment of the Tairos platform reflects Tencent's response to current industry challenges and its strategic positioning for future ecosystems [2][3]. Group 2: Architectural Choices - The debate between end-to-end and layered architectures in embodied intelligence is ongoing, with a preference for layered architecture due to its efficiency and practicality [4][5]. - Layered architecture allows for the integration of human prior knowledge into model structures, enhancing training efficiency and reducing data dependency [6][7]. Group 3: Knowledge Feedback Mechanism - The SLAP³ architecture proposed by Tencent includes multi-modal perception models, planning models, and action models, with dynamic collaboration and information flow between layers based on task complexity [7][11]. - A memory bank captures unique interaction data from the action model, which can be used to update the perception and planning models, creating a feedback loop for continuous learning [11][12]. Group 4: Evolution of Models - The architecture is designed for continuous iteration, allowing for the adjustment of prior knowledge as new insights are gained, similar to the evolution of the Transformer architecture [12][15]. - The goal is to transition towards a more efficient and native multi-modal intelligence form, despite current limitations in data availability and model exploration [15][16]. Group 5: Innovation and Commercialization - The influx of talent and capital into the embodied intelligence field is beneficial, but there is a need for balance between short-term commercial gains and long-term technological goals [23][24]. - Companies must maintain a clear vision of their ultimate objectives and have the courage to forgo immediate commercial opportunities to focus on foundational scientific challenges [25].
技术圈热议的π0/π0.5/A0,终于说清楚是什么了!功能/场景/方法论全解析~
自动驾驶之心· 2025-06-22 01:35
Core Insights - The article discusses the π0, π0.5, and A0 models, focusing on their architectures, advantages, and functionalities in robotic control and task execution [3][12][21]. π0 Model Structure - The π0 model is based on a pre-trained Vision-Language Model (VLM) and Flow Matching technology, integrating seven types of robots and over 68 tasks with more than 10,000 hours of data [3]. - It utilizes a VLM backbone, an Action Expert, and Cross-Embodiment Training to handle different robot action spaces [3]. π0 Advantages and Functions - The model can execute tasks directly from language prompts without additional fine-tuning, achieving a 20%-30% higher accuracy in task execution compared to baseline models [4][6]. - It supports complex task decomposition and high-frequency precise operations, generating continuous actions at a control frequency of up to 50Hz [4][6]. π0.5 Model Structure - The π0.5 model employs a two-stage training framework and a hierarchical architecture to learn from diverse data sources and generalize to new environments [7][9]. - It integrates a Vision-Language-Action (VLA) model that encodes multi-modal inputs into a unified sequence for decision-making [9]. π0.5 Advantages and Functions - The π0.5 model shows a 25%-40% higher success rate in tasks compared to π0, with a training speed improvement of three times due to mixed discrete-continuous action training [12][13]. - It effectively handles long-duration tasks and demonstrates zero-shot semantic understanding, allowing it to recognize and act on previously unseen objects [13][16]. A0 Model Structure - The A0 model features a layered architecture that focuses on Affordance understanding and action execution, utilizing a diffusion model for predicting contact points and trajectories [21][25]. - It integrates multi-source data to create a unified Affordance representation, enhancing its ability to perform complex tasks [26]. A0 Advantages and Functions - The A0 model exhibits cross-platform generalization capabilities, allowing deployment across various robotic platforms with high efficiency in spatial reasoning [26][27]. - It achieves an average success rate of 62.5% in tasks, with specific tasks like drawer opening reaching a 75% success rate [27].