分层架构 - filings, earnings calls, financial reports, news

分层架构

Search documents

机器之心· 2025-09-29 02:52

Core Viewpoint - The field of embodied intelligence is experiencing unprecedented attention, yet key issues remain unresolved, including data scarcity and differing technical approaches [1][2][3] Group 1: Data and Technical Approaches - The industry is divided into two factions: the "real machine" faction, which relies on real-world data collection, and the "synthetic" faction, which believes in the feasibility of synthetic data for model training [5][12] - Galaxy General, representing the synthetic faction, argues that achieving generalization in embodied intelligence models requires trillions of data points, which is unsustainable through real-world data alone [8][9] - The "real machine" faction challenges the notion that real-world data is prohibitively expensive, suggesting that with sufficient investment, data collection can be scaled effectively [12][14] Group 2: Model Architecture - Discussions around the architecture of embodied intelligence models highlight a divide between end-to-end and layered approaches, with some experts advocating for a unified model while others support a hierarchical structure [15][19] - The layered architecture is seen as more aligned with biological evolution, while the end-to-end approach is criticized for potential error amplification [19][20] - The debate extends to the relevance of VLA (Vision-Language Alignment) versus world models, with some experts arguing that VLA is currently more promising due to its data efficiency [21][22] Group 3: Industry Trends and Infrastructure - The scaling law in embodied intelligence is beginning to emerge, indicating that expanding model and data scales could be effective [24] - The industry is witnessing an acceleration in the deployment of embodied intelligence technologies, with various companies sharing their experiences in human-robot interaction and industrial applications [24][29] - Cloud service providers, particularly Alibaba Cloud, are emphasized as crucial players in supporting the infrastructure needs of embodied intelligence companies, especially as they transition to mass production [29][31] Group 4: Alibaba Cloud's Role - Alibaba Cloud has been preparing for the exponential growth in data and computational needs associated with embodied intelligence, having developed capabilities to handle large-scale data processing and model training [33][35] - The company offers a comprehensive suite of cloud-based solutions to support both real and synthetic data production, enhancing efficiency and reducing costs [35][36] - Alibaba Cloud's unique position as a model provider and its engineering capabilities are seen as significant advantages in the rapidly evolving embodied intelligence landscape [37][41]

腾讯张正友：具身智能必须回答的三个「真问题」

机器之心· 2025-08-10 04:31

Core Viewpoint - Tencent has launched the Tairos platform for embodied intelligence, aiming to provide a modular support system for the development and application of large models, development tools, and data services [2][3]. Group 1: Platform Development - The Tairos platform is a culmination of over seven years of research by Tencent's Robotics X Lab, which has developed various robotic prototypes to explore full-stack robotic technologies [2][3]. - The establishment of the Tairos platform reflects Tencent's response to current industry challenges and its strategic positioning for future ecosystems [2][3]. Group 2: Architectural Choices - The debate between end-to-end and layered architectures in embodied intelligence is ongoing, with a preference for layered architecture due to its efficiency and practicality [4][5]. - Layered architecture allows for the integration of human prior knowledge into model structures, enhancing training efficiency and reducing data dependency [6][7]. Group 3: Knowledge Feedback Mechanism - The SLAP³ architecture proposed by Tencent includes multi-modal perception models, planning models, and action models, with dynamic collaboration and information flow between layers based on task complexity [7][11]. - A memory bank captures unique interaction data from the action model, which can be used to update the perception and planning models, creating a feedback loop for continuous learning [11][12]. Group 4: Evolution of Models - The architecture is designed for continuous iteration, allowing for the adjustment of prior knowledge as new insights are gained, similar to the evolution of the Transformer architecture [12][15]. - The goal is to transition towards a more efficient and native multi-modal intelligence form, despite current limitations in data availability and model exploration [15][16]. Group 5: Innovation and Commercialization - The influx of talent and capital into the embodied intelligence field is beneficial, but there is a need for balance between short-term commercial gains and long-term technological goals [23][24]. - Companies must maintain a clear vision of their ultimate objectives and have the courage to forgo immediate commercial opportunities to focus on foundational scientific challenges [25].

技术圈热议的π0/π0.5/A0，终于说清楚是什么了！功能/场景/方法论全解析~

自动驾驶之心· 2025-06-22 01:35

Core Insights - The article discusses the π0, π0.5, and A0 models, focusing on their architectures, advantages, and functionalities in robotic control and task execution [3][12][21]. π0 Model Structure - The π0 model is based on a pre-trained Vision-Language Model (VLM) and Flow Matching technology, integrating seven types of robots and over 68 tasks with more than 10,000 hours of data [3]. - It utilizes a VLM backbone, an Action Expert, and Cross-Embodiment Training to handle different robot action spaces [3]. π0 Advantages and Functions - The model can execute tasks directly from language prompts without additional fine-tuning, achieving a 20%-30% higher accuracy in task execution compared to baseline models [4][6]. - It supports complex task decomposition and high-frequency precise operations, generating continuous actions at a control frequency of up to 50Hz [4][6]. π0.5 Model Structure - The π0.5 model employs a two-stage training framework and a hierarchical architecture to learn from diverse data sources and generalize to new environments [7][9]. - It integrates a Vision-Language-Action (VLA) model that encodes multi-modal inputs into a unified sequence for decision-making [9]. π0.5 Advantages and Functions - The π0.5 model shows a 25%-40% higher success rate in tasks compared to π0, with a training speed improvement of three times due to mixed discrete-continuous action training [12][13]. - It effectively handles long-duration tasks and demonstrates zero-shot semantic understanding, allowing it to recognize and act on previously unseen objects [13][16]. A0 Model Structure - The A0 model features a layered architecture that focuses on Affordance understanding and action execution, utilizing a diffusion model for predicting contact points and trajectories [21][25]. - It integrates multi-source data to create a unified Affordance representation, enhancing its ability to perform complex tasks [26]. A0 Advantages and Functions - The A0 model exhibits cross-platform generalization capabilities, allowing deployment across various robotic platforms with high efficiency in spatial reasoning [26][27]. - It achieves an average success rate of 62.5% in tasks, with specific tasks like drawer opening reaching a 75% success rate [27].