拆开“超节点”的伪装：没有内存统一编址，仍是服务器堆叠

Core Insights - The AI industry is shifting from merely stacking server hardware to a system-level competition, focusing on underlying computing architectures as the demand for trillion-parameter multimodal models becomes the norm [1] Group 1: The Rise of Supernodes - "Supernodes" have emerged as a new trend in the computing industry, with over ten domestic companies launching their versions, although many are merely repackaged traditional server stacks [2] - The concept of "supernodes" is often misrepresented, as many do not achieve the critical technical requirement of "unified memory addressing," leading to concerns about authenticity [2] Group 2: Communication Barriers - The need for supernodes arises from the "communication wall," which limits computational efficiency in large model training due to increased communication frequency and latency [3] - Three main barriers are identified: the communication wall, power and cooling wall, and complexity wall, all of which hinder the performance of traditional cluster architectures in the context of large models [3] Group 3: Technical Challenges - The traditional cluster architecture follows a "storage-compute separation" principle, leading to significant delays in data transfer between GPUs, which is inefficient for large model training [6][10] - The process of data transfer involves multiple steps that introduce latency, making it unsuitable for the high-frequency synchronization required in large model training [10] Group 4: Unified Memory Addressing - Unified memory addressing is crucial for breaking the communication wall, allowing for a global virtual address space where all memory resources are accessible without the overhead of traditional data transfer methods [12] - Achieving unified memory addressing requires advancements in communication protocols and cache coherence, which are currently lacking in many so-called "supernodes" [13][19] Group 5: Value of Supernodes - Unified memory addressing has proven to provide significant benefits in practical applications, such as model training, where it allows for better memory management and increased utilization of computational resources [20][23] - In model inference, unified memory addressing enables global pooling of key-value caches, improving throughput performance significantly [26] - For recommendation systems, it reduces communication delays and enhances efficiency by allowing direct memory access across nodes [30] Group 6: Conclusion - The competition in AI infrastructure has evolved from simple hardware stacking to a focus on architectural design, with unified memory addressing being a key capability for next-generation computing paradigms [31]