量子位 - filings, earnings calls, financial reports, news

量子位

Search documents

量子位· 2025-12-21 14:13

Core Insights - The article discusses the increasing focus on the ability of inference systems to handle real-world loads as agents accelerate on the application side [1][4] - The SGLang AI financial meetup highlighted engineering challenges in inference systems, including high concurrency requests, long context windows, multi-turn reasoning, memory management, and consistency generation in financial agent scenarios [4][9] Group 1: Inference System Engineering Solutions - The SGLang event, co-hosted with AtomGit, focused on large model inference architecture, agents, reinforcement learning, and their application in finance [7] - Key participants included engineering teams from inference systems, models, and computing power, emphasizing the higher demands for efficiency in high concurrency, long context windows, multi-turn reasoning, and memory management for agents compared to traditional LLMs [8] - Specific deployment scenarios, such as financial agents, have stricter requirements for low latency, response stability, consistency, and cost control [9] Group 2: Technical Innovations and Implementations - SGLang introduced the HiCache system to address issues of KV cache redundancy and high memory demand in high concurrency and long context scenarios, significantly reducing memory usage and improving inference stability and throughput [11] - For mixed models like Qwen3-Next and Kimi Linear, SGLang implemented Mamba Radix Tree for unified prefix management and Elastic Memory Pool for efficient inference and memory optimization in long context and high concurrency scenarios [13] - The Mooncake system, based on Transfer Engine, significantly reduced weight loading and model startup times, achieving weight update preparation in under 20 seconds and cold start times from 85 seconds to 9 seconds [17] Group 3: Collaboration with Ascend Platform - The capabilities of the inference systems are not limited to a specific computing platform, as HiCache, Mooncake, and GLM can run directly on the Ascend platform, indicating a shift in Ascend's role in the inference system ecosystem [24][25] - SGLang's latest advancements on the Ascend platform include model adaptation, performance optimization, and modular acceleration capabilities, achieving a throughput of 15 TPS per card for DeepSeek V3.2 under specific conditions [29] - System-level optimizations included load balancing, operator fusion to reduce memory access, and multi-stream parallel execution to enhance resource utilization [30][31] Group 4: Future Directions and Open Source Commitment - Ascend's collaboration with SGLang aims to fully embrace open source and accelerate ecosystem development, having completed gray testing of DeepSeek V3.2 in real business scenarios [46] - Future developments will focus on systematic engineering investments around inference systems, enhancing throughput for high concurrency and low latency workloads, and aligning with open-source engines for model deployment and performance tuning [47] - The integration of models, inference engines, and computing platforms into a stable collaborative framework will shift the focus from whether a model can run to whether the system can run sustainably and at scale [47]

大模型推理效率

Artificial Intelligence

昇腾

SGLang

大模型推理效率

Artificial Intelligence

昇腾

SGLang

自变量王潜：具身智能是物理世界的独立基础模型｜MEET2026

量子位· 2025-12-21 05:45

Core Viewpoint - The embodiment intelligence model is considered an independent foundational model parallel to language and multimodal models, specifically designed for the physical world [6][12][61] Group 1: Differences Between Physical and Virtual Worlds - The fundamental differences between the physical and virtual worlds are recognized, with the physical world characterized by continuity, randomness, and processes related to force, contact, and timing [2][10] - Existing models based on language and visual paradigms are structurally misaligned with the complexities of the physical world [3][21] Group 2: Need for a Separate Foundational Model - A separate foundational model is necessary due to the significant randomness in the physical world, which existing models struggle to accurately represent [10][17] - The current reliance on multimodal models for embodiment intelligence is seen as inadequate, necessitating a complete rethinking of model architecture and training methods [9][21] Group 3: Future of Multimodal Models - Shifting perspectives on embodiment intelligence will lead to new insights in model architecture and data utilization [24][30] - The learning processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models must adapt to these differences [25][28] Group 4: Scaling Laws and Data Utilization - The concept of Scaling Law is crucial in the development of large models, particularly in robotics, where data sourcing remains a significant challenge [47][49] - A phased approach to training and data collection is recommended, emphasizing the importance of real-world data for effective learning [52][53] Group 5: Hardware and AI Integration - A new learning paradigm necessitates the redesign of hardware in the physical world, advocating for AI to define hardware rather than the other way around [54][55] - The potential for embodiment intelligence to drive exponential growth in resources and capabilities is highlighted, drawing parallels to historical industrial advancements [60][61]