混合架构

Search documents
唯快不破:上海AI Lab 82页综述带你感受LLM高效架构的魅力
机器之心· 2025-08-25 09:10
作者:孙伟高 上海人工智能实验室 近年来,大语言模型(LLMs)展现出强大的语言理解与生成能力,推动了文本生成、代码生成、问答、翻 译等任务的突破。代表性模型如 GPT、Claude、Gemini、DeepSeek、Qwen 等,已经深刻改变了人机交互方 式。LLMs 的边界也不止于语言和简单问答。随着多模态(VLMs)与推理能力(LRMs)的兴起,LLMs 正 不断扩展到多模态理解、生成与复杂推理场景。 但模型性能持续提升的背后,是模型尺寸、数据规模、RL 推理长度的快速 Scaling,是算力和存储资源的急 剧消耗。大模型的训练与推理的成本居高不下,成为制约其广泛落地和应用的现实瓶颈。 本文从 LLM 架构角度出发,带你剖析大模型的效率秘诀。这一切的核心在于 Transformer 架构。 Transformer 的自注意力机制虽带来了远距离建模的突破,却因 O ( N 2 ) 的复杂度在长序列任务中成本高 昂。 而在 RAG、智能体、长链推理、多模态等新兴场景下,长序列需求愈发突出,进一步放大了效率与性 能之间的矛盾。 同时 Transformer 的 FFN 部分采用密集的 MLP 层,同样面临模型规 ...
专家访谈汇总:小马智行与文远知行高管“互撕”?
阿尔法工场研究院· 2025-06-24 10:14
Group 1: Solid-State Battery Developments - BYD, Guoxuan High-Tech, and FAW Group have successfully launched 60Ah automotive-grade battery cells with an energy density of 350-400Wh/kg, a charging rate of 1C, and a cycle life of 1000 times, ahead of schedule by about six months [1] - By the second half of 2025 to the first half of 2026, solid-state batteries are expected to reach a critical milestone in pilot testing, with equipment debugging and optimization nearing completion, significantly improving technology maturity [1] - With leading companies and the supply chain making strides, breakthroughs in equipment and materials are progressing smoothly, with sulfide electrolyte production surpassing kilometer-level rolls and pressure conditions reduced to 1-2Mpa [1] - By 2026, the price of sulfide electrolytes is projected to drop to 2.5 million per ton, with long-term potential to decrease to several hundred thousand per ton, bringing solid-state battery costs closer to those of liquid batteries [1] - This sets the foundation for large-scale applications of solid-state batteries in low-altitude aircraft, power systems, and robotics, with the market size expected to exceed 100GWh by 2030 [1] Group 2: Orders and Market Recovery - In November 2024, CATL and Leading Intelligent signed an agreement to further expand their cooperation, particularly in core equipment investment for battery cells, with CATL committing to prioritize 50% of new investments for Leading Intelligent [2] - From 2022 to 2024, despite high expectations, actual related transactions have declined, but orders are expected to rebound in Q1 2025, approaching levels seen in 2022-2023, indicating a gradual improvement in overall order conditions [2] - According to company forecasts, orders in 2025 are expected to increase by 20%-30%, reaching 24-26 billion, indicating a recovery trend for Leading Intelligent's orders [2] Group 3: VMware Pricing Controversy - Following Broadcom's acquisition of VMware for $69 billion, VMware implemented significant reforms, notably bundling its product offerings into the VMware Cloud Foundation (VCF) subscription suite, eliminating the previous perpetual licensing model [3] - Many users reported that this reform led to a dramatic increase in VMware product licensing costs, with some experiencing price hikes of 8 to 15 times, compared to purchasing specific products like vSphere or vSAN [3] - Broadcom responded by stating that this is not merely a price increase but a move to help users unlock greater value, highlighting that many customers overlook the comprehensive management, security, and automation features provided by VCF [3] - According to Broadcom's report, 53% of global enterprises prioritize deploying private clouds as a key IT task in the coming years, while 69% are evaluating the feasibility of migrating some workloads back to on-premises environments [3] - IDC's survey indicates that most enterprises maintain a hybrid architecture, with about 60% preferring on-premises IT systems for core workloads, and less than 2% opting for full public cloud adoption [3] Group 4: Technology and Market Competition in Robotaxi - Pony.ai's CTO recently stated that besides Waymo, Pony.ai, and Baidu, other companies have lagged behind in scaling and automation by two and a half years, while WeRide's CFO publicly countered, emphasizing WeRide's progress in practical implementation [4] - According to Grand View Research, the global Robotaxi market is projected to grow from $1.95 billion in 2024 to $43.76 billion by 2030, with Tianfeng Securities predicting it could reach 834.9 billion by 2030 [4] - Pony.ai's technology emphasizes redundancy and safety, utilizing a multi-sensor fusion approach, including LiDAR, cameras, and millimeter-wave radar, and continuously optimizing algorithms through a "shadow mode" [4] - The fleet has covered core areas in major cities and plans to expand to 1,000 vehicles by the end of 2025, with passenger fare revenue increasing by 800% year-on-year [4] - WeRide successfully listed on NASDAQ and earned the title of "Robotaxi First Stock" on October 25, 2024, with a closing market value of $4.491 billion on its first day [4] - This capital competition reflects the strategic intentions behind the technology and market rivalry, indicating that the company that gains an early advantage in the Robotaxi market will secure a favorable position in future market share battles [4] - From 2022 to 2024, Pony.ai's cumulative R&D investment reached $517 million (approximately 3.717 billion RMB), while WeRide's R&D expenses totaled 2.908 billion RMB during the same period [4] - Despite Pony.ai's slightly higher R&D investment, WeRide significantly leads in patent accumulation, having filed 921 patents compared to Pony.ai's 93 [4] - From 2022 to 2024, Pony.ai's main revenue figures were $68.39 million, $71.90 million, and $75.03 million, while WeRide's revenue during the same period was 528 million RMB, 402 million RMB, and 250 million RMB, indicating a significant decline in WeRide's revenue [4] - Both companies exhibit strong financial health, but WeRide faces challenges with decreasing operating cash flow, while Pony.ai has seen a significant decline in investment cash flow [4]
大模型专题:大模型架构创新研究报告
Sou Hu Cai Jing· 2025-06-06 11:38
Core Insights - The report focuses on innovations in large model architectures, particularly addressing the limitations of the Transformer architecture and exploring industry pathways for improvement [1][2][7] - As model sizes increase, the secondary computational complexity of Transformers (O(n²)) leads to significant power consumption and efficiency bottlenecks in processing long sequences, prompting a demand for innovative solutions [1][2][15] - The industry is currently exploring two main paths for architectural breakthroughs: improvements to the Transformer architecture and exploration of non-Transformer architectures [1][2][7] Transformer Architecture Improvements - Improvements to the Transformer architecture focus on optimizing the Attention mechanism, Feed-Forward Network (FFN) layers, and normalization layers [1][2][18] - Techniques such as sparse attention and dynamic attention are being developed to enhance computational efficiency, while Mixture of Experts (MoE) aims to improve sparse connection efficiency in FFN layers [1][2][18] - LongRoPE and other technologies are enhancing positional encoding to better model long sequences [1][2][18] Non-Transformer Architecture Exploration - Non-Transformer architectures include new types of RNNs (e.g., RWKV, Mamba) and CNNs (e.g., Hyena Hierarchy), as well as other innovative architectures like RetNet and LFM [1][2][7] - RWKV optimizes state evolution through a generalized Delta Rule, while Mamba leverages state space models to enhance training efficiency [1][2][7] - RetNet combines state space and multi-head attention to achieve parallel computation [1][2][7] Industry Trends and Future Directions - The industry is witnessing a trend towards hybrid architectures that combine linear Transformers with non-Transformer architectures, balancing performance and efficiency [2][7] - The current phase is characterized by a peak in traditional Transformer paradigms and an impending wave of architectural innovations, with significant focus on new RNN/CNN theoretical breakthroughs and practical engineering optimizations [2][7] - Companies like ByteDance and Alibaba are accelerating their investments in hybrid architectures, driving the evolution of large models towards higher efficiency and lower energy consumption [2][7]
Z Research|我们距离Agent的DeepSeek时刻还有多远(AI Agent 系列二)
Z Potentials· 2025-06-06 02:44
Core Insights - The article discusses the evolution and differentiation of AI Agents, emphasizing the need to distinguish between genuine innovative companies and those merely capitalizing on the concept of AI Agents [1][19][22]. Group 1: AI Agent Framework - The operation of AI Agents is broken down into three layers: perception, decision-making, and execution, highlighting the importance of each layer in the overall functionality of AI Agents [10][14][15]. - The "white horse is not a horse" concept is introduced to analyze the diversity of AI Agents in the market, categorizing them into pure, neutral, and free forms based on their operational characteristics [17][18]. Group 2: Technological Evolution - The article identifies the internalization of Agentic capabilities as a necessary evolution for LLMs, with examples from OpenAI's o4-mini and Anthropic's Claude 4 showcasing different design philosophies [30][38]. - Engineering integration is increasingly contributing to model capabilities, with tools like Prompt Engineering revealing significant potential for Agent products [2][31]. Group 3: Multi-Agent Systems - The limitations of Single-Agent systems are discussed, including memory constraints and the complexity of tool interactions, leading to the conclusion that Multi-Agent systems are becoming essential for overcoming these challenges [79][80]. - Multi-Agent architectures offer advantages in complexity, robustness, and scalability, allowing for parallel exploration of solutions and improved adaptability to human collaboration [82][83]. Group 4: Future Directions - The article suggests that the future of AI Agents will involve a competition between "experience universality" and "deep reliability," with hybrid architectures likely becoming a common choice [40][41]. - The emergence of protocols like MCP (Model Context Protocol) and A2A (Agent-to-Agent Protocol) is highlighted as a significant development in facilitating communication and tool integration among AI Agents [61][70].