Workflow
ResNet(残差网络)
icon
Search documents
大模型淘汰赛下半场,阶跃的底牌是什么?
虎嗅APP· 2026-01-26 10:26
Core Viewpoint - The article highlights the emergence of new players in the large model sector, particularly focusing on Jieyue Xingchen, which has recently appointed a new chairman and secured over 5 billion yuan in Series B+ funding, setting a record for single financing in the past 12 months [2][3]. Group 1: Company Developments - Jieyue Xingchen's recent funding is seen as a significant bet on the next phase of large model commercialization, especially as few companies have managed to secure substantial financing in the past year [2]. - The appointment of Inqi as chairman is expected to enhance the company's strategic direction, given his extensive experience and connections within the industry [11][12]. Group 2: Business Model and Strategy - Jieyue Xingchen distinguishes itself by focusing on the "AI + terminal" closed-loop model, integrating large model capabilities directly into hardware products like cars and smartphones, rather than merely selling models or APIs [5][9]. - The company has established partnerships with major automotive brands, embedding its technology into smart vehicle systems, which has resulted in significant sales, such as the Galaxy M9 model achieving nearly 40,000 units sold within three months of launch [5][7]. - In the smartphone sector, Jieyue Xingchen has collaborated with 60% of leading domestic brands, indicating a strong market presence [7]. Group 3: Competitive Landscape - The article notes a clear differentiation among companies in the large model space, with Jieyue Xingchen's path being more straightforward compared to others that are either focusing on enterprise privatization or consumer applications [3][9]. - The current competitive environment is shifting from a focus on model capabilities to the ability to implement these models in real-world applications, with Jieyue Xingchen positioned to capitalize on this trend [19]. Group 4: Talent and Leadership - The leadership team at Jieyue Xingchen, including CEO Jiang Daxin and key figures like Zhang Xiangyu and Zhu Yibo, brings a wealth of experience in AI and system engineering, which is crucial for the company's success in the AI + terminal strategy [13][14][16]. - The combination of technical expertise and practical experience within the team is seen as a significant competitive advantage, enabling the company to effectively translate AI capabilities into marketable products [16].
梁文锋DeepSeek新论文!接棒何恺明和字节,又稳了稳AI的“地基”
Xin Lang Cai Jing· 2026-01-02 05:27
Core Insights - DeepSeek has introduced a new architecture called mHC (Manifold-Constrained Hyper-Connections), which significantly improves the residual connection component of the Transformer architecture, a foundational element that has seen little change since its inception in 2015 [1][3] Group 1: Historical Context - The evolution of neural network architectures began with ResNet, introduced by Kaiming He in 2015, which addressed the vanishing gradient problem and enabled the training of very deep networks [3] - The Transformer model, released in 2017, adopted residual connections as a standard feature, forming the basis for many leading models today [3] Group 2: Technical Comparisons - Hyper-Connections, proposed by ByteDance in 2024, expanded the single residual flow into multiple parallel streams, enhancing model performance but introducing stability issues during training [5][10] - mHC aims to resolve the stability problems associated with Hyper-Connections by constraining the connection weight matrix within a specific mathematical space, ensuring that signal amplification does not occur [10][12] Group 3: Mathematical Innovation - The core innovation of mHC involves using a Doubly Stochastic Matrix for the connection weights, which guarantees that the output does not exceed the maximum input value, thus preserving energy conservation [10][12] - The implementation of mHC utilizes the Sinkhorn-Knopp algorithm to achieve the desired matrix properties efficiently, allowing for end-to-end training without introducing new hyperparameters [11][12] Group 4: Engineering Excellence - DeepSeek's approach to implementing mHC demonstrates significant engineering capabilities, including the development of custom CUDA kernels and operator fusion techniques to minimize computational delays [16] - The ability to integrate innovative mathematical solutions into practical training environments highlights DeepSeek's competitive advantage in the AI research landscape [16]