CogVideoX - filings, earnings calls, financial reports, news

CogVideoX

Search documents

Xin Lang Cai Jing· 2026-01-08 10:17

但是，在营收快速增长的同时，智谱华章的业务结构问题已经浮现。其以本地部署为营收支柱，近年云端部署业务增速较快，不过，云端业务因竞争激烈导致单价下行，叠加算力服务费大涨，一年成本超过 15亿元，拖累毛利率显著下降，盈利能力削弱。来源：新财富杂志智谱华章拿下"全球大模型第一股"的桂冠，以发行估值计算，参与其三轮14次融资的投资机构可赚得平均3.4倍的浮盈，包括美团、蚂蚁集团、腾讯投资等。上市后叠加股价上涨的乘数效应，投资机构获利将水涨船高。随着业务量放大，智谱华章亏损亦持续扩大，2024年亏损近30亿元。到2055年6月末，其负债攀升至 112.52亿元，净资产为-61.51亿元。上市融资43亿港元后，其债务压力将得以缓解，但仍需要持续输血。造血拐点时刻的不明朗，是否会影响其后续估值？来源：市场资讯市值579亿港元！全球大模型第一股智谱上市！雷军、徐新押中暴赚，美团浮盈5.7倍，蚂蚁、腾讯3 倍，但负债112亿敲响警钟 | 原创 2026年1月8日，港股迎来"全球大模型第一股"智谱华章（02513.HK）。其开盘后股价一度跌破发行价，但很快拉升，最终收盘定格在131.5港元/股，收涨13.17 ...

KNOWLEDGE ATLAS(HK:02513)

3 6 Ke· 2026-01-08 09:31

2026年1月8日，成立六年的智谱AI，正式在香港联合交易所挂牌上市。智谱首日开盘价120港元/股，市值528.28亿港元。在智谱本次IPO发行中，香港公开发售获1159.46倍认购，国际发售获15.28倍认购。以每股116.20港元的发行价计算，智谱本次IPO募资总额超43亿港元。智谱上市，图片来源：智谱一边是敲钟带来的确定性，一家大模型公司终于完成了从长期潜伏到接受市场公开定价的身份转换；另一边，则是围绕商业模式、持续亏损、技术路径与市场想象力的集中审视。究竟该如何在极具不确定性的大模型范式下，看待一家极其追求稳定性与可预期、可控性的中国上市公司？我们想通过四个问题来分析智谱： 1.智谱是"中国OpenAI"吗？ 2.投资人们在押注什么？ 3.智谱的商业模式究竟健不健康？ 4.上市后的智谱，"钱路"在何方？不过，几乎所有讨论的起点，都不可避免地回到了那个被反复提及又具有一定争议的标签——"中国OpenAI"。这个标签，既是智谱过去数年最重要的叙事助推器，也正在成为其上市之后必须主动面对、甚至卸下的认知负担。今天的智谱，站在了一个极其复杂的交叉点上。智谱是"中国OpenAI"吗？ "中 ...

KNOWLEDGE ATLAS(HK:02513)

Sou Hu Cai Jing· 2026-01-08 08:42

智谱从来不是，也不必成为"中国OpenAI"。作者｜刘杨楠编辑｜王博 2026年1月8日，成立六年的智谱AI，正式在香港联合交易所挂牌上市。在智谱本次IPO发行中，香港公开发售获1159.46倍认购，国际发售获15.28倍认购。以每股116.20港元的发行价计算，智谱本次IPO募资总额超43亿港元。智谱上市，图片来源：智谱今天的智谱，站在了一个极其复杂的交叉点上。究竟该如何在极具不确定性的大模型范式下，看待一家极其追求稳定性与可预期、可控性的中国上市公司？我们想通过四个问题来分析智谱：一边是敲钟带来的确定性，一家大模型公司终于完成了从长期潜伏到接受市场公开定价的身份转换；另一边，则是围绕商业模式、持续亏损、技术路径与市场想象力的集中审视。 1.智谱是"中国OpenAI"吗？智谱首日开盘价120港元/股，市值528.28亿港元。 2.投资人们在押注什么？ 3.智谱的商业模式究竟健不健康？ 4.上市后的智谱，"钱路"在何方？不过，几乎所有讨论的起点，都不可避免地回到了那个被反复提及又具有一定争议的标签——"中国OpenAI"。这个标签，既是智谱过去数年最重要的叙事助推器，也正在成为其上市之 ...

KNOWLEDGE ATLAS(HK:02513)

智谱（02513）：从清华实验室到港股AI新贵，关注模型迭代与生态飞轮

Soochow Securities· 2026-01-07 13:06

从清华实验室到港股 AI 新贵，关注模型迭代与生态飞轮投资评级（暂无） | 项目[Table_EPS] | 2023A | 2024A | 2025E | 2026E | 2027E | | --- | --- | --- | --- | --- | --- | | 营业总收入(百万元) | 124.54 | 312.41 | 785.00 | 1,550.00 | 3,219.00 | | 同比（%） | 116.93 | 150.86 | 151.27 | 97.45 | 107.68 | | 归母净利润(百万元) | -787.96 | -2,956.49 | -4,563.36 | -3,320.10 | -2,512.06 | | 同比（%） | -449.58 | -275.21 | -54.35 | 27.24 | 24.34 | | Non-GAAP 净利润（百万元） | -621.00 | -2,465.60 | -3,386.96 | -2,801.20 | -2,462.06 | | 同比（%） | -537.58 | -297.04 | -37.37 | 17.29 | 12.11 ...

KNOWLEDGE ATLAS(HK:02513)

Sou Hu Cai Jing· 2025-12-30 04:12

机器之心编辑部 AI 大新闻，一桩接一桩。早上刚传来 Manus 被 Meta 收购的消息，很快，围绕「全球大模型第一股」的竞速，也传来靴子落地的声响。 12 月 30 日，北京智谱华章科技股份有限公司（以下简称「智谱」）正式启动港股招股。招股期将持续至 2026 年 1 月 5 日，并计划于 2026 年 1 月 8 日以股票代码 "2513" 在香港联交所主板挂牌上市。根据招股安排，智谱拟进行全球发售 3741.95 万股 H 股，其中香港公开发售 187.1 万股 H 股，国际发售 3554.85 万股 H 股。 IPO 的定价与募资规模也随之揭晓 —— 每股发行价定为 116.20 港元。在扣除相关发行费用后，预计本次募资规模约 43 亿港元，对应的 IPO 市值预计将超过 511 亿港元。公开信息显示，智谱在私募市场的累计融资额已达 83.44 亿元，最新估值攀升至 243.77 亿元。这意味着，在迈向上市的关键一跃中，智谱的市值几乎实现翻倍，如此幅度的「溢价上市」，也是一次难度不低的市场挑战。基石投资者阵容同样颇为亮眼。公告显示，基石投资者合计拟认购 29.8 亿港元，占本次发行规模 ...

智谱定档大模型第一股，1月8日挂牌上市，IPO预募资43亿港元

量子位· 2025-12-30 03:57

Core Viewpoint - Zhipu AI, known as the "Chinese version of OpenAI," is set to become the world's first publicly listed large model company, with its IPO scheduled for January 8, 2026, on the Hong Kong Stock Exchange under the stock code 2513 [7][8]. Group 1: IPO Details - Zhipu AI has officially launched its IPO, aiming to raise approximately HKD 4.3 billion, with a post-listing market valuation expected to exceed HKD 51.1 billion [3][11]. - The IPO will issue a total of 37,419,500 H-shares, with 1,871,000 shares available for public sale in Hong Kong and 35,548,500 shares for international sale [6][10]. - The offering price is set at HKD 116.20 per share, with the subscription period running from December 30, 2025, to January 5, 2026 [9][11]. Group 2: Financial Performance - Zhipu AI has achieved significant revenue growth, with revenues of RMB 57.4 million, RMB 124.5 million, and RMB 312.4 million from 2022 to 2024, representing a compound annual growth rate of 130% [27]. - The company reported a revenue of RMB 191 million in the first half of 2025, marking a year-on-year increase of 325% [27]. - Zhipu AI has maintained a gross margin above 50% over the past three years, outperforming the industry average of approximately 40% [31][32]. Group 3: R&D Investment - The company has heavily invested in research and development, with R&D expenses rising to RMB 844 million, RMB 5.289 billion, and RMB 21.954 billion from 2022 to 2025, indicating a significant commitment to innovation [35]. - At its peak, R&D spending reached eight times the company's revenue for the period, highlighting the financial pressures associated with high R&D costs in the AI sector [36]. Group 4: Market Position and Strategy - Zhipu AI has established a strong market presence, with over 12,000 enterprise clients and more than 80 million end-user devices powered by its models [26]. - The company has successfully implemented a MaaS (Model as a Service) business model, which has attracted over 270,000 enterprises and application developers in China, with nine of the top ten internet companies utilizing its models [25][26]. - The latest flagship model, GLM-4.7, has achieved top rankings in various AI performance evaluations, further solidifying Zhipu's position in the competitive landscape [18][19]. Group 5: Founding and Leadership - Zhipu AI was founded in 2019, originating from Tsinghua University's technology transfer, with a leadership team composed of experts from the university's Knowledge Engineering Laboratory [41][53]. - The CEO, Zhang Peng, and Chief Scientist, Tang Jie, are both prominent figures in the AI research community, contributing to the company's technological advancements and strategic direction [46][51].

AAAI 2026｜教会视频扩散模型「理解科学现象」：从初始帧生成整个物理演化

机器之心· 2025-11-15 01:37

Core Insights - The article discusses the limitations of existing video generation models like Stable Diffusion and CogVideoX in accurately simulating scientific phenomena, highlighting their tendency to produce physically implausible results [2][3] - A new framework proposed by a research team from Dongfang University and Shanghai Jiao Tong University aims to enable video diffusion models to learn "latent scientific knowledge," allowing them to generate scientifically accurate video sequences from a single initial frame [3][4] Methodology - The proposed method consists of three main steps: latent knowledge extraction, pseudo-language prompt generation, and knowledge-guided video generation [8] - The first step involves extracting "latent scientific knowledge" from a single initial image, which is crucial for inferring subsequent dynamic evolution [9] - The second step generates pseudo-language prompts by leveraging the CLIP model's cross-modal alignment capabilities, allowing the model to "understand" the underlying structural rules in the initial image [10] - The third step integrates these pseudo-language prompts into existing video diffusion models, enabling them to simulate scientific phenomena while adhering to physical laws [11] Experimental Results - The research team conducted extensive experiments using fluid dynamics simulation data and real typhoon observation data, demonstrating that the new model generates videos that are not only visually superior but also more scientifically accurate [13][18] - The model was tested on various fluid simulation scenarios, including Rayleigh-Bénard Convection, Cylinder Flow, DamBreak, and DepthCharge, as well as real satellite data from four typhoon events [13][18] - Quantitative evaluations showed significant improvements in physical consistency metrics, with the new model outperforming mainstream methods in all tested scenarios [18] Future Implications - This research represents a meaningful exploration of generative AI in scientific modeling, suggesting that AI can evolve from merely visual generation to understanding and simulating physical processes [19][20] - The potential applications of this technology could extend to meteorological forecasting, fluid simulation, and Earth system modeling, positioning AI as a valuable tool for scientists [20]

生成式AI

物理直觉

人工智能

Stable Video Diffusion

Stable Video Diffusion

CogVideoX

妙笔生维：线稿驱动的三维场景视频自由编辑

机器之心· 2025-08-19 02:43

Core Viewpoint - The article discusses the development of Sketch3DVE, a novel method for 3D scene video editing that allows users to manipulate videos using simple sketches, enhancing creativity and personalization in video content creation [3][22]. Part 1: Background - Recent advancements in video generation models have significantly improved text-to-video and image-to-video generation, with a focus on precise control over camera trajectories due to its important application prospects [6]. - Existing methods for video editing are categorized into two types: one directly uses camera parameters as model inputs, while the other constructs explicit 3D representations from single images to render new perspective images [8][9]. - Despite these advancements, editing real videos with significant camera motion remains a challenge, as video editing requires maintaining original motion patterns and local features while synthesizing new content [8][9]. Part 2: Algorithm Principles - Users begin by selecting the first frame of a 3D scene video, marking the editing area with a mask and drawing a sketch to specify the geometry of new objects [12]. - The system employs the MagicQuill image editing algorithm to process the first frame, generating the edited result, and utilizes the DUSt3R algorithm for 3D reconstruction to analyze the entire input video [13]. - A 3D mask propagation algorithm is designed to accurately transfer the mask from the first frame to subsequent frames, ensuring consistency across different perspectives [14]. - The final video generation model integrates edited images, multi-view videos, and original input videos to produce a scene-edited video with precise 3D consistency [14]. Part 3: Effect Demonstration - The method allows users to create high-quality 3D scene video edits, enabling operations such as adding, removing, and replacing objects while maintaining good 3D consistency [16]. - The approach can handle complex scenarios involving shadows and reflections, producing reasonable editing results due to training on real video datasets [17]. - Users can also edit the first frame using image completion methods, demonstrating the versatility of the system in generating realistic 3D scene video edits [19]. - Sketch3DVE offers an effective solution to traditional model insertion challenges, allowing for personalized 3D object generation and high-fidelity scene video editing without requiring extensive expertise [22].

AI生成视频总不符合物理规律？匹兹堡大学团队新作PhyT2V：不重训练模型也能让物理真实度狂飙2.3倍！

机器之心· 2025-05-19 04:03

Core Viewpoint - The article discusses the advancement of Text-to-Video (T2V) generation technology, emphasizing the transition from focusing on visual quality to ensuring physical consistency and realism through the introduction of the PhyT2V framework, which enhances existing T2V models without requiring retraining or extensive external data [2][3][26]. Summary by Sections Introduction to PhyT2V - PhyT2V is a framework developed by a research team at the University of Pittsburgh, aimed at improving the physical consistency of T2V generation by integrating large language models (LLMs) for iterative self-refinement [2][3][8]. Current State of T2V Technology - Recent T2V models, such as Sora, Pika, and CogVideoX, have shown significant progress in generating complex and realistic scenes, but they struggle with adhering to real-world physical rules and common sense [5][7]. Limitations of Existing Methods - Current methods for enhancing T2V models often rely on data-driven approaches or fixed physical categories, which limits their generalizability, especially in out-of-distribution scenarios [10][12][18]. PhyT2V Methodology - PhyT2V employs a three-step iterative process involving: 1. Identifying physical rules and main objects from user prompts [12]. 2. Detecting semantic mismatches between generated videos and prompts using video captioning models [13]. 3. Generating corrected prompts based on identified physical rules and mismatches [14] [18]. Advantages of PhyT2V - PhyT2V offers several advantages over existing methods: - It does not require any model structure modifications or additional training data, making it easy to implement [18]. - It provides a feedback loop for prompt correction based on real generated results, enhancing the optimization process [18]. - It demonstrates strong cross-domain applicability, particularly in various physical scenarios [18]. Experimental Results - The framework has been tested on multiple T2V models, showing significant improvements in physical consistency (PC) and semantic adherence (SA) scores, with the CogVideoX-5B model achieving up to 2.2 times improvement in PC and 2.3 times in SA [23][26]. Conclusion - PhyT2V represents a novel, data-independent approach to T2V generation, ensuring that generated videos comply with real-world physical principles without the need for additional model retraining, marking a significant step towards creating more realistic T2V models [26].

Physical Consistency

Text-to-Video (T2V) Generation

Chain-of-Thought (CoT)

Backtracking Reasoning

Artificial Intelligence

PhyT2V

Physical Consistency

Text-to-Video (T2V) Generation

Chain-of-Thought (CoT)

Backtracking Reasoning

Artificial Intelligence

PhyT2V

ICML 2025 | 视频生成模型无损加速两倍，秘诀竟然是「抓住attention的时空稀疏性」

机器之心· 2025-05-07 07:37

Core Viewpoint - The article discusses the rapid advancement of AI video generation technology, particularly focusing on the introduction of Sparse VideoGen, which significantly accelerates video generation without compromising quality [1][4][23]. Group 1: Performance Bottlenecks in Video Generation - Current state-of-the-art video generation models like Wan 2.1 and HunyuanVideo face significant performance bottlenecks, requiring over 30 minutes to generate a 5-second 720p video on a single H100 GPU, with the 3D Full Attention module consuming over 80% of the inference time [1][6][23]. - The computational complexity of attention mechanisms in Video Diffusion Transformers (DiTs) increases quadratically with resolution and frame count, limiting real-world deployment capabilities [6][23]. Group 2: Introduction of Sparse VideoGen - Sparse VideoGen is a novel acceleration method that does not require retraining existing models, leveraging spatial and temporal sparsity in attention mechanisms to halve inference time while maintaining high pixel fidelity (PSNR = 29) [4][23]. - The method has been integrated with various state-of-the-art open-source models and supports both text-to-video (T2V) and image-to-video (I2V) tasks [4][23]. Group 3: Key Design Features of Sparse VideoGen - Sparse VideoGen identifies two unique sparsity patterns in attention maps: spatial sparsity, focusing on tokens within the same and adjacent frames, and temporal sparsity, capturing relationships across different frames [10][11][12]. - The method employs a dynamic adaptive sparse strategy through online profiling, allowing for optimal combinations of spatial and temporal heads based on varying denoising steps and prompts [16][17]. Group 4: Operator-Level Optimization - Sparse VideoGen introduces a hardware-friendly layout transformation to optimize memory access patterns, enhancing the performance of temporal heads by ensuring tokens are stored contiguously in memory [20][21]. - Additional optimizations for Query-Key Normalization (QK-Norm) and Rotary Position Embedding (RoPE) have resulted in significant throughput improvements, with average acceleration ratios of 7.4x and 14.5x, respectively [21]. Group 5: Experimental Results - Sparse VideoGen has demonstrated impressive performance, reducing inference time for HunyuanVideo from approximately 30 minutes to under 15 minutes, and for Wan 2.1 from 30 minutes to 20 minutes, while maintaining a PSNR above 29dB [23]. - The research indicates that understanding the internal structure of video generation models may lead to more sustainable performance breakthroughs compared to merely increasing model size [24].

AI Video Generation

Spatial-Temporal Sparsity

Artificial Intelligence

Spatial-Temporal Sparsity

Artificial Intelligence