算力成本
Search documents
一天涌入120亿,开年暴涨40%!AI板块的主动权益基金“赚”翻了?
Sou Hu Cai Jing· 2026-01-14 11:05
Core Viewpoint - The significant inflow of over 12 billion yuan into the Debon Stable Growth Fund on January 12 has raised market attention, indicating a potential rapid expansion of the fund's management scale from 724 million yuan as of September last year [1][2]. Fund Management Response - Debon Fund stated that the exact scale data for the day would be confirmed after end-of-day settlement, and announced subscription limits for A and C class shares starting January 14, with limits set at 100,000 yuan and 10,000 yuan respectively [1][2]. Performance of AI Sector - The Debon Stable Growth Fund has seen a net value increase of 29.48% since the beginning of 2025, with its top ten holdings concentrated in the AI application sector [2][4]. - Other funds heavily invested in AI applications, such as Shenwan Lixin Ledao and Xibu Lide Technology Innovation, have also reported net value increases of nearly 40% and over 30% respectively [2][3]. Fund Limitations and Market Trends - In response to rapid inflows, several fund companies have implemented subscription limits on high-performing AI-themed products, including Yongying Fund's limit of 1 million yuan per account for the Yongying Information Industry Select Fund [4]. - The trend of limiting subscriptions is observed across various funds focusing on information transmission, software, and IT services [4]. Future Investment Opportunities - Investment managers are focusing on the evolution of large models and related technologies that enhance model capabilities and efficiency, particularly in the context of AI applications and data advantages [5].
DeepSeek开源Engram,如何做到推理损失仅3%?
Tai Mei Ti A P P· 2026-01-13 08:44
Core Insights - DeepSeek has launched a new module called Engram, which focuses on conditional memory for large language models, aiming to enhance efficiency and reduce computational costs [1][4] - The company emphasizes innovation in architecture and methodology to break through the constraints of computational costs, with Engram representing a restructuring of memory storage at the architectural level [4][6] Group 1: Engram Module - Engram is designed as a differentiable, trainable component that separates memory load from the main computation, allowing for efficient retrieval of frequently occurring knowledge [4][6] - The module utilizes deterministic retrieval based on N-grams and hash mapping to access vectors from a large static embedding table, significantly speeding up the process without complex neural computations [4][6] Group 2: Memory Functionality - Engram incorporates a lightweight gating mechanism to determine the appropriateness of retrieved memory for the current context, enhancing both memory retention and output coherence [6] - The architecture divides the model's capabilities into three independent yet collaborative dimensions: model depth for logical reasoning, computational sparsity represented by MoE, and storage sparsity introduced by Engram [6][7] Group 3: Performance and Future Developments - Testing indicates that even with a memory bank of up to 100 billion parameters, the inference throughput loss remains below 3% [7] - DeepSeek plans to release its latest V4 model around the Chinese New Year, which is expected to significantly improve performance in handling complex tasks and coding capabilities, potentially surpassing competitors like Anthropic [7]
“AI教母”,公布最新世界模型
财联社· 2025-10-17 12:28
Group 1 - The article discusses the launch of a new real-time interactive 3D world model called RTFM (Real-Time Frame Model) developed by World Labs, founded by AI expert Fei-Fei Li. The model is designed around three key principles: efficiency, scalability, and durability, allowing it to run on a single H100 GPU to render persistent and consistent 3D worlds [2] - World Labs emphasizes that as world model technology advances, the demand for computing power will increase significantly, surpassing the current requirements of large language models (LLMs). To achieve 4K+60FPS interactive video streaming, traditional video architectures need to generate over 100,000 tokens per second, which is economically unfeasible with current computing infrastructure [2] - The article highlights a strategic partnership between OpenAI and Broadcom to deploy a 10-gigawatt AI accelerator, which is expected to create a diversified computing power system for OpenAI, reducing reliance on a single supplier and driving down computing costs through competition [3] Group 2 - The phenomenon known as "Jevons Paradox" is noted, where advancements in AI model technology that improve computing efficiency can lead to an overall increase in the total consumption of computing resources. For instance, the DeepSeek R1 model, released earlier this year, demonstrates strong AI performance but is expected to increase the demand for computing resources [4] - World Labs previously released the Marble model, which generates 3D worlds from a single image or text prompt, showcasing improved geometric structures and diverse styles compared to its predecessor. Fei-Fei Li has stated that the significance of world models lies in their ability to understand and reason about both textual information and the physical world's operational laws [4] - Companies across the AI and terminal sectors are increasingly investing in world models, with xAI hiring experts from NVIDIA and competitors like Meta and Google also focusing on this area. In China, robotics firms such as Yushu and Zhiyuan have open-sourced their world models [4] Group 3 - Dongwu Securities notes that as computing power becomes cheaper and more accessible, developers will set more complex models and systems as new benchmarks, increasing parameters, context, and parallelism. While model architecture iterations may reduce the computing power required for single inference and training, models like Genie3 that generate videos may require a significant increase in computing power to meet demands [5] - The higher ceiling for AI computing power and improved competitive landscape are expected to support a higher valuation framework for AI computing compared to 4G/5G, along with a stronger Beta [5]
26天倒计时:OpenAI即将关停GPT-4.5Preview API
3 6 Ke· 2025-06-18 07:34
Core Insights - OpenAI announced the removal of the GPT-4.5 Preview API effective July 14, which will impact developers who have integrated it into their products [2][3] - The removal was planned since the release of GPT-4.1 in April, and GPT-4.5 was always considered an experimental product [5] - OpenAI is focusing on promoting more scalable and cost-effective models, as evidenced by the recent 80% price reduction of the o3 API [8] Pricing and Cost Considerations - The pricing for GPT-4.5 API was significantly high at $75 per million input tokens and $150 per million output tokens, making it commercially unviable [6] - The cost of NVIDIA H100 GPUs, approximately $25,000, and their high power consumption further complicate the financial feasibility of maintaining such models [6] Strategic Implications - The rapid exit of GPT-4.5 highlights the challenges of model iteration speed and external computing costs as critical factors for OpenAI's business model [11] - OpenAI's strategy appears to be consolidating resources towards models that offer better scalability and cost control, while discontinuing less successful or ambiguous products [8]
对话红帽全球副总裁曹衡康:AI成本下降了 芯片的量一定会起来
Mei Ri Jing Ji Xin Wen· 2025-06-14 09:02
Core Viewpoint - The consensus in the industry is that the cost of computing power will eventually decrease, but there is no unified path chosen among data centers, integrated machines, or inference servers [1] Group 1: AI Inference Year - 2023 is considered the year of AI inference, marking the official launch of AI applications that will generate business revenue and internal cost control for enterprises [1] - Red Hat has chosen to adopt the "vLLM" framework, a high-performance large language model inference framework that has become a de facto standard in the open-source community [1] Group 2: Contribution and Market Potential - Contributors from China account for 35% of the contributions to the vLLM community, indicating a strong potential for inference technology to bring enterprise value in China [1] - The company identifies two technical challenges in inference: achieving high-performance inference with minimal hardware and cost, and distributing inference workloads across multiple servers [1] Group 3: Future of Computing Power Costs - Red Hat plans to launch inference servers in 2025, emphasizing that the main advantage is the reduction of computing power costs for enterprises [2] - The company does not produce hardware but focuses on software solutions, aiming to lower the barriers for AI adoption among businesses [2] - As computing costs decrease, the demand for GPU cards is expected to rise significantly, potentially increasing the number of enterprises using AI from 1,000 to 100,000 or even 1 million [2]