Workflow
多模态架构
icon
Search documents
月之暗面 Kimi 近 20 天收入超去年全年,成国内最快晋级十角兽公司
Xin Lang Cai Jing· 2026-02-23 22:26
IT之家 2 月 23 日消息,据智通财经报道,在连续融资超 12 亿美元(IT之家注:现汇率约合 82.93 亿元人民币)后,大模型独角兽月之暗面(Kimi)获得 近一年大模型行业的最高融资金额,也创下国内公司从成立到晋级十角兽企业(估值超 100 亿美元)的最快晋级速度。 综合IT之家此前报道,今年 1 月 27 日,月之暗面宣布推出并开源了其最新的 Kimi K2.5 模型。同时,Kimi 智能助手 K2.5 版本正式上线。用户在官网聊天界面中原本的 K2 模型已自动切换为 K2.5 版本。 月之暗面介绍称,该模型是其目前最智能的模型,在 Agent、代码、图像、视频及一系列通用智能任务上 取得开源 state-of-the-art 表现;同时也是 Kimi 迄今最全能的模型,采用了原生的多模态架构设计,同时支 持视觉与文本输入、思考与非思考模式、对话与 Agent 任务。 2 月 17 日,在完成上一轮 5 亿美元(现汇率约合 34.56 亿元人民币)融资仅一个多月后,月之暗面 Kimi 新一轮超 7 亿美元(现汇率约合 48.38 亿元人民币)的融资即将完成交割,本轮由阿里、五源、九安等老 股东联合 ...
月之暗面Kimi发布并开源K2.5模型
Ren Min Wang· 2026-02-02 01:21
Core Insights - Kimi has launched its next-generation open-source model, Kimi K2.5, which has achieved the best performance in global open-source model evaluations such as HLE, BrowseComp, and DeepSearchQA, making it the most intelligent model to date [1] Group 1: Model Features - Kimi K2.5 is designed on a native multimodal architecture that supports both visual and text inputs, integrating capabilities such as visual understanding, reasoning, programming, and agent functionalities into a single model [1] - The founder of Kimi, Yang Zhilin, stated that the company has restructured the infrastructure for reinforcement learning and optimized the training algorithms to ensure maximum efficiency and performance [1] Group 2: New Functionalities - The development team has introduced an "Agent Cluster" feature in K2.5, allowing the model to autonomously create "avatars" that can form teams with different roles to work in parallel, significantly enhancing the efficiency of complex task processing in large-scale search scenarios compared to single-agent execution [1] - Kimi K2.5 has also launched a new programming product called Kimi Code, which can run directly in terminals and integrate with mainstream editors like VSCode, Cursor, and Zed. This product leverages K2.5's multimodal advantages, enabling developers to input images and videos for programming assistance, thereby simplifying the programming process and lowering technical barriers [1]
月之暗面Kimi发布新模型 付费模式更新
Xin Jing Bao· 2026-01-27 11:37
Core Insights - Kimi has released and open-sourced the Kimi K2.5 model, which is described as the most intelligent and versatile model to date [1] Group 1: Model Features - Kimi K2.5 features a breakthrough in multimodal capabilities, supporting both visual and text inputs, as well as thinking and non-thinking modes, and dialogue and agent tasks [1] - The model enhances the code quality of open-source models, particularly in front-end development, allowing users to generate complete front-end interfaces from simple natural language dialogues [1] - Kimi K2.5 can automatically deconstruct interaction logic from uploaded screen recordings and reproduce it with code, lowering programming barriers [1] - The model has evolved from a single agent to an agent cluster, capable of dispatching up to 100 avatars to handle 1,500 steps simultaneously, with a main agent overseeing the final results [1] Group 2: Usage Modes and Commercialization - Kimi K2.5 has introduced four distinct modes: K2.5 Quick for rapid responses, K2.5 Thinking for multi-round search and complex question answering, K2.5 Agent for interpreting various document types, and K2.5 Agent Cluster for mass searches, long-form writing, and batch processing [2] - The update includes changes to Kimi's membership rights, clarifying its commercialization model, with free users receiving limited access to deep research and other services, while paid members can enjoy varying levels of service based on their subscription [2]
图像分词器造反了!华为 Selftok:自回归内核完美统一扩散模型,触发像素自主推理
机器之心· 2025-05-17 06:00
Core Viewpoint - The article discusses the breakthrough of Huawei's Pangu multimodal generation team in transforming visual data into discrete tokens, aiming to replicate the success of large language models (LLMs) in the visual domain through a novel approach called Selftok [1][5]. Group 1: Selftok Breakthrough - Selftok technology integrates autoregressive (AR) principles into visual tokenization, allowing pixel streams to be converted into discrete sequences that adhere to causal relationships [1][3]. - The initial paper on Selftok has been recognized as a Best Paper Candidate at CVPR 2025, highlighting its significance in the field [3]. Group 2: Industry Consensus and Challenges - The current consensus in the industry is that LLMs face a language data bottleneck, while non-language data like images and videos hold significant development potential [5]. - A unified multimodal architecture is seen as key to unlocking stronger emergent capabilities in AI, with the main challenge being the conversion of continuous visual signals into discrete tokens [5]. Group 3: Advantages of Discrete Visual Tokens - The article argues for the abandonment of spatial priors in favor of discrete visual tokens, which can maintain high accuracy while avoiding the pitfalls of continuous representations [6]. - Continuous representations are criticized for their poor prediction stability, increased complexity in reinforcement learning, and limited decoupling capabilities [6]. Group 4: Selftok Architecture - Selftok's architecture consists of an encoder, quantizer, and decoder, utilizing a dual-stream structure to enhance computational efficiency and maintain reconstruction quality [18][20]. - The quantizer employs a unique mechanism to address traditional training imbalances, achieving a unified approach to diffusion processes and autoregressive modeling [20]. Group 5: Training and Optimization - The pre-training phase of Selftok involves aligning multimodal data inputs to transition from LLM to visual-language model (VLM) [24]. - The model is optimized using reinforcement learning (RL) algorithms, with two types of reward functions designed to evaluate the generated images' attributes and spatial relationships [25][27]. Group 6: Experimental Results - Selftok has achieved state-of-the-art (SoTA) results in various reconstruction metrics on ImageNet, demonstrating superior detail preservation compared to other tokenizers [28]. - In benchmark evaluations, Selftok-Zero significantly outperformed models like GPT-4o, achieving a score of 92 on the GenEval benchmark [29][30].