Workflow
KTransformers
icon
Search documents
明晚直播|2GPU+2CPU微调万亿参数超大模型,带你上手开源项目KTransformers
量子位· 2025-11-10 12:02
Core Insights - The article discusses the KTransformers project, which allows for low-cost fine-tuning of large models using local resources, specifically highlighting the use of 2 GPUs and 2 CPUs for fine-tuning the DeepSeek 671B and Kimi K2 1TB models [1][4]. Group 1: KTransformers Overview - KTransformers is an open-source project that has garnered significant attention for its ability to run large models locally, appealing to users interested in personalized AI applications [2][4]. - The project aims to provide a cost-effective and high-performance solution for fine-tuning large models, which is crucial for the practical deployment of AI technologies [4]. Group 2: Key Contributors - Professor Zhang Mingxing from Tsinghua University is a key advisor for the KTransformers project, with a strong background in computer systems and numerous publications in top-tier conferences [6]. - Li Peilin, a core participant in the KTransformers project, is currently studying at Northwestern Polytechnical University and will pursue a PhD at Tsinghua University, contributing to the development of fine-tuning technologies [9]. Group 3: Upcoming Events - A live session is scheduled for the following evening at 19:00, inviting participants to engage in practical discussions about using KTransformers and its applications [5][10].
2张4090竟能本地微调万亿参数Kimi K2!趋境联合清华北航把算力门槛击穿了
量子位· 2025-11-05 07:56
Core Insights - The article discusses the significant reduction in the cost and complexity of fine-tuning large language models, enabling the use of consumer-grade GPUs for models like DeepSeek 671B and Kimi K2 1TB [1][5][12]. Group 1: Cost Reduction and Technological Advancements - Fine-tuning large models previously required massive GPU resources, with models like Kimi K2 needing up to 2000GB of VRAM, while now only 2-4 consumer-grade GPUs (e.g., 4090) are sufficient [3][4]. - The key to this cost reduction comes from two domestic projects: KTransformers and LLaMA-Factory, which have made significant advancements in model training and fine-tuning [5][6][7]. - KTransformers allows for fine-tuning large models with significantly lower VRAM requirements, needing only around 90GB for Kimi K2 and 70GB for DeepSeek 671B [7][12]. Group 2: Performance and Efficiency - KTransformers has been shown to outperform other frameworks in terms of throughput and memory usage for fine-tuning tasks, making it a viable option for personal workstations [12][13]. - The integration of KTransformers with LLaMA-Factory simplifies the fine-tuning process, allowing users to manage data processing and training without extensive coding knowledge [9][30]. Group 3: Practical Applications and Customization - The article highlights the potential for personalized AI models, enabling users to fine-tune models for specific styles or industry needs, thus democratizing access to advanced AI technologies [24][26]. - Companies can leverage KTransformers to create specialized AI models tailored to their business needs, enhancing efficiency and return on investment [27][28]. Group 4: Technical Innovations - KTransformers employs innovative techniques such as offloading memory-intensive tasks to CPUs and integrating LoRA for efficient fine-tuning, significantly reducing the memory footprint of large models [36]. - The collaboration between KTransformers and LLaMA-Factory represents a strong synergy that enhances both performance and usability in the fine-tuning landscape [32][33].
KTransformers入选计算机系统顶会、与主流框架合作,趋境&清华让「异构」成为推理新范式
量子位· 2025-10-22 09:12
Core Insights - KTransformers, an open-source project developed by Turing Technology and Tsinghua University's KVCache.AI team, focuses on system innovation during the inference phase of large models, enabling efficient operation on diverse hardware architectures with lower computational power [2][4]. Group 1: KTransformers Overview - KTransformers is a high-performance heterogeneous inference framework that optimally utilizes various computing resources such as GPUs, CPUs, and memory [2]. - The project paper was recognized at the prestigious SOSP 2025 conference, highlighting its significance in the field of computer systems [2][4]. Group 2: Technical Innovations - The framework introduces an "Expert Deferral" mechanism, allowing for efficient scheduling of experts in Mixture of Experts (MoE) models, which reduces computational load without sacrificing model performance [7][13]. - KTransformers achieves nearly 4x speedup on a single Intel Xeon processor compared to traditional PyTorch implementations, significantly enhancing CPU performance in expert calculations [12]. - The system allows for dynamic overlapping of CPU and GPU loads, resulting in a model throughput increase of approximately 1.45 times, with minimal impact on model accuracy [15][16]. Group 3: Collaboration and Ecosystem - KTransformers has partnered with SGLang, a mainstream inference framework, to integrate full GPU inference with heterogeneous inference, enhancing the overall architecture for large model deployment [5][19]. - This collaboration enables developers to access both full GPU and heterogeneous inference capabilities seamlessly, particularly beneficial in scenarios with limited GPU resources [21]. Group 4: Market Position and Future Directions - KTransformers has gained significant traction in the developer community, with over 15.2K stars on GitHub, indicating its widespread adoption as a foundational framework for large model inference [24]. - The project aims to democratize AI capabilities, making them accessible beyond elite computational paths, and is actively collaborating with various domestic CPU and GPU platforms to promote cost-effective solutions [28][29].
2025新一代计算产业大会召开 聚焦算力标准与技术创新
Zhong Guo Xin Wen Wang· 2025-09-17 08:59
Core Insights - The 2025 New Generation Computing Industry Conference was held in Beijing, focusing on the standardization of computing power and technological innovation paths [1][3] - Key discussions included the entire process of AI large model data acquisition, preprocessing, training, fine-tuning, and inference, emphasizing the use of open-source foundational models for application value [3] Group 1: Standardization and Innovation - The conference highlighted the need for high-level planning, collaboration, and quality application in the construction of new generation computing standards [3] - The establishment of working groups for GPU, DPU, computing product components, liquid cooling ecosystems, and heterogeneous computing was announced, along with the initiation of two national standards for server power supplies [4] Group 2: Technical Challenges and Solutions - The DPU was identified as a core chip for computing power, capable of handling data processing and network forwarding tasks to enhance CPU and GPU efficiency, but the lack of unified technical standards hinders large-scale application [3] - Two core technologies were introduced to address memory challenges in inference: Mooncake, which reduces memory consumption through shared public storage, and KTransformers, which enables CPU and GPU memory collaboration [3]
促开放协作与跨界融合 2025CCF中国开源大会在上海召开
Zhong Guo Xin Wen Wang· 2025-08-02 13:15
Core Insights - The 2025 CCF China Open Source Conference opened in Shanghai, focusing on key directions such as open-source large models and embodied intelligence [1][3] - Experts from academia and industry shared forward-looking views on critical technology areas including large models, open-source hardware, and intelligent operating systems [3] Group 1: Key Developments - The conference featured the introduction of efficient inference systems Mooncake and KTransformers developed by a team led by Zheng Weimin, showcasing their core role in supporting workloads in the intelligent era [3] - Academician E Wei Nan emphasized the paradigm shift in AI from a "model-centric" to a "data-centric" approach, highlighting the need for high-quality data infrastructure to lower the barriers for AI implementation [3] Group 2: Community and Ecosystem Initiatives - The CCF Ubiquitous Operating System Open Community was established with participation from top universities and research institutions, focusing on technology research, project incubation, standard development, application promotion, and talent cultivation [4] - A series of strategic initiatives were launched, including the establishment of the CCF-Mulan Innovation Open Source Incubator and the Omni-Infer Cloud Co-Creation Plan [3][4] Group 3: Educational and Collaborative Efforts - Shanghai Jiao Tong University aims to integrate open-source concepts into its curriculum, fostering talent for next-generation operating systems [5] - The collaboration model between Shanghai Jiao Tong University and Huawei emphasizes shared goals and resources to support core technology breakthroughs [5]