LMCache
Search documents
英伟达、DeepSeek集体跟进,18个月前被忽视,如今统治AI推理
3 6 Ke· 2025-11-10 04:11
2024年,北京大学金鑫-刘譞哲团队、加州大学圣地亚哥分校「Hao AI Lab」等提出了DistServe解耦推理理念,短短一年多时间,迅速从实验 室概念成长为行业标准,被NVIDIA、vLLM等主流大模型推理框架采用,预示着AI正迈向「模块化智能」的新时代。 如果说「摩尔定律」认为计算能力每18个月翻倍,如今大模型推理成本的下降速度,已远超「摩尔定律」关于计算能力迭代速度的预测。 这并非只是芯片性能提升之功,更重要的是来自推理系统的自身进化。而加速这一进化的,源自一个在DistServe系统中首次提出并实践的「解耦推理」理 念。 该系统由北大、UCSD等机构于2024年3月推出,并提出了一个简单而大胆的设想: 将大模型的推理过程拆分为「预填充」和「解码」两个阶段,并让它们分别在独立的计算资源池中进行伸缩与调度。 如今,这种解耦推理架构已被NVIDIA、llm-d、vLLM、MoonCake等主流大模型推理框架采用,并开始在大规模、真实推理场景中发挥强大力量。 「Hao AI Lab」由加州大学圣地亚哥分校助理教授Hao Zhang领导,他也是2025年谷歌机器学习与系统青年教师奖的获得者。 「Hao AI ...
独家|对话Tensormesh三位联创:如何从学术界走到大模型推理产业前线?
Z Potentials· 2025-10-24 08:18
Core Insights - Tensormesh, a company focused on providing cache-accelerated inference optimization for enterprises, has officially launched and secured $4.5 million in seed funding led by Laude Ventures [2] - The founding team, consisting of Junchen Jiang, Yihua Cheng, and Kuntai Du, aims to bridge the gap between AI inference engines and storage services, leveraging their academic backgrounds to create a commercially viable product [3][4] Company Overview - Tensormesh is the first commercial platform to productize large-scale AI inference caching, inspired by the open-source project LMCache, which combines advanced technology with enterprise-level usability, security, and manageability [2][4] - The company’s product allows enterprises to deploy large model services easily, significantly reducing operational costs to about one-tenth of public API usage while enhancing performance by up to ten times compared to mainstream solutions [4][29] Funding and Growth - The funding process for Tensormesh was unconventional, relying on personal connections rather than traditional methods like business plans or roadshows, resulting in a swift investment agreement [5][48] - The seed funding will primarily be used for product refinement and team expansion, with a strategic focus on creating a strong open-source engine as an entry point for commercial value [5][40] Market Position and Challenges - The inference industry is emerging, with the cost of inference surpassing training costs due to increased usage, highlighting the need for efficient solutions [30][32] - Tensormesh addresses three main challenges in deploying large models: privacy concerns, complex cluster management, and high operational costs [26][28] Product Features and Innovations - The product offers a one-click deployment solution for in-house large model services, ensuring data privacy while significantly lowering costs and improving performance [29][30] - Tensormesh aims to fill a market gap by providing a comprehensive solution that integrates inference engines, storage, scheduling, and routing, which is currently lacking in the industry [38] Future Aspirations - The company aspires to become the go-to solution for large model inference, similar to how Databricks is recognized in big data [44][45] - The long-term vision includes evolving with AI advancements, ensuring that Tensormesh remains relevant as the industry shifts from reliance on single models to more complex systems [51][52]
X @Avi Chawla
Avi Chawla· 2025-07-09 06:30
LLM Serving Engine - LMCache is an open-source LLM serving engine designed to reduce time-to-first-token and increase throughput, especially under long-context scenarios [1] - LMCache boosts vLLM with 7x faster access to 100x more KV caches [1] Open Source - LMCache is 100% open-source [1]
X @Avi Chawla
Avi Chawla· 2025-07-09 06:30
Key Features - LMCache is an open-source LLM serving engine designed to reduce time-to-first-token and increase throughput [1] - LMCache significantly improves vLLM, achieving 7x faster access to 100x more KV caches [1] Technical Advantages - The solution is particularly beneficial in long-context scenarios [1]