Workflow
vLLM
icon
Search documents
量化看市场系列之十一:Token太贵?让龙虾使用本地大模型
Huachuang Securities· 2026-03-29 14:48
金融工程 证 券 研 究 报 告 【点评报告】 量化看市场系列之十一:Token 太贵?让"龙虾" 使用本地大模型 摘要 LM Studio 是一款专为本地运行大语言模型(LLM)设计的跨平台桌面应 用,基于 llama.cpp 构建。它让你能够在自己的电脑上离线运行 Llama、 DeepSeek、Qwen、Mistral 等开源模型,无需云端 API,完全保护数据隐私。 OpenClaw 与 LM Studio 的连接,本质上是通过 OpenAI 兼容 API 协议实现的 本地模型调用。LM Studio 作为本地模型推理服务端,提供标准的 HTTP 接 口;OpenClaw 作为 AI 智能体框架,通过配置将该接口作为模型供应商接 入。这一组合的最大价值在于实现了真正意义上的私有化 AI 部署——所有 对话数据、推理过程和模型权重完全保留在本地设备,无需任何云端 API 调 用,既保障了数据主权,又消除了持续的 API 成本支出。 从技术架构看,LM Studio 扮演了"模型引擎"的角色,负责加载 GGUF/MLX 格式的本地模型并执行推理;OpenClaw 则充当"智能体大 脑",负责任务规划、工具 ...
X @Avi Chawla
Avi Chawla· 2026-03-21 20:29
RT Avi Chawla (@_avichawla)16 best GitHub repos to build AI engineering projects!(star + bookmark them):The open-source AI ecosystem has 4.3M+ repos now.New repos blow up every month, and the tools developers build with today look nothing like what we had a year ago.I put together a visual covering the 16 repos that make up the modern AI developer toolkit right now.The goal was to cover key layers of the stack:1) OpenClaw↳ Personal AI agent that runs on your devices and connects to 50+ messaging platforms2) ...
X @Avi Chawla
Avi Chawla· 2026-03-21 07:45
16 best GitHub repos to build AI engineering projects!(star + bookmark them):The open-source AI ecosystem has 4.3M+ repos now.New repos blow up every month, and the tools developers build with today look nothing like what we had a year ago.I put together a visual covering the 16 repos that make up the modern AI developer toolkit right now.The goal was to cover key layers of the stack:1) OpenClaw↳ Personal AI agent that runs on your devices and connects to 50+ messaging platforms2) AutoGPT↳ Platform for buil ...
InferenceX v2:NVIDIA Blackwell 对阵 AMD 对阵 Hopper —— 原名 InferenceMAX --- InferenceX v2_ NVIDIA Blackwell Vs AMD vs Hopper - Formerly InferenceMAX
2026-02-24 14:19
Summary of InferenceX v2: NVIDIA Blackwell vs AMD vs Hopper Industry and Company Involved - The document discusses the competitive landscape of AI inference performance, focusing on NVIDIA's Blackwell architecture and AMD's offerings, particularly in the context of inference benchmarks and optimizations. Core Points and Arguments - **InferenceX v2 Overview**: InferenceX v2 builds on InferenceMAXv1, establishing a new standard for AI inference performance and economics through continuous testing across numerous GPUs and frameworks [3][4][7] - **Benchmarking Capabilities**: InferenceX v2 is the first suite to benchmark NVIDIA's Blackwell Ultra GB300 NVL72 and B300, as well as AMD's MI355X, across the entire Pareto frontier curve [9][10] - **Performance Comparison**: - AMD's MI355X shows competitive performance per total cost of ownership (TCO) against NVIDIA's B200 in FP8 precision using disaggregated and wide expert parallelism [21][23] - However, NVIDIA's solutions, particularly the B200 and B300, maintain a significant performance lead over AMD's offerings in many scenarios [28][34] - **Energy Efficiency**: NVIDIA GPUs demonstrate superior energy efficiency, consuming significantly fewer picoJoules per token across all workloads compared to AMD [28] - **Composability Issues**: AMD's inference optimizations struggle with composability, where individual optimizations perform well in isolation but fail to deliver competitive results when combined [29][30][31] - **Future Focus for AMD**: AMD is advised to enhance the composability of its inference optimizations and is reportedly planning to focus on software composability of FP4 and distributed inferencing after the Chinese New Year [31][33][70] Additional Important Content - **Performance Improvements**: AMD has made notable improvements in SGLang DeepSeek R1 FP4 configurations, nearly doubling throughput in under two months [66][67] - **NVIDIA's Consistency**: NVIDIA's performance results have been more stable, with minor improvements noted for the B200 SGLang over a similar timeframe [73] - **Market Dynamics**: The document highlights the competitive dynamics between NVIDIA and AMD, emphasizing the need for AMD to increase contributions to open-source projects and improve its software stack to remain competitive [70][42] - **Technical Concepts**: The document explains key technical concepts such as disaggregated prefill, tensor parallelism, and the trade-offs between interactivity and throughput in LLM inference [49][57][61] This summary encapsulates the critical insights and data points from the InferenceX v2 report, providing a comprehensive overview of the competitive landscape in AI inference technology.
SemiAnalysis创始人播客分享--英伟达、华为、AIDC的谣言
傅里叶的猫· 2026-02-22 13:41
最近听了 SemiAnalysis 创始人 Dylan Patel 最近的一期播客,信息量非常大。从英伟达收购 Gro q,到中国半导体产业的疯狂内卷,再到那些关于"AI 耗尽水资源"的谣言,这个文章我们就来整理一 下他的观点。 英伟达的焦虑:从"一芯通吃"到多元化布局 不久前,英伟达还在说"一块 GPU 就能搞定所有 AI 任务",结果现在转头就收购了 Groq。这背后 藏着老黄的深层焦虑。 Dylan 提到一个很关键的观点:现在 AI 模型的工作负载已经大到可以容纳专用芯片了。Groq 这种 芯片在通用任务上不行,训练不了,跑大模型也不够经济,但它有一个绝活——推理速度快到飞起。 这就是典型的"专用芯片打败通用芯片"的场景。 未来 AI 模型可能不再是单线程思考,而是同时开启 100 个并行的思维流。Google 和 OpenAI 的一 些 Pro 模型已经在这么干了——模型不是只有一条推理链,而是同时跑多条,然后选出最佳答案。 这种场景下,你需要的不是"极致的快",而是"足够宽的并行处理能力"。 所以英伟达现在的策略很明确:既要保住通用 GPU 的基本盘,又要通过收购 Groq、开发 CPX 芯 片等方 ...
Clawdbot国产芯片适配完成!清华特奖出手,开源框架直接一键部署
量子位· 2026-02-03 04:52
Core Viewpoint - Clawdbot, now known as OpenClaw, has gained significant popularity, reaching 120,000 stars on GitHub within a week, with its Mac mini accessories sold out and rapid integration by major companies like Alibaba and Tencent [1][4]. Group 1: Clawdbot Features and Functionality - Clawdbot transforms AI from a standard chatbot into a 24/7 AI employee, capable of performing tasks while users are occupied or asleep [5]. - It can respond to messages on mobile devices and proactively notify users upon task completion [6]. - Users have reported high costs associated with using Clawdbot, as it can quickly consume hundreds of dollars in token fees for minimal output [10]. Group 2: Introduction of Xuanwu CLI - Xuanwu CLI is a new open-source framework that allows users to run Clawdbot locally without needing to purchase a Mac mini or incur API costs, making it more accessible [13][14]. - It simplifies the local deployment of models, providing an "app store-like" experience for users to select and use models without complex configurations [18]. - The command system of Xuanwu CLI is highly compatible with Ollama, allowing for easy transition for users familiar with that platform [20]. Group 3: Technical Advantages of Xuanwu CLI - Xuanwu CLI supports local AI engines, enabling integration with Clawdbot for continuous operation and interaction [25]. - It is designed to be user-friendly, requiring minimal setup and allowing for quick service startup, often within one minute [29]. - The framework is compatible with OpenAI API standards, facilitating easy integration with existing applications and reducing the cost of switching from cloud to local models [30]. Group 4: Adaptation to Domestic Chips - Xuanwu CLI is uniquely adapted to domestic chips, providing a cost-effective solution for running models locally, unlike other solutions that primarily rely on NVIDIA hardware [34]. - It addresses common issues faced with domestic chips, such as configuration complexity and performance variability, by encapsulating hardware differences and providing a unified resource pool [39]. - The architecture of Xuanwu CLI allows for intelligent scheduling and optimal resource allocation, ensuring stability and performance across different hardware setups [46]. Group 5: Company Background - Qingmiao Intelligent, founded in 2022, focuses on chip adaptation and the optimization of models, frameworks, and operators [48]. - The company has received significant investment and aims to create a comprehensive optimization system from hardware to intelligent agents [51]. - Qingmiao has successfully developed various domestic integrated machine solutions, achieving high performance and adaptability across multiple chip platforms [52].
LLM-in-Sandbox:给大模型一台电脑,激发通用智能体能力
机器之心· 2026-01-30 04:25
Core Idea - The article presents the concept of LLM-in-Sandbox, which allows large language models (LLMs) to explore tasks in a virtual computer environment, significantly enhancing their performance in various non-code domains without additional training [5][40]. Group 1: Technical Advancements - The evolution of large models is being unlocked through different paradigms, including In-Context Learning, Chain-of-Thought, and the recent intelligent agent framework that enables multi-turn interactions and tool usage [2][3]. - LLM-in-Sandbox is proposed as a new paradigm that combines LLMs with a virtual computer, allowing them to autonomously explore and complete tasks, leading to improved performance in fields such as mathematics, physics, chemistry, and long-text understanding [3][7]. Group 2: Design and Implementation - LLM-in-Sandbox features a lightweight, general-purpose design that contrasts with existing software engineering agents that require task-specific environments, thus enhancing generalization and scalability [10][11]. - The environment is based on a Docker Ubuntu setup with minimal pre-installed tools, allowing models to autonomously acquire domain-specific tools as needed [12][13]. Group 3: Experimental Results - Experiments across six non-code domains showed significant performance improvements for LLMs in the LLM-in-Sandbox mode, with enhancements observed in mathematics (+6.6% to +24.2%), physics (+1.0% to +11.1%), and other areas without additional training [20][21]. - The model's ability to autonomously utilize the sandbox environment was demonstrated through case studies, showcasing its capacity for external resource access, file management, and computational execution [21][22][23]. Group 4: Reinforcement Learning Integration - LLM-in-Sandbox RL is introduced to enhance the generalization capabilities of weaker models by training them in the sandbox environment using context-based tasks, which require active exploration [26][29]. - The approach has shown consistent performance improvements across various models, indicating its broad applicability and effectiveness [31]. Group 5: Efficiency and Performance - LLM-in-Sandbox demonstrates cross-domain generalization, achieving consistent performance improvements in multiple downstream tasks, including software engineering [31]. - The deployment of LLM-in-Sandbox can significantly reduce token consumption in long-text scenarios, with reductions of up to 8 times, while maintaining competitive throughput speeds [32][34]. Group 6: Future Prospects - LLM-in-Sandbox transcends traditional text generation capabilities, enabling cross-modal abilities and direct file generation, which could evolve into a universal digital creation system [35][38]. - The article concludes that LLM-in-Sandbox should become the default deployment paradigm for large models, as it offers substantial performance enhancements with minimal deployment costs [40].
vLLM团队创业,种子轮10.5亿!清华特奖游凯超加盟
量子位· 2026-01-23 05:03
Core Insights - The core viewpoint of the article is the establishment of a new company, Inferact, by the core team behind the open-source inference framework vLLM, which has successfully raised $150 million in seed funding, achieving a valuation of $800 million [1][2][7]. Funding and Market Trends - The $150 million seed round marks a new high in AI infrastructure funding and is one of the largest seed rounds in history [2]. - Investors highlight a shift in focus from training to inference as AI applications mature, with a growing need for low-cost, reliable operation of existing models [4][9]. Company Mission and Strategy - Inferact aims to address the "inference bottleneck" by building the next-generation commercial engine to tackle large-scale deployment challenges [5]. - The company plans to maintain a dual approach, supporting vLLM as an independent open-source project while developing commercial products to enhance hardware efficiency for AI model deployment [12][14]. Technology and Market Validation - vLLM has already been deployed in real-world industrial environments, including Amazon's core shopping application, validating its stability under high concurrency [10][11]. - The demand for low-cost, reliable operation of existing models has surpassed expectations for new model development [9]. Founding Team and Expertise - Simon Mo, the CEO, has a background in machine learning systems design and was an early engineer at Anyscale, bringing experience in transforming research into industrial-grade products [26][27]. - Co-founder Woosuk Kwon, a PhD from UC Berkeley, contributed significant innovations to vLLM, including the Paged Attention algorithm [30][31]. - The team also includes Kaichao You, a Tsinghua University award winner, and experienced advisors from academia and industry, enhancing the company's technical and strategic capabilities [33][36].
速递|a16z全程跟进:vLLM之父创AI推理Inferact,顶级投资阵容融资,估值达8亿美元
Sou Hu Cai Jing· 2026-01-23 04:46
Core Insights - Inferact, an AI startup founded by the creators of the open-source software vLLM, has completed a $150 million seed funding round, achieving a valuation of $800 million [2] - The funding round was led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from Sequoia Capital, Altitude Capital, Redpoint Ventures, and ZhenFund [2] - Inferact focuses on the inference stage of AI, which involves running existing models efficiently and reliably, rather than building new models [2][4] Company Overview - Inferact was founded in November 2025 and is led by CEO Simon Mo, one of the original maintainers of the vLLM project [3] - The company aims to support vLLM as an independent open-source project while also developing commercial products to help enterprises run AI models more efficiently on various hardware [4] - The vLLM project, initiated by the University of California, Berkeley, has attracted contributions from thousands of developers in the AI industry [2][3] Market Context - The interest from investors reflects a broader shift in the AI industry, where developers can now utilize existing powerful models without waiting for significant upgrades [3] - The inference stage is becoming a bottleneck, increasing costs and putting pressure on systems, which may worsen in the coming years [4] - The significant seed funding indicates the scale of market opportunities, with even minor efficiency improvements having a substantial impact on costs [4] Application Example - An example of vLLM's widespread application is Amazon, which relies on the software for both its cloud services and shopping applications to run internal AI systems [5]
速递|a16z全程跟进:vLLM之父创AI推理Inferact,顶级投资阵容融资,估值达8亿美元
Z Potentials· 2026-01-23 04:13
Core Insights - Inferact, an AI startup founded by the creators of the open-source software vLLM, has raised $150 million in seed funding, achieving a valuation of $800 million [2] - The company focuses on the inference stage of AI, where trained models begin to answer questions and solve tasks, predicting that the biggest challenge in the AI industry will shift from building new models to operating existing models efficiently and reliably [2][4] Funding and Investment - The seed round was led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from Sequoia Capital, Altitude Capital, Redpoint Ventures, and ZhenFund [2] - Andreessen Horowitz's involvement dates back to the early stages of the vLLM project, which became the first recipient of their "AI Open Source Grant Program" in 2023 [3] Technology and Development - Inferact's core technology is built around vLLM, an open-source project launched in 2023 to help enterprises efficiently deploy AI models on data center hardware [2][4] - The company aims to support vLLM as an independent open-source project while also developing commercial products to help businesses run AI models more efficiently on various hardware [4] Market Trends - The AI industry is experiencing a shift where developers can utilize existing powerful models without waiting for significant upgrades, contrasting with the past when new model releases took years [3] - The inference stage is becoming a bottleneck, increasing costs and putting pressure on systems, which may worsen in the coming years [4] Business Strategy - Inferact's significant seed funding reflects the scale of market opportunities, indicating that even small efficiency improvements can have a substantial impact on costs [4] - The company does not aim to replace or limit open-source projects but seeks to build a business that supports and expands the vLLM project [4]