GPU efficiency
Search documents
KV Cache and EXAScaler, Enabling AI Without New Systems
DDN· 2026-01-29 20:01
Your EXAScaler is AI-ready. Join us to learn how to unlock it. Improve performance and GPU efficiency. In this live session, Joel Kaufman, Senior Product Manager at DDN, explains how KV Cache works with EXAScaler to address these challenges. The session focuses on architecture, impact, and when KV Cache delivers the most value, without requiring new infrastructure. This session is designed to help teams understand why KV Cache matters and where it fits as AI workloads grow. What You’ll Learn • How KV Cache ...
X @s4mmy
s4mmy· 2025-10-13 17:56
AI Compute Demand & GPU Market - AI 计算需求增长速度是效率增长速度的两倍 [1] - 除非 GPU 效率提高,否则 GPU 将成为人工智能领域最受欢迎的商品 [1] Crypto Protocols Benefiting from GPU Demand - Livepeer 是一个去中心化的视频流网络,利用 GPU 资源进行高效的实时视频转码和处理,收入与价格合理相关 [1] - USDai_Official 是 GPU 支持的借贷协议 [2] - Gaib_ai 将企业 GPU 收益代币化为可交易资产 [2] - AethirCloud 提供企业级 DePIN GPU 云 [2] - Render Network 提供用于 3D 渲染和视觉效果的 GPU 计算 [2] - io.net 提供的计算比 AWS 便宜 70% [2] - 0G_labs 是一个具有 GPU 集群的模块化 L1,用于可验证的计算 [2] - SpheronFDN 通过无需许可的计算市场产生需求 [2] - Akash Network 是一个用于云计算资源的开源市场 [2] - Theta Network 的 EdgeCloud 利用用户 GPU 进行去中心化的视频渲染和 AI 推理 [2] - Golem Project 是一个点对点计算市场,在全球范围内出租闲置 GPU 用于渲染/AI 任务 [2] TAO + Subnets - SN 64: Chutes_ai [2] - SN 51: Lium_io [2] - SN 27: Neural_internet [2] - SN 12: ComputeHorde [2]
X @s4mmy
s4mmy· 2025-10-13 14:11
AI Compute Demand & GPU Market - AI compute demand is growing at twice the rate of efficiency growth, potentially leading to GPU shortages [1] - To meet current AI compute demand, an estimated $500 billion must be invested in data centers annually until 2030 [2] Crypto Protocols Benefiting from GPU Demand - Livepeer, a decentralized video streaming network, benefits from GPU resources for video transcoding and processing [1] - Several crypto protocols are positioned to capitalize on GPU demand, including GPU-backed lending protocols like USDai_Official, enterprise GPU yield tokenization platforms like gaib_ai, DePIN GPU clouds like AethirCloud, and GPU compute providers for 3D rendering like Rendernetwork [2] - Other protocols include ionet (compute provider), 0G_labs (modular L1 with GPU clusters), SpheronFDN (compute marketplace), akashnet_ (cloud computing marketplace), and Theta_Network (decentralized video rendering and AI inference) [2] TAO + Subnets - TAO subnets like chutes_ai (SN 64), lium_io (SN51), neural_internet (SN27), and ComputeHorde (SN12) are relevant in the context of GPU compute [2]
Continuous Profiling for GPUs — Matthias Loibl, Polar Signals
AI Engineer· 2025-07-22 19:46
GPU Profiling & Performance Optimization - The industry emphasizes improving performance and saving costs by optimizing software, potentially reducing server usage by 10% [4] - Sampled profiling is used to balance data volume and continuous monitoring, with examples of sampling 100 times per second resulting in less than 1% CPU overhead and 4MB memory overhead [5] - The industry highlights the importance of production environment profiling to observe real-world application performance with low overhead [8] - The company's solution leverages Linux EVPF, enabling profiling without application instrumentation [9] Technology & Metrics - The company's GPU profiling solution uses Nvidia NVML to extract metrics, including overall node utilization (blue line), individual process utilization (orange line), memory utilization, and clock speed [11][12] - Key metrics include power utilization (with power limit as a dashed line), temperature (important to avoid throttling at 80 degrees Celsius), and PCIe throughput (negative for receiving, positive for sending, e g 10 MB/s) [13][14] - The solution correlates GPU metrics with CPU profiles collected using EVPF to analyze CPU activity during periods of less than full GPU utilization [14] GPU Time Profiling - The company introduces GPU time profiling to measure time spent on individual CUDA functions, determining start and end times of kernels via the Linux kernel [18] - The solution displays CPU stacks with leaf nodes representing functions taking time on the GPU, with colors indicating different binaries (e g blue for Python) [19][20] Deployment & Integration - The company's solution can be deployed using a binary on Linux, Docker, or as a DaemonSet on Kubernetes, requiring a manifest YAML and token [21] - Turbo Puffer is interested in integrating the company's GPU profiling to improve the performance of their vector engine [22]