Workflow
Profiling
icon
Search documents
ICE says it's not stopping random people. These parents in D.C.disgree.
NBC News· 2025-09-04 16:21
We have several reports of parents like sending messages to each other like I'm afraid to like take my kid to school. Like I don't know if I can leave my house. We're outside Braftoft Elementary as parents are walking their kids in to start the school day.This week has felt different because there's been ice activity in this neighborhood. Mount Pleasant is historically a Salvadoran neighborhood in DC and parents here say they've been worried in the morning. How does it feel bringing your kids to school here ...
Continuous Profiling for GPUs — Matthias Loibl, Polar Signals
AI Engineer· 2025-07-22 19:46
GPU Profiling & Performance Optimization - The industry emphasizes improving performance and saving costs by optimizing software, potentially reducing server usage by 10% [4] - Sampled profiling is used to balance data volume and continuous monitoring, with examples of sampling 100 times per second resulting in less than 1% CPU overhead and 4MB memory overhead [5] - The industry highlights the importance of production environment profiling to observe real-world application performance with low overhead [8] - The company's solution leverages Linux EVPF, enabling profiling without application instrumentation [9] Technology & Metrics - The company's GPU profiling solution uses Nvidia NVML to extract metrics, including overall node utilization (blue line), individual process utilization (orange line), memory utilization, and clock speed [11][12] - Key metrics include power utilization (with power limit as a dashed line), temperature (important to avoid throttling at 80 degrees Celsius), and PCIe throughput (negative for receiving, positive for sending, e g 10 MB/s) [13][14] - The solution correlates GPU metrics with CPU profiles collected using EVPF to analyze CPU activity during periods of less than full GPU utilization [14] GPU Time Profiling - The company introduces GPU time profiling to measure time spent on individual CUDA functions, determining start and end times of kernels via the Linux kernel [18] - The solution displays CPU stacks with leaf nodes representing functions taking time on the GPU, with colors indicating different binaries (e g blue for Python) [19][20] Deployment & Integration - The company's solution can be deployed using a binary on Linux, Docker, or as a DaemonSet on Kubernetes, requiring a manifest YAML and token [21] - Turbo Puffer is interested in integrating the company's GPU profiling to improve the performance of their vector engine [22]