Workflow
Embedding Infrastructure
icon
Search documents
The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked
AI Engineer· 2026-05-05 17:00
Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makraduli walks through the real profiling mistakes, infrastructure gaps, and production constraints that led to building an embedding inference engine designed for dynamic model loading, hot-swapping, and memory-aware eviction instead of brittle one-model-per-container deployments. If you're working on small-model inference, embeddings, or GPU infrastructure, this ...