Elastic Introduces Native Inference Service in Elastic Cloud

Core Insights - Elastic has launched the Elastic Inference Service (EIS), a GPU-accelerated inference-as-a-service designed for Elasticsearch semantic search, vector search, and generative AI workflows [1][2]. Group 1: Service Features - EIS provides an API-based inference service utilizing NVIDIA GPUs, integrated with Elasticsearch's vector database for low-latency and high-throughput inference [3]. - The first text-embedding model available on EIS is the Elastic Learned Sparse EncodeR (ELSER), with plans to support additional models for multilingual embeddings and reranking soon [3][5]. - EIS is designed to streamline the developer experience by eliminating model downloads, manual configuration, and resource provisioning, integrating directly with semantic text and the Inference API [7]. Group 2: Performance and Scalability - The service offers improved end-to-end semantic search capabilities, compatible with both sparse and dense vectors, as well as semantic reranking [7]. - GPU-accelerated inference provides consistent latency and up to 10x higher throughput for ingestion compared to CPU-based alternatives [7]. - EIS is available on Serverless and Elastic Cloud Hosted deployments, accessible across all cloud service providers and regions [5]. Group 3: Pricing and Support - EIS features consumption-based pricing, charged per model per million tokens, making it easy for users to get started and access support [7]. - Elastic provides intellectual property indemnity for all models offered on EIS, ensuring peace of mind for users [7].