Workflow
Mortgage finance data
icon
Search documents
Apache Spark on Infinia Demo
DDN· 2025-11-11 18:56
AI Workflow & Data Preparation - Infinia plays a crucial role in AI workflows, particularly in data preparation stages, by handling diverse data ingestion, providing low-latency KV store access at scale, and integrating with various AI platforms [2] - The AI pipeline involves data collection, pre-processing, tagging, and indexing as key data preparation steps [1] - DDN's Infinia, combined with Spark integrations, facilitates a smooth and scalable workflow using familiar tools for AI developers [6][7] Data Management & Security - Infinia addresses the challenge of providing secure data buckets for multiple developers through multi-tenancy controls, enabling dynamic addition or removal of secure tenants and subtenants [6] - DDN has developed Spark integrations to efficiently move data into developer tenant buckets [6] - Infinia's multi-tenancy can create secure locations for hosting data used in each inference pipeline [9] Mortgage Default Modeling Demo - The demonstration uses 10 years of quarterly mortgage finance data to model delinquency rates and probabilities on mortgage defaults [4] - Apache Spark is used to prepare the data and pipe it into a model training process that could be run on top of Infinia [3] - The workflow includes extracting recent data subsets, copying them into new Infinia buckets using Spark, and transforming the data into parquet files for model training [4][8] - The model training utilizes the XGBoost machine learning library to create a predictive model for mortgage defaults [9]