Model Deployment

Take your trained model from a Jupyter notebook to a production API that serves real predictions.

The ML Pipeline

Notebook → Save Model → Build API → Containerize → Deploy → Monitor 1. Train model in notebook 2. Save/serialize the model (pickle, joblib, ONNX) 3. Wrap it in a web API (Flask / FastAPI) 4. Containerize with Docker (optional but recommended) 5. Deploy to cloud (AWS, GCP, Heroku, etc.) 6. Monitor predictions and retrain when needed

Step 1: Save Your Model

import joblib import pickle from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_iris # Train X, y = load_iris(return_X_y=True) model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X, y) # --- Option A: joblib (recommended for sklearn) --- joblib.dump(model, 'model.joblib') loaded_model = joblib.load('model.joblib') # --- Option B: pickle --- with open('model.pkl', 'wb') as f: pickle.dump(model, f) with open('model.pkl', 'rb') as f: loaded_model = pickle.load(f) # --- Option C: ONNX (cross-platform, fastest inference) --- # pip install skl2onnx onnxruntime from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType initial_type = [('input', FloatTensorType([None, 4]))] onnx_model = convert_sklearn(model, initial_types=initial_type) with open('model.onnx', 'wb') as f: f.write(onnx_model.SerializeToString())

Step 2: Build API with FastAPI

# app.py from fastapi import FastAPI from pydantic import BaseModel import joblib import numpy as np app = FastAPI(title="Iris Classifier API") model = joblib.load("model.joblib") class PredictRequest(BaseModel): sepal_length: float sepal_width: float petal_length: float petal_width: float class PredictResponse(BaseModel): prediction: str confidence: float CLASSES = ["setosa", "versicolor", "virginica"] @app.post("/predict", response_model=PredictResponse) def predict(req: PredictRequest): features = np.array([[req.sepal_length, req.sepal_width, req.petal_length, req.petal_width]]) prediction = model.predict(features)[0] proba = model.predict_proba(features)[0] return PredictResponse( prediction=CLASSES[prediction], confidence=round(float(proba.max()), 4) ) @app.get("/health") def health(): return {"status": "healthy"} # Run: uvicorn app:app --host 0.0.0.0 --port 8000 # Test: curl -X POST http://localhost:8000/predict \ # -H "Content-Type: application/json" \ # -d '{"sepal_length":5.1,"sepal_width":3.5,"petal_length":1.4,"petal_width":0.2}'

Step 2 (Alternative): Flask API

# app_flask.py from flask import Flask, request, jsonify import joblib import numpy as np app = Flask(__name__) model = joblib.load("model.joblib") CLASSES = ["setosa", "versicolor", "virginica"] @app.route("/predict", methods=["POST"]) def predict(): data = request.json features = np.array([[data["sepal_length"], data["sepal_width"], data["petal_length"], data["petal_width"]]]) prediction = model.predict(features)[0] return jsonify({"prediction": CLASSES[prediction]}) if __name__ == "__main__": app.run(host="0.0.0.0", port=5000) # Run: python app_flask.py

Step 3: Dockerize

# Dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY model.joblib . COPY app.py . EXPOSE 8000 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

# requirements.txt fastapi==0.104.1 uvicorn==0.24.0 joblib==1.3.2 scikit-learn==1.3.2 numpy==1.26.2

# Build and run docker build -t iris-api . docker run -p 8000:8000 iris-api

Step 4: Deploy Options

Platform	Complexity	Cost	Best For
Heroku	Low	Free tier	Quick demos, prototypes
Railway / Render	Low	Free tier	Simple deployments
AWS Lambda	Medium	Pay per request	Serverless, low traffic
AWS EC2 / GCP	High	Hourly	Full control, high traffic
Streamlit Cloud	Very Low	Free	Interactive ML demos with UI

Step 5: Monitoring

Data drift — Is incoming data different from training data?
Model drift — Is accuracy degrading over time?
Latency — How fast are predictions? (Target: <100ms)
Error rate — How many requests fail?
Retraining trigger — When to retrain with new data

Never deploy a model without input validation. Always check for missing values, out-of-range features, and wrong data types before prediction.