Model Deployment
Take your trained model from a Jupyter notebook to a production API that serves real predictions.
The ML Pipeline
Notebook → Save Model → Build API → Containerize → Deploy → Monitor
1. Train model in notebook
2. Save/serialize the model (pickle, joblib, ONNX)
3. Wrap it in a web API (Flask / FastAPI)
4. Containerize with Docker (optional but recommended)
5. Deploy to cloud (AWS, GCP, Heroku, etc.)
6. Monitor predictions and retrain when needed
Step 1: Save Your Model
import joblib
import pickle
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
# Train
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# --- Option A: joblib (recommended for sklearn) ---
joblib.dump(model, 'model.joblib')
loaded_model = joblib.load('model.joblib')
# --- Option B: pickle ---
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
with open('model.pkl', 'rb') as f:
loaded_model = pickle.load(f)
# --- Option C: ONNX (cross-platform, fastest inference) ---
# pip install skl2onnx onnxruntime
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
initial_type = [('input', FloatTensorType([None, 4]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)
with open('model.onnx', 'wb') as f:
f.write(onnx_model.SerializeToString())
Step 2: Build API with FastAPI
# app.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI(title="Iris Classifier API")
model = joblib.load("model.joblib")
class PredictRequest(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
class PredictResponse(BaseModel):
prediction: str
confidence: float
CLASSES = ["setosa", "versicolor", "virginica"]
@app.post("/predict", response_model=PredictResponse)
def predict(req: PredictRequest):
features = np.array([[req.sepal_length, req.sepal_width,
req.petal_length, req.petal_width]])
prediction = model.predict(features)[0]
proba = model.predict_proba(features)[0]
return PredictResponse(
prediction=CLASSES[prediction],
confidence=round(float(proba.max()), 4)
)
@app.get("/health")
def health():
return {"status": "healthy"}
# Run: uvicorn app:app --host 0.0.0.0 --port 8000
# Test: curl -X POST http://localhost:8000/predict \
# -H "Content-Type: application/json" \
# -d '{"sepal_length":5.1,"sepal_width":3.5,"petal_length":1.4,"petal_width":0.2}'
Step 2 (Alternative): Flask API
# app_flask.py
from flask import Flask, request, jsonify
import joblib
import numpy as np
app = Flask(__name__)
model = joblib.load("model.joblib")
CLASSES = ["setosa", "versicolor", "virginica"]
@app.route("/predict", methods=["POST"])
def predict():
data = request.json
features = np.array([[data["sepal_length"], data["sepal_width"],
data["petal_length"], data["petal_width"]]])
prediction = model.predict(features)[0]
return jsonify({"prediction": CLASSES[prediction]})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
# Run: python app_flask.py
Step 3: Dockerize
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.joblib .
COPY app.py .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
# requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
joblib==1.3.2
scikit-learn==1.3.2
numpy==1.26.2
# Build and run
docker build -t iris-api .
docker run -p 8000:8000 iris-api
Step 4: Deploy Options
| Platform | Complexity | Cost | Best For |
| Heroku | Low | Free tier | Quick demos, prototypes |
| Railway / Render | Low | Free tier | Simple deployments |
| AWS Lambda | Medium | Pay per request | Serverless, low traffic |
| AWS EC2 / GCP | High | Hourly | Full control, high traffic |
| Streamlit Cloud | Very Low | Free | Interactive ML demos with UI |
Step 5: Monitoring
- Data drift — Is incoming data different from training data?
- Model drift — Is accuracy degrading over time?
- Latency — How fast are predictions? (Target: <100ms)
- Error rate — How many requests fail?
- Retraining trigger — When to retrain with new data
Never deploy a model without input validation. Always check for missing values, out-of-range features, and wrong data types before prediction.