Detect AI Bias in 5 Minutes: A Docker Tutorial for ML Engineers
Step-by-step tutorial for detecting bias in machine learning models using Docker. No cloud upload required. Generate fairness reports and certificates locally.
7 min read
You have a trained model. You suspect it might be biased. You want to test it without uploading your data anywhere.
This tutorial walks you through detecting and measuring AI bias using Paragon Fairness — a Docker-based tool that runs entirely on your machine.
Prerequisites
- Docker installed (get Docker)
- A CSV or Parquet file with your data
- Your data must include a target column and at least one protected attribute column
Step 1: Pull the Image
docker pull ghcr.io/paragon-dao/paragon-fairness:latest
The image is about 1.6 GB and includes everything — Python runtime, model training code, certificate generation, and the GLE (General Learning Encoder).
Step 2: Prepare Your Data
Your CSV should look something like:
age,income,education,gender,race,hired
34,65000,bachelors,female,white,1
28,45000,masters,male,black,0
41,82000,phd,female,asian,1
...
The key columns:
- Target (
hired): What your model predicts - Protected attributes (
gender,race): Groups you want to test for bias
Step 3: Run the Analysis
docker run --rm -v $(pwd):/data \
ghcr.io/paragon-dao/paragon-fairness:latest \
train --data /data/hiring_data.csv \
--target hired \
--protected gender race
The tool will:
- Train a GLE encoder on your data
- Measure fairness across all protected groups
- Sweep the bottleneck dimension (d) across multiple values
- Generate a Fairness Certificate
Step 4: Read the Results
The output directory contains:
output/
├── certificate.json # Machine-readable metrics
├── certificate.html # Visual report with QR code
├── badge_fair.svg # Fairness badge for docs
├── models/ # Trained fair model weights
└── metrics/ # Per-group breakdowns
Open certificate.html in your browser. You will see:
- Overall fairness score based on bottleneck dimension
- Per-group metrics: accuracy, false positive rate, false negative rate for each demographic group
- Demographic parity gap: Difference in positive prediction rates
- Equalized odds gap: Difference in true positive and false positive rates
Step 5: Adjust the Fairness-Accuracy Tradeoff
Fairness is not binary. The bottleneck dimension d controls the tradeoff — it is the number of dimensions in the model’s internal representation:
# Maximum fairness (d=8, smallest representation)
docker run --rm -v $(pwd):/data \
ghcr.io/paragon-dao/paragon-fairness:latest \
train --data /data/hiring_data.csv \
--target hired \
--protected gender race \
--d 8
# Balanced (d=32)
docker run ... --d 32
# Maximum accuracy, less fairness constraint (d=128)
docker run ... --d 128
The free tier trains at d=32 and d=64. Pro tier sweeps all five values (8, 16, 32, 64, 128) so you can choose the right tradeoff. Try the interactive demo to visualize the tradeoff before committing.
For the full explanation of why d controls fairness, read the genomic fairness deep-dive.
Step 6: Publish Your Certificate
Once satisfied, submit your certificate for public verification:
docker run --rm -v $(pwd):/data \
ghcr.io/paragon-dao/paragon-fairness:latest \
verify --cert /data/output/certificate.json
This registers the certificate at paragondao.org where anyone can verify it by scanning the QR code or entering the SHA-256 hash at paragonbiosignals.com/verify.
What About My Data?
Your data never leaves your machine. The Docker container:
- Runs entirely offline (after the initial pull)
- Reads data from the mounted volume
- Writes results to the mounted volume
- Makes no network requests during training
The only network call is the optional verify command that submits the certificate (metrics only, not your data) for public verification.
Common Scenarios
”My model is already trained”
Use prep mode to analyze existing model outputs. This is designed for teams with models already in production who need a fairness audit without retraining:
docker run --rm -v $(pwd):/data \
ghcr.io/paragon-dao/paragon-fairness:latest \
prep --data /data/predictions.csv \
--target actual_outcome \
--predicted model_prediction \
--protected gender race
prep generates a Fairness Certificate from your existing predictions — no retraining required. Useful for regulatory audits of deployed models.
”I need to test multiple models”
Run the tool multiple times with different data or configurations. Each run generates a separate certificate.
”I need a demo for stakeholders”
docker run --rm -p 8080:8080 \
ghcr.io/paragon-dao/paragon-fairness:latest \
demo
This starts a local web server with the interactive d-slider visualization — the same tool available at paragonbiosignals.com/app/demo, running entirely on your laptop. Share your screen or point stakeholders to localhost:8080.
Next Steps
- What is a Fairness Certificate? — understand what you just generated
- Why bottleneck dimension controls fairness — the science behind d
- EU AI Act compliance guide — regulatory context and timeline
- Full documentation — advanced configuration, Parquet input, API reference
- Upgrade to Pro — unlimited samples, full d-sweep, signed certificates