7 min read

You have a trained model. You suspect it might be biased. You want to test it without uploading your data anywhere.

This tutorial walks you through detecting and measuring AI bias using Paragon Fairness — a Docker-based tool that runs entirely on your machine.

Prerequisites

Docker installed (get Docker)
A CSV or Parquet file with your data
Your data must include a target column and at least one protected attribute column

Step 1: Pull the Image

docker pull ghcr.io/paragon-dao/paragon-fairness:latest

The image is about 1.6 GB and includes everything — Python runtime, model training code, certificate generation, and the GLE (General Learning Encoder).

Step 2: Prepare Your Data

Your CSV should look something like:

age,income,education,gender,race,hired
34,65000,bachelors,female,white,1
28,45000,masters,male,black,0
41,82000,phd,female,asian,1
...

The key columns:

Target (hired): What your model predicts
Protected attributes (gender, race): Groups you want to test for bias

Step 3: Run the Analysis

docker run --rm -v $(pwd):/data \
  ghcr.io/paragon-dao/paragon-fairness:latest \
  train --data /data/hiring_data.csv \
        --target hired \
        --protected gender race

The tool will:

Train a GLE encoder on your data
Measure fairness across all protected groups
Sweep the bottleneck dimension (d) across multiple values
Generate a Fairness Certificate

Step 4: Read the Results

The output directory contains:

output/
├── certificate.json        # Machine-readable metrics
├── certificate.html        # Visual report with QR code
├── badge_fair.svg          # Fairness badge for docs
├── models/                 # Trained fair model weights
└── metrics/                # Per-group breakdowns

Open certificate.html in your browser. You will see:

Overall fairness score based on bottleneck dimension
Per-group metrics: accuracy, false positive rate, false negative rate for each demographic group
Demographic parity gap: Difference in positive prediction rates
Equalized odds gap: Difference in true positive and false positive rates

Step 5: Adjust the Fairness-Accuracy Tradeoff

Fairness is not binary. The bottleneck dimension d controls the tradeoff — it is the number of dimensions in the model’s internal representation:

# Maximum fairness (d=8, smallest representation)
docker run --rm -v $(pwd):/data \
  ghcr.io/paragon-dao/paragon-fairness:latest \
  train --data /data/hiring_data.csv \
        --target hired \
        --protected gender race \
        --d 8

# Balanced (d=32)
docker run ... --d 32

# Maximum accuracy, less fairness constraint (d=128)
docker run ... --d 128

The free tier trains at d=32 and d=64. Pro tier sweeps all five values (8, 16, 32, 64, 128) so you can choose the right tradeoff. Try the interactive demo to visualize the tradeoff before committing.

For the full explanation of why d controls fairness, read the genomic fairness deep-dive.

Step 6: Publish Your Certificate

Once satisfied, submit your certificate for public verification:

docker run --rm -v $(pwd):/data \
  ghcr.io/paragon-dao/paragon-fairness:latest \
  verify --cert /data/output/certificate.json

This registers the certificate at paragondao.org where anyone can verify it by scanning the QR code or entering the SHA-256 hash at paragonbiosignals.com/verify.

What About My Data?

Your data never leaves your machine. The Docker container:

Runs entirely offline (after the initial pull)
Reads data from the mounted volume
Writes results to the mounted volume
Makes no network requests during training

The only network call is the optional verify command that submits the certificate (metrics only, not your data) for public verification.

Common Scenarios

”My model is already trained”

Use prep mode to analyze existing model outputs. This is designed for teams with models already in production who need a fairness audit without retraining:

docker run --rm -v $(pwd):/data \
  ghcr.io/paragon-dao/paragon-fairness:latest \
  prep --data /data/predictions.csv \
       --target actual_outcome \
       --predicted model_prediction \
       --protected gender race

prep generates a Fairness Certificate from your existing predictions — no retraining required. Useful for regulatory audits of deployed models.

”I need to test multiple models”

Run the tool multiple times with different data or configurations. Each run generates a separate certificate.

”I need a demo for stakeholders”

docker run --rm -p 8080:8080 \
  ghcr.io/paragon-dao/paragon-fairness:latest \
  demo

This starts a local web server with the interactive d-slider visualization — the same tool available at paragonbiosignals.com/app/demo, running entirely on your laptop. Share your screen or point stakeholders to localhost:8080.

Next Steps

What is a Fairness Certificate? — understand what you just generated
Why bottleneck dimension controls fairness — the science behind d
EU AI Act compliance guide — regulatory context and timeline
Full documentation — advanced configuration, Parquet input, API reference
Upgrade to Pro — unlimited samples, full d-sweep, signed certificates