Your Model Is Already in Production. Here's How to Audit It for Fairness Before August.
Most teams don't need to retrain — they need to audit what's already deployed. Paragon's prep mode generates a Fairness Certificate from your existing model's predictions. No retraining. No data upload.
5 min read
The EU AI Act enforcement deadline is August 2, 2026. If you have AI models already deployed in healthcare, lending, insurance, or hiring, you have a problem that is different from the one most fairness tools solve.
Most fairness tools assume you are starting from scratch. They want to retrain your model with their architecture. That is not practical when your model is already in production, already FDA-cleared, or already integrated into 47 downstream systems.
You do not need to retrain. You need to audit.
The prep Mode
Paragon Fairness has a prep command designed for exactly this situation: you have a deployed model, you have its predictions, and you need a Fairness Certificate showing how it performs across demographic groups.
docker pull ghcr.io/paragon-dao/paragon-fairness:latest
docker run --rm -v $(pwd):/data \
ghcr.io/paragon-dao/paragon-fairness:latest \
prep --data /data/predictions.csv \
--target actual_outcome \
--predicted model_prediction \
--protected ancestry gender age_group
Your CSV needs three things:
- Ground truth column (
actual_outcome): What actually happened - Prediction column (
model_prediction): What your model predicted - Protected attribute columns (
ancestry,gender, etc.): Groups you need to audit across
The output is a full Fairness Certificate — per-group performance metrics, demographic parity gaps, equalized odds gaps, and an integrity hash for verification.
What prep Reports
The certificate breaks down your model’s performance for every combination of protected attributes:
- Accuracy by group: Does the model perform equally well for European-ancestry and African-ancestry patients? For male and female applicants?
- False positive rate by group: Is the model more likely to incorrectly flag one group over another?
- False negative rate by group: Is the model more likely to miss positive cases in one group?
- Demographic parity gap: The difference in positive prediction rates across groups
- Equalized odds gap: The difference in error rates across groups
If any of these gaps exceed your compliance threshold, you know exactly which groups are affected and by how much.
Who This Is For
prep is designed for a specific situation: models already in production that need compliance documentation.
This includes:
- FDA-cleared medical AI that must report subgroup performance under emerging guidance
- Credit scoring models under CFPB disparate impact scrutiny
- Clinical AI already deployed in hospital systems across diverse patient populations
- PRS models shipping risk scores that degrade for non-European ancestries
- Hiring AI subject to NYC Local Law 144 annual bias audits
If your model is already working and your problem is documentation, not retraining, prep is the right tool.
The Compliance Workflow
Here is the workflow for a team that needs EU AI Act compliance documentation by August:
Step 1: Export predictions. Pull a representative sample from your production system — actual outcomes alongside model predictions, with demographic columns included. This should reflect your real deployment population.
Step 2: Run prep. One Docker command. Your data stays on your machine. The tool generates a certificate in under 60 seconds for most datasets.
Step 3: Review the certificate. Open certificate.html. If any per-group gap exceeds your threshold, you know exactly what to fix. If gaps are within tolerance, you have your documentation.
Step 4: Publish for verification. Submit the certificate to paragondao.org so regulators, auditors, or customers can independently verify it. Only the certificate metadata is published — never your data.
Step 5: Schedule re-certification. Models drift. Populations change. Schedule quarterly prep runs on fresh production data. The Regulated tier includes re-certification SLAs.
prep vs. train
prep | train | |
|---|---|---|
| Input | Existing model predictions | Raw data |
| Output | Fairness audit certificate | Fair model + certificate |
| Retraining | None | Full GLE training pipeline |
| Time | Under 60 seconds | 5-15 minutes |
| Use case | Compliance audit of deployed models | Building new fair models from scratch |
| Who | Compliance officers, regulators, auditors | ML engineers, data scientists |
Both outputs are the same Fairness Certificate format — machine-verifiable, SHA-256 signed, with QR code for verification at paragonbiosignals.com/verify.
The August Question
Every organization deploying AI in EU high-risk categories needs to answer: can you document your model’s fairness across demographic groups?
If you are building new models, use train. If you have models already deployed, use prep. Either way, you need a certificate before August.
The free tier covers up to 1,000 samples. Read the docs or try the interactive demo. For regulated industries needing audit-ready evidence dossiers and re-certification SLAs, see the Regulated tier.