← Back to blog

Your Model Is Already in Production. Here's How to Audit It for Fairness Before August.

Most teams don't need to retrain — they need to audit what's already deployed. Paragon's prep mode generates a Fairness Certificate from your existing model's predictions. No retraining. No data upload.

compliance AI audit existing models EU AI Act fairness certificate

5 min read

The EU AI Act enforcement deadline is August 2, 2026. If you have AI models already deployed in healthcare, lending, insurance, or hiring, you have a problem that is different from the one most fairness tools solve.

Most fairness tools assume you are starting from scratch. They want to retrain your model with their architecture. That is not practical when your model is already in production, already FDA-cleared, or already integrated into 47 downstream systems.

You do not need to retrain. You need to audit.

The prep Mode

Paragon Fairness has a prep command designed for exactly this situation: you have a deployed model, you have its predictions, and you need a Fairness Certificate showing how it performs across demographic groups.

docker pull ghcr.io/paragon-dao/paragon-fairness:latest

docker run --rm -v $(pwd):/data \
  ghcr.io/paragon-dao/paragon-fairness:latest \
  prep --data /data/predictions.csv \
       --target actual_outcome \
       --predicted model_prediction \
       --protected ancestry gender age_group

Your CSV needs three things:

  • Ground truth column (actual_outcome): What actually happened
  • Prediction column (model_prediction): What your model predicted
  • Protected attribute columns (ancestry, gender, etc.): Groups you need to audit across

The output is a full Fairness Certificate — per-group performance metrics, demographic parity gaps, equalized odds gaps, and an integrity hash for verification.

What prep Reports

The certificate breaks down your model’s performance for every combination of protected attributes:

  • Accuracy by group: Does the model perform equally well for European-ancestry and African-ancestry patients? For male and female applicants?
  • False positive rate by group: Is the model more likely to incorrectly flag one group over another?
  • False negative rate by group: Is the model more likely to miss positive cases in one group?
  • Demographic parity gap: The difference in positive prediction rates across groups
  • Equalized odds gap: The difference in error rates across groups

If any of these gaps exceed your compliance threshold, you know exactly which groups are affected and by how much.

Who This Is For

prep is designed for a specific situation: models already in production that need compliance documentation.

This includes:

  • FDA-cleared medical AI that must report subgroup performance under emerging guidance
  • Credit scoring models under CFPB disparate impact scrutiny
  • Clinical AI already deployed in hospital systems across diverse patient populations
  • PRS models shipping risk scores that degrade for non-European ancestries
  • Hiring AI subject to NYC Local Law 144 annual bias audits

If your model is already working and your problem is documentation, not retraining, prep is the right tool.

The Compliance Workflow

Here is the workflow for a team that needs EU AI Act compliance documentation by August:

Step 1: Export predictions. Pull a representative sample from your production system — actual outcomes alongside model predictions, with demographic columns included. This should reflect your real deployment population.

Step 2: Run prep. One Docker command. Your data stays on your machine. The tool generates a certificate in under 60 seconds for most datasets.

Step 3: Review the certificate. Open certificate.html. If any per-group gap exceeds your threshold, you know exactly what to fix. If gaps are within tolerance, you have your documentation.

Step 4: Publish for verification. Submit the certificate to paragondao.org so regulators, auditors, or customers can independently verify it. Only the certificate metadata is published — never your data.

Step 5: Schedule re-certification. Models drift. Populations change. Schedule quarterly prep runs on fresh production data. The Regulated tier includes re-certification SLAs.

prep vs. train

preptrain
InputExisting model predictionsRaw data
OutputFairness audit certificateFair model + certificate
RetrainingNoneFull GLE training pipeline
TimeUnder 60 seconds5-15 minutes
Use caseCompliance audit of deployed modelsBuilding new fair models from scratch
WhoCompliance officers, regulators, auditorsML engineers, data scientists

Both outputs are the same Fairness Certificate format — machine-verifiable, SHA-256 signed, with QR code for verification at paragonbiosignals.com/verify.

The August Question

Every organization deploying AI in EU high-risk categories needs to answer: can you document your model’s fairness across demographic groups?

If you are building new models, use train. If you have models already deployed, use prep. Either way, you need a certificate before August.

The free tier covers up to 1,000 samples. Read the docs or try the interactive demo. For regulated industries needing audit-ready evidence dossiers and re-certification SLAs, see the Regulated tier.