Services Case Studies White Papers Blog About Our Team
Free AI Assessment → Contact Us

How to Run an Enterprise AI Audit: Assess What You've Already Built

Most enterprises have deployed AI systems without adequate documentation, independent validation, or ongoing monitoring. An AI audit is not about finding problems to punish. It is about understanding what you have built well enough to govern it, improve it, and defend it to regulators. Here is the methodology that works.

The trigger for most enterprise AI audits is external: a regulator asks questions, an incident occurs, or an acquisition target's AI systems need due diligence. Organizations that wait for these triggers are already in a reactive position that is more expensive and more damaging than proactive audit programs.

The better frame is this: an AI audit is a structured inventory of what you have deployed, how well it is governed, and where the gaps are relative to your regulatory obligations and risk tolerance. Done well, it produces a prioritized remediation plan that is far more actionable than any generic governance framework can provide.

68%
of enterprises lack a complete inventory of AI systems in production. The systems exist. The documentation does not.

This guide covers the five-phase AI audit methodology, the scoring approach for assessing governance maturity of individual systems, the finding severity framework for prioritizing remediation, and the common findings that appear in almost every enterprise AI audit.

Before You Start: Build the AI System Inventory

You cannot audit what you cannot find. The first task in any enterprise AI audit is building an inventory of AI systems in production or active development. This is harder than it sounds. AI components are embedded in purchased software, built into business intelligence platforms, developed by external vendors and resellers, and created informally by business units without IT involvement.

An AI system for audit purposes is any system that uses machine learning, statistical modeling, or large language models to make or substantially influence a business decision. This includes: credit scoring and fraud detection models, clinical decision support tools, HR screening and workforce analytics systems, customer-facing chatbots and recommendation engines, risk classification and pricing models, and any GenAI deployment in production workflows.

Build the inventory through four channels: IT system records and cloud platform inventories, vendor and partner disclosures (many purchased platforms contain AI you may not be aware of), business unit interviews targeting operational decisions that use model outputs, and data infrastructure reviews that reveal active model serving endpoints.

Most organizations discover they have 40 to 80 percent more AI systems in production than their initial estimate. The systems are real and consequential. The inventory just was not maintained.

The Five-Phase AI Audit Methodology

Each phase builds on the previous. You can conduct phases 1 through 3 with an internal team for most systems. Phases 4 and 5 for High-Risk systems typically require independent reviewers with no connection to the original development team.

Phase 01
Documentation Review
1 to 2 weeks per system
Assess the completeness and accuracy of existing documentation against minimum standards for the system's risk tier. This phase commonly reveals the most critical findings in enterprise AI audits.
Does a Model Development Plan or equivalent document exist? Is it current?
Is intended use and scope of applicability explicitly defined and bounded?
Are training data sources, preprocessing steps, and feature engineering documented?
Are performance thresholds defined, with rationale for the chosen thresholds?
Is there a documented validation methodology and results from initial validation?
Does a system owner exist and are they aware of their accountability?
Phase 02
Data Governance Assessment
1 week per system
Evaluate data quality, lineage, consent basis, and ongoing data management practices. Data governance findings in AI audits frequently reveal GDPR, CCPA, or sector-specific regulatory exposure.
Is the legal basis for using training data documented? For personal data: is the consent or legitimate interest basis recorded?
Are data quality standards defined for training and production inference data?
Is data lineage traceable from source to model features?
Are there controls to detect training-to-production data distribution shifts (data drift)?
Is there a defined process for removing data from training sets when subject rights requests are received?
Phase 03
Performance and Monitoring Assessment
1 to 2 weeks per system
Evaluate whether the system is performing as intended in production, and whether there are adequate monitoring controls to detect degradation, drift, or failure. This phase reveals production risk that documentation review cannot surface.
Are production performance metrics being actively monitored? Are alert thresholds defined?
Has model performance degraded materially since deployment? Is there a record of degradation incidents?
Is there a champion/challenger or A/B testing framework in place for performance comparison?
Is the system's behavior under edge cases and out-of-distribution inputs understood?
Are there defined performance thresholds that would trigger model redevelopment or decommission?
Phase 04
Fairness and Ethics Assessment
2 to 3 weeks per High-Risk system
For systems that make or influence decisions affecting individuals, assess whether the system treats protected class groups equitably and whether adverse action explanations are available and adequate. Required for regulatory compliance in credit, employment, insurance, and healthcare AI.
Has demographic parity analysis been conducted across all applicable protected characteristics?
Is the disparate impact ratio documented and above the 80% threshold for each protected group?
Are individual adverse action explanations available, actionable, and in plain language?
Is ongoing fairness monitoring in place with automated alerts for disparate impact threshold breaches?
Have proxy variables for protected characteristics been identified and assessed?
Phase 05
Regulatory Compliance Assessment
1 to 2 weeks per system
Map the system against applicable regulatory requirements. For EU-based or EU-serving organizations: EU AI Act. For financial services: SR 11-7, DORA, and sector supervisory guidance. For healthcare: HIPAA and FDA SaMD guidance. For HR AI in the EU: GDPR Article 22 automated decision-making requirements.
Has the system been classified under the EU AI Act risk framework? Is the classification documented?
For High-Risk EU AI Act systems: is there a conformity assessment process in place or planned?
For financial services models: is the Model Development Plan SR 11-7 compliant?
For healthcare AI: is FDA SaMD applicability assessed? If applicable, is regulatory strategy documented?
For automated HR decisions: is the GDPR Article 22 basis documented and subject rights process operational?

Scoring and Prioritizing What You Find

The output of an AI audit should be a scored assessment for each system and a prioritized remediation plan. Score each system on four dimensions: Documentation Completeness, Ongoing Governance, Fairness and Ethics, and Regulatory Compliance. Each dimension scored 1 to 4, where 1 is inadequate and 4 is leading practice.

Score Documentation Ongoing Governance Fairness and Ethics Regulatory Compliance
1 — Inadequate No MDP or equivalent; intended use undocumented No monitoring; no system owner; no incident process No fairness testing conducted; no adverse action explanations No classification under applicable frameworks; potential violations
2 — Partial Some documentation exists but incomplete or outdated Monitoring exists but no alerts; system owner passive Fairness tested at deployment only; no ongoing monitoring Classification done; documentation incomplete; gaps identified
3 — Adequate Complete MDP; current; reviewed annually Active monitoring with alerts; owner engaged; annual review Ongoing fairness monitoring; adverse action explanations operational Full compliance with applicable requirements; documentation complete
4 — Leading Complete, current MDP with change history; independently validated Real-time monitoring; champion/challenger active; board-level reporting Continuous fairness monitoring; SHAP explanations; third-party audit Proactive regulatory engagement; conformity assessment complete; audit-ready

Classifying Audit Findings by Severity

Not all findings are equal. Classify each finding by severity before building the remediation plan. This prevents organizations from spending 80% of their remediation effort on low-severity documentation gaps while critical monitoring and fairness issues remain unaddressed.

Critical Finding
Immediate Regulatory or Legal Exposure
Potential GDPR violation, discriminatory model in production, SR 11-7 material deficiency, EU AI Act non-compliance for deployed High-Risk system. Requires remediation within 30 to 60 days or system suspension.
Major Finding
Material Governance Gap
No ongoing performance monitoring, no system owner, incomplete MDP for High-Risk system, no fairness testing on individual-affecting model. Requires remediation within 90 to 180 days.
Minor Finding
Documentation and Process Gap
Documentation incomplete for low-risk system, monitoring present but alert thresholds not formalized, annual review process defined but not yet executed. Remediation within 180 days to 12 months.

Audit Prioritization Rule: All Critical findings must have a remediation owner and deadline before the audit report is finalized. Do not proceed to Major finding remediation planning until Critical findings have owners. The urgency difference between Critical and Major is real.

Get the Enterprise AI Governance Handbook

56 pages covering risk classification, model lifecycle governance, EU AI Act compliance roadmap, and board reporting. Covers everything an AI audit remediation plan needs to address.

Download Free Handbook

Common Findings in Enterprise AI Audits

After conducting AI governance audits across 200+ enterprises, the same findings appear with high frequency regardless of industry or organization size. The most common Critical finding: AI systems in production with no system owner. Someone built the model, it was deployed, and the person who built it has since changed roles or left the organization. The model runs without anyone who understands it, monitors it, or is accountable for its outcomes.

The most common Major finding: monitoring systems that record performance metrics but have no defined alert thresholds and no process for acting on degradation. The data exists. No one reviews it. Models drift and degrade while the monitoring dashboard accumulates data that is never acted upon.

The most common documentation finding: Model Development Plans that were written for initial deployment approval but have never been updated. The model has been retrained, the feature set has changed, the performance thresholds have been adjusted, but the MDP reflects the original version. The documentation and the production system have diverged.

For organizations ready to start their AI audit program, the Enterprise AI Governance Handbook provides the full governance framework and the free AI readiness assessment gives you a scored baseline across six dimensions. Organizations with urgent audit timelines or regulatory pressure should engage our AI Governance advisory team directly: we have conducted more than 60 enterprise AI audits and can compress a 6-month internal effort into 6 to 8 weeks with a fixed-scope engagement.

Related Advisory Service

AI Governance Advisory

Build the oversight structures that let AI deploy at pace without creating legal or reputational exposure.

Explore AI Governance →
Free AI Readiness Assessment — 5 minutes. No obligation. Start Now →