Enterprise AI Glossary | 200+ Terms

A

Terms beginning with A

18 terms

Accuracy ML Metrics

The proportion of correct predictions made by a model across all predictions. Accuracy is the simplest classification metric but can be misleading when classes are imbalanced — a model that always predicts the majority class can achieve high accuracy while being useless in practice.

Enterprise context: Never use accuracy as the sole metric for credit default, fraud detection, medical diagnosis, or any domain with significant class imbalance. Use precision, recall, AUC-ROC, or F1 instead.

Adverse Action Notice Governance

A legally required notification to an individual when an automated or AI-assisted decision negatively affects them — for example, a credit denial or insurance rate increase. The notice must include the specific reasons for the decision in plain language.

Enterprise context: Required under ECOA (credit), FCRA, and equivalent regulations in other jurisdictions. AI-generated adverse action notices must use model-specific reason codes, not generic statements. SHAP values are commonly used to generate these codes.

Agentic AI Generative AI

AI systems that can autonomously plan, execute multi-step tasks, use tools, and adapt their approach based on intermediate results — without step-by-step human direction. Agentic AI systems typically combine a large language model with tool access (web search, code execution, API calls, file read/write) and a planning or orchestration framework.

Enterprise context: Agentic AI introduces qualitatively different governance requirements compared to single-turn LLM applications — specifically around tool authorization (what the agent can do), human-in-the-loop design (when the agent must seek approval), and audit logging (traceable records of all agent actions).

AI Act (EU AI Act) Governance

The European Union's comprehensive regulation on artificial intelligence, enacted in 2024. It classifies AI systems into risk tiers — unacceptable risk (prohibited), high risk (extensive obligations), limited risk (transparency requirements), and minimal risk (no additional obligations). High-risk applications include credit scoring, hiring, medical devices, critical infrastructure, and biometric identification.

Enterprise context: The EU AI Act applies to any organization deploying AI that affects EU residents, regardless of where the organization is headquartered. High-risk system operators must maintain technical documentation, conduct conformity assessments, register systems in the EU database, and establish post-market monitoring.

AI Center of Excellence (AI CoE) Strategy

An organizational unit responsible for coordinating AI strategy, maintaining standards, building shared capabilities, and enabling business unit AI programs. AI CoEs typically operate in one of three models: hub (fully centralized), hub-and-spoke (centralized platform with embedded business unit resources), or platform (decentralized with shared platform and standards).

Enterprise context: The most common AI CoE failure mode is becoming an "ivory tower" — building capabilities without business unit engagement. The most effective CoEs prioritize time-to-value for business units over internal platform perfection.

AI Governance Governance

The policies, processes, standards, and controls that ensure AI systems are developed and operated responsibly, safely, and in compliance with applicable regulations and organizational values. AI governance covers model risk management, bias and fairness, transparency and explainability, security, privacy, and regulatory compliance.

Enterprise context: AI governance is distinct from data governance and IT governance, though it intersects with both. The most common gap is a governance framework that exists as a written policy but is not operationalized into the model development and deployment workflow.

AI Readiness Strategy

An organization's preparedness to develop, deploy, and sustain AI systems at scale. AI readiness assessments typically evaluate six dimensions: data maturity, infrastructure, talent and skills, governance and risk management, use case viability, and organizational culture.

Enterprise context: 73% of AI implementation failures trace to readiness gaps identified in retrospect but not assessed upfront. A structured readiness assessment before program launch reduces average implementation delay from 8.4 months to under 3 months.

AI Risk Classification Governance

The process of assigning a risk tier to an AI system based on its potential for harm, the sensitivity of decisions it influences, regulatory requirements, and the reversibility of its outputs. Common classification frameworks use four tiers: critical/high risk, elevated risk, standard risk, and minimal risk.

Enterprise context: Risk classification determines the level of model documentation, testing, validation, and governance oversight required. Systems classified as high-risk under internal frameworks should align with EU AI Act high-risk classification criteria to avoid regulatory surprises.

Algorithm Audit Governance

A structured review of an AI system's design, training data, performance metrics, and deployment context to assess potential harms, biases, or regulatory compliance gaps. Algorithm audits may be conducted internally or by third-party auditors and are increasingly required by regulation in financial services and employment contexts.

Anomaly Detection Machine Learning

Machine learning techniques that identify data points or patterns that deviate significantly from expected behavior. Applications include fraud detection, network intrusion detection, equipment fault detection, and quality control.

Enterprise context: Anomaly detection models are highly sensitive to the definition of "normal" at training time. Models trained on pre-COVID transaction patterns, for example, may generate excessive false positives during unusual economic periods. Regular recalibration of baseline distributions is essential.

API (Application Programming Interface) Architecture

In the AI context, an interface through which applications access AI model capabilities — typically LLM inference, embedding generation, or prediction endpoints — over HTTP. API-based AI access allows organizations to consume AI capabilities without managing model infrastructure.

Enterprise context: API-based access to LLMs (OpenAI, Anthropic, Google) introduces data privacy, latency, cost, and vendor dependency considerations. Most enterprise use cases require data processing agreements confirming that API inputs are not used for model training.

Area Under the Curve (AUC-ROC) ML Metrics

A classification metric measuring a model's ability to distinguish between classes across all possible decision thresholds. AUC of 1.0 is perfect discrimination; AUC of 0.5 is random chance. The ROC curve plots true positive rate against false positive rate at each threshold.

Enterprise context: AUC is threshold-independent, making it the preferred metric for comparing models when the optimal operating threshold depends on business context (e.g., the relative cost of false positives vs. false negatives in fraud detection or credit underwriting).

Attention Mechanism Machine Learning

A neural network component that allows a model to dynamically weight the relevance of different parts of its input when generating an output. The attention mechanism is the core architectural innovation underlying transformer models, enabling them to capture long-range dependencies in text.

AutoML Machine Learning

Automated machine learning tools that automate parts of the model development process — feature selection, algorithm selection, hyperparameter optimization, and sometimes model deployment. AutoML lowers the technical barrier to model development but does not eliminate the need for domain expertise in problem framing and result interpretation.

Enterprise context: AutoML is effective for well-defined tabular prediction problems with clean data. It is less appropriate for complex feature engineering requirements, specialized model architectures, or use cases requiring deep customization. AutoML output must still pass model risk governance review.

Average Handle Time (AHT) Reduction Business Metric

A common AI ROI metric in contact center deployments, measuring the reduction in time agents spend handling each customer interaction after AI assistance is deployed. AHT reductions of 15-30% are typical in well-implemented contact center AI deployments.

Ready to assess your AI readiness?

Our free AI readiness assessment covers all six dimensions — data, infrastructure, talent, governance, use cases, and culture.

Take the Free Assessment →

B

Terms beginning with B

9 terms

Baseline Model Machine Learning

A simple reference model — often a rule-based system, logistic regression, or simple average — used as a performance benchmark before introducing more complex ML approaches. If a complex model cannot materially outperform the baseline, the added complexity is not justified.

Enterprise context: Establishing a baseline is one of the most commonly skipped steps in enterprise AI development. Without it, teams cannot determine whether their model is actually adding value or simply codifying existing rules in a more opaque form.

Batch Inference Architecture

Running a model on a large dataset at a scheduled time, producing predictions that are stored and consumed later, rather than in real time. Batch inference is more cost-efficient than real-time inference for use cases where predictions do not need to be available in seconds.

Enterprise context: Most enterprise AI use cases are batch, not real-time — demand forecasting, churn prediction, credit scoring in underwriting queues, and document classification are common examples. Real-time inference adds significant infrastructure cost and complexity that batch does not require.

Bias (Model Bias) Governance

Systematic errors in model outputs that disadvantage certain groups or produce skewed results. Sources include biased training data (historical data reflecting past discrimination), proxy variables that correlate with protected attributes, and optimization objectives that are misaligned with equitable outcomes.

Enterprise context: Model bias is both a regulatory risk (disparate impact requirements in credit and employment) and a business risk (customer churn, reputational damage, regulatory action). Bias assessment must be conducted before production deployment and monitored continuously.

BM25 Retrieval

A classic information retrieval algorithm that scores documents based on the frequency and inverse document frequency of query terms. BM25 is a sparse retrieval method that excels at exact keyword matching, outperforming dense vector search for queries containing specific codes, names, or technical terms.

Enterprise context: In enterprise RAG deployments, BM25 combined with dense vector search in a hybrid retrieval architecture handles 20-30% of queries more accurately than dense-only approaches. Critical for regulatory citation, product code, and proper noun queries.

Bootstrapping (Statistical) Machine Learning

A resampling technique used to estimate the uncertainty of a model's performance metrics by repeatedly sampling from the available data with replacement. Bootstrapping provides confidence intervals for metrics when a held-out test set is too small for reliable estimation.

Business Case (AI) Strategy

A structured analysis justifying AI investment by quantifying expected benefits, total costs, risk adjustments, and return on investment. A credible enterprise AI business case covers five value categories (cost reduction, revenue growth, risk reduction, operational efficiency, customer experience) and twelve cost categories including the 40-60% typically underestimated in initial analyses.

Enterprise context: The most common reason AI business cases fail CFO review is an incomplete cost model — typically missing data infrastructure, integration, governance, change management, and ongoing monitoring costs. See our AI ROI Guide for a complete TCO framework.

C

Terms beginning with C

16 terms

Champion/Challenger Testing MLOps

A production deployment strategy where a new model (challenger) receives a portion of live traffic alongside the existing model (champion), allowing direct comparison of business outcomes under real conditions before full cutover. Common traffic splits are 90/10 or 80/20 champion/challenger.

Enterprise context: Required by SR 11-7 for material model updates in financial services. Champion/challenger testing reveals production performance differences that validation testing on historical data cannot — particularly for distribution shift and edge case behavior.

Chunking (RAG) Generative AI

The process of dividing documents into segments for indexing in a vector database for RAG systems. Chunking strategy — fixed size, sentence boundary, semantic, or hierarchical — significantly affects retrieval quality. Poor chunking that splits semantic units at arbitrary boundaries is a leading cause of RAG hallucination.

Enterprise context: Hierarchical chunking with structure-aware document parsing is recommended for contracts, financial reports, and regulatory documents. Naive fixed-size chunking is inappropriate for these document types, as it destroys clause context and cross-reference relationships.

Classification (Machine Learning) Machine Learning

A supervised learning task where the model assigns inputs to discrete categories. Binary classification (two classes) and multi-class classification (more than two classes) are the two variants. Examples include credit default prediction (default/no default), document categorization, and fraud detection.

Confusion Matrix ML Metrics

A table showing the four possible outcomes of a binary classifier: true positives, true negatives, false positives, and false negatives. The confusion matrix is the basis for precision, recall, specificity, and F1 score calculations and is essential for understanding the error distribution of a classification model.

Context Window Generative AI

The maximum amount of text (measured in tokens) that an LLM can process in a single inference call — including the system prompt, conversation history, retrieved context, and the model's own output. Context window sizes range from 8,000 tokens (older models) to 2 million tokens (Gemini 1.5 Pro).

Enterprise context: A larger context window does not mean performance is equal across the full window. Most LLMs exhibit degraded accuracy for information located in the middle of very long contexts ("lost in the middle" phenomenon). Retrieval-augmented approaches remain valuable even with large context windows for precision and cost reasons.

Continuous Integration / Continuous Deployment for ML (CI/CD-ML) MLOps

The adaptation of software CI/CD practices to machine learning, automating model training, validation, testing, and deployment pipelines. ML-specific additions include data validation, model performance testing, and automated model registry management.

Enterprise context: Organizations running more than five models in production require automated CI/CD-ML pipelines to manage the complexity of simultaneous retraining, validation, and deployment cycles. Manual processes do not scale beyond this threshold.

Cross-Encoder Reranking Retrieval

A second-stage retrieval step in RAG systems where a cross-encoder model jointly processes the query and each candidate document to score relevance more precisely than the first-stage vector similarity search. Cross-encoder reranking improves retrieval precision by 15-25% at the cost of added latency (200-400ms).

CSAT (Customer Satisfaction Score) Business Metric

A survey-based metric measuring customer satisfaction with a product, service, or interaction. AI deployments in customer service, personalization, and self-service channels are frequently evaluated against CSAT improvement alongside efficiency metrics.

D

Terms beginning with D

12 terms

Data Drift Data / MLOps

A change in the statistical distribution of model input features over time compared to the training data distribution. Data drift can cause model performance to degrade gradually without any errors in the model code or infrastructure. It is measured using Population Stability Index (PSI) and Kullback-Leibler divergence.

Enterprise context: Data drift typically precedes accuracy degradation by 4-8 weeks. Monitoring input feature distributions and alerting on PSI above 0.2 provides an early warning that enables proactive retraining rather than reactive incident response after business metrics have already been affected.

Data Governance for AI Data Governance

Policies and practices governing how data is collected, stored, accessed, labeled, and used for AI model training and inference. AI-specific data governance extends traditional data governance with lineage requirements for training datasets, consent management for personal data used in model training, and data quality standards specific to ML use.

Data Lake Data Architecture

A centralized repository storing large volumes of raw data in native format — structured, semi-structured, and unstructured — at low cost. Data lakes are often used as the foundation for AI feature engineering and model training pipelines.

Enterprise context: A data lake alone does not make an organization AI-ready. The most common data lake failure for AI use cases is absent data quality standards — the lake contains the data, but it cannot be relied upon without significant cleaning and validation before model training.

Deep Learning Machine Learning

A subset of machine learning using multi-layer neural networks with many layers (hence "deep") to learn representations from raw data. Deep learning underpins modern computer vision, natural language processing, speech recognition, and the large language models that power generative AI.

Disparate Impact Governance

A legal doctrine holding that policies or practices that appear neutral can still be discriminatory if they have a disproportionate adverse effect on a protected group. The four-fifths rule (80% rule) is a common threshold: if the selection rate for any protected group is less than 80% of the rate for the most-selected group, disparate impact may exist.

Enterprise context: Disparate impact in AI models used for credit underwriting, employment screening, insurance pricing, or housing decisions is subject to legal challenge under ECOA, Title VII, Fair Housing Act, and equivalent regulations. Regular disparate impact analysis is required, not optional.

Navigating AI governance requirements?

Our Enterprise AI Governance Handbook covers EU AI Act, SR 11-7, disparate impact, and model risk management in practical detail.

Download the Governance Handbook →

E

Terms beginning with E

8 terms

Embedding Generative AI

A dense numerical vector representation of text, images, or other data that encodes semantic meaning. Semantically similar items are located close together in embedding space. Embeddings are the foundation of RAG systems — documents and queries are both embedded, and retrieval is based on vector similarity.

Enterprise context: The quality of your embedding model has a larger impact on RAG system accuracy than most teams expect. General-purpose embeddings underperform domain-specific embeddings for specialized vocabulary (legal, medical, financial). Domain-fine-tuned embeddings show 12-18% recall improvement in these domains.

Ensemble Methods Machine Learning

Techniques that combine predictions from multiple models to achieve better overall performance than any single model. Common ensemble methods include bagging (Random Forest), boosting (XGBoost, LightGBM, CatBoost), and stacking. Ensembles are particularly effective in tabular data prediction tasks.

Explainability (AI Explainability) Governance

The ability to describe why an AI model produced a specific output in terms that humans can understand. Global explainability characterizes overall model behavior; local explainability explains individual predictions. Explainability is both a governance requirement (EU AI Act, SR 11-7) and a practical tool for model debugging and stakeholder trust.

Enterprise context: SHAP (SHapley Additive exPlanations) is the most widely used local explainability method in enterprise contexts because it provides consistent feature attribution values that satisfy adverse action notice requirements and audit documentation needs.

Experiment Tracking MLOps

The practice of systematically recording model training experiments — hyperparameters, training data versions, metrics, artifacts, and configuration — to enable reproducibility and systematic comparison of model variants. Common tools include MLflow, Weights and Biases, and Comet.

Enterprise context: Experiment tracking is not optional for regulated industries. SR 11-7 requires that model development decisions be documented and reproducible. Untracked experiments mean that a model's development cannot be recreated for model risk review or regulatory examination.

F

Terms beginning with F

10 terms

F1 Score ML Metrics

The harmonic mean of precision and recall, ranging from 0 to 1. F1 is a balanced metric for classification tasks where both false positives and false negatives are costly. F1 is preferred over accuracy when classes are imbalanced.

Faithfulness (RAGAS) Generative AI

A RAG evaluation metric measuring the proportion of claims in a generated response that can be directly attributed to the retrieved context. High faithfulness indicates low hallucination risk. Target faithfulness above 0.90 for regulated industry RAG deployments.

Enterprise context: Faithfulness is the primary hallucination metric for enterprise RAG systems. It should be measured both on a labeled evaluation set during development and continuously on production samples. A faithfulness drop of more than 0.05 on a 7-day rolling average should trigger investigation.

Feature Engineering Data

The process of transforming raw data into features that improve model performance. Feature engineering combines domain knowledge (knowing what signals matter for the prediction task) with statistical techniques (aggregations, transformations, interaction terms, time-based features). Feature quality typically has more impact on model performance than algorithm selection.

Feature Store MLOps

A centralized repository for storing, versioning, and serving engineered features for model training and inference. Feature stores ensure consistency between features used in training and features served at inference time, and enable feature reuse across models and teams.

Enterprise context: Feature stores add significant infrastructure cost and operational complexity. They are justified when: multiple teams are building models on similar data, real-time feature serving is required for latency-sensitive use cases, or the organization has more than 8-10 models in production consuming similar feature sets.

Fine-Tuning Generative AI

Updating a pre-trained model's weights on domain-specific data to improve performance on a specific task or adapt its behavior. Full fine-tuning updates all model weights; parameter-efficient fine-tuning methods (LoRA, QLoRA) update only a small fraction of weights, making fine-tuning of large language models more computationally feasible.

Enterprise context: Fine-tuning is appropriate when RAG cannot achieve the required quality (typically when the task requires internalized domain knowledge rather than retrieved facts), when inference cost at scale makes RAG economically unviable, or when the organization cannot send data to external APIs for privacy reasons. Fine-tuning requires high-quality domain training data, which is often the binding constraint.

Foundation Model Generative AI

A large-scale pre-trained model — trained on vast amounts of data at significant computational cost — that can be adapted to a wide range of downstream tasks. Foundation models include large language models (GPT-4, Claude, Gemini), vision models (CLIP, DALL-E), and multi-modal models. They are the base layer for most enterprise generative AI deployments.

G

Terms beginning with G

7 terms

Gradient Boosting Machine Learning

An ensemble learning technique that builds models sequentially, where each new model corrects the errors of the previous ensemble. Implementations include XGBoost, LightGBM, and CatBoost. Gradient boosting consistently achieves top performance on tabular data prediction tasks — fraud detection, credit scoring, churn prediction, demand forecasting — and is the most widely deployed algorithm in enterprise production AI.

Guardrails (LLM) Generative AI

Input and output validation layers applied to LLM interactions to prevent harmful, off-topic, or policy-violating responses. Guardrails may be implemented as prompt instructions (soft guardrails), separate classification models (hard guardrails), or both. Enterprise guardrails typically cover topic restrictions, PII detection, hallucination filtering, and output format validation.

GPU (Graphics Processing Unit) Infrastructure

Specialized processors originally designed for graphics rendering, now essential for training and inference of deep learning models due to their ability to perform massively parallel mathematical operations. NVIDIA H100 and A100 GPUs are the dominant infrastructure for enterprise AI workloads. GPU availability and cost are the primary infrastructure constraints for enterprise AI programs.

H

Terms beginning with H

6 terms

Hallucination Generative AI

A response generated by an LLM that is factually incorrect, fabricated, or not grounded in the provided context, but presented with apparent confidence. Hallucinations are an inherent property of current LLM architectures and cannot be completely eliminated — only mitigated through retrieval augmentation, output verification, and appropriate deployment design.

Enterprise context: The risk profile of hallucinations varies significantly by use case. A hallucination in an internal knowledge base assistant is an inconvenience; a hallucination in a clinical decision support tool or a legal document review system is a significant patient safety or professional liability risk. Use case risk classification should drive hallucination mitigation investment.

Hyperparameter Machine Learning

A configuration parameter of a machine learning model or training process that is set before training begins (not learned from data). Examples include learning rate, number of layers in a neural network, maximum tree depth in gradient boosting, and the number of clusters in k-means. Hyperparameter tuning — finding the combination that maximizes model performance — is a standard step in model development.

Human-in-the-Loop (HITL) Governance

A system design that incorporates human review, decision-making, or oversight at specific points in an AI-assisted workflow. HITL design ranges from full human review of all AI outputs (common in early deployment) to exception-only human review (escalations triggered by model uncertainty or edge case detection).

Enterprise context: The EU AI Act requires human oversight for high-risk AI systems. The practical design question is not whether to include humans but when and how — specifically, what the uncertainty threshold is for automatic escalation, what information humans need to review AI recommendations effectively, and what the workflow is for disagreement between human and AI judgments.

L

Terms beginning with L

9 terms

Large Language Model (LLM) Generative AI

A neural network model trained on massive text corpora that can generate, summarize, translate, classify, and reason about text. Enterprise-relevant LLMs include GPT-4o (OpenAI), Claude 3.5/3.7 (Anthropic), Gemini 1.5/2.0 (Google), and open-source models (Llama 3, Mistral). LLMs are the core component of generative AI enterprise applications.

Enterprise context: LLM selection for enterprise deployment should be based on task-specific performance benchmarks conducted on your actual data — not on public leaderboard scores, which measure general capabilities and may not predict performance on your specific domain vocabulary and task requirements.

Latency Architecture

The time delay between an AI system receiving an input and returning an output. For real-time inference applications, latency is measured at percentile levels — p50 (median), p95 (95th percentile), and p99 (99th percentile). p99 latency, the time within which 99% of requests are handled, is the most important metric for user experience in interactive systems.

LoRA (Low-Rank Adaptation) Generative AI

A parameter-efficient fine-tuning method that adds small, trainable rank-decomposition matrices to specific layers of a pre-trained LLM while freezing the original weights. LoRA reduces fine-tuning compute and memory requirements by 10-100x compared to full fine-tuning while achieving comparable performance on domain adaptation tasks.

LSTM (Long Short-Term Memory) Machine Learning

A type of recurrent neural network architecture designed to learn long-range temporal dependencies in sequential data. LSTMs were the dominant architecture for time-series prediction and NLP before the transformer architecture. They remain effective and widely deployed for IoT sensor data, financial time-series, and clinical event sequence modeling.

M

Terms beginning with M

12 terms

Model Card Governance

A structured documentation artifact describing an AI model's intended use, performance metrics, evaluation methodology, training data, known limitations, and ethical considerations. Model cards are a best practice in responsible AI development and are increasingly required by enterprise AI governance frameworks and regulations.

Model Drift MLOps

Degradation in model prediction quality over time due to changes in the real world that were not reflected in the training data. Model drift is the combination of data drift (input distribution changes) and concept drift (the relationship between inputs and outputs changes). Regular production monitoring is required to detect drift before it materially affects business outcomes.

Model Risk Management (MRM) Governance

The practice of identifying, measuring, and mitigating risks arising from the use of quantitative models — including AI and machine learning models — in decision-making. MRM encompasses model development standards, independent validation, ongoing monitoring, and model lifecycle governance. SR 11-7 is the primary US regulatory framework for MRM in financial services.

Enterprise context: SR 11-7 applies to all "models" as broadly defined — including ML and AI systems used in credit decisions, risk measurement, capital planning, and stress testing. Financial services firms deploying AI must build model development processes that satisfy SR 11-7 validation requirements from the start, not as a retrofit.

Multi-Agent System Generative AI

An AI architecture where multiple specialized agents collaborate to complete complex tasks — each with a specific role, tool access, and domain of responsibility. One agent may coordinate others (orchestrator), while specialized agents handle research, code execution, verification, or communication tasks. Multi-agent systems enable parallelism and specialization that single agents cannot achieve.

Enterprise context: Multi-agent systems multiply the governance complexity of single-agent systems. Each agent-to-agent interaction is a potential failure point, and the combined system can exhibit emergent behaviors not present in individual agents. Comprehensive audit logging of inter-agent communications is essential for production deployments.

MLOps MLOps

A set of practices combining machine learning development and operations, focused on deploying, monitoring, and maintaining ML models in production reliably and efficiently. MLOps covers the full model lifecycle: experiment tracking, model registry, pipeline automation, deployment, monitoring, and retraining workflows.

Enterprise context: Organizations with five or more models in production require a formal MLOps platform. Teams without MLOps infrastructure spend an estimated 40-60% of ML engineer time on manual operational tasks that should be automated, significantly reducing capacity for model development.

P

Terms beginning with P

11 terms

Precision ML Metrics

The proportion of positive predictions that are correct: True Positives / (True Positives + False Positives). High precision means the model rarely raises a false alarm. Precision is prioritized when false positives are costly — for example, in clinical screening where unnecessary procedures are expensive and harmful.

Prompt Engineering Generative AI

The practice of crafting input instructions (prompts) to elicit specific behaviors or outputs from large language models. Effective prompt engineering techniques include few-shot examples, chain-of-thought reasoning instructions, persona assignment, output format specification, and constraint statements. Prompt engineering does not modify model weights — it shapes model behavior through instruction.

Enterprise context: Prompt engineering should be version-controlled and treated as software — not ad-hoc text. Prompt regression testing (validating that prompt changes do not degrade performance on a standard evaluation set) is necessary for production LLM applications.

Prompt Injection Security

An attack where malicious instructions embedded in user input or retrieved context override the intended behavior of an LLM-based system. In a direct prompt injection, an attacker provides instructions directly. In an indirect prompt injection (increasingly important for agentic systems), malicious instructions are embedded in documents or web content that the AI retrieves and processes.

Enterprise context: Prompt injection is the most significant security concern for enterprise LLM deployments, especially agentic systems with tool access. Mitigation requires input sanitization, output filtering, privileged instruction separation, and limiting the scope of tools available to AI agents.

Population Stability Index (PSI) MLOps

A metric measuring the shift in the distribution of a variable between two time periods — typically the training period and a monitoring period. PSI below 0.1 indicates negligible shift; 0.1-0.2 indicates moderate shift requiring attention; above 0.2 indicates significant shift requiring investigation and potential model retraining.

R

Terms beginning with R

10 terms

RAG (Retrieval-Augmented Generation) Generative AI

An architecture that enhances LLM responses by first retrieving relevant information from a knowledge base, then providing that information as context to the LLM for response generation. RAG enables LLMs to answer questions about proprietary documents, reduces hallucination by grounding responses in source material, and enables source attribution for audit purposes.

Enterprise context: RAG is the default architecture for most enterprise GenAI use cases involving proprietary documents — contract review, policy Q&A, knowledge base assistants, regulatory guidance lookup. Naive single-stage RAG implementation consistently underperforms due to poor chunking and lack of retrieval optimization.

Recall ML Metrics

The proportion of actual positives that are correctly identified: True Positives / (True Positives + False Negatives). High recall means the model rarely misses a true positive. Recall is prioritized when false negatives are costly — for example, in cancer screening where missing a case is worse than a false alarm.

Responsible AI Governance

A framework for developing and deploying AI in ways that are ethical, fair, transparent, accountable, and aligned with human values and legal requirements. Responsible AI principles typically cover fairness and non-discrimination, transparency and explainability, human oversight, privacy, security, reliability, and environmental sustainability.

Enterprise context: Most large organizations have published responsible AI principles. The gap is in operationalization — translating principles into specific controls, checklists, review gates, and monitoring requirements that are embedded into standard development and deployment workflows.

ROC Curve (Receiver Operating Characteristic) ML Metrics

A graphical plot of true positive rate against false positive rate at all classification thresholds. The area under the ROC curve (AUC-ROC) is a threshold-independent summary of model discrimination ability. A model with AUC of 0.90 correctly ranks a random positive example above a random negative example 90% of the time.

S

Terms beginning with S

12 terms

Shadow Mode Deployment MLOps

A deployment strategy where a new model runs in parallel with the existing production system, receiving the same inputs and producing outputs that are logged but not acted upon. Shadow mode allows comparison of a new model against the current system on real production data without risk to live operations.

Enterprise context: Shadow mode deployment should run for a minimum of two weeks before production cutover to catch input distribution differences, integration edge cases, and performance anomalies that only appear on real data. Teams that skip shadow mode under schedule pressure consistently encounter production surprises in week one of go-live.

SHAP (SHapley Additive exPlanations) Governance

A model-agnostic explainability framework based on game theory that assigns a contribution value to each feature for each individual prediction. SHAP values sum to the model output, are consistent across models, and provide both local (individual prediction) and global (model-level) explanations.

Enterprise context: SHAP is the industry standard for adverse action reason codes in credit models, satisfying ECOA requirements for specific, accurate reason statements. SHAP is also required documentation under many model risk management frameworks for the validation of high-impact models.

SR 11-7 Governance

Guidance on Model Risk Management issued by the US Federal Reserve and OCC in 2011, establishing the standard for model development, independent validation, and ongoing monitoring at financial institutions. SR 11-7 defines a "model" broadly enough to include AI and ML systems and specifies requirements for documentation, validation, and governance.

Enterprise context: SR 11-7 compliance is not optional for US bank holding companies and national banks. Financial services AI programs must design model development processes (training documentation, experiment tracking, validation evidence) to satisfy SR 11-7 requirements from inception, not as a retrofit after development is complete.

Supervised Learning Machine Learning

A machine learning paradigm where models are trained on labeled examples — inputs paired with the correct outputs — to learn a mapping function that can be applied to new inputs. Classification and regression are both supervised learning tasks. The quality and quantity of labeled training data is the primary determinant of model performance.

System Prompt Generative AI

Instructions provided to an LLM at the beginning of a context window that define its role, constraints, persona, and behavioral guidelines for the entire conversation. System prompts are invisible to end users but govern model behavior. In enterprise deployments, system prompts should be version-controlled and reviewed as part of governance processes.

T

Terms beginning with T

9 terms

Target Leakage Machine Learning

A model training error where features derived from information that would not be available at prediction time are included in the training data, artificially inflating validation performance. Target leakage produces models that appear highly accurate in validation but fail dramatically in production because the "leaked" features are not available when making real predictions.

Enterprise context: Target leakage is the most commonly missed data quality issue in enterprise AI implementations. It is found in 44% of programs we audit. The impact is severe: validation metrics overstate production performance by 20-40%, leading to model risk approval on inflated results. A structured leakage audit before training prevents this entirely.

Token Generative AI

The basic unit of text that LLMs process — roughly equivalent to three-quarters of a word in English. LLMs encode text as sequences of tokens for processing and generate text token by token. Context window sizes, API pricing, and latency are all measured in tokens.

Transfer Learning Machine Learning

A technique where a model pre-trained on a large dataset is adapted to a new, related task using a smaller domain-specific dataset. Transfer learning is the foundation of both fine-tuning LLMs and domain-specific image models — it enables high-quality models without the vast data and compute required to train from scratch.

Transformer Machine Learning

A neural network architecture introduced in the 2017 paper "Attention Is All You Need" that uses self-attention mechanisms to process sequences in parallel. The transformer architecture underpins all major large language models (GPT, Claude, Gemini, Llama) and has also been applied to vision, code, protein structure prediction, and other domains.

V

Terms beginning with V

6 terms

Vector Database Generative AI

A specialized database optimized for storing, indexing, and querying high-dimensional vectors (embeddings). Vector databases use approximate nearest neighbor (ANN) algorithms (HNSW, IVF) to find the most similar vectors to a query efficiently. Common enterprise vector databases include Pinecone, Weaviate, Qdrant, Milvus, and pgvector.

Enterprise context: Vector database selection should be based on corpus size, deployment model (managed cloud vs. on-premises), filtering requirements, and performance benchmarks on your specific data. Recall quality at production scale differs significantly across vendors — our RAG Architecture Guide contains benchmarks at 10M, 100M, and 1B vector scales.

Vector Search Retrieval

A retrieval method that finds the most semantically similar documents to a query by comparing embedding vectors using distance metrics (cosine similarity, L2 distance). Vector search enables semantic retrieval — finding documents that are conceptually related to the query even when they share no exact keywords.

Z

Terms beginning with Z

2 terms

Zero-Shot Learning Generative AI

The ability of a model to perform a task it has not been explicitly trained on, based solely on a task description. LLMs demonstrate zero-shot capability across many tasks — classification, summarization, translation, and reasoning — without requiring task-specific fine-tuning or examples.

Zero-Trust Architecture (AI) Security

A security model applied to AI systems that assumes no component — including AI agents, tools, or data sources — should be trusted by default. Zero-trust AI architecture applies least-privilege access controls to all model actions, requires continuous verification, and logs all access and tool use for audit purposes.

Enterprise context: Zero-trust principles are particularly important for agentic AI systems that have access to tools (email, databases, file systems, APIs). Without zero-trust controls, a compromised or misbehaving agent can cause outsized harm before detection.

Want to go deeper on any of these topics?

Our research library has 20 white papers covering AI strategy, GenAI architecture, governance, vendor selection, and more — all free to download.

Browse the Research Library →

Enterprise AI Glossary: 200+ Terms Defined

Terms beginning with A

Ready to assess your AI readiness?

Terms beginning with B

Terms beginning with C

Terms beginning with D

Navigating AI governance requirements?

Terms beginning with E

Terms beginning with F

Terms beginning with G

Terms beginning with H

Terms beginning with L

Terms beginning with M

Terms beginning with P

Terms beginning with R

Terms beginning with S

Terms beginning with T

Terms beginning with V

Terms beginning with Z

Want to go deeper on any of these topics?

Get the AI Strategy Playbook — Free