Test descriptions

Published

January 28, 2026

Tests that are available as part of the ValidMind Library, grouped by type of validation or monitoring test.

Try the test sandbox ^beta

Explore our interactive sandbox to see what tests are available in the ValidMind Library.

ACFandPACFPlot

Analyzes time series data using Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots to reveal trends and correlations.

ADF

Assesses the stationarity of a time series dataset using the Augmented Dickey-Fuller (ADF) test.

AutoAR

Automatically identifies the optimal Autoregressive (AR) order for a time series using BIC and AIC criteria.

AutoMA

Automatically selects the optimal Moving Average (MA) order for each variable in a time series dataset based on minimal BIC and AIC values.

AutoStationarity

Automates Augmented Dickey-Fuller test to assess stationarity across multiple time series in a DataFrame.

BivariateScatterPlots

Generates bivariate scatterplots to visually inspect relationships between pairs of numerical predictor variables in machine learning classification tasks.

BoxPierce

Detects autocorrelation in time-series data through the Box-Pierce test to validate model performance.

ChiSquaredFeaturesTable

Assesses the statistical association between categorical features and a target variable using the Chi-Squared test.

ClassImbalance

Evaluates and quantifies class distribution imbalance in a dataset used by a machine learning model.

DatasetDescription

Provides comprehensive analysis and statistical summaries of each column in a machine learning model's dataset.

DatasetSplit

Evaluates and visualizes the distribution proportions among training, testing, and validation datasets of an ML model.

DescriptiveStatistics

Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's dataset.

DickeyFullerGLS

Assesses stationarity in time series data using the Dickey-Fuller GLS test to determine the order of integration.

Duplicates

Tests dataset for duplicate entries, ensuring model reliability via data quality verification.

EngleGrangerCoint

Assesses the degree of co-movement between pairs of time series data using the Engle-Granger cointegration test.

FeatureTargetCorrelationPlot

Visualizes the correlation between input features and the model's target output in a color-coded horizontal bar plot.

HighCardinality

Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting.

HighPearsonCorrelation

Identifies highly correlated feature pairs in a dataset suggesting feature redundancy or multicollinearity.

IQROutliersBarPlot

Visualizes outlier distribution across percentiles in numerical data using the Interquartile Range (IQR) method.

IQROutliersTable

Determines and summarizes outliers in numerical features using the Interquartile Range method.

IsolationForestOutliers

Detects outliers in a dataset using the Isolation Forest algorithm and visualizes results through scatter plots.

JarqueBera

Assesses normality of dataset features in an ML model using the Jarque-Bera test.

KPSS

Assesses the stationarity of time-series data in a machine learning model using the KPSS unit root test.

LJungBox

Assesses autocorrelations in dataset features by performing a Ljung-Box test on each feature.

LaggedCorrelationHeatmap

Assesses and visualizes correlation between target variable and lagged independent variables in a time-series dataset.

MissingValues

Evaluates dataset quality by ensuring missing value ratio across all features does not exceed a set threshold.

MissingValuesBarPlot

Assesses the percentage and distribution of missing values in the dataset via a bar plot, with emphasis on identifying high-risk columns based on a user-defined threshold.

MutualInformation

Calculates mutual information scores between features and target variable to evaluate feature relevance.

PearsonCorrelationMatrix

Evaluates linear dependency between numerical variables in a dataset via a Pearson Correlation coefficient heat map.

PhillipsPerronArch

Assesses the stationarity of time series data in each feature of the ML model using the Phillips-Perron test.

ProtectedClassesCombination

Visualizes combinations of protected classes and their corresponding error metric differences.

ProtectedClassesDescription

Visualizes the distribution of protected classes in the dataset relative to the target variable and provides descriptive statistics.

ProtectedClassesDisparity

Investigates disparities in model performance across different protected class segments.

ProtectedClassesThresholdOptimizer

Obtains a classifier by applying group-specific thresholds to the provided estimator.

RollingStatsPlot

Evaluates the stationarity of time series data by plotting its rolling mean and standard deviation over a specified window.

RunsTest

Executes Runs Test on ML model to detect non-random patterns in output data sequence.

ScatterPlot

Assesses visual relationships, patterns, and outliers among features in a dataset through scatter plot matrices.

ScoreBandDefaultRates

Analyzes default rates and population distribution across credit score bands.

SeasonalDecompose

Assesses patterns and seasonality in a time series dataset by decomposing its features into foundational components.

ShapiroWilk

Evaluates feature-wise normality of training data using the Shapiro-Wilk test.

Skewness

Evaluates the skewness of numerical data in a dataset to check against a defined threshold, aiming to ensure data quality and optimize model performance.

SpreadPlot

Assesses potential correlations between pairs of time series variables through visualization to enhance understanding of their relationships.

TabularCategoricalBarPlots

Generates and visualizes bar plots for each category in categorical features to evaluate the dataset's composition.

TabularDateTimeHistograms

Generates histograms to provide graphical insight into the distribution of time intervals in a model's datetime data.

TabularDescriptionTables

Summarizes key descriptive statistics for numerical, categorical, and datetime variables in a dataset.

TabularNumericalHistograms

Generates histograms for each numerical feature in a dataset to provide visual insights into data distribution and detect potential issues.

TargetRateBarPlots

Generates bar plots visualizing the default rates of categorical features for a classification machine learning model.

TimeSeriesDescription

Generates a detailed analysis for the provided time series dataset, summarizing key statistics to identify trends, patterns, and data quality issues.

TimeSeriesDescriptiveStatistics

Evaluates the descriptive statistics of a time series dataset to identify trends, patterns, and data quality issues.

TimeSeriesFrequency

Evaluates consistency of time series data frequency and generates a frequency plot.

TimeSeriesHistogram

Visualizes distribution of time-series data using histograms and Kernel Density Estimation (KDE) lines.

TimeSeriesLinePlot

Generates and analyses time-series data through line plots revealing trends, patterns, anomalies over time.

TimeSeriesMissingValues

Validates time-series data quality by confirming the count of missing values is below a certain threshold.

TimeSeriesOutliers

Identifies and visualizes outliers in time-series data using the z-score method.

TooManyZeroValues

Identifies numerical columns in a dataset that contain an excessive number of zero values, defined by a threshold percentage.

UniqueRows

Verifies the diversity of the dataset by ensuring that the count of unique rows exceeds a prescribed threshold.

WOEBinPlots

Generates visualizations of Weight of Evidence (WoE) and Information Value (IV) for understanding predictive power of categorical variables in a data set.

WOEBinTable

Assesses the Weight of Evidence (WoE) and Information Value (IV) of each feature to evaluate its predictive power in a binary classification model.

ZivotAndrewsArch

Evaluates the order of integration and stationarity of time series data using the Zivot-Andrews unit root test.

BertScore

Assesses the quality of machine-generated text using BERTScore metrics and visualizes results through histograms and bar charts, alongside compiling a comprehensive table of descriptive statistics.

BleuScore

Evaluates the quality of machine-generated text using BLEU metrics and visualizes the results through histograms and bar charts, alongside compiling a comprehensive table of descriptive statistics for BLEU scores.

ClusterSizeDistribution

Assesses the performance of clustering models by comparing the distribution of cluster sizes in model predictions with the actual data.

ContextualRecall

Evaluates a Natural Language Generation model's ability to generate contextually relevant and factually correct text, visualizing the results through histograms and bar charts, alongside compiling a comprehensive table of descriptive statistics for…

FeaturesAUC

Evaluates the discriminatory power of each individual feature within a binary classification model by calculating the Area Under the Curve (AUC) for each feature separately.

MeteorScore

Assesses the quality of machine-generated translations by comparing them to human-produced references using the METEOR score, which evaluates precision, recall, and word order.

ModelMetadata

Compare metadata of different models and generate a summary table with the results.

ModelPredictionResiduals

Assesses normality and behavior of residuals in regression models through visualization and statistical tests.

RegardScore

Assesses the sentiment and potential biases in text generated by NLP models by computing and visualizing regard scores.

RegressionResidualsPlot

Evaluates regression model performance using residual distribution and actual vs. predicted plots.

RougeScore

Assesses the quality of machine-generated text using ROUGE metrics and visualizes the results to provide comprehensive performance insights.

TimeSeriesPredictionWithCI

Assesses predictive accuracy and uncertainty in time series models, highlighting breaches beyond confidence intervals.

TimeSeriesPredictionsPlot

Plot actual vs predicted values for time series data and generate a visual comparison for the model.

TimeSeriesR2SquareBySegments

Evaluates the R-Squared values of regression models over specified time segments in time series data to assess segment-wise model performance.

TokenDisparity

Evaluates the token disparity between reference and generated texts, visualizing the results through histograms and bar charts, alongside compiling a comprehensive table of descriptive statistics for token counts.

ToxicityScore

Assesses the toxicity levels of texts generated by NLP models to identify and mitigate harmful or offensive content.

Bias

Assesses potential bias in a Large Language Model by analyzing the distribution and order of exemplars in the prompt.

Clarity

Evaluates and scores the clarity of prompts in a Large Language Model based on specified guidelines.

Conciseness

Analyzes and grades the conciseness of prompts provided to a Large Language Model.

Delimitation

Evaluates the proper use of delimiters in prompts provided to Large Language Models.

NegativeInstruction

Evaluates and grades the use of affirmative, proactive language over negative instructions in LLM prompts.

Robustness

Assesses the robustness of prompts provided to a Large Language Model under varying conditions and contexts. This test specifically measures the model's ability to generate correct classifications with the given prompt even when the inputs are edge…

Specificity

Evaluates and scores the specificity of prompts provided to a Large Language Model (LLM), based on clarity, detail, and relevance.

CalibrationCurveDrift

Evaluates changes in probability calibration between reference and monitoring datasets.

ClassDiscriminationDrift

Compares classification discrimination metrics between reference and monitoring datasets.

ClassImbalanceDrift

Evaluates drift in class distribution between reference and monitoring datasets.

ClassificationAccuracyDrift

Compares classification accuracy metrics between reference and monitoring datasets.

ConfusionMatrixDrift

Compares confusion matrix metrics between reference and monitoring datasets.

CumulativePredictionProbabilitiesDrift

Compares cumulative prediction probability distributions between reference and monitoring datasets.

FeatureDrift

Evaluates changes in feature distribution over time to identify potential model drift.

PredictionAcrossEachFeature

Assesses differences in model predictions across individual features between reference and monitoring datasets through visual analysis.

PredictionCorrelation

Assesses correlation changes between model predictions from reference and monitoring datasets to detect potential target drift.

PredictionProbabilitiesHistogramDrift

Compares prediction probability distributions between reference and monitoring datasets.

PredictionQuantilesAcrossFeatures

Assesses differences in model prediction distributions across individual features between reference and monitoring datasets through quantile analysis.

ROCCurveDrift

Compares ROC curves between reference and monitoring datasets.

ScoreBandsDrift

Analyzes drift in population distribution and default rates across score bands.

ScorecardHistogramDrift

Compares score distributions between reference and monitoring datasets for each class.

TargetPredictionDistributionPlot

Assesses differences in prediction distributions between a reference dataset and a monitoring dataset to identify potential data drift.