• About
  • Get Started
  • Guides
  • ValidMind Library
    • ValidMind Library
    • Supported Models
    • QuickStart Notebook

    • TESTING
    • Run Tests & Test Suites
    • Test Descriptions
    • Test Sandbox (BETA)

    • CODE SAMPLES
    • All Code Samples · LLM · NLP · Time Series · Etc.
    • Download Code Samples · notebooks.zip
    • Try it on JupyterHub

    • REFERENCE
    • ValidMind Library Python API
  • Support
  • Training
  • Releases
  • Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • Python Library
    • ValidMind Library

    • ValidMind Academy
    • Training Courses
  • Log In
    • Public Internet
    • ValidMind Platform · US1
    • ValidMind Platform · CA1

    • Private Link
    • Virtual Private ValidMind (VPV)

    • Which login should I use?
  1. Run tests & test suites

EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use

  • ValidMind Library
  • Supported models

  • QuickStart
  • Quickstart for model documentation
  • Install and initialize ValidMind Library
  • Store model credentials in .env files

  • Model Development
  • 1 — Set up ValidMind Library
  • 2 — Start model development process
  • 3 — Integrate custom tests
  • 4 — Finalize testing & documentation

  • Model Validation
  • 1 — Set up ValidMind Library for validation
  • 2 — Start model validation process
  • 3 — Developing a challenger model
  • 4 — Finalize validation & reporting

  • Model Testing
  • Run tests & test suites
    • Add context to LLM-generated test descriptions
    • Configure dataset features
    • Document multiple results for the same test
    • Explore test suites
    • Explore tests
    • Dataset Column Filters when Running Tests
    • Load dataset predictions
    • Log metrics over time
    • Run individual documentation sections
    • Run documentation tests with custom configurations
    • Run tests with multiple datasets
    • Intro to Unit Metrics
    • Understand and utilize RawData in ValidMind tests
    • Introduction to ValidMind Dataset and Model Objects
    • Run Tests
      • Run dataset based tests
      • Run comparison tests
  • Test descriptions
    • Data Validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LaggedCorrelationHeatmap
      • LJungBox
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • Nlp
        • CommonWords
        • Hashtags
        • LanguageDetection
        • Mentions
        • PolarityAndSubjectivity
        • Punctuations
        • Sentiment
        • StopWords
        • TextDescription
        • Toxicity
    • Model Validation
      • BertScore
      • BleuScore
      • ClusterSizeDistribution
      • ContextualRecall
      • FeaturesAUC
      • MeteorScore
      • ModelMetadata
      • ModelPredictionResiduals
      • RegardScore
      • RegressionResidualsPlot
      • RougeScore
      • TimeSeriesPredictionsPlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • Embeddings
        • ClusterDistribution
        • CosineSimilarityComparison
        • CosineSimilarityDistribution
        • CosineSimilarityHeatmap
        • DescriptiveAnalytics
        • EmbeddingsVisualization2D
        • EuclideanDistanceComparison
        • EuclideanDistanceHeatmap
        • PCAComponentsPairwisePlots
        • StabilityAnalysisKeyword
        • StabilityAnalysisRandomNoise
        • StabilityAnalysisSynonyms
        • StabilityAnalysisTranslation
        • TSNEComponentsPairwisePlots
      • Ragas
        • AnswerCorrectness
        • AspectCritic
        • ContextEntityRecall
        • ContextPrecision
        • ContextPrecisionWithoutReference
        • ContextRecall
        • Faithfulness
        • NoiseSensitivity
        • ResponseRelevancy
        • SemanticSimilarity
      • Sklearn
        • AdjustedMutualInformation
        • AdjustedRandIndex
        • CalibrationCurve
        • ClassifierPerformance
        • ClassifierThresholdOptimization
        • ClusterCosineSimilarity
        • ClusterPerformanceMetrics
        • CompletenessScore
        • ConfusionMatrix
        • FeatureImportance
        • FowlkesMallowsScore
        • HomogeneityScore
        • HyperParametersTuning
        • KMeansClustersOptimization
        • MinimumAccuracy
        • MinimumF1Score
        • MinimumROCAUCScore
        • ModelParameters
        • ModelsPerformanceComparison
        • OverfitDiagnosis
        • PermutationFeatureImportance
        • PopulationStabilityIndex
        • PrecisionRecallCurve
        • RegressionErrors
        • RegressionErrorsComparison
        • RegressionPerformance
        • RegressionR2Square
        • RegressionR2SquareComparison
        • RobustnessDiagnosis
        • ROCCurve
        • ScoreProbabilityAlignment
        • SHAPGlobalImportance
        • SilhouettePlot
        • TrainingTestDegradation
        • VMeasure
        • WeakspotsDiagnosis
      • Statsmodels
        • AutoARIMA
        • CumulativePredictionProbabilities
        • DurbinWatsonTest
        • GINITable
        • KolmogorovSmirnov
        • Lilliefors
        • PredictionProbabilitiesHistogram
        • RegressionCoeffs
        • RegressionFeatureSignificance
        • RegressionModelForecastPlot
        • RegressionModelForecastPlotLevels
        • RegressionModelSensitivityPlot
        • RegressionModelSummary
        • RegressionPermutationFeatureImportance
        • ScorecardHistogram
    • Ongoing Monitoring
      • CalibrationCurveDrift
      • ClassDiscriminationDrift
      • ClassificationAccuracyDrift
      • ClassImbalanceDrift
      • ConfusionMatrixDrift
      • CumulativePredictionProbabilitiesDrift
      • FeatureDrift
      • PredictionAcrossEachFeature
      • PredictionCorrelation
      • PredictionProbabilitiesHistogramDrift
      • PredictionQuantilesAcrossFeatures
      • ROCCurveDrift
      • ScoreBandsDrift
      • ScorecardHistogramDrift
      • TargetPredictionDistributionPlot
    • Prompt Validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
  • Test sandbox beta

  • Notebooks
  • Code samples
    • Capital Markets
      • Quickstart for knockout option pricing model documentation
      • Quickstart for Heston option pricing model using QuantLib
    • Credit Risk
      • Document an application scorecard model
      • Document an application scorecard model
      • Document an application scorecard model
      • Document a credit risk model
      • Document an application scorecard model
    • Custom Tests
      • Implement custom tests
      • Integrate external test providers
    • Model Validation
      • Validate an application scorecard model
    • Nlp and Llm
      • Sentiment analysis of financial data using a large language model (LLM)
      • Summarization of financial data using a large language model (LLM)
      • Sentiment analysis of financial data using Hugging Face NLP models
      • Summarization of financial data using Hugging Face NLP models
      • Automate news summarization using LLMs
      • Prompt validation for large language models (LLMs)
      • RAG Model Benchmarking Demo
      • RAG Model Documentation Demo
    • Ongoing Monitoring
      • Ongoing Monitoring for Application Scorecard
      • Quickstart for ongoing monitoring of models with ValidMind
    • Regression
      • Document a California Housing Price Prediction regression model
    • Time Series
      • Document a time series forecasting model
      • Document a time series forecasting model

  • Reference
  • ValidMind Library Python API

On this page

  • Getting started
  • Explore tests and test suites
  • Intermediate
  • Advanced
  • When do I use tests and test suites?
  • Can I use my own tests?
  • ValidMind Library Python API reference
  • Edit this page
  • Report an issue

Run tests and test suites

Published

May 12, 2025

​ValidMind provides many built-in tests and test suites, which help you produce documentation during stages of the model development lifecycle where you need to validate that your work satisfies MRM (model risk management) requirements.

Key ​ValidMind concepts
model documentation
A structured and detailed record pertaining to a model, encompassing key components such as its underlying assumptions, methodologies, data sources, inputs, performance metrics, evaluations, limitations, and intended uses.

Within the realm of model risk management, this documentation serves to ensure transparency, adherence to regulatory requirements, and a clear understanding of potential risks associated with the model's application.

validation report
A formal document produced after a model validation process, outlining the findings, assessments, and recommendations related to a specific model's performance, appropriateness, and limitations. Provides a comprehensive review of the model's conceptual framework, data sources and integrity, calibration methods, and performance outcomes.

Within model risk management, the validation report is crucial for ensuring transparency, demonstrating regulatory compliance, and offering actionable insights for model refinement or adjustments.

template, documentation template
Functions as a test suite and lays out the structure of model documentation, segmented into various sections and sub-sections. Documentation templates define the structure of your model documentation, specifying the tests that should be run, and how the results should be displayed.

​ValidMind templates come with pre-defined sections, similar to test placeholders, including boilerplates and spaces designated for documentation and test results. When rendered, produces a document that model developers can use for model validation.

test
A function contained in the library, designed to run a specific quantitative test on the dataset or model. Test results are sent to the ValidMind Platform to generate the model documentation according to the template that is associated with the documentation.

Tests are the building blocks of ​ValidMind, used to evaluate and document models and datasets, and can be run individually or as part of a suite defined by your model documentation template.

metrics, custom metrics
Metrics are a subset of tests that do not have thresholds. Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered via the ValidMind Library to be used with the ValidMind Platform.

In the context of ​ValidMind's Jupyter Notebooks, metrics and tests can be thought of as interchangeable concepts.

inputs
Objects to be evaluated and documented in the ValidMind Library. They can be any of the following:
  • model: A single model that has been initialized in ​ValidMind with vm.init_model(). See the Model Documentation or the for more information.
  • dataset: Single dataset that has been initialized in ​ValidMind with vm.init_dataset(). See the Dataset Documentation for more information.
  • models: A list of ​ValidMind models - usually this is used when you want to compare multiple models in your custom tests.
  • datasets: A list of ​ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom tests. (Learn more: [Run tests with multiple datasets(/notebooks/how_to/run_tests_that_require_multiple_datasets.ipynb)])
parameters
Additional arguments that can be passed when running a ​ValidMind test, used to pass additional information to a test, customize its behavior, or provide additional context.
outputs
Custom tests can return elements like tables or plots. Tables may be a list of dictionaries (each representing a row) or a pandas DataFrame. Plots may be matplotlib or plotly figures.
test suite
A collection of tests which are run together to generate model documentation end-to-end for specific use cases.

For example, the classifier_full_suite test suite runs tests from the tabular_dataset and classifier test suites to fully document the data and model sections for binary classification model use cases.

Getting started

Start by running a pre-made test, then modify it, and finally create your own test:

Run dataset based tests
Use the ValidMind Library's run_test function to run built-in or custom tests that take any dataset or model as input. These tests generate outputs in the form of text, tables, and images that get populated in model documentation.
Implement custom tests
Custom tests extend the functionality of ValidMind, allowing you to document any model or use case with added flexibility.
No matching items

Explore tests and test suites

Next, find available tests and test suites using the library or the interactive test sandbox:

Explore tests
View and learn more about the tests available in the ValidMind Library, including code examples and usage of key functions.
Explore test suites
Explore the the test suites and tests that are available in the ValidMind Library, and retrieve detailed information for each.
Test sandbox beta
Explore our interactive sandbox to see what tests are available in the ValidMind Library and how you can use them in your own code.
No matching items

Intermediate

Building on previous sections, add your own test provider, set up datasets, run tests on individual sections in your model documentation, and more:

Integrate external test providers
Register a custom test provider with the ValidMind Library to run your own tests.
Configure dataset features
When initializing a ValidMind dataset object, you can pass in a list of features to use instead of utilizing all dataset columns when running tests.
Run individual documentation sections
For targeted testing, you can run tests on individual sections or specific groups of sections in your model documentation.
Run documentation tests with custom configurations
When running documentation tests, you can configure inputs and parameters for individual tests by passing a config as a parameter.
Run tests with multiple datasets
To support running tests that require more than one dataset, ValidMind provides a mechanim that allows you to pass multiple datasets as inputs.
Intro to Unit Metrics
To turn complex evidence into actionable insights, you can run a unit metric as a single-value measure to quantify and monitor risks throughout a model's lifecycle.
No matching items

Advanced

Need more? Try some of the advanced features provided by the library:

Document multiple results for the same test
Documentation templates facilitate the presentation of multiple unique test results for a single test.
Load dataset predictions
To enable tests to make use of predictions, you can load predictions in ValidMind dataset objects in multiple different ways.
Understand and utilize RawData in ValidMind tests
Test functions in ValidMind can return a special object called RawData, which holds intermediate or unprocessed data produced somewhere in the test logic but not returned as part of the test's visible output, such as in tables or figures.
No matching items

When do I use tests and test suites?

While you have the flexibility to decide when to use which ​ValidMind tests, we have identified a few typical scenarios with their own characteristics and needs:

Dataset testing

To document and validate your dataset:

  • For generic tabular datasets: use the tabular_dataset test suite.
  • For time-series datasets: use the time_series_dataset test suite.

Model testing

To document and validate your model:

  • For binary classification models: use the classifier test suite.
  • For time series models: use the timeseries test suite.

End-to-end testing

To document a binary classification model and the relevant dataset end-to-end:

Use the classifier_full_suite test suite.

Can I use my own tests?

Absolutely! ​ValidMind supports custom tests that you develop yourself or that are provided by third-party test libraries, also referred to as test providers. We provide instructions with code examples that you can adapt:

Implement custom tests
Custom tests extend the functionality of ValidMind, allowing you to document any model or use case with added flexibility.
Integrate external test providers
Register a custom test provider with the ValidMind Library to run your own tests.
Add context to LLM-generated test descriptions

Learn how to add custom use case and test-specific context to your LLM-generated test descriptions.

No matching items

ValidMind Library Python API reference

ValidMind Library API Reference

Add context to LLM-generated test descriptions

© Copyright 2025 ValidMind Inc. All Rights Reserved.

  • Edit this page
  • Report an issue
Cookie Preferences
  • validmind.com

  • Privacy Policy

  • Terms of Use