ValidMind for model development 4 — Finalize testing and documentation

Learn how to use ValidMind for your end-to-end model documentation process with our introductory notebook series. In this last notebook, finalize the testing and documentation of your model and have a fully documented sample model ready for review.

We'll first use run_documentation_tests() previously covered in 2 — Start the model development process to ensure that your custom test results generated in 3 — Integrate custom tests are included in your documentation. Then, we'll view and update the configuration for the entire model documentation template to suit your needs.

Learn by doing

Our course tailor-made for developers new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — Developer Fundamentals

Prerequisites

In order to finalize the testing and documentation for your sample model, you'll need to first have:

Registered a model within the ValidMind Platform with a predefined documentation template
Installed the ValidMind Library in your local environment, allowing you to access all its features
Learned how to import and initialize datasets for use with ValidMind
Learned how to run and log default and custom tests with ValidMind, including from external test providers
Inserted test-driven blocks for the results of the following tests into your model's documentation:
- HighPearsonCorrelation:balanced_raw_dataset
- my_test_provider.ConfusionMatrix
- my_custom_tests.ConfusionMatrix:test_dataset_normalized

Need help with the above steps?

Refer to the first three notebooks in this series:

Setting up

This section should be very familiar to you now — as we performed the same actions in the previous two notebooks in this series.

Initialize the ValidMind Library

As usual, let's first connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:

In a browser, log in to ValidMind.
In the left sidebar, navigate to Inventory and select the model you registered for this "ValidMind for model development" series of notebooks.
Go to Getting Started and click Copy snippet to clipboard.

Next, load your model identifier credentials from an .env file or replace the placeholder with your own code snippet:

# Make sure the ValidMind Library is installed

%pip install -q validmind

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
)

Note: you may need to restart the kernel to use updated packages.

⚠️ This kernel is running an older version of validmind (2.8.25) than the latest version installed on your system (2.8.26). You may need to restart the kernel if you are experiencing issues.

2025-06-29 21:11:55,920 - INFO(validmind.api_client): 🎉 Connected to ValidMind!
📊 Model: [ValidMind Academy] Model development (ID: cmalgf3qi02ce199qm3rdkl46)
📁 Document Type: model_documentation

Import sample dataset

Next, we'll import the same public Bank Customer Churn Prediction dataset from Kaggle we used in the last notebooks so that we have something to work with:

from validmind.datasets.classification import customer_churn as demo_dataset

print(
    f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)

raw_df = demo_dataset.load_data()

Loaded demo dataset with: 

    • Target column: 'Exited' 
    • Class labels: {'0': 'Did not exit', '1': 'Exited'}

We'll apply a simple rebalancing technique to the dataset before continuing:

import pandas as pd

raw_copy_df = raw_df.sample(frac=1)  # Create a copy of the raw dataset

# Create a balanced dataset with the same number of exited and not exited customers
exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 1]
not_exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 0].sample(n=exited_df.shape[0])

balanced_raw_df = pd.concat([exited_df, not_exited_df])
balanced_raw_df = balanced_raw_df.sample(frac=1, random_state=42)

Remove highly correlated features

Let's also quickly remove highly correlated features from the dataset using the output from a ValidMind test.

As you learned previously, before we can run tests you'll need to initialize a ValidMind dataset object:

# Register new data and now 'balanced_raw_dataset' is the new dataset object of interest
vm_balanced_raw_dataset = vm.init_dataset(
    dataset=balanced_raw_df,
    input_id="balanced_raw_dataset",
    target_column="Exited",
)

With our balanced dataset initialized, we can then run our test and utilize the output to help us identify the features we want to remove:

# Run HighPearsonCorrelation test with our balanced dataset as input and return a result object
corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_balanced_raw_dataset},
)

# From result object, extract table from `corr_result.tables`
features_df = corr_result.tables[0].data
features_df

	Columns	Coefficient	Pass/Fail
0	(Age, Exited)	0.3473	Fail
1	(IsActiveMember, Exited)	-0.1856	Pass
2	(Balance, NumOfProducts)	-0.1731	Pass
3	(Balance, Exited)	0.1394	Pass
4	(NumOfProducts, Exited)	-0.0457	Pass
5	(Age, Balance)	0.0391	Pass
6	(NumOfProducts, IsActiveMember)	0.0383	Pass
7	(CreditScore, Exited)	-0.0349	Pass
8	(Balance, HasCrCard)	-0.0347	Pass
9	(CreditScore, IsActiveMember)	0.0336	Pass

# Extract list of features that failed the test
high_correlation_features = features_df[features_df["Pass/Fail"] == "Fail"]["Columns"].tolist()
high_correlation_features

['(Age, Exited)']

# Extract feature names from the list of strings
high_correlation_features = [feature.split(",")[0].strip("()") for feature in high_correlation_features]
high_correlation_features

['Age']

We can then re-initialize the dataset with a different input_id and the highly correlated features removed and re-run the test for confirmation:

# Remove the highly correlated features from the dataset
balanced_raw_no_age_df = balanced_raw_df.drop(columns=high_correlation_features)

# Re-initialize the dataset object
vm_raw_dataset_preprocessed = vm.init_dataset(
    dataset=balanced_raw_no_age_df,
    input_id="raw_dataset_preprocessed",
    target_column="Exited",
)

# Re-run the test with the reduced feature set
corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset_preprocessed},
)

Train the model

We'll then use ValidMind tests to train a simple logistic regression model on our prepared dataset:

# First encode the categorical features in our dataset with the highly correlated features removed
balanced_raw_no_age_df = pd.get_dummies(
    balanced_raw_no_age_df, columns=["Geography", "Gender"], drop_first=True
)
balanced_raw_no_age_df.head()

	CreditScore	Tenure	Balance	NumOfProducts	HasCrCard	IsActiveMember	EstimatedSalary	Exited	Geography_Germany	Geography_Spain	Gender_Male
5907	733	1	0.00	2	1	1	141841.31	0	False	False	True
3574	696	3	150604.52	1	0	0	5566.60	0	True	False	True
6393	427	9	0.00	2	1	0	28368.37	0	False	True	False
5786	779	0	133295.98	1	1	0	22832.71	1	False	True	False
551	543	1	106138.33	2	1	1	120657.32	1	True	False	False

# Split the processed dataset into train and test
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(balanced_raw_no_age_df, test_size=0.20)

X_train = train_df.drop("Exited", axis=1)
y_train = train_df["Exited"]
X_test = test_df.drop("Exited", axis=1)
y_test = test_df["Exited"]

from sklearn.linear_model import LogisticRegression

# Logistic Regression grid params
log_reg_params = {
    "penalty": ["l1", "l2"],
    "C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    "solver": ["liblinear"],
}

# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV

grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg.fit(X_train, y_train)

# Logistic Regression best estimator
log_reg = grid_log_reg.best_estimator_

Initialize the ValidMind objects

Let's initialize the ValidMind Dataset and Model objects in preparation for assigning model predictions to each dataset:

# Initialize the datasets into their own dataset objects
vm_train_ds = vm.init_dataset(
    input_id="train_dataset_final",
    dataset=train_df,
    target_column="Exited",
)

vm_test_ds = vm.init_dataset(
    input_id="test_dataset_final",
    dataset=test_df,
    target_column="Exited",
)

# Initialize a model object
vm_model = vm.init_model(log_reg, input_id="log_reg_model_v1")

Assign predictions

Once the model is registered, we'll assign predictions to the training and test datasets:

vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

2025-06-29 21:12:22,808 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2025-06-29 21:12:22,810 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()
2025-06-29 21:12:22,811 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while
2025-06-29 21:12:22,813 - INFO(validmind.vm_models.dataset.utils): Done running predict()
2025-06-29 21:12:22,815 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2025-06-29 21:12:22,816 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()
2025-06-29 21:12:22,816 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while
2025-06-29 21:12:22,817 - INFO(validmind.vm_models.dataset.utils): Done running predict()

Add custom tests

We'll also add the same custom tests we implemented in the previous notebook so that this session has access to the same custom inline test and local test provider.

Implement custom inline test

Let's set up a custom inline test that calculates the confusion matrix for a binary classification model:

# First create a confusion matrix plot
import matplotlib.pyplot as plt
from sklearn import metrics

# Get the predicted classes
y_pred = log_reg.predict(vm_test_ds.x)

confusion_matrix = metrics.confusion_matrix(y_test, y_pred)

cm_display = metrics.ConfusionMatrixDisplay(
    confusion_matrix=confusion_matrix, display_labels=[False, True]
)
cm_display.plot()

# Create the reusable ConfusionMatrix inline test with normalized matrix
@vm.test("my_custom_tests.ConfusionMatrix")
def confusion_matrix(dataset, model, normalize=False):
    """The confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known.

    The confusion matrix is a 2x2 table that contains 4 values:

    - True Positive (TP): the number of correct positive predictions
    - True Negative (TN): the number of correct negative predictions
    - False Positive (FP): the number of incorrect positive predictions
    - False Negative (FN): the number of incorrect negative predictions

    The confusion matrix can be used to assess the holistic performance of a classification model by showing the accuracy, precision, recall, and F1 score of the model on a single figure.
    """
    y_true = dataset.y
    y_pred = dataset.y_pred(model=model)

    if normalize:
        confusion_matrix = metrics.confusion_matrix(y_true, y_pred, normalize="all")
    else:
        confusion_matrix = metrics.confusion_matrix(y_true, y_pred)

    cm_display = metrics.ConfusionMatrixDisplay(
        confusion_matrix=confusion_matrix, display_labels=[False, True]
    )
    cm_display.plot()

    plt.close()  # close the plot to avoid displaying it

    return cm_display.figure_  # return the figure object itself

# Test dataset with normalize=True
result = vm.tests.run_test(
    "my_custom_tests.ConfusionMatrix:test_dataset_normalized",
    inputs={"model": vm_model, "dataset": vm_test_ds},
    params={"normalize": True},
)

Add a local test provider

Finally, let's save our custom inline test to our local test provider:

# Create custom tests folder
tests_folder = "my_tests"

import os

# create tests folder
os.makedirs(tests_folder, exist_ok=True)

# remove existing tests
for f in os.listdir(tests_folder):
    # remove files and pycache
    if f.endswith(".py") or f == "__pycache__":
        os.system(f"rm -rf {tests_folder}/{f}")

# Save custom inline test to custom tests folder
confusion_matrix.save(
    tests_folder,
    imports=["import matplotlib.pyplot as plt", "from sklearn import metrics"],
)

2025-06-29 21:12:38,081 - INFO(validmind.tests.decorator): Saved to /home/runner/work/documentation/documentation/site/notebooks/EXECUTED/model_development/my_tests/ConfusionMatrix.py!Be sure to add any necessary imports to the top of the file.
2025-06-29 21:12:38,082 - INFO(validmind.tests.decorator): This metric can be run with the ID: <test_provider_namespace>.ConfusionMatrix

# Register local test provider
from validmind.tests import LocalTestProvider

# initialize the test provider with the tests folder we created earlier
my_test_provider = LocalTestProvider(tests_folder)

vm.tests.register_test_provider(
    namespace="my_test_provider",
    test_provider=my_test_provider,
)

Reconnect to ValidMind

After you insert test-driven blocks into your model documentation, changes should persist and become available every time you call vm.preview_template().

However, you'll need to reload the connection to the ValidMind Platform if you have added test-driven blocks when the connection was already established using reload():

vm.reload()

Now, when you run preview_template() again, the three test-driven blocks you added to your documentation in the last two notebooks in should show up in the template in sections 2.3 Correlations and Interactions and 3.2 Model Evaluation:

vm.preview_template()

Include custom test results

Since your custom test IDs are now part of your documentation template, you can now run tests for an entire section and all additional custom tests should be loaded without any issues.

Let's run all tests in the Model Evaluation section of the documentation. Note that we have been running the sample custom confusion matrix with normalize=True to demonstrate the ability to provide custom parameters.

In the Run the model evaluation tests section of 2 — Start the model development process, you learned how to assign inputs to individual tests with run_documentation_tests(). Assigning parameters is similar, you only need to provide assign a params dictionary to a given test ID, my_test_provider.ConfusionMatrix in this case.

test_config = {
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {
            "dataset": vm_train_ds,
            "model": vm_model,
        },
    },
    "my_test_provider.ConfusionMatrix": {
        "params": {"normalize": True},
        "inputs": {"dataset": vm_test_ds, "model": vm_model},
    },
}
results = vm.run_documentation_tests(
    section=["model_evaluation"],
    inputs={
        "dataset": vm_test_ds,  # Any test that requires a single dataset will use vm_test_ds
        "model": vm_model,
        "datasets": (
            vm_train_ds,
            vm_test_ds,
        ),  # Any test that requires multiple datasets will use vm_train_ds and vm_test_ds
    },
    config=test_config,
)

2025-06-29 21:12:38,713 - WARNING(validmind.vm_models.test_suite.runner): Config key 'my_test_provider.ConfusionMatrix' does not match a test_id in the template.
    Ensure you registered a content block with the correct content_id in the template
    The configuration for this test will be ignored.

Documentation template configuration

Let's call the utility function vm.get_test_suite().get_default_config() which will return the default configuration for the entire documentation template as a dictionary:

This configuration will contain all the test IDs and their default parameters.
You can then modify this configuration as needed and pass it to run_documentation_tests() to run all tests in the documentation template if needed.
You still have the option to continue running tests for one section at a time; get_default_config() simply provides a useful reference for providing default parameters to every test.

import json

model_test_suite = vm.get_test_suite()
config = model_test_suite.get_default_config()
print("Suite Config: \n", json.dumps(config, indent=2))

Suite Config: 
 {
  "validmind.data_validation.DatasetDescription": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {}
  },
  "validmind.data_validation.ClassImbalance": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "min_percent_threshold": 10
    }
  },
  "validmind.data_validation.Duplicates": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "min_threshold": 1
    }
  },
  "validmind.data_validation.HighCardinality": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "num_threshold": 100,
      "percent_threshold": 0.1,
      "threshold_type": "percent"
    }
  },
  "validmind.data_validation.MissingValues": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "min_threshold": 1
    }
  },
  "validmind.data_validation.Skewness": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "max_threshold": 1
    }
  },
  "validmind.data_validation.UniqueRows": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "min_percent_threshold": 1
    }
  },
  "validmind.data_validation.TooManyZeroValues": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "max_percent_threshold": 0.03
    }
  },
  "validmind.data_validation.IQROutliersTable": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "threshold": 1.5
    }
  },
  "validmind.data_validation.IQROutliersBarPlot": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "threshold": 1.5,
      "fig_width": 800
    }
  },
  "validmind.data_validation.DescriptiveStatistics": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {}
  },
  "validmind.data_validation.PearsonCorrelationMatrix": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {}
  },
  "validmind.data_validation.HighPearsonCorrelation": {
    "inputs": {
      "dataset": "dataset"
    },
    "params": {
      "max_threshold": 0.3,
      "top_n_correlations": 10,
      "feature_columns": null
    }
  },
  "validmind.model_validation.ModelMetadata": {
    "inputs": {
      "model": "model"
    },
    "params": {}
  },
  "validmind.data_validation.DatasetSplit": {
    "inputs": {
      "datasets": "datasets"
    },
    "params": {}
  },
  "validmind.model_validation.sklearn.PopulationStabilityIndex": {
    "inputs": {
      "datasets": "datasets",
      "model": "model"
    },
    "params": {
      "num_bins": 10,
      "mode": "fixed"
    }
  },
  "validmind.model_validation.sklearn.ConfusionMatrix": {
    "inputs": {
      "dataset": "dataset",
      "model": "model"
    },
    "params": {
      "threshold": 0.5
    }
  },
  "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
    "inputs": {
      "dataset": "dataset",
      "model": "model"
    },
    "params": {
      "average": "macro"
    }
  },
  "validmind.model_validation.sklearn.ClassifierPerformance:out_of_sample": {
    "inputs": {
      "dataset": "dataset",
      "model": "model"
    },
    "params": {
      "average": "macro"
    }
  },
  "validmind.model_validation.sklearn.PrecisionRecallCurve": {
    "inputs": {
      "model": "model",
      "dataset": "dataset"
    },
    "params": {}
  },
  "validmind.model_validation.sklearn.ROCCurve": {
    "inputs": {
      "model": "model",
      "dataset": "dataset"
    },
    "params": {}
  },
  "validmind.model_validation.sklearn.TrainingTestDegradation": {
    "inputs": {
      "datasets": "datasets",
      "model": "model"
    },
    "params": {
      "max_threshold": 0.1
    }
  },
  "validmind.model_validation.sklearn.MinimumAccuracy": {
    "inputs": {
      "dataset": "dataset",
      "model": "model"
    },
    "params": {
      "min_threshold": 0.7
    }
  },
  "validmind.model_validation.sklearn.MinimumF1Score": {
    "inputs": {
      "dataset": "dataset",
      "model": "model"
    },
    "params": {
      "min_threshold": 0.5
    }
  },
  "validmind.model_validation.sklearn.MinimumROCAUCScore": {
    "inputs": {
      "dataset": "dataset",
      "model": "model"
    },
    "params": {
      "min_threshold": 0.5
    }
  },
  "validmind.model_validation.sklearn.PermutationFeatureImportance": {
    "inputs": {
      "model": "model",
      "dataset": "dataset"
    },
    "params": {
      "fontsize": null,
      "figure_height": null
    }
  },
  "validmind.model_validation.sklearn.SHAPGlobalImportance": {
    "inputs": {
      "model": "model",
      "dataset": "dataset"
    },
    "params": {
      "kernel_explainer_samples": 10,
      "tree_or_linear_explainer_samples": 200,
      "class_of_interest": null
    }
  },
  "validmind.model_validation.sklearn.WeakspotsDiagnosis": {
    "inputs": {
      "datasets": "datasets",
      "model": "model"
    },
    "params": {
      "features_columns": null,
      "metrics": null,
      "thresholds": null
    }
  },
  "validmind.model_validation.sklearn.OverfitDiagnosis": {
    "inputs": {
      "model": "model",
      "datasets": "datasets"
    },
    "params": {
      "metric": null,
      "cut_off_threshold": 0.04
    }
  },
  "validmind.model_validation.sklearn.RobustnessDiagnosis": {
    "inputs": {
      "datasets": "datasets",
      "model": "model"
    },
    "params": {
      "metric": null,
      "scaling_factor_std_dev_list": [
        0.1,
        0.2,
        0.3,
        0.4,
        0.5
      ],
      "performance_decay_threshold": 0.05
    }
  }
}

Update the config

The default config does not assign any inputs to a test, but you can assign inputs to individual tests as needed depending on the datasets and models you want to pass to individual tests.

For this particular documentation template (binary classification), the ValidMind Library provides a sample configuration that can be used to populate the entire model documentation using the following inputs as placeholders:

A raw_dataset raw dataset
A train_dataset training dataset
A test_dataset test dataset
A trained model instance

As part of updating the config you will need to ensure the correct input_ids are used in the final config passed to run_documentation_tests().

from validmind.datasets.classification import customer_churn
from validmind.utils import preview_test_config

test_config = customer_churn.get_demo_test_config()
preview_test_config(test_config)

{
    "validmind.data_validation.DatasetDescription": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {}
    },
    "validmind.data_validation.ClassImbalance": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "min_percent_threshold": 10
        }
    },
    "validmind.data_validation.Duplicates": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "min_threshold": 1
        }
    },
    "validmind.data_validation.HighCardinality": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "num_threshold": 100,
            "percent_threshold": 0.1,
            "threshold_type": "percent"
        }
    },
    "validmind.data_validation.MissingValues": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "min_threshold": 1
        }
    },
    "validmind.data_validation.Skewness": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "max_threshold": 1
        }
    },
    "validmind.data_validation.UniqueRows": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "min_percent_threshold": 1
        }
    },
    "validmind.data_validation.TooManyZeroValues": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "max_percent_threshold": 0.03
        }
    },
    "validmind.data_validation.IQROutliersTable": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "threshold": 1.5
        }
    },
    "validmind.data_validation.IQROutliersBarPlot": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "threshold": 1.5,
            "fig_width": 800
        }
    },
    "validmind.data_validation.DescriptiveStatistics": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {}
    },
    "validmind.data_validation.PearsonCorrelationMatrix": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {}
    },
    "validmind.data_validation.HighPearsonCorrelation": {
        "inputs": {
            "dataset": "raw_dataset"
        },
        "params": {
            "max_threshold": 0.3,
            "top_n_correlations": 10,
            "feature_columns": null
        }
    },
    "validmind.model_validation.ModelMetadata": {
        "inputs": {
            "model": "model"
        },
        "params": {}
    },
    "validmind.data_validation.DatasetSplit": {
        "inputs": {
            "datasets": [
                "train_dataset",
                "test_dataset"
            ]
        },
        "params": {}
    },
    "validmind.model_validation.sklearn.PopulationStabilityIndex": {
        "inputs": {
            "datasets": [
                "train_dataset",
                "test_dataset"
            ],
            "model": "model"
        },
        "params": {
            "num_bins": 10,
            "mode": "fixed"
        }
    },
    "validmind.model_validation.sklearn.ConfusionMatrix": {
        "inputs": {
            "dataset": "test_dataset",
            "model": "model"
        },
        "params": {
            "threshold": 0.5
        }
    },
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {
            "model": "model",
            "dataset": "train_dataset"
        }
    },
    "validmind.model_validation.sklearn.ClassifierPerformance:out_of_sample": {
        "inputs": {
            "model": "model",
            "dataset": "test_dataset"
        }
    },
    "validmind.model_validation.sklearn.PrecisionRecallCurve": {
        "inputs": {
            "model": "model",
            "dataset": "test_dataset"
        },
        "params": {}
    },
    "validmind.model_validation.sklearn.ROCCurve": {
        "inputs": {
            "model": "model",
            "dataset": "test_dataset"
        },
        "params": {}
    },
    "validmind.model_validation.sklearn.TrainingTestDegradation": {
        "inputs": {
            "datasets": [
                "train_dataset",
                "test_dataset"
            ],
            "model": "model"
        },
        "params": {
            "max_threshold": 0.1
        }
    },
    "validmind.model_validation.sklearn.MinimumAccuracy": {
        "inputs": {
            "dataset": "test_dataset",
            "model": "model"
        },
        "params": {
            "min_threshold": 0.7
        }
    },
    "validmind.model_validation.sklearn.MinimumF1Score": {
        "inputs": {
            "dataset": "test_dataset",
            "model": "model"
        },
        "params": {
            "min_threshold": 0.5
        }
    },
    "validmind.model_validation.sklearn.MinimumROCAUCScore": {
        "inputs": {
            "dataset": "test_dataset",
            "model": "model"
        },
        "params": {
            "min_threshold": 0.5
        }
    },
    "validmind.model_validation.sklearn.PermutationFeatureImportance": {
        "inputs": {
            "model": "model",
            "dataset": "test_dataset"
        },
        "params": {
            "fontsize": null,
            "figure_height": null
        }
    },
    "validmind.model_validation.sklearn.SHAPGlobalImportance": {
        "inputs": {
            "model": "model",
            "dataset": "test_dataset"
        },
        "params": {
            "kernel_explainer_samples": 10,
            "tree_or_linear_explainer_samples": 200,
            "class_of_interest": null
        }
    },
    "validmind.model_validation.sklearn.WeakspotsDiagnosis": {
        "inputs": {
            "datasets": [
                "train_dataset",
                "test_dataset"
            ],
            "model": "model"
        },
        "params": {
            "features_columns": null,
            "metrics": null,
            "thresholds": null
        }
    },
    "validmind.model_validation.sklearn.OverfitDiagnosis": {
        "inputs": {
            "model": "model",
            "datasets": [
                "train_dataset",
                "test_dataset"
            ]
        },
        "params": {
            "metric": null,
            "cut_off_threshold": 0.04
        }
    },
    "validmind.model_validation.sklearn.RobustnessDiagnosis": {
        "inputs": {
            "datasets": [
                "train_dataset",
                "test_dataset"
            ],
            "model": "model"
        },
        "params": {
            "metric": null,
            "scaling_factor_std_dev_list": [
                0.1,
                0.2,
                0.3,
                0.4,
                0.5
            ],
            "performance_decay_threshold": 0.05
        }
    }
}

Using this sample configuration, let's finish populating model documentation by running all tests for the Model Development section of the documentation.

Recall that the training and test datasets in our exercise have the following input_id values:

train_dataset_final for the training dataset
test_dataset_final for the test dataset

config = {
    "validmind.model_validation.ModelMetadata": {
        "inputs": {"model": "log_reg_model_v1"},
    },
    "validmind.data_validation.DatasetSplit": {
        "inputs": {"datasets": ["train_dataset_final", "test_dataset_final"]},
    },
    "validmind.model_validation.sklearn.PopulationStabilityIndex": {
        "inputs": {
            "model": "log_reg_model_v1",
            "datasets": ["train_dataset_final", "test_dataset_final"],
        },
        "params": {"num_bins": 10, "mode": "fixed"},
    },
    "validmind.model_validation.sklearn.ConfusionMatrix": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
    },
    "my_test_provider.ConfusionMatrix": {
        "inputs": {"dataset": "test_dataset_final", "model": "log_reg_model_v1"},
    },
    "my_custom_tests.ConfusionMatrix:test_dataset_normalized": {
        "inputs": {"dataset": "test_dataset_final", "model": "log_reg_model_v1"},
    },
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "train_dataset_final"}
    },
    "validmind.model_validation.sklearn.ClassifierPerformance:out_of_sample": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"}
    },
    "validmind.model_validation.sklearn.PrecisionRecallCurve": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
    },
    "validmind.model_validation.sklearn.ROCCurve": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
    },
    "validmind.model_validation.sklearn.TrainingTestDegradation": {
        "inputs": {
            "model": "log_reg_model_v1",
            "datasets": ["train_dataset_final", "test_dataset_final"],
        },
        "params": {
            "metrics": ["accuracy", "precision", "recall", "f1"],
            "max_threshold": 0.1,
        },
    },
    "validmind.model_validation.sklearn.MinimumAccuracy": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
        "params": {"min_threshold": 0.7},
    },
    "validmind.model_validation.sklearn.MinimumF1Score": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
        "params": {"min_threshold": 0.5},
    },
    "validmind.model_validation.sklearn.MinimumROCAUCScore": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
        "params": {"min_threshold": 0.5},
    },
    "validmind.model_validation.sklearn.PermutationFeatureImportance": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
    },
    "validmind.model_validation.sklearn.SHAPGlobalImportance": {
        "inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
        "params": {"kernel_explainer_samples": 10},
    },
    "validmind.model_validation.sklearn.WeakspotsDiagnosis": {
        "inputs": {
            "model": "log_reg_model_v1",
            "datasets": ["train_dataset_final", "test_dataset_final"],
        },
        "params": {
            "thresholds": {"accuracy": 0.75, "precision": 0.5, "recall": 0.5, "f1": 0.7}
        },
    },
    "validmind.model_validation.sklearn.OverfitDiagnosis": {
        "inputs": {
            "model": "log_reg_model_v1",
            "datasets": ["train_dataset_final", "test_dataset_final"],
        },
        "params": {"cut_off_percentage": 4},
    },
    "validmind.model_validation.sklearn.RobustnessDiagnosis": {
        "inputs": {
            "model": "log_reg_model_v1",
            "datasets": ["train_dataset_final", "test_dataset_final"],
        },
        "params": {
            "scaling_factor_std_dev_list": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5],
            "accuracy_decay_threshold": 4,
        },
    },
}


full_suite = vm.run_documentation_tests(
    section="model_development",
    config=config,
)

2025-06-29 21:13:04,139 - WARNING(validmind.vm_models.test_suite.runner): Config key 'my_test_provider.ConfusionMatrix' does not match a test_id in the template.
    Ensure you registered a content block with the correct content_id in the template
    The configuration for this test will be ignored.
2025-06-29 21:13:04,140 - WARNING(validmind.vm_models.test_suite.runner): Config key 'my_custom_tests.ConfusionMatrix:test_dataset_normalized' does not match a test_id in the template.
    Ensure you registered a content block with the correct content_id in the template
    The configuration for this test will be ignored.

In summary

In this final notebook, you learned how to:

Refresh the connection from the ValidMind Library to the ValidMind Platform after you've inserted test-driven blocks to your documentation
Include custom test results in your model documentation
View and configure the configuration for your model documentation template

With our ValidMind for model development series of notebooks, you learned how to document a model end-to-end with the ValidMind Library by running through some common scenarios in a typical model development setting:

Running out-of-the-box tests
Documenting your model by adding evidence to model documentation
Extending the capabilities of the ValidMind Library by implementing custom tests
Ensuring that the documentation is complete by running all tests in the documentation template

Next steps

Work with your model documentation

Now that you've logged all your test results and generated a draft for your model documentation, head to the ValidMind Platform to wrap up your model documentation. Continue to work on your model documentation by:

Run and log more tests: Use the skills you learned in this series of notebooks to run and log more individual tests, including custom tests, then insert them into your documentation as supplementary evidence. (Learn more: validmind.tests)
Inserting additional test results: Add Test-Driven Blocks under any relevant section of your model documentation. (Learn more: Work with test results)
Making qualitative edits to your test descriptions: Click on the description of any inserted test results to review and edit the ValidMind-generated test descriptions for quality and accuracy. (Learn more: Working with model documentation)
View guidelines: In any section of your model documentation, click ValidMind Insights in the top right corner to reveal the Documentation Guidelines for each section to help guide the contents of your model documentation. (Learn more: View documentation guidelines)
Collaborate with other stakeholders: Use the ValidMind Platform's real-time collaborative features to work seamlessly together with the rest of your organization, including model validators. Review suggested changes in your content blocks, work with versioned history, and use comments to discuss specific portions of your model documentation. (Learn more: Collaborate with others)

When your model documentation is complete and ready for review, submit it for approval from the same ValidMind Platform where you made your edits and collaborated with the rest of your organization, ensuring transparency and a thorough model development history. (Learn more: Submit for approval)

Learn more

Now that you're familiar with the basics, you can explore the following notebooks to get a deeper understanding on how the ValidMind Library allows you generate model documentation for any use case:

Use cases

More how-to guides and code samples

Discover more learning resources

All notebook samples can be found in the following directories of the ValidMind Library GitHub repository: