ValidMind for model development 2 — Start the model development process

Learn how to use ValidMind for your end-to-end model documentation process with our series of four introductory notebooks. In this second notebook, you'll run tests and investigate results, then add the results or evidence to your documentation.

You'll become familiar with the individual tests available in ValidMind, as well as how to run them and change parameters as necessary. Using ValidMind's repository of individual tests as building blocks helps you ensure that a model is being built appropriately.

For a full list of out-of-the-box tests, refer to our Test descriptions or try the interactive Test sandbox.

Learn by doing

Our course tailor-made for developers new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — Developer Fundamentals

Prerequisites

In order to log test results or evidence to your model documentation with this notebook, you'll need to first have:

Need help with the above steps?

Refer to the first notebook in this series: 1 — Set up the ValidMind Library

Setting up

Initialize the ValidMind Library

First, let's connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:

  1. In a browser, log in to ValidMind.

  2. In the left sidebar, navigate to Inventory and select the model you registered for this "ValidMind for model development" series of notebooks.

  3. Go to Getting Started and click Copy snippet to clipboard.

Next, load your model identifier credentials from an .env file or replace the placeholder with your own code snippet:

# Make sure the ValidMind Library is installed

%pip install -q validmind

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
)
Note: you may need to restart the kernel to use updated packages.
2025-12-31 01:01:03,425 - INFO(validmind.api_client): 🎉 Connected to ValidMind!
📊 Model: [ValidMind Academy] Model development (ID: cmalgf3qi02ce199qm3rdkl46)
📁 Document Type: model_documentation

Import sample dataset

Then, let's import the public Bank Customer Churn Prediction dataset from Kaggle.

In our below example, note that:

  • The target column, Exited has a value of 1 when a customer has churned and 0 otherwise.
  • The ValidMind Library provides a wrapper to automatically load the dataset as a Pandas DataFrame object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns.
from validmind.datasets.classification import customer_churn as demo_dataset

print(
    f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)

raw_df = demo_dataset.load_data()
raw_df.head()
Loaded demo dataset with: 

    • Target column: 'Exited' 
    • Class labels: {'0': 'Did not exit', '1': 'Exited'}
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0

Identify qualitative tests

Next, let's say we want to do some data quality assessments by running a few individual tests.

Use the vm.tests.list_tests() function introduced by the first notebook in this series in combination with vm.tests.list_tags() and vm.tests.list_tasks() to find which prebuilt tests are relevant for data quality assessment:

  • tasks represent the kind of modeling task associated with a test. Here we'll focus on classification tasks.
  • tags are free-form descriptions providing more details about the test, for example, what category the test falls into. Here we'll focus on the data_quality tag.
# Get the list of available task types
sorted(vm.tests.list_tasks())
['classification',
 'clustering',
 'data_validation',
 'feature_extraction',
 'monitoring',
 'nlp',
 'regression',
 'residual_analysis',
 'text_classification',
 'text_generation',
 'text_qa',
 'text_summarization',
 'time_series_forecasting',
 'visualization']
# Get the list of available tags
sorted(vm.tests.list_tags())
['AUC',
 'analysis',
 'anomaly_detection',
 'bias_and_fairness',
 'binary_classification',
 'calibration',
 'categorical_data',
 'classification',
 'classification_metrics',
 'clustering',
 'correlation',
 'credit_risk',
 'data_analysis',
 'data_distribution',
 'data_quality',
 'data_validation',
 'descriptive_statistics',
 'dimensionality_reduction',
 'distribution',
 'embeddings',
 'feature_importance',
 'feature_selection',
 'few_shot',
 'forecasting',
 'frequency_analysis',
 'kmeans',
 'linear_regression',
 'llm',
 'logistic_regression',
 'metadata',
 'model_comparison',
 'model_diagnosis',
 'model_explainability',
 'model_interpretation',
 'model_performance',
 'model_predictions',
 'model_selection',
 'model_training',
 'model_validation',
 'multiclass_classification',
 'nlp',
 'normality',
 'numerical_data',
 'outliers',
 'qualitative',
 'rag_performance',
 'ragas',
 'regression',
 'retrieval_performance',
 'scorecard',
 'seasonality',
 'senstivity_analysis',
 'sklearn',
 'stationarity',
 'statistical_test',
 'statistics',
 'statsmodels',
 'tabular_data',
 'text_data',
 'threshold_optimization',
 'time_series_data',
 'unit_root_test',
 'visualization',
 'zero_shot']

You can pass tags and tasks as parameters to the vm.tests.list_tests() function to filter the tests based on the tags and task types.

For example, to find tests related to tabular data quality for classification models, you can call list_tests() like this:

vm.tests.list_tests(task="classification", tags=["tabular_data", "data_quality"])
ID Name Description Has Figure Has Table Required Inputs Params Tags Tasks
validmind.data_validation.ClassImbalance Class Imbalance Evaluates and quantifies class distribution imbalance in a dataset used by a machine learning model.... True True ['dataset'] {'min_percent_threshold': {'type': 'int', 'default': 10}} ['tabular_data', 'binary_classification', 'multiclass_classification', 'data_quality'] ['classification']
validmind.data_validation.DescriptiveStatistics Descriptive Statistics Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's... False True ['dataset'] {} ['tabular_data', 'time_series_data', 'data_quality'] ['classification', 'regression']
validmind.data_validation.Duplicates Duplicates Tests dataset for duplicate entries, ensuring model reliability via data quality verification.... False True ['dataset'] {'min_threshold': {'type': '_empty', 'default': 1}} ['tabular_data', 'data_quality', 'text_data'] ['classification', 'regression']
validmind.data_validation.HighCardinality High Cardinality Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting.... False True ['dataset'] {'num_threshold': {'type': 'int', 'default': 100}, 'percent_threshold': {'type': 'float', 'default': 0.1}, 'threshold_type': {'type': 'str', 'default': 'percent'}} ['tabular_data', 'data_quality', 'categorical_data'] ['classification', 'regression']
validmind.data_validation.HighPearsonCorrelation High Pearson Correlation Identifies highly correlated feature pairs in a dataset suggesting feature redundancy or multicollinearity.... False True ['dataset'] {'max_threshold': {'type': 'float', 'default': 0.3}, 'top_n_correlations': {'type': 'int', 'default': 10}, 'feature_columns': {'type': 'list', 'default': None}} ['tabular_data', 'data_quality', 'correlation'] ['classification', 'regression']
validmind.data_validation.MissingValues Missing Values Evaluates dataset quality by ensuring missing value ratio across all features does not exceed a set threshold.... False True ['dataset'] {'min_threshold': {'type': 'int', 'default': 1}} ['tabular_data', 'data_quality'] ['classification', 'regression']
validmind.data_validation.MissingValuesBarPlot Missing Values Bar Plot Assesses the percentage and distribution of missing values in the dataset via a bar plot, with emphasis on... True False ['dataset'] {'threshold': {'type': 'int', 'default': 80}, 'fig_height': {'type': 'int', 'default': 600}} ['tabular_data', 'data_quality', 'visualization'] ['classification', 'regression']
validmind.data_validation.Skewness Skewness Evaluates the skewness of numerical data in a dataset to check against a defined threshold, aiming to ensure data... False True ['dataset'] {'max_threshold': {'type': '_empty', 'default': 1}} ['data_quality', 'tabular_data'] ['classification', 'regression']
validmind.plots.BoxPlot Box Plot Generates customizable box plots for numerical features in a dataset with optional grouping using Plotly.... True False ['dataset'] {'columns': {'type': 'Optional', 'default': None}, 'group_by': {'type': 'Optional', 'default': None}, 'width': {'type': 'int', 'default': 1800}, 'height': {'type': 'int', 'default': 1200}, 'colors': {'type': 'Optional', 'default': None}, 'show_outliers': {'type': 'bool', 'default': True}, 'title_prefix': {'type': 'str', 'default': 'Box Plot of'}} ['tabular_data', 'visualization', 'data_quality'] ['classification', 'regression', 'clustering']
validmind.plots.HistogramPlot Histogram Plot Generates customizable histogram plots for numerical features in a dataset using Plotly.... True False ['dataset'] {'columns': {'type': 'Optional', 'default': None}, 'bins': {'type': 'Union', 'default': 30}, 'color': {'type': 'str', 'default': 'steelblue'}, 'opacity': {'type': 'float', 'default': 0.7}, 'show_kde': {'type': 'bool', 'default': True}, 'normalize': {'type': 'bool', 'default': False}, 'log_scale': {'type': 'bool', 'default': False}, 'title_prefix': {'type': 'str', 'default': 'Histogram of'}, 'width': {'type': 'int', 'default': 1200}, 'height': {'type': 'int', 'default': 800}, 'n_cols': {'type': 'int', 'default': 2}, 'vertical_spacing': {'type': 'float', 'default': 0.15}, 'horizontal_spacing': {'type': 'float', 'default': 0.1}} ['tabular_data', 'visualization', 'data_quality'] ['classification', 'regression', 'clustering']
validmind.stats.DescriptiveStats Descriptive Stats Provides comprehensive descriptive statistics for numerical features in a dataset.... False True ['dataset'] {'columns': {'type': 'Optional', 'default': None}, 'include_advanced': {'type': 'bool', 'default': True}, 'confidence_level': {'type': 'float', 'default': 0.95}} ['tabular_data', 'statistics', 'data_quality'] ['classification', 'regression', 'clustering']
Want to learn more about navigating ValidMind tests?

Refer to our notebook outlining the utilities available for viewing and understanding available ValidMind tests: Explore tests

Initialize the ValidMind datasets

With the individual tests we want to run identified, the next step is to connect your data with a ValidMind Dataset object. This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind, but you only need to do it once per dataset.

Initialize a ValidMind dataset object using the init_dataset function from the ValidMind (vm) module. For this example, we'll pass in the following arguments:

  • dataset — The raw dataset that you want to provide as input to tests.
  • input_id — A unique identifier that allows tracking what inputs are used when running each individual test.
  • target_column — A required argument if tests require access to true values. This is the name of the target column in the dataset.
# vm_raw_dataset is now a VMDataset object that you can pass to any ValidMind test
vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column="Exited",
)

Running tests

Now that we know how to initialize a ValidMind dataset object, we're ready to run some tests!

You run individual tests by calling the run_test function provided by the validmind.tests module. For the examples below, we'll pass in the following arguments:

  • test_id — The ID of the test to run, as seen in the ID column when you run list_tests.
  • params — A dictionary of parameters for the test. These will override any default_params set in the test definition.

Run tabular data tests

The inputs expected by a test can also be found in the test definition — let's take validmind.data_validation.DescriptiveStatistics as an example.

Note that the output of the describe_test() function below shows that this test expects a dataset as input:

vm.tests.describe_test("validmind.data_validation.DescriptiveStatistics")
Test: Descriptive Statistics ('validmind.data_validation.DescriptiveStatistics')

Now, let's run a few tests to assess the quality of the dataset:

result = vm.tests.run_test(
    test_id="validmind.data_validation.DescriptiveStatistics",
    inputs={"dataset": vm_raw_dataset},
)

Descriptive Statistics

Descriptive Statistics is designed to provide a comprehensive summary of both numerical and categorical data within a dataset. The primary purpose of this test is to visualize the overall distribution of the variables, which aids in understanding the model's behavior and predicting its performance. By summarizing key statistics such as count, mean, standard deviation, and frequency, the test offers insights into the data's characteristics and potential anomalies.

The test operates by utilizing two in-built functions of pandas dataframes: describe() for numerical fields and value_counts() for categorical fields. The describe() function extracts summary statistics for numerical data, including count, mean, standard deviation, minimum, and maximum values, as well as percentiles. These metrics provide a snapshot of the data's central tendency, dispersion, and range. For categorical data, value_counts() calculates the count of each category, the number of unique values, the most common value, and its frequency. This helps identify the distribution and dominance of categories within the dataset. The results are formatted into two tables, one for numerical and another for categorical variables, offering a clear summary of the dataset's main characteristics.

The primary advantages of this test include its ability to provide a detailed overview of the dataset, highlighting the distribution and characteristics of the variables. It is versatile and robust, applicable to both numerical and categorical data, and helps identify crucial anomalies such as outliers, extreme skewness, or lack of diversity. These insights are vital for understanding model behavior during testing and validation. By offering a high-level summary, the test aids in identifying potential data quality issues and informs further analysis.

It should be noted that while this test offers a high-level overview of the data, it may not detect subtle correlations or complex patterns. It does not provide insights into the relationships between variables and should be used in conjunction with other statistical tests for a comprehensive understanding of the model's data. Additionally, skewed data or significant outliers can represent high risk, as indicated by a significant difference between the mean and median for numerical data. For categorical data, a lack of diversity or overdominance of a single category can also indicate high risk.

This test shows the results in two tables: one for numerical variables and another for categorical variables. The numerical variables table includes columns for count, mean, standard deviation, minimum, and maximum values, as well as percentiles (25%, 50%, 75%, 90%, and 95%). These metrics provide a detailed view of the data's distribution, central tendency, and variability. The categorical variables table includes columns for count, number of unique values, the most common value, its frequency, and the proportion of the most frequent value relative to the total. This helps identify the distribution and dominance of categories within the dataset. Notable observations include the range and scale of values, with specific attention to any significant outliers or skewness in the data.

The test results reveal the following key insights:

  • Credit Score Distribution: The credit score data shows a mean of 650.16 with a standard deviation of 96.85, indicating a moderate spread around the mean. The scores range from 350 to 850, with the median at 652, suggesting a relatively normal distribution.
  • Age Variability: The age data has a mean of 38.95 and a standard deviation of 10.46, with ages ranging from 18 to 92. The median age is 37, indicating a slightly younger population skew.
  • Balance Characteristics: The balance data shows a mean of 76,434.10 with a high standard deviation of 62,612.25, indicating significant variability. The median balance is 97,264, with a notable number of zero balances, suggesting a skewed distribution.
  • Geography Dominance: In the categorical data, France is the most common geography, representing 50.12% of the dataset, indicating a potential bias towards this region.
  • Gender Distribution: The gender data shows a slight male dominance, with males representing 54.95% of the dataset, suggesting a relatively balanced gender distribution.

Based on these results, the dataset exhibits a diverse range of numerical values with some skewness and variability, particularly in balance and credit score data. The categorical data shows a dominance of certain categories, such as France in geography, which could influence model behavior. These insights suggest that while the dataset provides a broad representation of variables, attention should be given to the skewness and dominance observed, as they may impact the model's performance and generalizability. Understanding these characteristics is crucial for interpreting the model's predictions and ensuring robust performance across different scenarios.

Tables

Numerical Variables

Name Count Mean Std Min 25% 50% 75% 90% 95% Max
CreditScore 8000.0 650.1596 96.8462 350.0 583.0 652.0 717.0 778.0 813.0 850.0
Age 8000.0 38.9489 10.4590 18.0 32.0 37.0 44.0 53.0 60.0 92.0
Tenure 8000.0 5.0339 2.8853 0.0 3.0 5.0 8.0 9.0 9.0 10.0
Balance 8000.0 76434.0965 62612.2513 0.0 0.0 97264.0 128045.0 149545.0 162488.0 250898.0
NumOfProducts 8000.0 1.5325 0.5805 1.0 1.0 1.0 2.0 2.0 2.0 4.0
HasCrCard 8000.0 0.7026 0.4571 0.0 0.0 1.0 1.0 1.0 1.0 1.0
IsActiveMember 8000.0 0.5199 0.4996 0.0 0.0 1.0 1.0 1.0 1.0 1.0
EstimatedSalary 8000.0 99790.1880 57520.5089 12.0 50857.0 99505.0 149216.0 179486.0 189997.0 199992.0

Categorical Variables

Name Count Number of Unique Values Top Value Top Value Frequency Top Value Frequency %
Geography 8000.0 3.0 France 4010.0 50.12
Gender 8000.0 2.0 Male 4396.0 54.95
result2 = vm.tests.run_test(
    test_id="validmind.data_validation.ClassImbalance",
    inputs={"dataset": vm_raw_dataset},
    params={"min_percent_threshold": 30},
)

❌ Class Imbalance

Class Imbalance is designed to evaluate the distribution of target classes in a dataset used by a machine learning model. Its primary purpose is to ensure that the classes are not overly skewed, which could lead to bias in the model's predictions. A balanced training dataset is crucial to avoid creating a model that performs well for the majority class but poorly for the minority class.

The test operates by calculating the frequency of each class in the target column of the dataset, expressed as a percentage. It checks whether each class appears in at least a set minimum percentage of the total records, with the default threshold set at 10%. This involves counting the occurrences of each class and dividing by the total number of records to obtain the percentage. The test then compares these percentages against the threshold to determine if any class is under-represented. A class is marked as high risk if its percentage falls below the threshold, indicating potential imbalance. The typical range for these percentages is from 0% to 100%, with values below the threshold considered poor.

The primary advantages of this test include its ability to quickly identify under-represented classes that could affect model efficiency. The straightforward calculation makes it swift and easy to implement. It is highly informative, not only spotting imbalance but also quantifying the degree of imbalance. The adjustable threshold allows for flexibility, adapting to different use-cases or domain-specific needs. Additionally, the test provides a visual plot showing class proportions, enhancing interpretability and comprehension of the data.

It should be noted that the test might struggle with datasets containing a high number of classes, where imbalance could be inherent. Sensitivity to the threshold value might lead to incorrect imbalance detection if set too high. The test does not account for varying costs or impacts of misclassifying different classes, which can vary by application. While it identifies imbalances, it does not offer methods to address them. The test is applicable only for classification tasks and unsuitable for regression or clustering.

This test shows the results in both tabular and graphical formats. The table presents the percentage of rows for each class and whether they pass or fail the threshold test. The plot visually represents the class distribution, with the y-axis showing the percentage and the x-axis representing the classes. The table indicates that class '0' comprises 79.80% of the dataset and passes the threshold, while class '1' comprises 20.20% and fails. The plot corroborates this, showing a significant disparity between the two classes, with class '0' dominating the dataset. The threshold for passing is set at 30%, highlighting the imbalance as class '1' falls below this mark.

The test results reveal the following key insights:

  • Dominance of Class '0': Class '0' constitutes 79.80% of the dataset, significantly exceeding the 30% threshold, indicating a strong presence.
  • Under-representation of Class '1': Class '1' makes up only 20.20% of the dataset, failing the threshold and highlighting a potential imbalance issue.

Based on these results, the dataset exhibits a notable class imbalance, with class '0' being predominant and class '1' under-represented. This imbalance could lead to biased model predictions, favoring the majority class. The visual and tabular data both emphasize the need for addressing this imbalance to ensure fair and accurate model performance across all classes. The insights suggest that without intervention, the model may struggle to accurately predict the minority class, impacting its overall effectiveness.

Parameters:

{
  "min_percent_threshold": 30
}
            

Tables

Exited Class Imbalance

Exited Percentage of Rows (%) Pass/Fail
0 79.80% Pass
1 20.20% Fail

Figures

ValidMind Figure validmind.data_validation.ClassImbalance:25db

The output above shows that the class imbalance test did not pass according to the value we set for min_percent_threshold.

To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:

import pandas as pd

raw_copy_df = raw_df.sample(frac=1)  # Create a copy of the raw dataset

# Create a balanced dataset with the same number of exited and not exited customers
exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 1]
not_exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 0].sample(n=exited_df.shape[0])

balanced_raw_df = pd.concat([exited_df, not_exited_df])
balanced_raw_df = balanced_raw_df.sample(frac=1, random_state=42)

With this new balanced dataset, you can re-run the individual test to see if it now passes the class imbalance test requirement.

As this is technically a different dataset, remember to first initialize a new ValidMind Dataset object to pass in as input as required by run_test():

# Register new data and now 'balanced_raw_dataset' is the new dataset object of interest
vm_balanced_raw_dataset = vm.init_dataset(
    dataset=balanced_raw_df,
    input_id="balanced_raw_dataset",
    target_column="Exited",
)
# Pass the initialized `balanced_raw_dataset` as input into the test run
result = vm.tests.run_test(
    test_id="validmind.data_validation.ClassImbalance",
    inputs={"dataset": vm_balanced_raw_dataset},
    params={"min_percent_threshold": 30},
)

✅ Class Imbalance

Class Imbalance is designed to evaluate the distribution of target classes in a dataset used by a machine learning model. Its primary purpose is to ensure that the classes are not overly skewed, which could lead to bias in the model's predictions. A balanced training dataset is crucial to avoid creating a model that is biased with high accuracy for the majority class and low accuracy for the minority class.

The test operates by calculating the frequency of each class in the target column of the dataset, expressed as a percentage. It checks whether each class appears in at least a set minimum percentage of the total records, with the default threshold set to 10%. This involves counting the occurrences of each class and dividing by the total number of records to obtain the percentage. The test then compares these percentages against the threshold to determine if any class is under-represented. A class is marked as high risk if it falls below the threshold, indicating potential imbalance. The typical range for these percentages is from 0% to 100%, with values below the threshold considered poor.

The primary advantages of this test include its ability to spot under-represented classes that could affect the efficiency of a machine learning model. The calculation is straightforward and swift, making it easy to implement. The test is highly informative as it not only spots imbalance but also quantifies the degree of imbalance. The adjustable threshold allows flexibility and adaptation to different use-cases or domain-specific needs. Additionally, the test provides a visually insightful plot showing the classes and their corresponding proportions, enhancing interpretability and comprehension of the data.

It should be noted that the test might struggle to perform well or provide vital insights for datasets with a high number of classes, where imbalance could be inevitable due to inherent class distribution. Sensitivity to the threshold value might result in faulty detection of imbalance if the threshold is set excessively high. Regardless of the percentage threshold, it doesn't account for varying costs or impacts of misclassifying different classes, which might fluctuate based on specific applications or domains. While it can identify imbalances in class distribution, it doesn't provide direct methods to address or correct these imbalances. The test is only applicable for classification operations and unsuitable for regression or clustering tasks.

This test shows a table and a plot representing the class distribution in the dataset. The table titled "Exited Class Imbalance" displays two classes, 0 and 1, each with a percentage of rows at 50.00%, and both classes pass the test based on the set threshold of 30%. The plot visually confirms this balance, with two bars of equal height representing the classes, each reaching the 0.5 mark on the y-axis, indicating 50% representation. The x-axis represents the class labels, while the y-axis shows the percentage. The equal distribution of the bars suggests a balanced dataset with no class falling below the threshold, indicating no immediate risk of class imbalance.

The test results reveal the following key insights:

  • Balanced Class Distribution: Both classes, 0 and 1, have an equal representation of 50% in the dataset, indicating a perfectly balanced class distribution.
  • Threshold Compliance: Each class exceeds the minimum percentage threshold of 30%, passing the test and suggesting no risk of class imbalance.

Based on these results, the dataset demonstrates a balanced class distribution, with both classes equally represented at 50%. This balance suggests that the model is unlikely to be biased towards any particular class, as each class meets and exceeds the minimum threshold requirement. The equal distribution ensures that the model can potentially perform well across different classes without favoring one over the other. This balance is crucial for maintaining the model's predictive accuracy and fairness, particularly in applications where class representation is critical.

Parameters:

{
  "min_percent_threshold": 30
}
            

Tables

Exited Class Imbalance

Exited Percentage of Rows (%) Pass/Fail
0 50.00% Pass
1 50.00% Pass

Figures

ValidMind Figure validmind.data_validation.ClassImbalance:aad8

Utilize test output

You can utilize the output from a ValidMind test for further use, for example, if you want to remove highly correlated features. Removing highly correlated features helps make the model simpler, more stable, and easier to understand.

Below we demonstrate how to retrieve the list of features with the highest correlation coefficients and use them to reduce the final list of features for modeling.

First, we'll run validmind.data_validation.HighPearsonCorrelation with the balanced_raw_dataset we initialized previously as input as is for comparison with later runs:

corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_balanced_raw_dataset},
)

❌ High Pearson Correlation

High Pearson Correlation is designed to identify highly correlated feature pairs in a dataset, suggesting feature redundancy or multicollinearity. The primary purpose of this test is to measure the linear relationship between features, which can indicate potential issues such as feature redundancy or multicollinearity that may affect the performance and interpretability of machine learning models.

The test operates by calculating pairwise Pearson correlations for all features in the dataset. It measures the strength and direction of the linear relationship between two variables, with the correlation coefficient ranging from -1 to 1. A value close to 1 indicates a strong positive linear relationship, while a value close to -1 indicates a strong negative linear relationship. A value around 0 suggests no linear relationship. The test sorts these correlations, removes duplicates and self-correlations, and evaluates them against a pre-set threshold, which is 0.3 by default. If the absolute value of a correlation exceeds this threshold, it is flagged as a potential issue. The test also returns the top n strongest correlations, providing a clear view of the most significant relationships in the dataset.

The primary advantages of this test include its ability to quickly and effectively identify linear relationships between feature pairs, which is crucial for understanding potential multicollinearity issues. This transparency allows developers and risk management teams to address these issues early in the model development process. The test's output is straightforward, displaying pairs of correlated variables along with their Pearson correlation coefficients and a Pass or Fail status. This makes it particularly useful for scenarios where model interpretability and performance are critical, as it helps ensure that the model is not unduly influenced by redundant features.

It should be noted that the test is limited to identifying linear relationships and does not account for nonlinear dependencies, which may also be present in the data. Additionally, the Pearson correlation coefficient is sensitive to outliers, which can skew the results and lead to misleading conclusions. The test only identifies redundancy within feature pairs, potentially missing more complex relationships involving three or more variables. High correlation coefficients exceeding the threshold indicate a risk of multicollinearity, which can lead to model overfitting and reduced interpretability.

This test shows the results in a tabular format, where each row represents a pair of features along with their correlation coefficient and a Pass or Fail status. The table includes columns for the feature pair, the calculated Pearson correlation coefficient, and whether the correlation passes the threshold test. The coefficients range from -0.1972 to 0.3489, with the threshold set at 0.3. The table highlights the feature pair (Age, Exited) with a coefficient of 0.3489, which fails the test, indicating a potential issue with multicollinearity. Other feature pairs, such as (IsActiveMember, Exited) and (Balance, NumOfProducts), have lower coefficients and pass the test, suggesting weaker linear relationships.

The test results reveal the following key insights:

  • Age and Exited Correlation: The feature pair (Age, Exited) has a correlation coefficient of 0.3489, which exceeds the threshold and fails the test, indicating a strong linear relationship that may suggest multicollinearity.
  • IsActiveMember and Exited Correlation: The feature pair (IsActiveMember, Exited) has a correlation coefficient of -0.1972, which passes the test, indicating a moderate negative linear relationship that does not pose a significant risk of multicollinearity.
  • Balance and NumOfProducts Correlation: The feature pair (Balance, NumOfProducts) has a correlation coefficient of -0.1758, passing the test and suggesting a weak negative linear relationship.
  • Overall Correlation Patterns: Most feature pairs exhibit weak correlations, with coefficients close to zero, indicating limited linear relationships and reduced risk of multicollinearity.

Based on these results, the test highlights a significant linear relationship between Age and Exited, which may require further investigation to address potential multicollinearity issues. The other feature pairs show weaker correlations, suggesting that they are less likely to contribute to redundancy or multicollinearity in the model. These insights provide a clearer understanding of the dataset's structure, allowing for more informed decisions regarding feature selection and model development. The results emphasize the importance of addressing the identified high correlation to ensure the model's robustness and interpretability.

Parameters:

{
  "max_threshold": 0.3
}
            

Tables

Columns Coefficient Pass/Fail
(Age, Exited) 0.3489 Fail
(IsActiveMember, Exited) -0.1972 Pass
(Balance, NumOfProducts) -0.1758 Pass
(Balance, Exited) 0.1637 Pass
(Age, Balance) 0.0552 Pass
(NumOfProducts, Exited) -0.0513 Pass
(HasCrCard, IsActiveMember) -0.0445 Pass
(Age, NumOfProducts) -0.0405 Pass
(Tenure, EstimatedSalary) 0.0386 Pass
(NumOfProducts, IsActiveMember) 0.0381 Pass

The output above shows that the test did not pass according to the value we set for max_threshold.

corr_result is an object of type TestResult. We can inspect the result object to see what the test has produced:

print(type(corr_result))
print("Result ID: ", corr_result.result_id)
print("Params: ", corr_result.params)
print("Passed: ", corr_result.passed)
print("Tables: ", corr_result.tables)
<class 'validmind.vm_models.result.result.TestResult'>
Result ID:  validmind.data_validation.HighPearsonCorrelation
Params:  {'max_threshold': 0.3}
Passed:  False
Tables:  [ResultTable]

Let's remove the highly correlated features and create a new VM dataset object.

We'll begin by checking out the table in the result and extracting a list of features that failed the test:

# Extract table from `corr_result.tables`
features_df = corr_result.tables[0].data
features_df
Columns Coefficient Pass/Fail
0 (Age, Exited) 0.3489 Fail
1 (IsActiveMember, Exited) -0.1972 Pass
2 (Balance, NumOfProducts) -0.1758 Pass
3 (Balance, Exited) 0.1637 Pass
4 (Age, Balance) 0.0552 Pass
5 (NumOfProducts, Exited) -0.0513 Pass
6 (HasCrCard, IsActiveMember) -0.0445 Pass
7 (Age, NumOfProducts) -0.0405 Pass
8 (Tenure, EstimatedSalary) 0.0386 Pass
9 (NumOfProducts, IsActiveMember) 0.0381 Pass
# Extract list of features that failed the test
high_correlation_features = features_df[features_df["Pass/Fail"] == "Fail"]["Columns"].tolist()
high_correlation_features
['(Age, Exited)']

Next, extract the feature names from the list of strings (example: (Age, Exited) > Age):

high_correlation_features = [feature.split(",")[0].strip("()") for feature in high_correlation_features]
high_correlation_features
['Age']

Now, it's time to re-initialize the dataset with the highly correlated features removed.

Note the use of a different input_id. This allows tracking the inputs used when running each individual test.

# Remove the highly correlated features from the dataset
balanced_raw_no_age_df = balanced_raw_df.drop(columns=high_correlation_features)

# Re-initialize the dataset object
vm_raw_dataset_preprocessed = vm.init_dataset(
    dataset=balanced_raw_no_age_df,
    input_id="raw_dataset_preprocessed",
    target_column="Exited",
)

Re-running the test with the reduced feature set should pass the test:

corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset_preprocessed},
)

✅ High Pearson Correlation

High Pearson Correlation is designed to identify highly correlated feature pairs in a dataset, suggesting feature redundancy or multicollinearity. The primary purpose of this test is to measure the linear relationship between features, which can indicate potential issues such as feature redundancy or multicollinearity that may affect the performance and interpretability of machine learning models.

The test operates by calculating pairwise Pearson correlations for all features in the dataset. It measures the strength and direction of the linear relationship between two variables, with the correlation coefficient ranging from -1 to 1. A value of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. The test sorts these correlations, removing duplicates and self-correlations, and evaluates them against a pre-set threshold (defaulted at 0.3). If the absolute value of a correlation exceeds this threshold, it suggests a significant linear relationship. The test then returns the top n strongest correlations, providing a Pass or Fail status based on the threshold.

The primary advantages of this test include its ability to quickly and simply identify relationships between feature pairs, which is crucial for early detection of multicollinearity issues that could disrupt model training. The test's transparent output displays pairs of correlated variables along with their Pearson correlation coefficients and Pass or Fail status, making it easy to interpret. This feature is particularly useful in scenarios where model interpretability is critical, as it helps in understanding which features may be redundant and could potentially be removed or adjusted to improve model performance.

It should be noted that the test is limited to identifying linear relationships and does not account for nonlinear dependencies, which could also affect model performance. Additionally, the Pearson correlation coefficient is sensitive to outliers, meaning that a few extreme values can significantly impact the results. The test is also limited to identifying redundancy within feature pairs and may not detect more complex relationships involving three or more variables. High correlation coefficients exceeding the threshold indicate a risk of multicollinearity, which can lead to model overfitting and reduced interpretability.

This test shows the results in a tabular format, listing feature pairs, their correlation coefficients, and a Pass or Fail status. Each row represents a pair of features, with the "Columns" field indicating the feature pair, the "Coefficient" field showing the Pearson correlation coefficient, and the "Pass/Fail" field indicating whether the correlation exceeds the threshold. The coefficients range from -0.1972 to 0.0386, all of which are below the threshold of 0.3, resulting in a Pass status for all pairs. Notable observations include the highest correlation of -0.1972 between "IsActiveMember" and "Exited" and the lowest of -0.0328 between "Tenure" and "IsActiveMember," indicating weak linear relationships across the dataset.

The test results reveal the following key insights:

  • Weak Correlations Across Features: All feature pairs exhibit weak correlations, with coefficients ranging from -0.1972 to 0.0386, indicating no significant linear relationships.
  • No Multicollinearity Detected: Since all correlation coefficients are below the threshold of 0.3, there is no evidence of multicollinearity among the features.
  • Balanced Feature Relationships: The distribution of correlation coefficients suggests balanced relationships among features, with no pair showing a dominant linear relationship.

Based on these results, the dataset does not exhibit significant linear relationships between feature pairs, suggesting that multicollinearity is not a concern. The weak correlations indicate that the features are relatively independent, which is beneficial for model interpretability and performance. This analysis provides confidence that the features can be used in modeling without the risk of redundancy or overfitting due to multicollinearity. The results support the robustness of the dataset for further analysis and model development.

Parameters:

{
  "max_threshold": 0.3
}
            

Tables

Columns Coefficient Pass/Fail
(IsActiveMember, Exited) -0.1972 Pass
(Balance, NumOfProducts) -0.1758 Pass
(Balance, Exited) 0.1637 Pass
(NumOfProducts, Exited) -0.0513 Pass
(HasCrCard, IsActiveMember) -0.0445 Pass
(Tenure, EstimatedSalary) 0.0386 Pass
(NumOfProducts, IsActiveMember) 0.0381 Pass
(Tenure, HasCrCard) 0.0367 Pass
(CreditScore, EstimatedSalary) -0.0332 Pass
(Tenure, IsActiveMember) -0.0328 Pass

You can also plot the correlation matrix to visualize the new correlation between features:

corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.PearsonCorrelationMatrix",
    inputs={"dataset": vm_raw_dataset_preprocessed},
)

Pearson Correlation Matrix

Pearson Correlation Matrix is designed to evaluate the extent of linear dependency between numerical variables in a dataset. The primary purpose is to identify potential redundancy by revealing high correlations, which can help in reducing the dimensionality of the dataset without significantly impacting the model's performance.

The test operates by generating a correlation matrix for all numerical variables using the Pearson correlation formula. This formula measures the linear relationship between two variables, producing a coefficient that ranges from -1 to 1. A value of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. The test visualizes this matrix as a heat map, where the color intensity represents the magnitude and direction of the correlation. High correlations, typically above 0.7 in absolute terms, are highlighted to indicate potential redundancy.

The primary advantages of this test include its ability to detect and quantify the linearity of relationships between variables, which aids in identifying redundant variables. This can simplify models and potentially improve performance by reducing complexity. The heatmap visualization provides an intuitive overview of correlations, making it accessible even to users who may not be comfortable with numerical matrices. This visual representation helps in quickly identifying areas of concern or interest within the dataset.

It should be noted that this test is limited to detecting linear relationships, potentially missing non-linear dependencies that could also be important for dimensionality reduction. It measures only the degree of linear relationship, not the strength of one variable's effect on another. The chosen threshold of 0.7 for high correlation is somewhat arbitrary and might exclude valid dependencies with lower coefficients. Additionally, a large number of highly correlated variables can indicate redundancy, posing a risk of overfitting if not addressed.

This test shows a heat map representing the Pearson correlation coefficients between pairs of numerical variables in the dataset. The axes of the heat map list the variables, and each cell's color indicates the correlation coefficient between the corresponding pair of variables. The scale ranges from -1 to 1, with darker colors representing stronger correlations. Notably, the heat map highlights correlations above 0.7 in white, indicating significant linear relationships. In this particular dataset, most correlations are low, with values close to zero, suggesting minimal linear dependency between the variables. However, some variables, such as "CreditScore" and "Tenure," show perfect correlation with themselves, as expected.

The test results reveal the following key insights:

  • Low Overall Correlation: Most variable pairs exhibit low correlation coefficients, indicating minimal linear dependency.
  • Notable Correlation: The "Balance" and "Exited" variables show a moderate positive correlation of 0.16, suggesting a potential relationship worth further exploration.
  • Self-Correlation: Variables like "CreditScore" and "Tenure" show perfect self-correlation, as expected, confirming the accuracy of the matrix.

Based on these results, the dataset exhibits generally low linear correlations between variables, suggesting minimal redundancy. The moderate correlation between "Balance" and "Exited" may warrant further investigation to understand its implications for the model. Overall, the low correlation levels indicate that the dataset's dimensionality is unlikely to be significantly reduced through linear dependency analysis alone. This suggests that other methods, such as exploring non-linear relationships, may be necessary to achieve further dimensionality reduction or model simplification.

Figures

ValidMind Figure validmind.data_validation.PearsonCorrelationMatrix:8dec

Documenting test results

Now that we've done some analysis on two different datasets, we can use ValidMind to easily document why certain things were done to our raw data with testing to support it.

Every test result returned by the run_test() function has a .log() method that can be used to send the test results to the ValidMind Platform:

  • When using run_documentation_tests(), documentation sections will be automatically populated with the results of all tests registered in the documentation template.
  • When logging individual test results to the platform, you'll need to manually add those results to the desired section of the model documentation.

To demonstrate how to add test results to your model documentation, we'll populate the entire Data Preparation section of the documentation using the clean vm_raw_dataset_preprocessed dataset as input, and then document an additional individual result for the highly correlated dataset vm_balanced_raw_dataset.

Run and log multiple tests

run_documentation_tests() allows you to run multiple tests at once and automatically log the results to your documentation. Below, we'll run the tests using the previously initialized vm_raw_dataset_preprocessed as input — this will populate the entire Data Preparation section for every test that is part of the documentation template.

For this example, we'll pass in the following arguments:

  • inputs: Any inputs to be passed to the tests.
  • config: A dictionary <test_id>:<test_config> that allows configuring each test individually. Each test config requires the following:
    • params: Individual test parameters.
    • inputs: Individual test inputs. This overrides any inputs passed from the run_documentation_tests() function.

When including explicit configuration for individual tests, you'll need to specify the inputs even if they mirror what is included in your global configuration.

# Individual test config with inputs specified
test_config = {
    "validmind.data_validation.ClassImbalance": {
        "params": {"min_percent_threshold": 30},
        "inputs": {"dataset": vm_raw_dataset_preprocessed},
    },
    "validmind.data_validation.HighPearsonCorrelation": {
        "params": {"max_threshold": 0.3},
        "inputs": {"dataset": vm_raw_dataset_preprocessed},
    },
}

# Global test config
tests_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_raw_dataset_preprocessed,
    },
    config=test_config,
    section=["data_preparation"],
)
Test suite complete!
26/26 (100.0%)

Test Suite Results: Binary Classification V2


Check out the updated documentation on ValidMind.

Template for binary classification models.

Data Preparation

Run and log an individual test

Next, we'll use the previously initialized vm_balanced_raw_dataset (that still has a highly correlated Age column) as input to run an individual test, then log the result to the ValidMind Platform.

When running individual tests, you can use a custom result_id to tag the individual result with a unique identifier:

  • This result_id can be appended to test_id with a : separator.
  • The balanced_raw_dataset result identifier will correspond to the balanced_raw_dataset input, the dataset that still has the Age column.
result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation:balanced_raw_dataset",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_balanced_raw_dataset},
)
result.log()

❌ High Pearson Correlation Balanced Raw Dataset

High Pearson Correlation: Balanced Raw Dataset is designed to identify highly correlated feature pairs in a dataset, which may suggest feature redundancy or multicollinearity. The primary purpose of this test is to detect linear relationships between features that could impact the performance and interpretability of machine learning models. By identifying these correlations, developers and risk management teams can address potential issues that may arise from multicollinearity, such as model overfitting and reduced interpretability.

The test operates by calculating pairwise Pearson correlation coefficients for all features in the dataset. The Pearson correlation coefficient is a statistical measure that quantifies the linear relationship between two variables, ranging from -1 to 1. A value of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. The test sorts these coefficients, removes duplicate and self-correlations, and evaluates each pair against a pre-set threshold (0.3 in this case). If the absolute value of the coefficient exceeds this threshold, the pair is flagged as a potential issue. The test outputs the top n strongest correlations, providing a Pass or Fail status based on the threshold.

The primary advantages of this test include its ability to quickly and effectively identify linear relationships between feature pairs, which is crucial for early detection of multicollinearity issues. This transparency allows for straightforward interpretation of the results, as the test outputs a clear list of correlated variable pairs along with their correlation coefficients and Pass/Fail status. This makes it particularly useful in scenarios where model interpretability is critical, as it helps ensure that each feature's predictive power is authentic and not overshadowed by redundant variables.

It should be noted that the test is limited to detecting linear relationships and may not capture nonlinear dependencies between features. Additionally, the Pearson correlation coefficient is sensitive to outliers, which can skew results and lead to misleading interpretations. The test also focuses on pairwise correlations, potentially missing more complex multicollinearity involving three or more variables. High correlation coefficients exceeding the threshold indicate a risk of multicollinearity, which can lead to model overfitting and reduced interpretability.

This test shows the results in a tabular format, listing feature pairs, their Pearson correlation coefficients, and a Pass or Fail status based on the threshold of 0.3. The table provides a clear view of which feature pairs exhibit significant linear relationships. The columns represent the feature pairs, the calculated correlation coefficient, and the Pass/Fail status. The coefficients range from -0.1972 to 0.3489, with the highest correlation observed between "Age" and "Exited" at 0.3489, which fails the test due to exceeding the threshold. The table allows for easy identification of potential multicollinearity issues, with the Pass/Fail status indicating whether each pair's correlation is within acceptable limits.

The test results reveal the following key insights:

  • High Correlation Between Age and Exited: The feature pair "Age" and "Exited" shows a correlation coefficient of 0.3489, which exceeds the threshold and is marked as Fail, indicating a strong linear relationship that could affect model performance.
  • Low Correlation Among Other Features: Other feature pairs, such as "IsActiveMember" and "Exited" or "Balance" and "NumOfProducts," have correlation coefficients well below the threshold, indicating weak linear relationships and passing the test.
  • Overall Low Multicollinearity Risk: Most feature pairs exhibit low correlation coefficients, suggesting minimal risk of multicollinearity affecting the model's interpretability and performance.

Based on these results, the dataset shows a generally low risk of multicollinearity, with most feature pairs exhibiting weak linear relationships. The exception is the "Age" and "Exited" pair, which demonstrates a significant correlation that may warrant further investigation to ensure it does not adversely impact the model. The overall low correlation among other features suggests that the dataset is well-suited for modeling, with minimal redundancy and a clear distinction between the predictive power of individual variables. This analysis provides a comprehensive understanding of the dataset's feature relationships, guiding further model development and refinement.

Parameters:

{
  "max_threshold": 0.3
}
            

Tables

Columns Coefficient Pass/Fail
(Age, Exited) 0.3489 Fail
(IsActiveMember, Exited) -0.1972 Pass
(Balance, NumOfProducts) -0.1758 Pass
(Balance, Exited) 0.1637 Pass
(Age, Balance) 0.0552 Pass
(NumOfProducts, Exited) -0.0513 Pass
(HasCrCard, IsActiveMember) -0.0445 Pass
(Age, NumOfProducts) -0.0405 Pass
(Tenure, EstimatedSalary) 0.0386 Pass
(NumOfProducts, IsActiveMember) 0.0381 Pass
2025-12-31 01:03:26,426 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.HighPearsonCorrelation:balanced_raw_dataset does not exist in model's document
Note the output returned indicating that a test-driven block doesn't currently exist in your model's documentation for this particular test ID.

That's expected, as when we run individual tests the results logged need to be manually added to your documentation within the ValidMind Platform.

Add individual test results to model documentation

With the test results logged, let's head to the model we connected to at the beginning of this notebook and insert our test results into the documentation (Need more help?):

  1. From the Inventory in the ValidMind Platform, go to the model you connected to earlier.

  2. In the left sidebar that appears for your model, click Documentation under Documents.

  3. Locate the Data Preparation section and click on 2.3. Correlations and Interactions to expand that section.

  4. Hover under the Pearson Correlation Matrix content block until a horizontal dashed line with a + button appears, indicating that you can insert a new block.

    Screenshot showing insert block button in model documentation

  5. Click + and then select Test-Driven Block under FROM LIBRARY:

    • Click on VM Library under TEST-DRIVEN in the left sidebar.
    • In the search bar, type in HighPearsonCorrelation.
    • Select HighPearsonCorrelation:balanced_raw_dataset as the test.

    A preview of the test gets shown:

    Screenshot showing the HighPearsonCorrelation test selected

  6. Finally, click Insert 1 Test Result to Document to add the test result to the documentation.

    Confirm that the individual results for the high correlation test has been correctly inserted into section 2.3. Correlations and Interactions of the documentation.

  7. Finalize the documentation by editing the test result's description block to explain the changes you made to the raw data and the reasons behind them as shown in the screenshot below:

    Screenshot showing the inserted High Pearson Correlation block

Model testing

So far, we've focused on the data assessment and pre-processing that usually occurs prior to any models being built. Now, let's instead assume we have already built a model and we want to incorporate some model results into our documentation.

Train simple logistic regression model

Using ValidMind tests, we'll train a simple logistic regression model on our dataset and evaluate its performance by using the LogisticRegression class from the sklearn.linear_model.

To start, let's grab the first few rows from the balanced_raw_no_age_df dataset with the highly correlated features removed we initialized earlier:

balanced_raw_no_age_df.head()
CreditScore Geography Gender Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
4721 850 Spain Male 7 115378.94 1 0 1 162087.82 0
6987 574 Spain Male 5 120355.00 1 1 0 137793.35 0
2183 620 France Male 8 0.00 2 1 1 145937.99 0
395 531 France Female 6 0.00 1 0 0 194998.34 1
7245 702 Germany Female 5 138597.54 2 1 1 41536.59 1

Before training the model, we need to encode the categorical features in the dataset:

  • Use the OneHotEncoder class from the sklearn.preprocessing module to encode the categorical features.
  • The categorical features in the dataset are Geography and Gender.
balanced_raw_no_age_df = pd.get_dummies(
    balanced_raw_no_age_df, columns=["Geography", "Gender"], drop_first=True
)
balanced_raw_no_age_df.head()
CreditScore Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited Geography_Germany Geography_Spain Gender_Male
4721 850 7 115378.94 1 0 1 162087.82 0 False True True
6987 574 5 120355.00 1 1 0 137793.35 0 False True True
2183 620 8 0.00 2 1 1 145937.99 0 False False True
395 531 6 0.00 1 0 0 194998.34 1 False False False
7245 702 5 138597.54 2 1 1 41536.59 1 True False False

We'll split our preprocessed dataset into training and testing, to help assess how well the model generalizes to unseen data:

  • We start by dividing our balanced_raw_no_age_df dataset into training and test subsets using train_test_split, with 80% of the data allocated to training (train_df) and 20% to testing (test_df).
  • From each subset, we separate the features (all columns except "Exited") into X_train and X_test, and the target column ("Exited") into y_train and y_test.
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(balanced_raw_no_age_df, test_size=0.20)

X_train = train_df.drop("Exited", axis=1)
y_train = train_df["Exited"]
X_test = test_df.drop("Exited", axis=1)
y_test = test_df["Exited"]

Then using GridSearchCV, we'll find the best-performing hyperparameters or settings and save them:

from sklearn.linear_model import LogisticRegression

# Logistic Regression grid params
log_reg_params = {
    "penalty": ["l1", "l2"],
    "C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    "solver": ["liblinear"],
}

# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV

grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg.fit(X_train, y_train)

# Logistic Regression best estimator
log_reg = grid_log_reg.best_estimator_
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning:

'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning:

Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.

Initialize model evaluation objects

The last step for evaluating the model's performance is to initialize the ValidMind Dataset and Model objects in preparation for assigning model predictions to each dataset.

# Initialize the datasets into their own dataset objects
vm_train_ds = vm.init_dataset(
    input_id="train_dataset_final",
    dataset=train_df,
    target_column="Exited",
)

vm_test_ds = vm.init_dataset(
    input_id="test_dataset_final",
    dataset=test_df,
    target_column="Exited",
)

You'll also need to initialize a ValidMind model object (vm_model) that can be passed to other functions for analysis and tests on the data for each of our three models.

You simply initialize this model object with vm.init_model():

# Register the model
vm_model = vm.init_model(log_reg, input_id="log_reg_model_v1")

Assign predictions

Once the model has been registered you can assign model predictions to the training and testing datasets.

  • The assign_predictions() method from the Dataset object can link existing predictions to any number of models.
  • This method links the model's class prediction values and probabilities to our vm_train_ds and vm_test_ds datasets.

If no prediction values are passed, the method will compute predictions automatically:

vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)
2025-12-31 01:03:27,435 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2025-12-31 01:03:27,437 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()
2025-12-31 01:03:27,438 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while
2025-12-31 01:03:27,440 - INFO(validmind.vm_models.dataset.utils): Done running predict()
2025-12-31 01:03:27,443 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2025-12-31 01:03:27,443 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()
2025-12-31 01:03:27,445 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while
2025-12-31 01:03:27,446 - INFO(validmind.vm_models.dataset.utils): Done running predict()

Run the model evaluation tests

In this next example, we'll focus on running the tests within the Model Development section of the model documentation. Only tests associated with this section will be executed, and the corresponding results will be updated in the model documentation.

test_config = {
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {
            "dataset": vm_train_ds,
            "model": vm_model,
        },
    }
}
results = vm.run_documentation_tests(
    section=["model_development"],
    inputs={
        "dataset": vm_test_ds,  # Any test that requires a single dataset will use vm_test_ds
        "model": vm_model,
        "datasets": (
            vm_train_ds,
            vm_test_ds,
        ),  # Any test that requires multiple datasets will use vm_train_ds and vm_test_ds
    },
    config=test_config,
)
Test suite complete!
34/34 (100.0%)

Test Suite Results: Binary Classification V2


Check out the updated documentation on ValidMind.

Template for binary classification models.

Model Development

In summary

In this second notebook, you learned how to:

Next steps

Integrate custom tests

Now that you're familiar with the basics of using the ValidMind Library to run and log tests to provide evidence for your model documentation, let's learn how to incorporate your own custom tests into ValidMind: 3 — Integrate custom tests