• About
  • Get Started
  • Guides
  • ValidMind Library
    • ValidMind Library
    • Supported Models
    • QuickStart Notebook

    • TESTING
    • Run Tests & Test Suites
    • Test Descriptions
    • Test Sandbox (BETA)

    • CODE SAMPLES
    • All Code Samples · LLM · NLP · Time Series · Etc.
    • Download Code Samples · notebooks.zip
    • Try it on JupyterHub

    • REFERENCE
    • ValidMind Library Python API
  • Support
  • Training
  • Releases
  • Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • Python Library
    • ValidMind Library

    • ValidMind Academy
    • Training Courses
  • Log In
    • Public Internet
    • ValidMind Platform · US1
    • ValidMind Platform · CA1

    • Private Link
    • Virtual Private ValidMind (VPV)

    • Which login should I use?
  1. Run tests & test suites
  2. Add context to LLM-generated test descriptions

EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use

  • ValidMind Library
  • Supported models

  • QuickStart
  • Quickstart for model documentation
  • Install and initialize ValidMind Library
  • Store model credentials in .env files

  • Model Development
  • 1 — Set up ValidMind Library
  • 2 — Start model development process
  • 3 — Integrate custom tests
  • 4 — Finalize testing & documentation

  • Model Validation
  • 1 — Set up ValidMind Library for validation
  • 2 — Start model validation process
  • 3 — Developing a challenger model
  • 4 — Finalize validation & reporting

  • Model Testing
  • Run tests & test suites
    • Add context to LLM-generated test descriptions
    • Configure dataset features
    • Document multiple results for the same test
    • Explore test suites
    • Explore tests
    • Dataset Column Filters when Running Tests
    • Load dataset predictions
    • Log metrics over time
    • Run individual documentation sections
    • Run documentation tests with custom configurations
    • Run tests with multiple datasets
    • Intro to Unit Metrics
    • Understand and utilize RawData in ValidMind tests
    • Introduction to ValidMind Dataset and Model Objects
    • Run Tests
      • Run dataset based tests
      • Run comparison tests
  • Test descriptions
    • Data Validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LaggedCorrelationHeatmap
      • LJungBox
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • Nlp
        • CommonWords
        • Hashtags
        • LanguageDetection
        • Mentions
        • PolarityAndSubjectivity
        • Punctuations
        • Sentiment
        • StopWords
        • TextDescription
        • Toxicity
    • Model Validation
      • BertScore
      • BleuScore
      • ClusterSizeDistribution
      • ContextualRecall
      • FeaturesAUC
      • MeteorScore
      • ModelMetadata
      • ModelPredictionResiduals
      • RegardScore
      • RegressionResidualsPlot
      • RougeScore
      • TimeSeriesPredictionsPlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • Embeddings
        • ClusterDistribution
        • CosineSimilarityComparison
        • CosineSimilarityDistribution
        • CosineSimilarityHeatmap
        • DescriptiveAnalytics
        • EmbeddingsVisualization2D
        • EuclideanDistanceComparison
        • EuclideanDistanceHeatmap
        • PCAComponentsPairwisePlots
        • StabilityAnalysisKeyword
        • StabilityAnalysisRandomNoise
        • StabilityAnalysisSynonyms
        • StabilityAnalysisTranslation
        • TSNEComponentsPairwisePlots
      • Ragas
        • AnswerCorrectness
        • AspectCritic
        • ContextEntityRecall
        • ContextPrecision
        • ContextPrecisionWithoutReference
        • ContextRecall
        • Faithfulness
        • NoiseSensitivity
        • ResponseRelevancy
        • SemanticSimilarity
      • Sklearn
        • AdjustedMutualInformation
        • AdjustedRandIndex
        • CalibrationCurve
        • ClassifierPerformance
        • ClassifierThresholdOptimization
        • ClusterCosineSimilarity
        • ClusterPerformanceMetrics
        • CompletenessScore
        • ConfusionMatrix
        • FeatureImportance
        • FowlkesMallowsScore
        • HomogeneityScore
        • HyperParametersTuning
        • KMeansClustersOptimization
        • MinimumAccuracy
        • MinimumF1Score
        • MinimumROCAUCScore
        • ModelParameters
        • ModelsPerformanceComparison
        • OverfitDiagnosis
        • PermutationFeatureImportance
        • PopulationStabilityIndex
        • PrecisionRecallCurve
        • RegressionErrors
        • RegressionErrorsComparison
        • RegressionPerformance
        • RegressionR2Square
        • RegressionR2SquareComparison
        • RobustnessDiagnosis
        • ROCCurve
        • ScoreProbabilityAlignment
        • SHAPGlobalImportance
        • SilhouettePlot
        • TrainingTestDegradation
        • VMeasure
        • WeakspotsDiagnosis
      • Statsmodels
        • AutoARIMA
        • CumulativePredictionProbabilities
        • DurbinWatsonTest
        • GINITable
        • KolmogorovSmirnov
        • Lilliefors
        • PredictionProbabilitiesHistogram
        • RegressionCoeffs
        • RegressionFeatureSignificance
        • RegressionModelForecastPlot
        • RegressionModelForecastPlotLevels
        • RegressionModelSensitivityPlot
        • RegressionModelSummary
        • RegressionPermutationFeatureImportance
        • ScorecardHistogram
    • Ongoing Monitoring
      • CalibrationCurveDrift
      • ClassDiscriminationDrift
      • ClassificationAccuracyDrift
      • ClassImbalanceDrift
      • ConfusionMatrixDrift
      • CumulativePredictionProbabilitiesDrift
      • FeatureDrift
      • PredictionAcrossEachFeature
      • PredictionCorrelation
      • PredictionProbabilitiesHistogramDrift
      • PredictionQuantilesAcrossFeatures
      • ROCCurveDrift
      • ScoreBandsDrift
      • ScorecardHistogramDrift
      • TargetPredictionDistributionPlot
    • Prompt Validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
  • Test sandbox beta

  • Notebooks
  • Code samples
    • Capital Markets
      • Quickstart for knockout option pricing model documentation
      • Quickstart for Heston option pricing model using QuantLib
    • Credit Risk
      • Document an application scorecard model
      • Document an application scorecard model
      • Document an application scorecard model
      • Document a credit risk model
      • Document an application scorecard model
    • Custom Tests
      • Implement custom tests
      • Integrate external test providers
    • Model Validation
      • Validate an application scorecard model
    • Nlp and Llm
      • Sentiment analysis of financial data using a large language model (LLM)
      • Summarization of financial data using a large language model (LLM)
      • Sentiment analysis of financial data using Hugging Face NLP models
      • Summarization of financial data using Hugging Face NLP models
      • Automate news summarization using LLMs
      • Prompt validation for large language models (LLMs)
      • RAG Model Benchmarking Demo
      • RAG Model Documentation Demo
    • Ongoing Monitoring
      • Ongoing Monitoring for Application Scorecard
      • Quickstart for ongoing monitoring of models with ValidMind
    • Regression
      • Document a California Housing Price Prediction regression model
    • Time Series
      • Document a time series forecasting model
      • Document a time series forecasting model

  • Reference
  • ValidMind Library Python API

On this page

  • Install the ValidMind Library
  • Initialize the ValidMind Library
    • Get your code snippet
  • Initialize the Python environment
  • Load the sample dataset
    • Preprocess the raw dataset
  • Initializing the ValidMind objects
    • Initialize the datasets
    • Initialize a model object
    • Assign predictions to the datasets
  • Set custom context for test descriptions
    • Review default LLM-generated descriptions
    • Enable use case context
    • Add test-specific context
  • Edit this page
  • Report an issue
  1. Run tests & test suites
  2. Add context to LLM-generated test descriptions

Add context to LLM-generated test descriptions

When you run ValidMind tests, test descriptions are automatically generated with LLM using the test results, the test name, and the static test definitions provided in the test's docstring. While this metadata offers valuable high-level overviews of tests, insights produced by the LLM-based descriptions may not always align with your specific use cases or incorporate organizational policy requirements.

In this notebook, you'll learn how to add context to the generated descriptions by providing additional information about the test or the use case. Including custom use case context is useful when you want to highlight information about the intended use and technique of the model, or the insitution policies and standards specific to your use case.

Install the ValidMind Library

To install the library:

%pip install -q validmind

Initialize the ValidMind Library

ValidMind generates a unique code snippet for each registered model to connect with your developer environment. You initialize the ValidMind Library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet

  1. In a browser, log in to ValidMind.

  2. In the left sidebar, navigate to Model Inventory and click + Register Model.

  3. Enter the model details and click Continue. (Need more help?)

    For example, to register a model for use with this notebook, select:

    • Documentation template: Binary classification
    • Use case: Marketing/Sales - Attrition/Churn Management

    You can fill in other options according to your preference.

  4. Go to Getting Started and click Copy snippet to clipboard.

Next, load your model identifier credentials from an .env file or replace the placeholder with your own code snippet:

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
  # api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  # api_key = "...",
  # api_secret = "...",
  # model = "..."
)

Initialize the Python environment

After you've connected to your model register in the ValidMind Platform, let's import the necessary libraries and set up your Python environment for data analysis:

import xgboost as xgb
import os

%matplotlib inline

Load the sample dataset

First, we'll import a sample ValidMind dataset and load it into a pandas DataFrame, a two-dimensional tabular data structure that makes use of rows and columns:

# Import the sample dataset from the library

from validmind.datasets.classification import customer_churn

print(
    f"Loaded demo dataset with: \n\n\t• Target column: '{customer_churn.target_column}' \n\t• Class labels: {customer_churn.class_labels}"
)

raw_df = customer_churn.load_data()
raw_df.head()

Preprocess the raw dataset

Then, we'll perform a number of operations to get ready for the subsequent steps:

  • Preprocess the data: Splits the DataFrame (df) into multiple datasets (train_df, validation_df, and test_df) using demo_dataset.preprocess to simplify preprocessing.
  • Separate features and targets: Drops the target column to create feature sets (x_train, x_val) and target sets (y_train, y_val).
  • Initialize XGBoost classifier: Creates an XGBClassifier object with early stopping rounds set to 10.
  • Set evaluation metrics: Specifies metrics for model evaluation as error, logloss, and auc.
  • Fit the model: Trains the model on x_train and y_train using the validation set (x_val, y_val). Verbose output is disabled.
train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

Initializing the ValidMind objects

Initialize the datasets

Before you can run tests, you'll need to initialize a ValidMind dataset object using the init_dataset function from the ValidMind (vm) module.

We'll include the following arguments:

  • dataset — the raw dataset that you want to provide as input to tests
  • input_id - a unique identifier that allows tracking what inputs are used when running each individual test
  • target_column — a required argument if tests require access to true values. This is the name of the target column in the dataset
  • class_labels — an optional value to map predicted classes to class labels

With all datasets ready, you can now initialize the raw, training, and test datasets (raw_df, train_df and test_df) created earlier into their own dataset objects using vm.init_dataset():

vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=customer_churn.target_column,
    class_labels=customer_churn.class_labels,
)

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=customer_churn.target_column,
)

vm_test_ds = vm.init_dataset(
    dataset=test_df, input_id="test_dataset", target_column=customer_churn.target_column
)

Initialize a model object

Additionally, you'll need to initialize a ValidMind model object (vm_model) that can be passed to other functions for analysis and tests on the data.

Simply intialize this model object with vm.init_model():

vm_model = vm.init_model(
    model,
    input_id="model",
)

Assign predictions to the datasets

We can now use the assign_predictions() method from the Dataset object to link existing predictions to any model.

If no prediction values are passed, the method will compute predictions automatically:

vm_train_ds.assign_predictions(
    model=vm_model,
)

vm_test_ds.assign_predictions(
    model=vm_model,
)

Set custom context for test descriptions

Review default LLM-generated descriptions

By default, custom context for LLM-generated descriptions is disabled, meaning that the output will not include any additional context.

Let's generate an initial test description for the DatasetDescription test for comparison with later iterations:

vm.tests.run_test(
    "validmind.data_validation.DatasetDescription",
    inputs={
        "dataset": vm_raw_dataset,
    },
)

Enable use case context

To enable custom use case context, set the VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED environment variable to 1.

This is a global setting that will affect all tests for your linked model for the duration of your ValidMind Library session:

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED"] = "1"

Enabling use case context allows you to pass in additional context, such as information about your model, relevant regulatory requirements, or model validation targets to the LLM-generated text descriptions within use_case_context:

use_case_context = """

This is a customer churn prediction model for a banking loan application system using XGBoost classifier. 

Key Model Information:
- Use Case: Predict customer churn risk during loan application process
- Model Type: Binary classification using XGBoost
- Critical Decision Point: Used in loan approval workflow

Regulatory Requirements:
- Subject to model risk management review and validation
- Results require validation review for regulatory compliance
- Model decisions directly impact loan approval process
- Does this result raise any regulatory concerns?

Validation Focus:
- Explain strengths and weaknesses of the test and the context of whether the result is acceptable.
- What does the result indicate about model reliability?
- Is the result within acceptable thresholds for loan decisioning?
- What are the implications for customer impact?

""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = use_case_context

With the use case context set, generate an updated test description for the DatasetDescription test for comparison with default output:

vm.tests.run_test(
    "validmind.data_validation.DatasetDescription",
    inputs={
        "dataset": vm_raw_dataset,
    },
).log()

Disable use case context

To disable custom use case context, set the VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED environment variable to 0.

This is a global setting that will affect all tests for your linked model for the duration of your ValidMind Library session:

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED"] = "0"

With the use case context disabled again, generate another test description for the DatasetDescription test for comparison with previous custom output:

vm.tests.run_test(
    "validmind.data_validation.DatasetDescription",
    inputs={
        "dataset": vm_raw_dataset,
    },
).log()

Add test-specific context

In addition to the model-level use_case_context, you're able to add test-specific context to your LLM-generated descriptions allowing you to provide test-specific validation criteria about the test that is being run.

We'll reenable use case context by setting the VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED environment variable to 1, then join the test-specific context to the use case context using the VALIDMIND_LLM_DESCRIPTIONS_CONTEXT environment variable.

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED"] = "1"

Dataset Description

Rather than relying on generic dataset result descriptions in isolation, we'll use the context to specify precise thresholds for missing values, appropriate data types for banking variables (like CreditScore and Balance), and valid value ranges based on particular business rules:

test_context = """

Acceptance Criteria:
- Missing Values: All critical features must have less than 5% missing values (including CreditScore, Balance, Age)
- Data Types: All columns must have appropriate data types (numeric for CreditScore/Balance/Age, categorical for Geography/Gender)
- Cardinality: Categorical variables must have fewer than 50 unique values, while continuous variables should show appropriate distinct value counts (e.g., high for EstimatedSalary, exactly 2 for Boolean fields)
- Value Ranges: Numeric fields must fall within business-valid ranges (CreditScore: 300-850, Age: ≥18, Balance: ≥0)
""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate an updated test description for the DatasetDescription test again:

vm.tests.run_test(
    "validmind.data_validation.DatasetDescription",
    inputs={
        "dataset": vm_raw_dataset,
    },
)

Class Imbalance

The following test-specific context example adds value to the LLM-generated description by providing defined risk levels to assess class representation:

  • By categorizing classes into Low, Medium, and High Risk, the LLM can generate more nuanced and actionable insights, ensuring that the analysis aligns with business requirements for balanced datasets.
  • This approach not only highlights potential issues but also guides necessary documentation and mitigation strategies for high-risk classes.
test_context = """

Acceptance Criteria:

• Risk Levels for Class Representation:
  - Low Risk: Each class represents 20% or more of the total dataset
  - Medium Risk: Each class represents between 10% and 19.9% of the total dataset
  - High Risk: Any class represents less than 10% of the total dataset

• Overall Requirement:
  - All classes must achieve at least Medium Risk status to pass
""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the ClassImbalance test for review:

vm.tests.run_test(
    "validmind.data_validation.ClassImbalance",
    inputs={
        "dataset": vm_raw_dataset,
    },
    params={
        "min_percent_threshold": 10,
    },
)

High Cardinality

In this below case, the context specifies a risk-based criteria for the number of distinct values in categorical features.

This helps the LLM to generate more nuanced and actionable insights, ensuring the descriptions are more relevant to your organization's policies.

test_context = """

Acceptance Criteria:

• Risk Levels for Distinct Values in Categorical Features:
  - Low Risk: Each categorical column has fewer than 50 distinct values or less than 5% unique values relative to the total dataset size
  - Medium Risk: Each categorical column has between 50 and 100 distinct values or between 5% and 10% unique values
  - High Risk: Any categorical column has more than 100 distinct values or more than 10% unique values

• Overall Requirement:
  - All categorical columns must achieve at least Medium Risk status to pass
""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the HighCardinality test for review:

vm.tests.run_test(
    "validmind.data_validation.HighCardinality",
    inputs={
        "dataset": vm_raw_dataset,
    },
    params= {
        "num_threshold": 100,
        "percent_threshold": 0.1,
        "threshold_type": "percent"
        }
)

Missing Values

Here, we use the test-specific context to establish differentiated risk thresholds across features.

Rather than applying uniform criteria, the context allows for specific requirements for critical financial features (CreditScore, Balance, Age).

test_context = """
Test-Specific Context for Missing Values Analysis:

Acceptance Criteria:

• Risk Levels for Missing Values:
  - Low Risk: Less than 1% missing values in any column
  - Medium Risk: Between 1% and 5% missing values
  - High Risk: More than 5% missing values

• Feature-Specific Requirements:
  - Critical Features (CreditScore, Balance, Age):
    * Must maintain Low Risk status
    * No missing values allowed
  
  - Secondary Features (Tenure, NumOfProducts, EstimatedSalary):
    * Must achieve at least Medium Risk status
    * Up to 3% missing values acceptable

  - Categorical Features (Geography, Gender):
    * Must achieve at least Medium Risk status
    * Up to 5% missing values acceptable
""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the MissingValues test for review:

vm.tests.run_test(
    "validmind.data_validation.MissingValues",
    inputs={
        "dataset": vm_raw_dataset,
    },
    params= {
        "min_threshold": 1
        }
)

Unique Rows

This example context establishes variable-specific thresholds based on business expectations.

Rather than applying uniform criteria, it recognizes that high variability is expected in features like EstimatedSalary (>90%) and Balance (>50%), while enforcing strict limits on categorical features like Geography (<5 values), ensuring meaningful validation aligned with banking data characteristics.

test_context = """

Acceptance Criteria:

• High-Variability Expected Features:
  - EstimatedSalary: Must have >90% unique values
  - Balance: Must have >50% unique values
  - CreditScore: Must have between 5-10% unique values

• Medium-Variability Features:
  - Age: Should have between 0.5-2% unique values
  - Tenure: Should have between 0.1-0.5% unique values

• Low-Variability Features:
  - Binary Features (HasCrCard, IsActiveMember, Gender, Exited): Must have exactly 2 unique values
  - Geography: Must have fewer than 5 unique values
  - NumOfProducts: Must have fewer than 10 unique values

• Overall Requirements:
  - Features must fall within their specified ranges to pass
""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the UniqueRows test for review:

vm.tests.run_test(
    "validmind.data_validation.UniqueRows",
    inputs={
        "dataset": vm_raw_dataset,
    },
    params= {
        "min_percent_threshold": 1
        }
)

Too Many Zero Values

Here, test-specific context is used to provide meaning and expectations for different variables:

  • For instance, zero values in Balance and Tenure indicate risk, whereas zeros in binary variables like HasCrCard or IsActiveMember are expected.
  • This tailored context ensures that the analysis accurately reflects the business significance of zero values across different features.
test_context = """

Acceptance Criteria:
- Numerical Features Only: Test evaluates only continuous numeric columns (Balance, Tenure), 
  excluding binary columns (HasCrCard, IsActiveMember)

- Risk Level Thresholds for Balance and Tenure:
  - High Risk: More than 5% zero values
  - Medium Risk: Between 3% and 5% zero values
  - Low Risk: Less than 3% zero values

- Individual Column Requirements:
  - Balance: Must be Low Risk (banking context requires accurate balance tracking)
  - Tenure: Must be Low or Medium Risk (some zero values acceptable for new customers)

• Overall Test Result: Test must achieve "Pass" status (Low Risk) for Balance, and at least Medium Risk for Tenure

""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the TooManyZeroValues test for review:

vm.tests.run_test(
    "validmind.data_validation.TooManyZeroValues",
    inputs={
        "dataset": vm_raw_dataset,
    },
    params= {
        "max_percent_threshold": 0.03
        }
)

IQR Outliers Table

In this case, we use test-specific context to incorporate risk levels tailored to key variables, like CreditScore, Age, and NumOfProducts, that otherwise would not be considered for outlier analysis if we ran the test without context where all variables would be evaluated without any business criteria.

test_context = """

Acceptance Criteria:
- Risk Levels for Outliers:
    - Low Risk: 0-50 outliers
    - Medium Risk: 51-300 outliers
    - High Risk: More than 300 outliers
- Feature-Specific Requirements:
    - CreditScore, Age, NumOfProducts: Must maintain Low Risk status to ensure data quality and model reliability

""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the IQROutliersTable test for review:

vm.tests.run_test(
    "validmind.data_validation.IQROutliersTable",
    inputs={
        "dataset": vm_raw_dataset,
    },
    params= {
        "threshold": 1.5
        }
)

Descriptive Statistics

Test-specific context is used in this case to provide risk-based thresholds aligned with the bank's policy.

For instance, CreditScore ranges of 550-850 are considered low risk based on standard credit assessment practices, while Balance thresholds reflect typical retail banking ranges.

test_context = """

Acceptance Criteria:

• CreditScore:
  - Low Risk: 550-850
  - Medium Risk: 450-549
  - High Risk: <450 or missing
  - Justification: Banking standards require reliable credit assessment

• Age:
  - Low Risk: 18-75
  - Medium Risk: 76-85
  - High Risk: >85 or <18
  - Justification: Core banking demographic with age-appropriate products

• Balance:
  - Low Risk: 0-200,000
  - Medium Risk: 200,001-250,000
  - High Risk: >250,000
  - Justification: Typical retail banking balance ranges

• Tenure:
  - Low Risk: 1-10 years
  - Medium Risk: <1 year
  - High Risk: 0 or >10 years
  - Justification: Expected customer relationship duration

• EstimatedSalary:
  - Low Risk: 25,000-150,000
  - Medium Risk: 150,001-200,000
  - High Risk: <25,000 or >200,000
  - Justification: Typical income ranges for retail banking customers

""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the DescriptiveStatistics test for review:

vm.tests.run_test(
    "validmind.data_validation.DescriptiveStatistics",
    inputs={
        "dataset": vm_raw_dataset,
    },
)

Pearson Correlation Matrix

For this test, the context provides meaningful correlation ranges between specific variable pairs based on business criteria.

For example, while a general correlation analysis might flag any correlation above 0.7 as concerning, the test-specific context specifies that Balance and NumOfProducts should maintain a negative correlation between -0.4 and 0, reflecting expected banking relationships.

test_context = """

Acceptance Criteria:

• Target Variable Correlations (Exited):
  - Must show correlation coefficients between ±0.1 and ±0.3 with Age, CreditScore, and Balance
  - Should not exceed ±0.2 correlation with other features
  - Justification: Ensures predictive power while avoiding target leakage

• Feature Correlations:
  - Balance & NumOfProducts: Must maintain correlation between -0.4 and 0
  - Age & Tenure: Should show positive correlation between 0.1 and 0.3
  - CreditScore & Balance: Should maintain correlation between 0.1 and 0.3

• Binary Feature Correlations:
  - HasCreditCard & IsActiveMember: Must not exceed ±0.15 correlation
  - Binary features should not show strong correlations (>±0.2) with continuous features

• Overall Requirement:
  - No feature pair should exceed ±0.7 correlation to avoid multicollinearity

""".strip()

context = f"""
{use_case_context}

{test_context}
""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

With the test-specific context set, generate a test description for the PearsonCorrelationMatrix test for review:

vm.tests.run_test(
    "validmind.data_validation.PearsonCorrelationMatrix",
    inputs={
        "dataset": vm_raw_dataset,
    },
)
Run tests & test suites
Configure dataset features

© Copyright 2025 ValidMind Inc. All Rights Reserved.

  • Edit this page
  • Report an issue
Cookie Preferences
  • validmind.com

  • Privacy Policy

  • Terms of Use