• About
  • Get Started
  • Guides
  • ValidMind Library
    • ValidMind Library
    • Supported Models
    • QuickStart Notebook

    • TESTING
    • Run Tests & Test Suites
    • Test Descriptions
    • Test Sandbox (BETA)

    • CODE SAMPLES
    • All Code Samples · LLM · NLP · Time Series · Etc.
    • Download Code Samples · notebooks.zip
    • Try it on JupyterHub

    • REFERENCE
    • ValidMind Library Python API
  • Support
  • Training
  • Releases
  • Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • Python Library
    • ValidMind Library

    • ValidMind Academy
    • Training Courses
  • Log In
    • Public Internet
    • ValidMind Platform · US1
    • ValidMind Platform · CA1

    • Private Link
    • Virtual Private ValidMind (VPV)

    • Which login should I use?
  1. Run tests & test suites
  2. Understand and utilize RawData in ValidMind tests

EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use

  • ValidMind Library
  • Supported models

  • QuickStart
  • Quickstart for model documentation
  • Install and initialize ValidMind Library
  • Store model credentials in .env files

  • Model Development
  • 1 — Set up ValidMind Library
  • 2 — Start model development process
  • 3 — Integrate custom tests
  • 4 — Finalize testing & documentation

  • Model Validation
  • 1 — Set up ValidMind Library for validation
  • 2 — Start model validation process
  • 3 — Developing a challenger model
  • 4 — Finalize validation & reporting

  • Model Testing
  • Run tests & test suites
    • Add context to LLM-generated test descriptions
    • Configure dataset features
    • Document multiple results for the same test
    • Explore test suites
    • Explore tests
    • Dataset Column Filters when Running Tests
    • Load dataset predictions
    • Log metrics over time
    • Run individual documentation sections
    • Run documentation tests with custom configurations
    • Run tests with multiple datasets
    • Intro to Unit Metrics
    • Understand and utilize RawData in ValidMind tests
    • Introduction to ValidMind Dataset and Model Objects
    • Run Tests
      • Run dataset based tests
      • Run comparison tests
  • Test descriptions
    • Data Validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LaggedCorrelationHeatmap
      • LJungBox
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • Nlp
        • CommonWords
        • Hashtags
        • LanguageDetection
        • Mentions
        • PolarityAndSubjectivity
        • Punctuations
        • Sentiment
        • StopWords
        • TextDescription
        • Toxicity
    • Model Validation
      • BertScore
      • BleuScore
      • ClusterSizeDistribution
      • ContextualRecall
      • FeaturesAUC
      • MeteorScore
      • ModelMetadata
      • ModelPredictionResiduals
      • RegardScore
      • RegressionResidualsPlot
      • RougeScore
      • TimeSeriesPredictionsPlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • Embeddings
        • ClusterDistribution
        • CosineSimilarityComparison
        • CosineSimilarityDistribution
        • CosineSimilarityHeatmap
        • DescriptiveAnalytics
        • EmbeddingsVisualization2D
        • EuclideanDistanceComparison
        • EuclideanDistanceHeatmap
        • PCAComponentsPairwisePlots
        • StabilityAnalysisKeyword
        • StabilityAnalysisRandomNoise
        • StabilityAnalysisSynonyms
        • StabilityAnalysisTranslation
        • TSNEComponentsPairwisePlots
      • Ragas
        • AnswerCorrectness
        • AspectCritic
        • ContextEntityRecall
        • ContextPrecision
        • ContextPrecisionWithoutReference
        • ContextRecall
        • Faithfulness
        • NoiseSensitivity
        • ResponseRelevancy
        • SemanticSimilarity
      • Sklearn
        • AdjustedMutualInformation
        • AdjustedRandIndex
        • CalibrationCurve
        • ClassifierPerformance
        • ClassifierThresholdOptimization
        • ClusterCosineSimilarity
        • ClusterPerformanceMetrics
        • CompletenessScore
        • ConfusionMatrix
        • FeatureImportance
        • FowlkesMallowsScore
        • HomogeneityScore
        • HyperParametersTuning
        • KMeansClustersOptimization
        • MinimumAccuracy
        • MinimumF1Score
        • MinimumROCAUCScore
        • ModelParameters
        • ModelsPerformanceComparison
        • OverfitDiagnosis
        • PermutationFeatureImportance
        • PopulationStabilityIndex
        • PrecisionRecallCurve
        • RegressionErrors
        • RegressionErrorsComparison
        • RegressionPerformance
        • RegressionR2Square
        • RegressionR2SquareComparison
        • RobustnessDiagnosis
        • ROCCurve
        • ScoreProbabilityAlignment
        • SHAPGlobalImportance
        • SilhouettePlot
        • TrainingTestDegradation
        • VMeasure
        • WeakspotsDiagnosis
      • Statsmodels
        • AutoARIMA
        • CumulativePredictionProbabilities
        • DurbinWatsonTest
        • GINITable
        • KolmogorovSmirnov
        • Lilliefors
        • PredictionProbabilitiesHistogram
        • RegressionCoeffs
        • RegressionFeatureSignificance
        • RegressionModelForecastPlot
        • RegressionModelForecastPlotLevels
        • RegressionModelSensitivityPlot
        • RegressionModelSummary
        • RegressionPermutationFeatureImportance
        • ScorecardHistogram
    • Ongoing Monitoring
      • CalibrationCurveDrift
      • ClassDiscriminationDrift
      • ClassificationAccuracyDrift
      • ClassImbalanceDrift
      • ConfusionMatrixDrift
      • CumulativePredictionProbabilitiesDrift
      • FeatureDrift
      • PredictionAcrossEachFeature
      • PredictionCorrelation
      • PredictionProbabilitiesHistogramDrift
      • PredictionQuantilesAcrossFeatures
      • ROCCurveDrift
      • ScoreBandsDrift
      • ScorecardHistogramDrift
      • TargetPredictionDistributionPlot
    • Prompt Validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
  • Test sandbox beta

  • Notebooks
  • Code samples
    • Capital Markets
      • Quickstart for knockout option pricing model documentation
      • Quickstart for Heston option pricing model using QuantLib
    • Credit Risk
      • Document an application scorecard model
      • Document an application scorecard model
      • Document an application scorecard model
      • Document a credit risk model
      • Document an application scorecard model
    • Custom Tests
      • Implement custom tests
      • Integrate external test providers
    • Model Validation
      • Validate an application scorecard model
    • Nlp and Llm
      • Sentiment analysis of financial data using a large language model (LLM)
      • Summarization of financial data using a large language model (LLM)
      • Sentiment analysis of financial data using Hugging Face NLP models
      • Summarization of financial data using Hugging Face NLP models
      • Automate news summarization using LLMs
      • Prompt validation for large language models (LLMs)
      • RAG Model Benchmarking Demo
      • RAG Model Documentation Demo
    • Ongoing Monitoring
      • Ongoing Monitoring for Application Scorecard
      • Quickstart for ongoing monitoring of models with ValidMind
    • Regression
      • Document a California Housing Price Prediction regression model
    • Time Series
      • Document a time series forecasting model
      • Document a time series forecasting model

  • Reference
  • ValidMind Library Python API

On this page

  • Setup
    • Installation and intialization
    • Load the sample dataset
    • Initialize the ValidMind objects
  • RawData usage examples
    • Using RawData from the ROC Curve Test
    • Pearson Correlation Matrix
    • Precision-Recall Curve
    • Using RawData in custom tests
    • Using RawData in comparison tests
  • Edit this page
  • Report an issue
  1. Run tests & test suites
  2. Understand and utilize RawData in ValidMind tests

Understand and utilize RawData in ValidMind tests

Test functions in ValidMind can return a special object called RawData, which holds intermediate or unprocessed data produced somewhere in the test logic but not returned as part of the test's visible output, such as in tables or figures.

  • The RawData feature allows you to customize the output of tests, making it a powerful tool for creating custom tests and post-processing functions.
  • RawData is useful when running post-processing functions with tests to recompute tabular outputs, redraw figures, or even create new outputs entirely.

In this notebook, you'll learn how to access, inspect, and utilize RawData from ValidMind tests.

Setup

Before we can run our examples, we'll need to set the stage to enable running tests with the ValidMind Library. Since the focus of this notebook is on the RawData object, this section will merely summarize the steps instead of going into greater detail.

To learn more about running tests with ValidMind: Run tests and test suites

Installation and intialization

First, let's make sure that the ValidMind Library is installed and ready to go, and our Python environment set up for data analysis:

# Install the ValidMind Library
%pip install -q validmind

# Initialize the ValidMind Library
import validmind as vm

# Import the `xgboost` library with an alias
import xgboost as xgb

Load the sample dataset

Then, we'll import a sample ValidMind dataset and preprocess it:

# Import the `customer_churn` sample dataset
from validmind.datasets.classification import customer_churn
raw_df = customer_churn.load_data()

# Preprocess the raw dataset
train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

# Separate features and targets
x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

# Create an `XGBClassifier` object
model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)

# Train the model using the validation set
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

Initialize the ValidMind objects

Before you can run tests, you'll need to initialize a ValidMind dataset object, as well as a ValidMind model object that can be passed to other functions for analysis and tests on the data:

# Initialize the dataset object
vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=customer_churn.target_column,
    class_labels=customer_churn.class_labels,
    __log=False,
)

# Initialize the datasets into their own dataset objects
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=customer_churn.target_column,
    __log=False,
)
vm_test_ds = vm.init_dataset(
    dataset=test_df,
    input_id="test_dataset",
    target_column=customer_churn.target_column,
    __log=False,
)

# Initialize a model object
vm_model = vm.init_model(
    model,
    input_id="model",
    __log=False,
)

# Assign predictions to the datasets
vm_train_ds.assign_predictions(
    model=vm_model,
)

vm_test_ds.assign_predictions(
    model=vm_model,
)

RawData usage examples

Once you're set up to run tests, you can then try out the following examples:

  • Using RawData from the ROC Curve Test
  • Pearson Correlation Matrix
  • Precision-Recall Curve
  • Using RawData in custom tests
  • Using RawData in comparison tests

Using RawData from the ROC Curve Test

In this introductory example, we run the ROC Curve test, inspect its RawData output, and then create a custom ROC curve using the raw data values.

First, let's run the default ROC Curve test for comparsion with later iterations:

from validmind.tests import run_test

# Run the ROC Curve test normally
result_roc = run_test(
    "validmind.model_validation.sklearn.ROCCurve",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    generate_description=False,
)

Now let's assume we want to create a custom version of the above figure. First, let's inspect the raw data that this test produces so we can see what we have to work with.

RawData objects have a inspect() method that will pretty print the attributes of the object to be able to quickly see the data and its types:

# Inspect the RawData output from the ROC test
print("RawData from ROC Curve Test:")
result_roc.raw_data.inspect()

As we can see, the ROC Curve returns a RawData object with the following attributes: - fpr: A list of false positive rates - tpr: A list of true positive rates - auc: The area under the curve

This should be enough to create our own custom ROC curve via a post-processing function without having to create a whole new test from scratch and without having to recompute any of the data:

import matplotlib.pyplot as plt

from validmind.vm_models.result import TestResult


def custom_roc_curve(result: TestResult):
    # Extract raw data from the test result
    fpr = result.raw_data.fpr
    tpr = result.raw_data.tpr
    auc = result.raw_data.auc

    # Create a custom ROC curve plot
    fig = plt.figure()
    plt.plot(fpr, tpr, label=f"Custom ROC (AUC = {auc:.2f})", color="blue")
    plt.plot([0, 1], [0, 1], linestyle="--", color="gray", label="Random Guess")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.title("Custom ROC Curve from RawData")
    plt.legend()

    # close the plot to avoid it automatically being shown in the notebook
    plt.close()

    # remove existing figure
    result.remove_figure(0)

    # add new figure
    result.add_figure(fig)

    return result

# test it on the existing result
modified_result = custom_roc_curve(result_roc)

# show the modified result
modified_result.show()

Now that we have created a post-processing function and verified that it works on our existing test result, we can use it directly in run_test() from now on:

result = run_test(
    "validmind.model_validation.sklearn.ROCCurve",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    post_process_fn=custom_roc_curve,
    generate_description=False,
)

Pearson Correlation Matrix

In this next example, try commenting out the post_process_fn argument in the following cell and see what happens between different runs:

import plotly.graph_objects as go


def custom_heatmap(result: TestResult):
    corr_matrix = result.raw_data.correlation_matrix

    heatmap = go.Heatmap(
        z=corr_matrix.values,
        x=list(corr_matrix.columns),
        y=list(corr_matrix.index),
        colorscale="Viridis",
    )
    fig = go.Figure(data=[heatmap])
    fig.update_layout(title="Custom Heatmap from RawData")

    plt.close()

    result.remove_figure(0)
    result.add_figure(fig)

    return result


result_corr = run_test(
    "validmind.data_validation.PearsonCorrelationMatrix",
    inputs={"dataset": vm_test_ds},
    generate_description=False,
    # COMMENT OUT `post_process_fn`
    post_process_fn=custom_heatmap,
)

Precision-Recall Curve

Then, let's try the same thing with the Precision-Recall Curve test:

def custom_pr_curve(result: TestResult):
    precision = result.raw_data.precision
    recall = result.raw_data.recall

    fig = plt.figure()
    plt.plot(recall, precision, label="Precision-Recall Curve")
    plt.xlabel("Recall")
    plt.ylabel("Precision")
    plt.title("Custom Precision-Recall Curve from RawData")
    plt.legend()

    plt.close()
    result.remove_figure(0)
    result.add_figure(fig)

    return result

result_pr = run_test(
    "validmind.model_validation.sklearn.PrecisionRecallCurve",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    generate_description=False,
    # COMMENT OUT `post_process_fn`
    post_process_fn=custom_pr_curve,
)

Using RawData in custom tests

These examples demonstrate some very simple ways to use the RawData feature of ValidMind tests. The majority of ValidMind-developed tests return some form of raw data that can be used to customize the output of the test, but you can also create your own tests that return RawData objects and use them in the same way.

Let's take a look at how this can be done in custom tests. To start, define and run your custom test:

import pandas as pd

from validmind import test, RawData
from validmind.vm_models import VMDataset, VMModel


@test("custom.MyCustomTest")
def MyCustomTest(dataset: VMDataset, model: VMModel) -> tuple[go.Figure, RawData]:
    """Custom test that produces a figure and a RawData object"""
    # pretend we are using the dataset and model to compute some data
    # ...

    # create some fake data that will be used to generate a figure
    data = pd.DataFrame({"x": [10, 20, 30, 40, 50], "y": [10, 20, 30, 40, 50]})

    # create the figure (scatter plot)
    fig = go.Figure(data=go.Scatter(x=data["x"], y=data["y"]))

    # now let's create a RawData object that holds the "computed" data
    raw_data = RawData(scatter_data_df=data)

    # finally, return both the figure and the raw data
    return fig, raw_data


my_result = run_test(
    "custom.MyCustomTest",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    generate_description=False,
)

We can see that the test result shows the figure. But since we returned a RawData object, we can also inspect the contents and see how we could use it to customize or regenerate the figure in the post-processing function:

my_result.raw_data.inspect()

We can see that we get a nicely-formatted preview of the dataframe we stored in the raw data object. Let's go ahead and use it to re-plot our data:

def custom_plot(result: TestResult):
    data = result.raw_data.scatter_data_df

    # use something other than a scatter plot
    fig = go.Figure(data=go.Bar(x=data["x"], y=data["y"]))
    fig.update_layout(title="Custom Bar Chart from RawData")
    fig.update_xaxes(title="X Axis")
    fig.update_yaxes(title="Y Axis")

    result.remove_figure(0)
    result.add_figure(fig)

    return result

result = run_test(
    "custom.MyCustomTest",
    inputs={"dataset": vm_test_ds, "model": vm_model},
    post_process_fn=custom_plot,
    generate_description=False,
)

Using RawData in comparison tests

When running comparison tests, the RawData object will contain the raw data for each individual test result as well as the comparison results between the test results. To support this, the RawData object contains the model and dataset input_ids for each of the datasets and models in the test, so that the post-processing function can use them to customize the output. The example below shows how to use the RawData object to customize the output of a comparison test and add a table to the test result that shows the confusion matrix for each individual test result as well as the comparison results between the test results.

When designing post-processing functions that need to handle both individual and comparison test results, you can check the structure of the raw data to determine which case you're dealing with. In the example below, we check if confusion_matrix is a list (comparison test with multiple matrices) or a single matrix (individual test). For comparison tests, the function creates two tables: one showing the confusion matrices for each test case, and another showing the percentage drift between them. For individual tests, it creates a single table with the confusion matrix values. This pattern of checking the raw data structure can be applied to other tests to create versatile post-processing functions that work in both scenarios.

def cm_table(result: TestResult):
    # For individual results
    if not isinstance(result.raw_data.confusion_matrix, list):
        # Extract values from single confusion matrix
        cm = result.raw_data.confusion_matrix
        tn, fp = cm[0, 0], cm[0, 1]
        fn, tp = cm[1, 0], cm[1, 1]
        
        # Create DataFrame for individual matrix
        cm_df = pd.DataFrame({
            'TN': [tn],
            'FP': [fp],
            'FN': [fn],
            'TP': [tp]
        })
        
        # Add individual table
        result.add_table(cm_df, title="Confusion Matrix")
        
    # For comparison results
    else:
        cms = result.raw_data.confusion_matrix
        cm1, cm2 = cms[0], cms[1]
        
        # Create individual results table
        rows = []
        for i, cm in enumerate(cms):
            rows.append({
                'dataset': result.raw_data.dataset[i],
                'model': result.raw_data.model[i],
                'TN': cm[0, 0],
                'FP': cm[0, 1],
                'FN': cm[1, 0],
                'TP': cm[1, 1]
            })
        individual_df = pd.DataFrame(rows)
        
        # Calculate percentage differences
        diff_df = pd.DataFrame({
            'TN_drift (%)': [(cm2[0, 0] - cm1[0, 0]) / cm1[0, 0] * 100],
            'FP_drift (%)': [(cm2[0, 1] - cm1[0, 1]) / cm1[0, 1] * 100],
            'FN_drift (%)': [(cm2[1, 0] - cm1[1, 0]) / cm1[1, 0] * 100],
            'TP_drift (%)': [(cm2[1, 1] - cm1[1, 1]) / cm1[1, 1] * 100]
        }).round(2)
        
        # Add both tables
        result.add_table(individual_df, title="Individual Confusion Matrices")
        result.add_table(diff_df, title="Confusion Matrix Drift")
        
    return result

Let's first run the confusion matrix test on a single dataset-model pair to see how our post-processing function handles individual results:

from validmind.tests import run_test

result_cm = run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix",
    inputs={
        "dataset": vm_test_ds,
        "model": vm_model,
    },
    post_process_fn=cm_table,
    generate_description=False,
)

Now let's run a comparison test between test and train datasets to see how the function handles multiple results:

result_cm = run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix",
    input_grid={
        "dataset": [vm_test_ds, vm_train_ds],
        "model": [vm_model]
    },
    post_process_fn=cm_table,
    generate_description=False,
)

Let's inspect the raw data to see how comparison tests structure their data - notice how the RawData object contains not just the confusion matrices for both datasets, but also tracks which dataset and model each result came from:

result_cm.raw_data.inspect()
Intro to Unit Metrics
Introduction to ValidMind Dataset and Model Objects

© Copyright 2025 ValidMind Inc. All Rights Reserved.

  • Edit this page
  • Report an issue
Cookie Preferences
  • validmind.com

  • Privacy Policy

  • Terms of Use