Add context to LLM-generated test descriptions
Learn how to add custom context to LLM-generated test descriptions
EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use
Unified version 25.01
January 31, 2025
This release includes our new unified versioning scheme for our software, support for thresholds in unit metrics and custom context for test descriptions within the ValidMind Library, and many more enhancements.
25.01
Our documentation now follows the new unified versioning scheme for our software, starting with this 25.01
release. Included in this release are:
v2.7.7
v1.29.10
We manage multiple repositories, each with its own version tags. The new versioning scheme replaces the ValidMind Library version in the documentation to clarify that each release includes code from multiple repositories rather than a single source.
This change simplifies tracking changes for each ValidMind release and streamlines version management for you. Release frequency and the upgrade process remain unchanged.
When logging metrics using log_metric()
, you can now include a thresholds
dictionary. For example, use thresholds={"target": 0.8, "minimum": 0.6}
to define multiple reference levels.
These thresholds automatically appear as horizontal reference lines when you add a Metric Over Time block to the documentation.
The visualization uses a distinct color palette to differentiate between thresholds. It displays only the most recent threshold configuration and includes threshold information in both the chart legend and data table.
This enhancement provides immediate visual context for metric values. It helps track metric performance against multiple defined thresholds over time.
You can now include contextual information to enhance LLM-based generation of test results descriptions and interpretations. This enhancement improves test result descriptions by incorporating additional context that can be specified through environment variables.
A new notebook demonstrates adding context to LLM-based descriptions with examples of:
We’ve introduced enhancements to the ValidMind Library that focus on documenting credit risk scorecard models:
New notebooks: Learn how to document application scorecard models using the library. These notebooks provide a step-by-step guide for loading a demo dataset, preprocessing data, training models, and documenting the model.
You can choose from three different approaches: running individual tests, running a full test suite, or using a single function to document a model.
New tests:
MutualInformation
: Evaluates feature relevance by calculating mutual information scores between features and the target variable.ScoreBandDefaultRates
: Analyzes default rates and population distribution across credit score bands.CalibrationCurve
: Assesses calibration by comparing predicted probabilities against observed frequencies.ClassifierThresholdOptimization
: Visualizes threshold optimization methods for binary classification models.ModelParameters
: Extracts and displays model parameters for transparency and reproducibility.ScoreProbabilityAlignment
: Evaluates alignment between credit scores and predicted probabilities.Modifications have also been made to existing tests to improve functionality and accuracy. The TooManyZeroValues
test now includes a row count and applies a percentage threshold for zero values.
The split
function in lending_club.py
has been enhanced to support an optional validation set, allowing for more flexible dataset splitting.
A new utility function, get_demo_test_config
, has been added to generate a default test configuration for demo purposes.
Several enhancements to the ValidMind Library focus on ongoing monitoring capabilities:
New notebook: Learn how to use ongoing monitoring with credit risk datasets in this step-by-step guide for the ValidMind Library.
Custom tests: Define and run your own tests using the library:
ScoreBandDiscriminationMetrics.py
: Evaluates discrimination metrics across different score bands.New tests:
CalibrationCurveDrift
: Evaluates changes in probability calibration.ClassDiscriminationDrift
: Compares classification discrimination metrics.ClassImbalanceDrift
: Evaluates drift in class distribution.ClassificationAccuracyDrift
: Compares classification accuracy metrics.ConfusionMatrixDrift
: Compares confusion matrix metrics.CumulativePredictionProbabilitiesDrift
: Compares cumulative prediction probability distributions.FeatureDrift
: Evaluates changes in feature distribution.PredictionAcrossEachFeature
: Assesses prediction distributions across features.PredictionCorrelation
: Assesses correlation changes between predictions and features.PredictionProbabilitiesHistogramDrift
: Compares prediction probability distributions.PredictionQuantilesAcrossFeatures
: Assesses prediction distributions across features using quantiles.ROCCurveDrift
: Compares ROC curves.ScoreBandsDrift
: Analyzes drift in score bands.ScorecardHistogramDrift
: Compares score distributions.TargetPredictionDistributionPlot
: Assesses differences in prediction distributions.We also improved dataset loading, preprocessing, and feature engineering functions with verbosity control for cleaner output.
Want to create your own code samples using ValidMind’s? We’ve now made it easier for contributors to submit custom code samples.
Our end-to-end notebook template generation notebook will generate a new file with all the bits and pieces of a standard ValidMind notebook to get you started.
The same functionality is also accessible from our Makefile:
The template generation notebook draws from a number of mini-templates, should you need to revise them or grab the information from them manually:
about-validmind.ipynb
: Conceptual overview of ValidMind & prerequisites.install-initialize-validmind.ipynb
: ValidMind Library installation & initialization instructions.next-steps.ipynb
: Directions to review the generated documentation within the ValidMind Platform & additional learning resources.upgrade-validmind.ipynb
: Instructions for comparing & upgrading versions of the ValidMind Library.We’ve streamlined dashboard configuration with dedicated view and edit modes. Click Edit Mode to make changes, then click Done Editing to save and return to view mode:
To prevent any confusion when multiple people are working on the same dashboard, we’ve added some helpful safeguards:
Risk assessment generation has been enhanced to allow you to provide an optional prompt before starting text generation. This feature lets you guide the output, ensuring that the generated text aligns more closely with your specific requirements.
The TestResult
class now exposes pre-populated test descriptions through the doc
property, separating them from dynamically generated GenAI descriptions:
result.doc
— contains the original docstring of the test.result.description
— contains the dynamically generated description.This enhancement makes it easier to distinguish between ValidMind’s standard test documentation and the dynamic, context-aware descriptions generated for your specific test results.
You can browse the full catalog of official test descriptions in our test documentation:
We added raw data storage across all ValidMind Library tests. Every test now returns a RawData
object, allowing post-processing functions to recreate any test output. This feature enhances flexibility and customizability.
print_env
functionWe’ve added a new diagnostic print_env()
utility function that displays comprehensive information about your running environment. This function is particularly useful when:
This function outputs key details, such as Python version, installed package versions, and relevant environment variables, making it easier to diagnose issues and share your setup with others.
Workflows are now easier to read when zoomed out, helped by a larger modal window and simplified nodes:
Zooming in reveals more details:
Hovering over a node highlights all in
and out
connections, making relationships clearer:
We replaced the plugin for the editor of mathematical equations and formulas. The new plugin provides an improved interface for adding and editing LaTeX expressions in your documentation.
The new editor also includes a real-time preview and common mathematical symbols for easier equation creation.
To access the latest version of the ValidMind Platform,2 hard refresh your browser tab:
Ctrl
+ Shift
+ R
OR Ctrl
+ F5
⌘ Cmd
+ Shift
+ R
OR hold down ⌘ Cmd
and click the Reload
buttonTo upgrade the ValidMind Library:3
In your Jupyter Notebook:
Then within a code cell or your terminal, run:
You may need to restart your kernel after running the upgrade package for changes to be applied.