Read the announcement
Interested in our LLM support? Large language model support and more will be available in our closed beta.
EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use
September 27, 2023
In this release, we’ve added support for large language models (LLMs) to enhance the capabilities of the ValidMind Library in preparation for the closed beta,1 along with a number of new demo notebooks that you can try out.
Other enhancements provide improvements for the developer experience and with our documentation site.
We added initial support for large language models (LLMs) in ValidMind via the new FoundationModel
class.
FoundationModel
and specify predict_fn
and a prompt
, and pass that into any test suite, for example.predict_fn
must be defined by the user and implements the logic for calling the Foundation LLM, usually via the ValidMind Library Python API.To demonstrate the capabilities of LLM support, this release also includes new demo notebooks:
Prompt validation demo notebook for LLMs
As a proof of concept, we added initial native prompt validation tests to the library, including a notebook and simple template to test out these metrics on a sentiment analysis LLM model we built.
Text summarization model demo notebook for LLMs
We added a new notebook in the library that includes the financial news dataset, initializes a Hugging Face summarization model using the init_model
interface, implements relevant metrics for testing, and demonstrates how to run a text summarization metrics test suite for an LLM instructed as a financial news summarizer.
ValidMind can now validate pre-trained models from the HuggingFace Hub, including any language model compatible with the HF transformers API.
To illustrate this new feature, we have included a financial news sentiment analysis demo that runs documentation tests for a Hugging Face model with text classification using the financial_phrasebank
:2
run_test()
We added a new run_test()
helper function that streamlines running tests for you. This function allows executing any individual test independent of a test suite or a documentation template. A one-line command can execute a test, making it easier to run tests with various parameters and options.
We also updated the QuickStart notebook to have a consistent experience.
This notebook:
vm.preview_template()
after initializing ValidMindvm.run_documentation_tests()
instead of running a test suite that is not connected to the templaterun_test
Discover existing tests by calling list_tests()
or describe_test()
:
list_tests()
:
describe_test()
:
View the tests associated with a documentation template by running preview_template()
:
Using the test ID, run a given test and pass in additional configuration parameters and inputs:
We made a number of changes to tests to improve the developer experience:
fail_fast
argument can be passed to run_test_plan()
, run_test_suite()
and run_documentation_tests()
, used to fail and raise an exception on the first error encountered. This change is useful for debugging.ClassifierPerformance
test now determines if you are testing a binary or a multi-class model. When testing a multi-class model, we now report additional per-class, macro and weighted average tests.accuracy
, F1
, precision
, recall
, and roc_auc
score.metadata
property to every ValidMind test class.metadata
property includes a task_types
field and a tags
field which both serve to categorize the tests based on what data and model types they work with, what category of test they fall into, and more.We added a new search feature to the validmind.tests.list_tests
function to allow for better test discoverability.
The list_tests
function in the tests
module now supports the following arguments:
filter
: If set, will match tests by ID, task_types or tags using a combination of substring and fuzzy string matching. Defaults to None
.task
: If set, will further narrow matching tests (assuming filter
has been passed) by exact matching the task
to the test’s task_type
metadata. Defaults to None
.tags
: If a list is passed, will again narrow the matched tests by exact matching on tags. Defaults to None
.We enhanced the architecture and content of our external docs site to make the user journey more efficient for model developers and model validators who are new to our products:
We made a number of incremental improvements to our user guide:
To access the latest version of the ValidMind Platform,3 hard refresh your browser tab:
Ctrl
+ Shift
+ R
OR Ctrl
+ F5
⌘ Cmd
+ Shift
+ R
OR hold down ⌘ Cmd
and click the Reload
buttonTo upgrade the ValidMind Library:4
In your Jupyter Notebook:
Then within a code cell or your terminal, run:
You may need to restart your kernel after running the upgrade package for changes to be applied.