The Use Case

Error handling is something that I have personally been trying to incorporate more into my own workflow. But error handling in data science is much different than that of software engineering. Evidently makes this easy!

I recently came across this open-source package that I think is fantastic for using in ML projects. Evidently is a tool that generates metrics and reports for your data, model, and predictions. The key components are metrics, tests, and reports. Each component can be customized to fit your needs.

Core Concepts

(from documentation)

A Metric is a core component of Evidently. You can combine multiple Metrics in a Report. Reports are best for visual analysis and debugging of your models and data.

A Test is a metric with a condition. Each test returns a pass or fail result. You can combine multiple Tests in a Test Suite. Test Suites are best for automated model checks as part of an ML pipeline.

A Report is a combination of different Metrics that evaluate data or ML model quality.

You can list multiple Tests and execute them together in a Test Suite. Test Suites are best for automation. Use them when you can set up expectations upfront (or derive them from the reference dataset).

All of these objects produce JSON artifacts that can be converted to html, python dictionaries, or visual renders within a notebook.

Evidently Example 1

Evidently Example 2

Example python dictionary output

Evidently Example 3

Getting Started

from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset

# Predefined sets of tests
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset

from evidently.report import Report
from evidently.metrics import DatasetDriftMetric, DataDriftTable

# Customizable reports, display in notebook
data_drift_dataset_report = Report(metrics=[
    DatasetDriftMetric(),
    DataDriftTable(),    
])
data_drift_dataset_report.run(reference_data=reference, current_data=current)
data_drift_dataset_report

# Example output
data_drift_dataset_report.as_dict()

# Customize dictionary output
data_drift_dataset_report.as_dict(include={
    "DataDriftTable": {
        "drift_by_columns":{
            "target":{
                "column_name", "column_type", "drift_score"
            }}}})

# HTML
report.save_html("file.html")

Docs: EvidentlyAI

Github: Evidently Github

Evidently: The Ultimate Tool for Streamlining Your Machine Learning Workflow ⚡

Intro to the Evidently open-source framework

The Use Case

Core Concepts

Example python dictionary output

Getting Started