The Use Case
Error handling is something that I have personally been trying to incorporate more into my own workflow. But error handling in data science is much different than that of software engineering. Evidently makes this easy!
I recently came across this open-source package that I think is fantastic for using in ML projects. Evidently is a tool that generates metrics and reports for your data, model, and predictions. The key components are metrics, tests, and reports. Each component can be customized to fit your needs.
Core Concepts
(from documentation)
A Metric is a core component of Evidently. You can combine multiple Metrics in a Report. Reports are best for visual analysis and debugging of your models and data.
A Test is a metric with a condition. Each test returns a pass or fail result. You can combine multiple Tests in a Test Suite. Test Suites are best for automated model checks as part of an ML pipeline.
A Report is a combination of different Metrics that evaluate data or ML model quality.
You can list multiple Tests and execute them together in a Test Suite. Test Suites are best for automation. Use them when you can set up expectations upfront (or derive them from the reference dataset).
All of these objects produce JSON artifacts that can be converted to html, python dictionaries, or visual renders within a notebook.
Example python dictionary output
Getting Started
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset
# Predefined sets of tests
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset
from evidently.report import Report
from evidently.metrics import DatasetDriftMetric, DataDriftTable
# Customizable reports, display in notebook
data_drift_dataset_report = Report(metrics=[
DatasetDriftMetric(),
DataDriftTable(),
])
data_drift_dataset_report.run(reference_data=reference, current_data=current)
data_drift_dataset_report
# Example output
data_drift_dataset_report.as_dict()
# Customize dictionary output
data_drift_dataset_report.as_dict(include={
"DataDriftTable": {
"drift_by_columns":{
"target":{
"column_name", "column_type", "drift_score"
}}}})
# HTML
report.save_html("file.html")
Docs: EvidentlyAI
Github: Evidently Github