Model Monitoring

Keith VanderLinden
Calvin University

ML System Failures

ML systems fail if:

  • The software system fails to operate as expected.
  • The ML system fails to perform as expected.

The first criterion is shared by all software systems; the second is unique to ML systems.

Data Distribution Shifts

We distinguish these distributions:

  • Source
  • Target

Rarely are these distributions:

  • Identical
  • Stationary

Detecting and addressing distribution shifts are crucial for maintaining ML system performance.

Monitoring ML Systems

If our systems are observable, we can monitor key metrics on:

  • Raw inputs
  • Features
  • Predictions
  • Accuracy

The latter metrics are easier to monitor and to interpret.

Monitoring Tools

There are three main types of monitoring tools:

  • Logs
  • Dashboards
  • Alerts

DMLS focuses on monitoring from the user’s perspective.

Continual Learning

Continual Learning establishes an infrastructure for retraining models in production. We distinguish:

  • Stateful retraining
  • Stateless retraining

It can help address data distribution shifts and the cold start problem, but is challenged by data collection and model management.

Test in Production

Test in Production is a practice of proactively testing models in production. We distinguish between:

  • Shadow deployment
  • A/B testing
  • Canary testing

It’s commonly used in large ML systems.