Model Monitoring
Keith VanderLinden
Calvin University
ML System Failures
ML systems fail if:
- The software system fails to operate as expected.
- The ML system fails to perform as expected.
The first criterion is shared by all software systems; the second is unique to ML systems.
Data Distribution Shifts
We distinguish these distributions:
Rarely are these distributions:
Detecting and addressing distribution shifts are crucial for maintaining ML system performance.
Monitoring ML Systems
If our systems are observable, we can monitor key metrics on:
- Raw inputs
- Features
- Predictions
- Accuracy
The latter metrics are easier to monitor and to interpret.
Continual Learning
Continual Learning establishes an infrastructure for retraining models in production. We distinguish:
- Stateful retraining
- Stateless retraining
It can help address data distribution shifts and the cold start problem, but is challenged by data collection and model management.
Test in Production
Test in Production is a practice of proactively testing models in production. We distinguish between:
- Shadow deployment
- A/B testing
- Canary testing
It’s commonly used in large ML systems.