Modeling is an iterative process in which we experiment with different models and model configurations. Best practices:
Simple is better than complex.
People are biased.
Time changes things.
There are no perfect solutions.
All models are wrong.
Experiment Tracking and Versioning
Success in modeling often depends on experimentation with:
Datasets and Preprocessing
Model Architectures & Hyperparameters
All leading to different:
Evaluation results
Memory and compute requirements
We must be able to compare these experiments (tracking) and to reproduce them (versioning).
Distributed Training
In production, the simplifying assumptions we make in academic work are generally not true.
Datasets can’t be fit into main memory.
Models can’t be trained in a single machine.
“Big data” is when your workflow breaks. — R. Pruim, MDSR2e
AutoML
Much of the success of machine learning has been due to the deployment of relatively simple approaches trained on relatively voluminous datasets and powerful compute engines.
AutoML applies this approach to:
Hyper-parameter tuning
Architecture search
Phases of ML Adoption
Huyen presents these phrases of adoption for ML.
Before ML
Simple models
Optimizing
Complex models
I found these to be a helpful tonic against ML mania.