[ML Design Patterns] Model Training Patterns

[ML Design Patterns] Model Training Patterns

Design Pattern 1: Useful Overfitting

Overfitting is useful when:

Entire input space can be tabulated (you have all the possible examples) and there is no noise (Labels are accurate for all instances)
Knowledge Distillation from Larger ML model into smaller ML model

Points:

The best fitting model is a "large" model that has been properly "regularized"
A complex enough model should be able to overfit on a small enough batch of data, provided everything is setup correctly. If not, something wrong with the setup.

Design Pattern 2: Checkpoints

Checkpointing: Saving full model state (entire internal state) so that model training can be resumed from a point. Example of model's state:

Dropout
Learning rate, if the model uses scheduler
History of previous inputs, in case of RNNs

Exporting model only exports information necessary to create prediction function (Eg: weights and biases for linear model) whereas checkpointing saves entire state
Tensorflow and Keras automatically resume training from checkpoint if checkpoints are found in the training path.
This is not available in Pytorch. This is how its done in pytorch:

Uses of checkpointing:

Resilience: Robustness against machine failure
Generalization: Early Stoppping
Tuneability: Fine tuning from a particular point (particular checkpoint)

Comments