Design Pattern 1: Useful Overfitting
Overfitting is useful when:
- Entire input space can be tabulated (you have all the possible examples) and there is no noise (Labels are accurate for all instances)
- Knowledge Distillation from Larger ML model into smaller ML model
- The best fitting model is a "large" model that has been properly "regularized"
- A complex enough model should be able to overfit on a small enough batch of data, provided everything is setup correctly. If not, something wrong with the setup.
Design Pattern 2: Checkpoints
Checkpointing: Saving full model state (entire internal state) so that model training can be resumed from a point. Example of model's state:
- Dropout
- Learning rate, if the model uses scheduler
- History of previous inputs, in case of RNNs
- Exporting model only exports information necessary to create prediction function (Eg: weights and biases for linear model) whereas checkpointing saves entire state
- Tensorflow and Keras automatically resume training from checkpoint if checkpoints are found in the training path.
- This is not available in Pytorch. This is how its done in pytorch:
Uses of checkpointing:
- Resilience: Robustness against machine failure
- Generalization: Early Stoppping
- Tuneability: Fine tuning from a particular point (particular checkpoint)
Comments
Post a Comment