SageMaker Checkpointing

Purpose Save the state of ML during training. Usage save the snapshot of the model. restart a training job from this snapshot. analyze the intermediate state of the model during training. Use checkpoints with managed spot instances to save cost. How does it work? Training code runs on the training containers on EC2 instances. Uses […]