Progressive Image Resizing

Applicable Settings: Vision, Increased GPU Throughput, Reduced GPU Memory Usage, Method, Curriculum, Speedup

TL;DR

Progressive Resizing works by initially shrinking the size of the training images, and slowly growing them back to their full size by the end of training. It reduces costs during the early phase of training, when the network may learn coarse-grained features that do not require details lost by reducing image resolution.

Attribution

Inspired by the progressive resizing technique as proposed by fast.ai.

Applicable Settings

Progressive Resizing is intended for use on computer vision tasks where the network architecture can accommodate different sized inputs.

Hyperparameters

initial_scale - The initial scaling coefficient used to determine the height and width of images at the beginning of training. The default value of 0.5 converts a 224x224 image to a 112x112 image, for example.
finetune_fraction - The fraction of training steps that should be devoted to training on the full-sized images. The default value of 0.2 means that there will be an initial training phase of 80% of max_epochs whereby the input images are linearly scaled by a multiple from initial_scale to 1.0, followed by a fine-tuning phase of 20% of max_epochs with a scale of 1.0.
mode - The method by which images should be resized. Currently, the two implemented methods are "crop" , where the image is randomly cropped to the desired size, and "resize", where the image is downsampled to the desired size using bilinear interpolation.
resize_targets - Whether the targets should be downsampled as well in the same fashion. This is appropriate for some tasks, such as segmentation, where elements of the output correspond to elements of the input image.

Example Effects

When using Progressive Resizing, the early steps of training run faster than the later steps of training (which run at the original speed), since the smaller images reduce the amount of computation that the network must perform. Ideally, generalization performance is not impacted much by Progressive Resizing, but this depends on the specific dataset, network architecture, task, and hyperparameters. In our experience with ResNets on ImageNet, Progressive resizing improves training speed (as measured by wall clock time) with negligible effects on classification accuracy.

Implementation Details

Our implementation of Progressive Resizing gives two options for resizing the images:

mode = "crop" does a random crop of the input image to a smaller size. This mode is appropriate for datasets where scale is important. For example, we get better results using crops for ResNet-56 on CIFAR-10, where the objects are similar sizes to one another and the images are already low resolution.

mode = "resize" does downsampling with a bilinear interpolation of the image to a smaller size. This mode is appropriate for datasets where scale is variable, all the content of the image is needed each time it is seen, or the images are relatively higher resolution. For example, we get better results using resizing for ResNet-50 on ImageNet.

Suggested Hyperparameters

initial_scale = 0.5 is a reasonable starting point. This starts training on images where each side length has been reduced by 50%.

finetune_fraction = 0.2 is a reasonable starting point for how long to train with full-sized images at the end of training. This reserves 20% of training at the end for training on full sized images.

Considerations

Progressive Resizing requires that the network architecture be capable of handling different sized images. Additionally, since the early epochs of training require significantly less GPU compute than the later epochs, CPU/dataloading may become a bottleneck in the early epochs even if this isn’t true in the late epochs.

Additionally, while we have not investigated this, Progressive Resizing may also change how sensitive the network is to different sizes of objects, or how biased the network is in favor of shape or texture.

Composability

Progressive Resizing will interact with other methods that change the size of the inputs, such as Selective Backprop with downsampling and ColOut

Detailed Results

Using the recommendations above, we ran a baseline ResNet-50 model on CIFAR-10 and ImageNet with and without progressive resizing. CIFAR-10 runs were done on a single NVIDIA 3080 GPU for 200 epochs. ImageNet runs were done on 8x NVIDIA 3080 GPUs for 90 epochs. Shown below are the validation set accuracies and time-to-train for each of these runs.

Code

class composer.algorithms.progressive_resizing.ProgressiveResizing(mode: str = 'resize', initial_scale: float = 0.5, finetune_fraction: float = 0.2, resize_targets: bool = False)[source]

Apply Fastai’s progressive resizing data augmentation to speed up training

Progressive resizing initially reduces input resolution to speed up early training. Throughout training, the downsampling factor is gradually increased, yielding larger inputs up to the original input size. A final finetuning period is then run to finetune the model using the full-sized inputs.

Parameters

mode – Type of scaling to perform. Value must be one of 'crop' or 'resize'. 'crop' performs a random crop, whereas 'resize' performs a bilinear interpolation.
initial_scale – Initial scale factor used to shrink the inputs. Must be a value in between 0 and 1.
finetune_fraction – Fraction of training to reserve for finetuning on the full-sized inputs. Must be a value in between 0 and 1.
resize_targets – If True, resize targets also.

apply(event: composer.core.event.Event, state: composer.core.state.State, logger: Optional[composer.core.logging.logger.Logger] = None) → None[source]

Applies ProgressiveResizing on input images

Parameters

event (Event) – the current event
state (State) – the current trainer state
logger (Logger) – the training logger

match(event: composer.core.event.Event, state: composer.core.state.State) → bool[source]

Run on Event.AFTER_DATALOADER

Parameters

event (Event) – The current event.
state (State) – The current state.

Returns

bool – True if this algorithm should run now

class composer.algorithms.progressive_resizing.resize_inputs(X: torch.Tensor, y: torch.Tensor, scale_factor: float, mode: str = 'resize', resize_targets: bool = False)[source]

Resize inputs and optionally outputs by cropping or interpolating.

Parameters

X – input tensor of shape (N, C, H, W). Resizing will be done along dimensions H and W using the constant factor scale_factor.
y – output tensor of shape (N, C, H, W) that will also be resized if resize_targets is True,
scale_factor – scaling coefficient for the height and width of the input/output tensor. 1.0 keeps the original size.
mode – type of scaling to perform. Value must be one of 'crop' or 'resize'. 'crop' performs a random crop, whereas 'resize' performs a bilinear interpolation.
resize_targets – whether to resize the targets, y, as well

Returns

X_sized – resized input tensor of shape (N, C, H * scale_factor, W * scale_factor).
y_sized – if resized_targets is True, resized output tensor of shape (N, C, H * scale_factor, W * scale_factor). Otherwise returns original y.