CIFAR ResNet

Category of Task: Vision

Kind of Task: Image Classification

Overview

The ResNet model family is a set of convolutional neural networks that can be used as the basis for a variety of vision tasks. CIFAR ResNet models are a subset of this family designed specifically for the CIFAR-10 and CIFAR-100 datasets.

Attribution

Paper: Deep Residual Learning for Image Recognition by He, Zhang, Ren and Sun 2015. Note that this paper set the standard for ResNet style architectures for both CIFAR-10/100 and ImageNet

Architecture

Residual Networks are feedforward convolutional networks with “residual” connections between non-consecutive layers.

The model architecture is defined by the original paper:

  • The network inputs are of dimension 32×32x3.

  • The first layer is 3×3 convolutions

  • The subsequent layers are a stack of 6n layers with 3×3 convolutions on the feature maps of sizes {32,16,8}, with 2n layers for each feature map size. The number of filters are {16,32,64} for the respective feature map sizes. Subsampling is performed by convolutions with a stride of 2

  • The network ends with a global average pooling, a linear layer with the output dimension equal to the number of classes, and softmax function.

There are a total 6n+2 stacked weighted layers. Each family member is specified by the number of layers, for example n=9 corresponds to ResNet56

The biggest differences between CIFAR ResNet models and ImageNet ResNet models are:

  • CIFAR ResNet models use fewer filters for each convolution.

  • The ImageNet ResNets contain four stages, while the CIFAR ResNets contain three stages. In addition, CIFAR ResNets uniformly distribute blocks across each stage while ImageNet ResNets have a specific number of blocks for each stage.

Family members

Model Family Members

Parameter Count

Our Accuracy

Training Time on 1x3080

ResNet20

0.27M

TBA

TBA

ResNet32

0.46M

TBA

TBA

ResNet44

0.66M

TBA

TBA

ResNet56

0.85M

93.1%

35 min

ResNet110

1.7M

TBA

TBA

Default Training Hyperparameters

  • Optimizer: SGD

    • Learning rate: 1.2

    • Momentum: 0.9

    • Weight decay: 1e-4

  • Batch size: 1024

  • LR Schedulers

    • Linear warmup for 5 epochs

    • Multistep decay by 0.1 at epochs 80 and 120

  • Number of epochs: 160