CIFAR ResNet

Category of Task: Vision

Kind of Task: Image Classification

Overview

The ResNet model family is a set of convolutional neural networks that can be used as the basis for a variety of vision tasks. CIFAR ResNet models are a subset of this family designed specifically for the CIFAR-10 and CIFAR-100 datasets.

Attribution

Paper: Deep Residual Learning for Image Recognition by He, Zhang, Ren and Sun 2015. Note that this paper set the standard for ResNet style architectures for both CIFAR-10/100 and ImageNet

Architecture

Residual Networks are feedforward convolutional networks with “residual” connections between non-consecutive layers.

The model architecture is defined by the original paper:

The network inputs are of dimension 32×32x3.
The first layer is 3×3 convolutions
The subsequent layers are a stack of 6n layers with 3×3 convolutions on the feature maps of sizes {32,16,8}, with 2n layers for each feature map size. The number of filters are {16,32,64} for the respective feature map sizes. Subsampling is performed by convolutions with a stride of 2
The network ends with a global average pooling, a linear layer with the output dimension equal to the number of classes, and softmax function.

There are a total 6n+2 stacked weighted layers. Each family member is specified by the number of layers, for example n=9 corresponds to ResNet56

The biggest differences between CIFAR ResNet models and ImageNet ResNet models are:

CIFAR ResNet models use fewer filters for each convolution.
The ImageNet ResNets contain four stages, while the CIFAR ResNets contain three stages. In addition, CIFAR ResNets uniformly distribute blocks across each stage while ImageNet ResNets have a specific number of blocks for each stage.

Family members

Model Family Members	Parameter Count	Our Accuracy	Training Time on 1x3080
ResNet20	0.27M	TBA	TBA
ResNet32	0.46M	TBA	TBA
ResNet44	0.66M	TBA	TBA
ResNet56	0.85M	93.1%	35 min
ResNet110	1.7M	TBA	TBA

Default Training Hyperparameters

Optimizer: SGD
- Learning rate: 1.2
- Momentum: 0.9
- Weight decay: 1e-4
Batch size: 1024
LR Schedulers
- Linear warmup for 5 epochs
- Multistep decay by 0.1 at epochs 80 and 120
Number of epochs: 160