CIFAR ResNet
Category of Task: Vision
Kind of Task: Image Classification
Overview
The ResNet model family is a set of convolutional neural networks that can be used as the basis for a variety of vision tasks. CIFAR ResNet models are a subset of this family designed specifically for the CIFAR-10 and CIFAR-100 datasets.
Attribution
Paper: Deep Residual Learning for Image Recognition by He, Zhang, Ren and Sun 2015. Note that this paper set the standard for ResNet style architectures for both CIFAR-10/100 and ImageNet
Architecture
Residual Networks are feedforward convolutional networks with “residual” connections between non-consecutive layers.
The model architecture is defined by the original paper:
The network inputs are of dimension 32×32x3.
The first layer is 3×3 convolutions
The subsequent layers are a stack of 6n layers with 3×3 convolutions on the feature maps of sizes {32,16,8}, with 2n layers for each feature map size. The number of filters are {16,32,64} for the respective feature map sizes. Subsampling is performed by convolutions with a stride of 2
The network ends with a global average pooling, a linear layer with the output dimension equal to the number of classes, and softmax function.
There are a total 6n+2 stacked weighted layers. Each family member is specified by the number of layers, for example n=9 corresponds to ResNet56
The biggest differences between CIFAR ResNet models and ImageNet ResNet models are:
CIFAR ResNet models use fewer filters for each convolution.
The ImageNet ResNets contain four stages, while the CIFAR ResNets contain three stages. In addition, CIFAR ResNets uniformly distribute blocks across each stage while ImageNet ResNets have a specific number of blocks for each stage.
Family members
Model Family Members |
Parameter Count |
Our Accuracy |
Training Time on 1x3080 |
---|---|---|---|
ResNet20 |
0.27M |
TBA |
TBA |
ResNet32 |
0.46M |
TBA |
TBA |
ResNet44 |
0.66M |
TBA |
TBA |
ResNet56 |
0.85M |
93.1% |
35 min |
ResNet110 |
1.7M |
TBA |
TBA |
Default Training Hyperparameters
Optimizer: SGD
Learning rate: 1.2
Momentum: 0.9
Weight decay: 1e-4
Batch size: 1024
LR Schedulers
Linear warmup for 5 epochs
Multistep decay by 0.1 at epochs 80 and 120
Number of epochs: 160