🏙️ ResNet#

[How to Use] · [Architecture] · [Family Members] · [Default Training Hyperparameters] · [Attribution] · [API Reference]

Vision / Image Classification

The ResNet model family is a set of convolutional neural networks that can be used as a basis for a variety of vision tasks. Our implementation is a simple wrapper on top of the torchvision ResNet implementation.

How to Use#

from composer.models import composer_resnet

model = composer_resnet(
    model_name="resnet50",
    num_classes=1000,
    weights=None
)

Architecture#

The basic architecture defined in the original papers is as follows:

The first layer is a 7x7 Convolution with stride 2 and 64 filters.
Subsequent layers follow 4 stages with {64, 128, 256, 512} input channels with a varying number of residual blocks at each stage that depends on the family member. At the end of every stage, the resolution is reduced by half using a convolution with stride 2.
The final section consists of a global average pooling followed by a linear + softmax layer that outputs values for the specified number of classes.

The below table from He et al. details some of the building blocks for ResNets of different sizes.

Family Members#

ResNet family members are identified by their number of layers. Parameter count, accuracy, and training time are provided below.

Model Family Members	Parameter Count	Our Accuracy	Training Time on 8xA100s
ResNet-18	11.5M	TBA	TBA
ResNet-34	21.8M	TBA	TBA
ResNet-50	25.6M	76.5%	3.83 hrs
ResNet-101	44.5M	78.1%	5.50 hrs
ResNet-152	60.2M	TBA	TBA

❗ Note: Please see the CIFAR ResNet model card for the differences between CIFAR and ImageNet ResNets.

Default Training Hyperparameters#

Optimizer: Decoupled SGDW
- Learning rate: 2.048 Momentum: 0.875 Weight_decay: 5.0e-4
LR schedulers:
- Cosine decay with warmup for 8 epochs
Batch size: 2048
Number of epochs: 90ep

Attribution#

Paper: Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Code and hyperparameters: DeepLearningExamples Github repository by Nvidia

API Reference#

composer.models.resnet.model.composer_resnet(model_name, num_classes=1000, weights=None, pretrained=False, groups=1, width_per_group=64, initializers=None, loss_name='soft_cross_entropy')[source]

Helper function to create a ComposerClassifier with a torchvision ResNet model.

From Deep Residual Learning for Image Recognition (He et al, 2015).

Parameters

model_name (str) – Name of the ResNet model instance. Either ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152"].
num_classes (int, optional) – The number of classes. Needed for classification tasks. Default: 1000.
weights (str, optional) – If provided, pretrained weights can be specified, such as with IMAGENET1K_V2. Default: None.
pretrained (bool, optional) – If True, use ImageNet pretrained weights. Default: False. This parameter is deprecated and will soon be removed in favor of weights.
groups (int, optional) – Number of filter groups for the 3x3 convolution layer in bottleneck blocks. Default: 1.
width_per_group (int, optional) – Initial width for each convolution group. Width doubles after each stage. Default: 64.
initializers (List[Initializer], optional) – Initializers for the model. None for no initialization. Default: None.
loss_name (str, optional) – Loss function to use. E.g. ‘soft_cross_entropy’ or ‘binary_cross_entropy_with_logits’. Loss function must be in loss. Default: 'soft_cross_entropy'”.

Returns

ComposerModel – instance of ComposerClassifier with a torchvision ResNet model.

Example:

from composer.models import composer_resnet

model = composer_resnet(model_name='resnet18')  # creates a torchvision resnet18 for image classification