UNet

Category of Task: Vision

Kind of Task: Segmentation

Link to Code: https://github.com/mosaicml/mosaicml/tree/main/composer/models/unet

Overview

Unet is an example of architecture used in image segmentation. The example we are using is for medical brain tumor data.

Attribution

The UNet model has been introduced in “U-Net: Convolutional Networks for Biomedical Image Segmentation” by Olaf Ronneberger, Philipp Fischer, Thomas Brox in https://arxiv.org/abs/1505.04597.

We are using the NVDA DLE examples version in https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Segmentation/nnUNet.

Architecture

The figure below shows a 3D version of the UNet architecture. Quoting the DLE examples, U-Net is composed of a contractive and an expanding path, that aims at building a bottleneck in its centremost part through a combination of convolution, instance norm and leaky relu operations. After this bottleneck, the image is reconstructed through a combination of convolutions and upsampling. Skip connections are added with the goal of helping the backward flow of gradients in order to improve training.

Implementation Details

There are 3 main differences between our implementation and the original NVDA DALI implementation.

The first two refer to removing the NVDA DALI pipeline and replacing all transforms with torch implementations. We are omitting the Zoom transform and use a kernel size of 3 for the Gaussian Blur transform.

While NVDA DLE examples reports the training accuracy using an average of 5 folds, we are using only 1 fold in the interest of faster iteration time, so all of our results are reported using fold 0 and 200 epochs.

Exploring Tradeoffs Between Quality and Training Speed/Cost

As noted above, we are reporting only 1 fold and a fixed number of 200 epochs in training the model, while DLE uses early stopping.