Tip

This tutorial is available as a Jupyter notebook.

Open in Colab

ƒ() Functional API#

In this tutorial, we’ll see an example of using Composer’s algorithms in a standalone fashion with no changes to the surrounding code and no requirement to use the Composer trainer. We’ll be training a simple model on CIFAR-10, similar to the PyTorch classifier tutorial. Because we’ll be using a toy model trained for only a few epochs, we won’t get the same speed or accuracy gains we might expect from a more realistic problem. However, this notebook should still serve as a useful illustration of how to use various algorithms. For examples of more realistic results, see the MosaicML Explorer.

Install Composer#

If you don’t already have composer installed, install it:

[ ]:
%pip install mosaicml

Define the Model, Dataloader, and Training Loop#

First, we need to define our original model, dataloader, and training loop. Let’s start with the dataloader:

[ ]:
import torch
import torch.utils.data
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

datadir = './data'
batch_size = 1024

transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ]
)

trainset = torchvision.datasets.CIFAR10(root=datadir, train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root=datadir, train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

As you can see, we compose two transforms, one which transforms the images to tensors and another that normalizes them. We apply these transformations to both the train and test sets. Now, let’s define our model. We’re going to use a toy convolutional neural network so that the training epochs finish quickly.

[ ]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=(3, 3), stride=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=(3, 3))
        self.norm = nn.BatchNorm2d(32)
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.fc1 = nn.Linear(32, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.conv2(x)
        x = F.relu(self.norm(x))
        x = torch.flatten(self.pool(x), 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Finally, let’s write a simple training loop that prints the accuracy on the test set at the end of each epoch. We’ll just run a few epochs for brevity.

[ ]:
from tqdm.notebook import tqdm
import composer.functional as cf

num_epochs = 5

def train_and_eval(model, train_loader, test_loader):
    torch.manual_seed(42)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model = model.to(device)
    opt = torch.optim.Adam(model.parameters())
    for epoch in range(num_epochs):
        print(f"---- Beginning epoch {epoch} ----")
        model.train()
        progress_bar = tqdm(train_loader)
        for X, y in progress_bar:
            X = X.to(device)
            y = y.to(device)
            y_hat = model(X)
            loss = F.cross_entropy(y_hat, y)
            progress_bar.set_postfix_str(f"train loss: {loss.item():.4f}")
            loss.backward()
            opt.step()
            opt.zero_grad()
        model.eval()
        num_right = 0
        eval_size = 0
        for X, y in test_loader:
            X = X.to(device)
            y = y.to(device)
            y_hat = model(X)
            num_right += (y_hat.argmax(dim=1) == y).sum().item()
            eval_size += len(y)
        acc_percent = 100 * num_right / eval_size
        print(f"Epoch {epoch} validation accuracy: {acc_percent:.2f}%")

Great. Now, let’s instantiate this baseline model and see how it fares on our dataset.

[ ]:
model = Net()
train_and_eval(model, trainloader, testloader)

Now that we have this baseline, let’s add algorithms to improve our data pipeline and model. We’ll start by adding some data augmentation, accessed via cf.colout_batch.

[ ]:
# create dataloaders for the train and test sets
shared_transforms = [
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
]

train_transforms = shared_transforms[:] + [cf.colout_batch]

test_transform = transforms.Compose(shared_transforms)
train_transform = transforms.Compose(train_transforms)

trainset = torchvision.datasets.CIFAR10(root=datadir, train=True,
                                        download=True, transform=train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                        shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root=datadir, train=False,
                                        download=True, transform=test_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                          shuffle=False, num_workers=2)

Let’s see how our model does with just these changes.

[ ]:
model = Net()
# only use one data augmentation since our small model runs quickly
# and allows the dataloader little time to do anything fancy
train_and_eval(model, trainloader, testloader)

As we might expect, adding data augmentation doesn’t help us when we aren’t training long enough to start overfitting.

Let’s try using some algorithms that modify the model. We’re going to keep things simple and just add a Squeeze-and-Excitation module after the larger of the two conv2d operations in our model.

[ ]:
# squeeze-excite can add a lot of overhead for small
# conv2d operations, so only add it after convs with a
# minimum number of channels
cf.apply_squeeze_excite(model, latent_channels=64, min_channels=16)

Now let’s see how our model does with the above algorithm applied.

[ ]:
train_and_eval(model, trainloader, testloader)

Adding squeeze-excite gives us another few percentage points of accuracy and does so with little decrease in the number of iterations per second. Great!

Of course, this is a toy model and dataset, but it serves to illustrate how to use Composer’s algorithms inside your own training loops, with minimal changes to your code. If you hit any problems or have questions, feel free to open an issue or reach out to us on Slack.