๐ Tensorboard Monitoring#
Installing Tensorboard#
Before we get started make sure you have installed Tensorboard! Make sure to run:
pip install 'mosaicml[tensorboard]'
Logging to Tensorboard Pythonically#
To log your runโs results to tensorboard, first you will need to create a TensorboardLogger
object, like so:
from composer.loggers import TensorboardLogger
tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")
log_dir
is where you want the Tensorboard logs to be saved locally (on the system in which you run composer).
If you are viewing or accessing you logs locally, choose this path wisely and remember it! Also make sure to use this same value for log_dir
for any future runs (so all your runs can be visualized together!)
If you will be using S3 to save your logs then the exact path you choose is not as important, as your logs will automatically
be saved to a directory called tensorboard_logs
inside of your bucket.
Once we have our TensorboardLogger, we just need to add it to our Trainer and then weโll be good to go. Below is an example of training MNIST with Tensorboard Logging:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from composer import Trainer
from composer.models import mnist_model
from composer.loggers import TensorboardLogger
transform = transforms.Compose([transforms.ToTensor()])
train_set = datasets.MNIST("data", train=True, download=True, transform=transform)
val_set = datasets.MNIST("data", train=False, download=True, transform=transform)
train_dataloader = DataLoader(train_set, batch_size=128)
eval_dataloader = DataLoader(val_set, batch_size=64)
# Create your Tensorboard Logger here.
tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")
trainer = Trainer(
model=mnist_model(num_classes=10),
train_dataloader=train_dataloader,
eval_dataloader=eval_dataloader,
max_duration='5ep',
# Add your Tensorboard Logger to the trainer here.
loggers=[tb_logger],
eval_interval='1ep'
)
trainer.fit()
Now, run this code and if all goes well, your loss and metric results will be logged Tensorboard log files, which will be written to โ./my_tensorboard_logsโ. See these instructions for viewing your results in the Tensorboard viewer.
Logging to Tensorboard using YAML configs#
To log to Tensorboard using YAML config files, you can add these lines to you YAML file:
loggers:
tensorboard:
log_dir: my_tensorboard_logs
Putting it all together the MNIST config YAML file version of the code above should look like:
train_dataset:
mnist:
is_train: true
datadir: ./datasets/mnist
download: true
val_dataset:
mnist:
is_train: false
datadir: ./datasets/mnist
download: true
model:
mnist_classifier:
num_classes: 10
max_duration: 5ep
eval_interval: 1ep
train_batch_size: 128
eval_batch_size: 64
loggers:
tensorboard:
log_dir: my_tensorboard_logs
Then save that YAML file as mnist.yaml
for example.
To run the training using this config file, just run:
python examples/run_composer_trainer.py -f mnist.yaml
If all goes well, your loss and metric results will be logged Tensorboard log files, which will be written to โ./my_tensorboard_logsโ. See these instructions for viewing your results in the Tensorboard viewer.
Saving your Logs to S3#
You can also save your Tensorboard logs to cloud storage such as Amazon S3. This is especially useful in cases where your run environment does not have persistent storage.
To save your logs to S3, you need to use an RemoteUploaderDownloader
If you are using YAML configs you can get S3 logging by adding the following lines to your YAML file under loggers:
loggers:
object_store:
object_store_hparams:
s3:
bucket: my-s3-bucket-name
Remember to replace my-s3-bucket-name
with the name of your bucket.
Putting this all together, your YAML file will look like:
train_dataset:
mnist:
is_train: true
datadir: ./datasets/mnist
download: true
val_dataset:
mnist:
is_train: false
datadir: ./datasets/mnist
download: true
model:
mnist_classifier:
num_classes: 10
max_duration: 5ep
eval_interval: 1ep
train_batch_size: 128
eval_batch_size: 64
loggers:
tensorboard:
log_dir: my_tensorboard_logs
object_store:
object_store_hparams:
s3:
bucket: my-s3-bucket-name
Once again you can run this job like so:
python examples/run_composer_trainer.py -f mnist.yaml
If all goes well: your loss and metric results will be logged Tensorboard log files, which will be written to s3://my-s3-bucket-name/tensorboard_logs
The tensorboard logger will always name the directory inside your bucket โtensorboard_logs
โ.
See these instructions for viewing your results in the Tensorboard viewer.
Viewing your Results Locally#
If you saved your Tensorboard log files locally you can view them by starting a Tensorboard process and pointing it to the log directory you specified. To do this run the following at the command line:
tensorboard --logdir='./my_tensorboard_logs'
This will start a Tensorboard process, which will write a message to stdout that looks something like:
TensorBoard 2.9.1 at http://localhost:6006/ (Press CTRL+C to quit)
Open the URL in your browser to access the Tensorboard viewer, which should look something like this:
Enjoy viewing your metrics!
Viewing your Results from S3#
To view your logs that are saved to S3, you first can use the AWS cli, which you can download here. Once you have that installed, you can run the following command:
tensorboard --logdir=s3://my-bucket-name/tensorboard_logs
Now, as detailed above in Viewing your Results Locally, you can open the URL that is written to stdout to view your results!