Today’s blog post will demonstrate some basic neural network code that uses the fast.ai library to train a model to categorize images into one of n categories and get predictions on single images. You can take this code and change it to train the model on any image categories you would like. I won’t go into too much detail on what is going on behind the scenes and will focus on showing you how to get a working model.
The code is based off of lesson 1 (and partially lesson 3) of the fast.ai course, which I strongly recommend taking a look at if you are interested in machine learning. To run the code, you will need the fast.ai library, which is built on top of the PyTorch deep learning framework. Additionally, to run the code in any reasonable amount of time you will likely want a GPU. Take a look at fast.ai lesson 1 to learn how to set up a GPU computer in the cloud using Paperspace.
The code above constructs a neural network using the ResNet-34 model, which is pre-trained on the ImageNet dataset. Our code simply adds a few layers to the end of the pre-trained model and trains them.
Line 7 essentially packages up the images so that they can be passed through the model and used to train it. Line 8 creates the learner object capable of actually doing the training. Line 9 actually performs the training with a learning rate and a number of epochs (cycles through all of the input data). With the fast.ai library, these three lines of code are all that is required to build and train a convolutional neural network. Fast.ai makes the process of training a neural network incredibly simple by taking care of preparing the pre-trained model, adding additional layers, and setting some hyper-parameters to sensible defaults if not specified. At the same time however, the source code of the fast.ai library is relatively concise and readable, so as you use fast.ai I encourage you to take a look at how it is leveraging PyTorch under the hood.
To train this model on your images, change the PATH variable to point at where your images reside. The PATH directory should have inside it a “train” and a “valid” directory. Inside these directories there should be a directory for each category of image with the images for that category inside. You should place roughly 80% of your images for each category in the train directory and the other 20% in the valid directory.
To get a prediction for a single image:
In the code above we get the transformations that are applied to the validation image set, apply them to our image, and get our model’s prediction. We then find the predicted category name from our data object’s classes attribute. Note that because inputs to our model are assumed to be tensors (rank 4 tensors in our case, or tensors with 4 dimensions), we must add another dimension to our 3-dimensional image with img[None].
The code in this post will work best on images that are similar to ImageNet images, which are primarily images of objects, animals, and other easily photographable things. If our images were something much different like X-rays or satellite images, the pretrained ResNet-34 model would likely benefit from some extra training that we have not done here and we might want to use different data augmentation transforms (the “transforms_side_on” variable from our ImageClassifierData instantiation). You may have noticed that during training, several rows of numbers get printed to the console.
[0. 0.04368 0.02732 0.99121]
[1. 0.03997 0.02237 0.99023]
[2. 0.04147 0.02266 0.99072]
The fourth number in each row is the accuracy of our model based on how well it categorizes the images in our validation set. The hope is that this number is as close as possible to 1.0 by your final training epoch. The number may vary greatly depending on your number of images, number of categories, and quality of images, but hopefully it is somewhere between 0.8 and 1.0.
There are several techniques we could employ to increase the accuracy of this model, but I will will leave them for a future post.