FastAI Image Segmentation

Image segmentation is the process of taking a digital image and segmenting it into multiple segments of pixels. The goal of image segmentation is to simplify and/or change the representation of an image into something more meaningful and easier to understand.

The FastAI library allows us to build image segmentation models using only a few lines of code by providing us with classes and methods for both loading in the data and creating a model to perform the segmentation(UNET).

If you are unfamiliar with the FastAI library I would highly recommend that you check out the Practical Deep Learning for Coders course which is really awesome and not only teaches you about the library but also about the technologies and practices used to make the library great.

In this article, we will go over where we can get image segmentation data, how we could create our own data as well as what a U-NET is and how we can use for image segmentation.

U-NET

A U-NET is a convolutional neural network that was initially developed for biomedical image segmentation but has since proven its value for all image segmentation task no matter what specific topic.

Figure 2: U-NET Architecture

The U-NET architecture consists of two paths. The contraction path (the encoder) and the expansion path (the decoder).

The encoder extracts features which contain information about what is in an image using convolutional and pooling layers.

Whilst encoding the size of the feature map gets reduced. The decoder is then used to recover the feature map size for the segmentation image, for which it uses Up-convolution layers.

Because the decoding process loses some of the higher level features the encoder learned, the U-NET has skip connections. That means that the outputs of the encoding layers are passed directly to the decoding layers so that all the important pieces of information can be preserved.

That’s only a high-level overview of what a U-NET is. For more information check out the original paper.

Getting our data

For this tutorial, we will use the CamVid data-set which is a really high-quality road-segmentation data-set provided by the University of Cambridge.

Another nice thing about the data-set is that we don’t need to download it manually because it is included in the FastAI library and so we can simply download it using the untar_data method.

from fastai.vision import *

path = untar_data(URLs.CAMVID)
print(path.ls()) # prints subdirectories

Next of we will take a quick look at our data by graphing a random image and its segmentation using methods provided by FastAI, that allow us to both obtain all the image-paths and open an image.

path_lbl = path/'labels'
path_img = path/'images'

fnames = get_image_files(path_img)
lbl_names = get_image_files(path_lbl)

# open and show image
img_f = fnames[10]
img = open_image(img_f)
img.show(figsize=(5, 5))

Figure 3: Example image

Now we need to create a function that maps from the path of an image to the path of its segmentation.

get_y_fn = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'

print(get_y_fn(img_f))

This outputs the path to the segmentation of the chosen image and we can now use it to open a segmentation image.

mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5, 5), alpha=1)

Figure 4: Segmentation Example

Now that we know how our data looks like we can create our data-set using the SegmentationItemList class provided by FastAI.

src_size = np.array(mask.shape[1:])
size = src_size//2
bs = 2

src = (SegmentationItemList.from_folder(path_img)
       # Load in x data from folder
       .split_by_fname_file('../valid.txt')
       # Split data into training and validation set 
       .label_from_func(get_y_fn, classes=codes)
       # Label data using the get_y_fn function
)

data = (src.transform(get_transforms(), size=size, tfm_y=True)
        # Flip images horizontally 
        .databunch(bs=bs)
        # Create a databunch
        .normalize(imagenet_stats)
        # Normalize for resnet
)

We can show a few examples using the show_batch method which is available for all sorts of databunches in FastAI.

data.show_batch(rows=3, figsize=(12, 9))

Figure 5: Examples with segmentation overlay

Creating our own data

In order to create your own segmentation data you first of need to make or download some pictures of the objects, you want to detect. Then you need to create the segmentation using some kind of software.

For regular object detection, you would need to annotate the objects in an image using a bounding box, but for segmentation, you need to give every pixel in an image a color specific to its class.

Figure 6: Segmentation example (from Pixel Annotation Tool)

Thankfully there are free tools out there that can help you label your segmentation data. One of those tools is called Pixel Annotation Tool and it provides you with the ability to color the pixels using different brush sizes.

Figure 7: Pixel Annotation Tool

Creating and training our model

Now that we have our data and know what a U-NET is we can use the FastAI library to create and train our segmentation model. But before we will create our model we will create a function that will measure the accuracy of the model. The accuracy on the CamVid data-set should be measured without the void class and therefore we will exclude the void class from our accuracy function.

name2id = {v:k for k,v in enumerate(codes)}
void_code = name2id['Void']

def acc_camvid(input, target):
    target = target.squeeze(1)
    mask = target != void_code
return (input.argmax(dim=1)[mask]==target[mask]).float().mean()

To create a U-NET in FastAI the unet_learner class can be used. We not only going to pass it our data but we will also specify an encoder-network (Resnet34 in our case), our accuracy function as well as a weight-decay of 1e-2.

learn = unet_learner(data, models.resnet34, metrics=acc_image_seg, wd=1e-2)

With our model ready to go we can now search for a fitting learning rate and then start training our model. This process is the same for all FastAI models and if you aren’t familiar with it yet I would highly recommend that you check out my first FastAI article.

learn.lr_find() # find learning rate
learn.recorder.plot() # plot learning rate graph

Figure 8: Learning rate

lr = 3e-3 # pick a lr
learn.fit_one_cycle(10, slice(lr), pct_start=0.9) # train model

Figure 9: Training results

Standardly only the decoder is unfrozen, which means that our pretrained encoder didn’t receive any training yet so we will now show some results and then train the whole model.

learn.save('camvid-stage-1') # save model
learn.show_results(rows=3, figsize=(8, 9)) # show results

Figure 10: Results after the first training run

learn.unfreeze() # unfreeze all layers

# find and plot lr again
learn.unfreeze()
learn.recorder.plot()

# train model 
learn.fit_one_cycle(12, slice(lr/400, lr/4), pct_start=0.8)

Figure 11: Training results

As you can see we reached an accuracy of 92% and almost perfect segmentation results on a seemingly hard problem which is amazing.

Conclusion

Image segmentation is the process of taking a digital image and segmenting it into multiple segments of pixels with the goal of getting a more meaningful and simplified image.

FastAI makes it easy for us to perform image segmentation by giving us the ability to load in our segmentation data and to use a U-NET model for segmenting the images.

If you liked this article consider subscribing to my Youtube Channel and following me on social media.

The code covered in this article is available as a Github Repository.

If you have any questions, recommendations or critiques, I can be reached via Twitter or the comment section.