Train a Mask R-CNN model with the Tensorflow Object Detection API

by Gilbert Tanner on May 04, 2020

Train a Mask R-CNN model with the Tensorflow Object Detection API

In this article, you'll learn how to train a Mask R-CNN model with the Tensorflow Object Detection API. If you haven't installed the Tensorflow Object Detection API yet, I'd recommend you to check out my article 'Installing the Tensorflow Object Detection API'.

Gathering data

Now that the Tensorflow Object Detection API is ready to go, we need to gather the images needed for training.

To train a robust model, we need lots of pictures that should vary as much as possible from each other. That means that they should have different lighting conditions, different backgrounds, and lots of random objects in them.

You can either take the pictures yourself, or you can download pictures from the internet. For my microcontroller detector, I have four different objects I want to detect (Arduino Nano, ESP8266, Raspberry Pi  3, Heltect ESP32 Lora).

I took about 25 pictures of each individual microcontroller and 25 pictures containing multiple microcontrollers using my smartphone.

After taking the pictures, make sure to transform them to a resolution suitable for training (I used 800x600).

You can use the resize_images script to resize the image to the wanted resolution.

python resize_images.py -d images/ -s 800 600

After you have all the images, move about 80% to the  object_detection/images/train directory and the other 20% to the object_detection/images/test directory. Make sure that the images in both directories have a good variety of classes.

Labeling data

After you have gathered enough images, it's time to label them, so your model knows what to learn. In order to label the data, you will need to use some kind of labeling software.

For object detection, we used LabelImg,  an excellent image annotation tool supporting both PascalVOC and Yolo format. For Image Segmentation/Instance Segmentation there are multiple great annotations tools available. Including, VGG Image Annotation Tool, labelme, and PixelAnnotationTool. I chose labelme, because of its simplicity to both install and use.

Labelme
Figure 2: Labelme

Labelme can be installed using pip:

pip install labelme

After you have installed Labelme, you can start it by typing labelme inside the command line. Now you can click on "Open Dir", select the folder with the images inside, and start labeling your images.

Labeling images
Figure 3: Labeling images

Generating Training data

With the images labeled, we need to create TFRecords that can be served as input data for the training of the model. Before we create the TFRecord files, we'll convert the labelme labels into COCO format. This can be done with the labelme2coco.py script.

python labelme2coco.py train --output train.json
python labelme2coco.py test --output test.json

Now that the data is in COCO format we can create the TFRecord files. For this we'll make use of the create_coco_tf_record.py file from my Github repository, which is a slightly modified version of the original create_coco_tf_record.py file.

python create_coco_tf_record.py --logtostderr --train_image_dir=images/train --test_image_dir=images/test --train_annotations_file=images/train.json --test_annotations_file=images/test.json --output_dir=./

After executing this command, you should have a train.record and test.record file inside your object detection folder.

Getting ready for training

The last thing we need to do before training is to create a label map and a training configuration file.

Creating a label map

The label map maps an id to a name. We will put it in a folder called training, which is located in the object_detection directory. The labelmap for my detector can be seen below.

item {
    id: 1
    name: 'Arduino'
}
item {
    id: 2
    name: 'ESP8266'
}
item {
    id: 3
    name: 'Heltec'
}
item {
    id: 4
    name: 'Raspberry'
}

The id number of each item should match the ids inside the train.json and test.json files.

"categories": [
    {
        "supercategory": "Arduino",
        "id": 0,
        "name": "Arduino"
    },
    {
        "supercategory": "ESP8266",
        "id": 1,
        "name": "ESP8266"
    },
    {
        "supercategory": "Heltec",
        "id": 2,
        "name": "Heltec"
    },
    {
        "supercategory": "Raspberry",
        "id": 3,
        "name": "Raspberry"
    }
],

Creating the training configuration

Lastly, we need to create a training configuration file. The Tensorflow Object Detection API provides 4 model options:

From the Tensorflow Model Zoo
Model name Speed (ms) COCO mAP[^1] Outputs
mask_rcnn_inception_resnet_v2_atrous_coco 771 36 Masks
mask_rcnn_inception_v2_coco 79 25 Masks
mask_rcnn_resnet101_atrous_coco 470 33 Masks
mask_rcnn_resnet50_atrous_coco 343 29 Masks

For this tutorial I chose to use the mask_rcnn_inception_v2_coco model, because it's alot faster than the other options. You can find the mask_rcnn_inception_v2_coco.config file inside the samples/config folder. Copy the config file to the training directory. Then open it with a text editor and make the following  changes:

Line 10: change the number of classes to number of objects you want to detect (4 in my case)

Line 126: change fine_tune_checkpoint to the path of the model.ckpt file:

fine_tune_checkpoint: "<path>/models/research/object_detection/training/mask_rcnn_inception_v2_coco_2018_01_28/model.ckpt"

Line 142: change input_path to the path of the train.records file:

input_path: "<path>/models/research/object_detection/train.record"

Line 158: change input_path to the path of the test.records file:

input_path: "<path>/models/research/object_detection/test.record"

Line 144 and 160: change label_map_path to the path of the label map:

label_map_path: "<path>/models/research/object_detection/training/labelmap.pbtxt"

Line 150: change num_example to the number of images in your test folder.

Training the model

To train the model execute the following command in the command line:

python model_main.py --logtostderr --model_dir=training/ --pipeline_config_path=training/mask_rcnn_inception_v2_coco.config

Every few minutes, the current loss gets logged to Tensorboard. Open Tensorboard by opening a second command line, navigating to the object_detection folder and typing:

tensorboard --logdir=training

This will open a webpage at localhost:6006.

Figure 4: Monitoring loss using Tensorboard

You should train the model until it reaches a satisfying loss. The training process can then be terminated by pressing Ctrl+C.

Training in Google Colab

If your computer doesn't have a good enough GPU to train the model locally, you can train it on Google Colab. For this, I recommend creating a folder that has the data as well as all the config files in it and putting it on Google Drive. That way, you can then load in all the custom files into Google Colab.

You can find an example inside the Tensorflow_Object_Detection_API_Instance_Segmentation_in_Google_Colab.ipynb notebook.

Exporting the inference graph

Now that we have a trained model, we need to generate an inference graph, which can be used to run the model. For doing so we need to first of find out the highest saved step number. For this, we need to navigate to the training directory and look for the model.ckpt file with the biggest index.

Then we can create the inference graph by typing the following command in the command line.

python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/mask_rcnn_inception_v2_coco.config --trained_checkpoint_prefix training/model.ckpt-XXXX --output_directory inference_graph

XXXX represents the highest number.

Testing model

To test the model you can use the object_detection_tutorial.ipynb. You only need to change the path to the model and label.

From:

# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

To:

MODEL_NAME = 'inference_graph'
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
PATH_TO_LABELS = 'training/labelmap.pbtxt'

Result

Figure 6: Mask R-CNN Prediction on video

That’s all from this article. If you have any questions or just want to chat with me, feel free to leave a comment below or contact me on social media. If you want to get continuous updates about my blog make sure to  join my newsletter.