Train a Mask R-CNN model with the Tensorflow Object Detection API
Create a custom Mask R-CNN model with the Tensorflow Object Detection API.
This article will teach you how to train a Mask R-CNN model with the Tensorflow Object Detection API and Tensorflow 2. If you want to use Tensorflow 1 instead, check out the tf1 branch of my Github repository.
Installation
You can install the TensorFlow Object Detection API either with Python Package Installer (pip) or Docker, an open-source platform for deploying and managing containerized applications. For running the Tensorflow Object Detection API locally, Docker is recommended. If you aren't familiar with Docker though, it might be easier to install it using pip.
First clone the master branch of the Tensorflow Models repository:
Docker Installation
Python Package Installation
Note: The *.proto designating all files does not work protobuf version 3.5 and higher. If you are using version 3.5, you have to go through each file individually. To make this easier, I created a python script that loops through a directory and converts all proto files one at a time.
To test the installation run:
If everything was installed correctly, you should see something like:
Gathering data
Now that the Tensorflow Object Detection API is ready to go, we need to gather the necessary images for training.
To train a robust model, we need lots of pictures that should vary as much as possible from each other. That means that they should have different lighting conditions, different backgrounds, and lots of random objects in them.
You can either take the pictures yourself or download pictures from the internet. For my microcontroller detector, I have four different objects I want to detect (Arduino Nano, ESP8266, Raspberry Pi 3, Heltect ESP32 Lora).
I took about 25 pictures of each microcontroller and 25 pictures containing multiple microcontrollers using my smartphone.
After taking the pictures, make sure to transform them to a resolution suitable for training (I used 800x600).
You can use the resize images script to resize the image to the wanted resolution.
After you have all the images, move about 80% to the object_detection/images/train directory and the other 20% to the object_detection/images/test directory. Ensure that the images in both directories have a good variety of classes.
Labeling data
After gathering enough images, it's time to label them so your model knows what to learn. To label the data, you will need to use a labeling software.
For object detection, we used LabelImg, an excellent image annotation tool supporting both PascalVOC and Yolo format. For Image Segmentation/Instance Segmentation, there are multiple great annotation tools available, including, VGG Image Annotation Tool, labelme, and PixelAnnotationTool. I chose labelme because of its simplicity to both install and use.
Labelme can be installed using pip:
After installing Labelme, you can start it by typing labelme
inside the command line. Now you can click on "Open Dir", select the folder with the images inside, and start labeling your images.
Generating Training data
With the images labeled, we need to create TFRecords that can be served as input data for the training of the model. Before we create the TFRecord files, we'll convert the labelme labels into COCO format. This can be done with the labelme2coco.py script.
Now that the data is in COCO format, we can create the TFRecord files. For this, we'll use the create_coco_tf_record.py file from my Github repository, which is a slightly modified version of the original create_coco_tf_record.py file.
After executing this command, you should have a train.record and test.record file inside your object detection folder.
Getting ready for training
The last thing we need to do before training is to create a label map and a training configuration file.
Creating a label map
The label map maps an id to a name. We will put it in a folder called training, which is located in the object_detection directory. The labelmap for my detector can be seen below.
The id number of each item should match the ids inside the train.json and test.json files.
Creating the training configuration
Lastly, we need to create a training configuration file. At the moment, only one Mask-RCNN model is supported with Tensorflow 2.
Model name | Speed (ms) | COCO mAP | Outputs |
---|---|---|---|
Mask R-CNN Inception ResNet V2 1024x1024 | 301 | 39.0/34.6 | Boxes/Masks |
The base config for the model can be found inside the configs/tf2 folder.
Copy the config file to the training directory. Then open it inside a text editor and make the following changes:
- Line 12: change the number of classes to the number of objects you want to detect (4 in my case)
- Line 125: change
fine_tune_checkpoint
to the path of the model.ckpt file:
- Line 126: Change
fine_tune_checkpoint_type
to detection - Line 136: change
input_path
to the path of the train.record file:
- Line 156: change
input_path
to the path of the test.record file:
- Line 134 and 152: change
label_map_path
to the path of the label map:
Line 107 and 147: change batch_size
to a number appropriate for your hardware, like 4, 8, or 16.
Training the model
To train the model, run the following command in the command line:
If everything was setup correctly, the training should begin shortly, and you should see something like the following:
Every few minutes, the current loss gets logged to Tensorboard. Open Tensorboard by opening a second command line, navigating to the object_detection folder, and typing:
This will open a webpage at localhost:6006.
The training script saves checkpoints about every five minutes. Train the model until it reaches a satisfying loss. Then you can terminate the training process by pressing Ctrl+C.
Exporting the inference graph
Now that we have a trained model, we need to generate an inference graph that can be used to run the model.