Create a microcontroller detector using Detectron2

Create a microcontroller detector using Detectron2

Object Detection is a common computer vision problem that deals with identifying and locating certain objects inside an image. It's used in a lot of applications today including video surveillance, pedestrian detection, and face detection.

In my last article, I showed you how to use Detectron2, Facebook's new computer vision framework, for both object detection and instance segmentation. I also showed you the basic steps of building your own object detector using it.

In this article, I want to go a step further by giving you a concrete example on how to build a custom model. Just like for my Tensorflow Object Detection Tutorial we will work on my Microcontroller Detection Data-set which can be downloaded for free on Kaggle.

All the code covered in this article can be found on my Github.

Gathering and Labeling data

Creating your own object detection data-set can be painful. Especially if you aren't familiar with the steps it takes to correctly and efficiently gather data and label it. That's why I want to quickly walk you through the needed steps.

Gathering data

Gathering image data is simple. You can either take pictures yourself using some kind of camera or you can download images from the internet.

In order to build a robust model, you need pictures with different backgrounds, varying lighting conditions as well as random objects in the background.

Labeling data

After you have gathered enough images it is time to label them so your model knows what to learn. In order to label the data, you will need to use some kind of labeling software.

What software you use will depend on what kind of labels you want. If you want to create a object detection data-set I can highly recommend LabelImg, an excellent image annotation tool supporting both PascalVOC and Yolo format. You can download a prebuilt for pretty much all popular operating systems here.

Figure 2: LabelImg

If you want to create a instance segmentation data-set I can recommend labelme, a polygonal annotation tool very similar to labelImg.

Figure 3: Labelme

Registering the data-set

If you want to use a custom dataset with one of detectron2's prebuilt data loaders you will need to register your dataset so Detectron2 knows how to obtain the dataset.

Here you have two options. You can either write a method that returns the needed information or you can transform your dataset to COCO format which can then be directly registered with the register_coco_instances method.

In my case ,I have PascalVOC labels and even though you can transform them into COCO format I decided against it and used option 1.

In the following code snippet, you can see how I loaded the data from the csv file, which I created with the xml_to_csv.py file based on the xml_to_csv.py file from Dat Tran’s raccoon detector.

Figure 4: Csv file
import os
import numpy as np
import json
from detectron2.structures import BoxMode
import itertools
import cv2

# write a function that loads the dataset into detectron2's standard format
def get_microcontroller_dicts(csv_file, img_dir):
    df = pd.read_csv(csv_file)
    df['filename'] = df['filename'].map(lambda x: img_dir+x)

    classes = ['Raspberry_Pi_3', 'Arduino_Nano', 'ESP8266', 'Heltec_ESP32_Lora']

    df['class_int'] = df['class'].map(lambda x: classes.index(x))

    dataset_dicts = []
    for filename in df['filename'].unique().tolist():
        record = {}
        
        height, width = cv2.imread(filename).shape[:2]
        
        record["file_name"] = filename
        record["height"] = height
        record["width"] = width

        objs = []
        for index, row in df[(df['filename']==filename)].iterrows():
          obj= {
              'bbox': [row['xmin'], row['ymin'], row['xmax'], row['ymax']],
              'bbox_mode': BoxMode.XYXY_ABS,
              'category_id': row['class_int'],
              "iscrowd": 0
          }
          objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts

To test that the function works correctly you can create a dataset dict and use the detectrons Visualizer to visualize a few random samples.

import random
from detectron2.utils.visualizer import Visualizer

dataset_dicts = get_microcontroller_dicts('Microcontroller Detection/train_labels.csv', 'Microcontroller Detection/train/')
for d in random.sample(dataset_dicts, 10):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=microcontroller_metadata, scale=0.5)
    vis = visualizer.draw_dataset_dict(d)
    cv2.imshow(vis.get_image()[:, :, ::-1])
Figure 5: Labled data examples

If the above code shows your data correctly you are ready to register the training and testing set using the following code:

from detectron2.data import DatasetCatalog, MetadataCatalog

classes = ['Raspberry_Pi_3', 'Arduino_Nano', 'ESP8266', 'Heltec_ESP32_Lora']

for d in ["train", "test"]:
  DatasetCatalog.register('microcontroller/' + d, lambda d=d: get_microcontroller_dicts('Microcontroller Detection/' + d + '_labels.csv', 'Microcontroller Detection/' + d+'/'))
  MetadataCatalog.get('microcontroller/' + d).set(thing_classes=classes)
microcontroller_metadata = MetadataCatalog.get('microcontroller/train')

Training the model

Now that the data is registered it's really simple to fine-tune a pre-trained model to work for your data-set.

You only need to get a model config and model weights from the detectron2 model zoo and then create a DefaultTrainer.

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ('microcontroller/train',)
cfg.DATASETS.TEST = ()   # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_101_FPN_3x/137851257/model_final_f6e8b1.pkl"  # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.MAX_ITER = 1000
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()
Figure 6: Model training

Use model for inference

After training, the model automatically gets saved into a pth file. This file can then be used to load the model and make predictions.

For inference, the DefaultPredictor class will be used instead of the DefaultTrainer.

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5   # set the testing threshold for this model
cfg.DATASETS.TEST = ('microcontroller/test', )
predictor = DefaultPredictor(cfg)

df_test = pd.read_csv('Microcontroller Detection/test_labels.csv')

dataset_dicts = get_microcontroller_dicts('Microcontroller Detection/test_labels.csv', 'Microcontroller Detection/test/')
for d in random.sample(dataset_dicts, 5):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1], metadata=microcontroller_metadata, scale=0.8)
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2_imshow(v.get_image()[:, :, ::-1])
Figure 7: Predictions

Conclusion

Detectron2 is Facebook's new vision library that allows us to easily use and create object detection, instance segmentation, keypoint detection, and panoptic segmentation models. It has a simple, modular design that makes it easy to rewrite a script for another data-set.

Overall I really enjoy working with it because it makes it easy to train models no matter what data format you have. Furthermore, it's also easy to switch from one task to another.

With that said that’s all from this article. If you have any questions or just want to chat with me feel free to contact me on social media or through my contact form. If you want to get continuous updates about my blog make sure to follow me on Twitter and join my newsletter.