YOLO Object Detection in PyTorch

This article is the last of a four-part series on object detection with YOLO.

This article will show you how to use YOLO in PyTorch.

Installation

yolov3 can be installed by cloning the repository and installing the dependencies located inside the requirements.txt file.

git clone https://github.com/ultralytics/yolov3
cd yolov3
pip install -U -r requirements.txt

Detection Using A Pre-Trained Model

You can run an object detection model using the detect.py file. In addition, you can find a list of all the arguments you can parse to detect.py by specifying the --help flag.

usage: detect.py [-h] [--weights WEIGHTS [WEIGHTS ...]] [--source SOURCE]
                 [--img-size IMG_SIZE] [--conf-thres CONF_THRES]
                 [--iou-thres IOU_THRES] [--device DEVICE] [--view-img]
                 [--save-txt] [--save-conf] [--classes CLASSES [CLASSES ...]]
                 [--agnostic-nms] [--augment] [--update] [--project PROJECT]
                 [--name NAME] [--exist-ok]

optional arguments:
  -h, --help            show this help message and exit
  --weights WEIGHTS [WEIGHTS ...]
                        model.pt path(s)
  --source SOURCE       source
  --img-size IMG_SIZE   inference size (pixels)
  --conf-thres CONF_THRES
                        object confidence threshold
  --iou-thres IOU_THRES
                        IOU threshold for NMS
  --device DEVICE       cuda device, i.e. 0 or 0,1,2,3 or cpu
  --view-img            display results
  --save-txt            save results to *.txt
  --save-conf           save confidences in --save-txt labels
  --classes CLASSES [CLASSES ...]
                        filter by class: --class 0, or --class 0 2 3
  --agnostic-nms        class-agnostic NMS
  --augment             augmented inference
  --update              update all models
  --project PROJECT     save results to project/name
  --name NAME           save results to project/name
  --exist-ok            existing project/name ok, do not increment

The source could be an image, video, directory of images, webcam, or an image stream.

  • Image: --source file.jpg
  • Video: --source file.mp4
  • Directory: --source dir/
  • Webcam: --source 0
  • RTSP stream: --source rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa
  • HTTP stream: --source http://wmccpinetop.axiscam.net/mjpg/video.mjpg

If you don't specify a source, it uses the data/images folder. The results will automatically be saved inside the runs/detect folder.

python detect.py --weights yolov3.pt --img 640 --conf 0.25 --source data/images/
Figure 1: Detection Example
python3 detect.py --weights yolov3.pt --source TownCentreXVID.avi
Figure 2: Pedestrian Detection

Train on custom data

1. Create annotations

After collecting your images, you'll have to annotate them. For YOLO, each image should have a corresponding .txt file with a line for each ground truth object in the image that looks like:

<object-class> <x> <y> <width> <height>

The .txt file should have the same name as the image. All images should be located inside a folder called images, and all labels should be located inside the labels folder.

You can get such labels using an annotation tool like labelImg, which supports Pascal VOC and YOLO (ensure you have selected YOLO).

Figure 3: LabelImg

If you have a dataset with PASCAL VOC labels, you can convert them using the convert_voc_to_yolo.py script. Before you execute the file, you'll have to change the classes list to fit your dataset. After that, you can run the script:

python convert_voc_to_yolo.py

2. Create a dataset.yaml file

The dataset.yaml file defines 1) an optional download command/URL for auto-downloading, 2) a path to a directory of training images (or path to a *.txt file with a list of training images), 3) the same for our validation images, 4) the number of classes, 5) a list of class names:

microcontroller-detection.yml:

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: microcontroller-detection/train.txt
val: microcontroller-detection/train.txt

# number of classes
nc: 4

# class names
names: ['Arduino_Nano', 'Heltec_ESP32_Lora', 'ESP8266', 'Raspberry_Pi_3']

3. Start Training

To train the model, pass your yml file to the train.py script. You can also pass additional arguments like the image size, batch size, and epoch count. If you want to start from a pretrained model (recommended), you also need to specify the --weights argument (Pretrained weights are auto-downloaded from the latest YOLOv3 release). If you want to train from scratch (starting with random weights), you can use --weights '' --cfg yolov3.yaml.

python train.py --img 640 --batch 16 --epochs 300 --data microcontroller-detection.yml --weights yolov3.pt
Figure 4: Training output

All results, by default, are logged to runs/train, with a new experiment directory created for each new training as runs/train/exp2, runs/train/exp3, etc.

Figure 5: Test Batch 0 Predictions

Training losses and performance metrics are also logged to Tensorboard and a custom results.txt logfile.

Figure 6: Performance metrics

4. Make predictions with the trained model

After the training has finished, the best and latest model weights are saved. Then, they can be used to make predictions on custom images using the detect.py script.

python3 detect.py --weights runs/train/exp/weights/best.pt --img 640 --conf 0.25 --source <path to image>
Figure 7: Prediction example