Google Coral Edge TPUs out of beta - Overview of all the changes

Google Coral Edge TPUs out of beta - Overview of all the changes

In March Google unveiled Google Coral, their platform for local AI. At launch, Google Coral had two products the Google Coral USB Accelerator and the Google Coral Dev Board.

Initial Coral products (Source)
Figure 1: Initial products (Source)

With a form factor of 65x30mm and a price of $75 the USB Accelerator is aimed to allow to run big deep learning models on edge devices like the Raspberry Pi.

The Dev Board comes in at $150 and allows us to perform machine learning and deep learning experiments on a stand-alone small form factor device.

When I first took a look at the Coral USB Accelerator in May I was impressed at the performance you could get with it but at the time it was quite hard to use your own models with it because of some restrictions when creating the model combined with the fact that you needed to use quantization aware training, which wasn't that easy to implement with high-level APIs like Keras.

Furthermore to make use of the computing power of the EdgeTPU you needed to use the EdgeTPU API, which meant that you couldn't run the code when no EdgeTPU was connected.

Over the last few months, Google brought lots of updates to make Google Coral more accessible and easier to use. In this article, I want to show you the most important once.

Offline compiler

The first update came only a few days after I released my first article and included support for an offline compiler. This compiler allows you to make your models EdgeTPU compatible using the command line instead of needing to upload them to a online compiler.

The compiler can be installed on Linux systems (Debian 6.0 or higher) with the following commands:

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

sudo apt-get update

sudo apt-get install edgetpu

Post-Training Quantization Support

In the first couple of months, you needed to use quantization aware training, which could be quite hard to implement when working with Keras. This is why I was hyped when hearing that full integer post-training quantization support was added to the EdgeTPU compiler.

This change in combination with the release of Tensorflow 2.0 cut the lines of code I needed to create a EdgeTPU compatible CNN in half.

Tensorflow 1.x with quantization aware training:

# From: https://colab.research.google.com/gist/ohtaman/c1cf119c463fd94b0da50feea320ba1e/edgetpu-with-keras.ipynb#scrollTo=jJuqJna_vgna

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense

import numpy as np
import matplotlib.pyplot as plt
import os

# load dataset
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
img_rows, img_cols = 28, 28
train_labels = keras.utils.to_categorical(train_labels, num_classes=10)
test_labels = keras.utils.to_categorical(test_labels, num_classes=10)

train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, 1)
test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, 1)


# create model
def build_keras_model():
    model = Sequential()  # create sequential model

    # create some conv, maxpoll, dropout blocks
    model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1)))
    model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(rate=0.25))

    model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(rate=0.25))

    model.add(Flatten())
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(rate=0.5))
    model.add(Dense(10, activation='softmax'))


# train
train_graph = tf.Graph()
train_sess = tf.Session(graph=train_graph)

keras.backend.set_session(train_sess)
with train_graph.as_default():
    train_model = build_keras_model()
    
    tf.contrib.quantize.create_training_graph(input_graph=train_graph, quant_delay=100)
    train_sess.run(tf.global_variables_initializer())    

    train_model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    train_model.fit(train_images, train_labels, epochs=5)
    
    # save graph and checkpoints
    saver = tf.train.Saver()
    saver.save(train_sess, 'checkpoints')

with train_graph.as_default():
    print('sample result of original model')
    print(train_model.predict(test_images[:1]))

# Freeze model and save it
# eval
eval_graph = tf.Graph()
eval_sess = tf.Session(graph=eval_graph)

keras.backend.set_session(eval_sess)

with eval_graph.as_default():
    keras.backend.set_learning_phase(0)
    eval_model = build_keras_model()
    tf.contrib.quantize.create_eval_graph(input_graph=eval_graph)
    eval_graph_def = eval_graph.as_graph_def()
    saver = tf.train.Saver()
    saver.restore(eval_sess, 'checkpoints')

    frozen_graph_def = tf.graph_util.convert_variables_to_constants(
        eval_sess,
        eval_graph_def,
        [eval_model.output.op.name]
    )

    with open('frozen_model.pb', 'wb') as f:
        f.write(frozen_graph_def.SerializeToString())

# Generate tflite file
os.system('tflite_convert \
    --output_file=model.tflite \
    --graph_def_file=frozen_model.pb \
    --inference_type=QUANTIZED_UINT8 \
    --input_arrays=flatten_input \
    --output_arrays=dense_1/Softmax \
    --mean_values=0 \
    --std_dev_values=255')

# Check generated tflite file
# load TFLite file
interpreter = tf.lite.Interpreter(model_path=f'model.tflite')
# Allocate memory. 
interpreter.allocate_tensors()

# get some informations .
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(input_details)
print(output_details)

def quantize(detail, data):
    shape = detail['shape']
    dtype = detail['dtype']
    a, b = detail['quantization']
    
    return (data/a + b).astype(dtype).reshape(shape)


def dequantize(detail, data):
    a, b = detail['quantization']
    
    return (data - b)*a

quantized_input = quantize(input_details[0], test_images[:1])
interpreter.set_tensor(input_details[0]['index'], quantized_input)

interpreter.invoke()

# The results are stored on 'index' of output_details
quantized_output = interpreter.get_tensor(output_details[0]['index'])

print('sample result of quantized model')
print(dequantize(output_details[0], quantized_output))

# Compile the tflite file using EdgeTPU Compiler
os.system("edgetpu_compiler \'model.tflite\'")

Tensorflow 2.0 with full integer post-training quantization:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense
from tensorflow.compat.v2 import lite
import os

# load dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

y_train = keras.utils.to_categorical(y_train, num_classes=10)
y_test = keras.utils.to_categorical(y_test, num_classes=10)

# Use subset to reduce training time (for testing only)
x_train = x_train[:100]
y_train = y_train[:100]

# create model
def build_keras_model():
    model = Sequential()  # create sequential model

    # create some conv, maxpoll, dropout blocks
    model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu', input_shape=(32, 32, 3)))
    model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(rate=0.25))

    model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
    model.add(MaxPool2D(pool_size=(2, 2)))
    model.add(Dropout(rate=0.25))

    model.add(Flatten())
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(rate=0.5))
    model.add(Dense(10, activation='softmax'))

    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    return model


model = build_keras_model()
model.fit(x_train, y_train, epochs=1)

train_ds = tf.data.Dataset.from_tensor_slices(
    (tf.cast(x_train, tf.float32))).batch(1)


def representative_data_gen():
    for input_value in train_ds.take(100):
        yield [input_value]


print(model.layers[0].input_shape)
converter = lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()


file = open('model_quant.tflite', 'wb')
file.write(tflite_model)

# Compile the tflite file using EdgeTPU Compiler
os.system("edgetpu_compiler \'model_quant.tflite\'")

For more information about how to build your own model for use on the Edge TPU be sure to check out "TensorFlow models on the Edge TPU".

Updated Edge TPU Python library

With version 2.11.1 of the EdgeTPU Python API, the Coral team added a new on-device backpropagation API, the SoftmaxRegression API, and they completely rebuild their old transfer learning API, the ImprintingEngine API.

SoftmaxRegression API

The SoftmaxRegression API can be used to perform transfer learning on the last layer of an image classification model, allowing us to quickly tune a model for a specific data-set.

For more information check out "Retrain a classification model on-device with backpropagation".

Updated ImprintingEngine API

The ImprintingEngine API serves a different purpose than the SoftmaxRegression API. Rather than changing the complete last layer Weight Imprinting allows you to add and remove certain classes from the output layer.

For more information check out the "Low-Shot Learning with Imprinted Weights" paper as well as "Retrain a classification model on-device with weight imprinting".

New TensorFlow Lite delegate for Edge TPU

Until July, accelerating your model with the Edge TPU required you writing code using the Edge TPU API. But now you can use the Edge TPU with Tensorflow Lite through the TensorFlow Lite Delegate API.

The TensorFlow Lite Delegate API offers Tensorflow Lite a way to delegate part of the graph execution to another executor – in this case, the Edge TPU.

This update allows us to write code that runs no matter if an Edge TPU is or isn't connected.

Install Edge TPU runtime

To use the TensorFlow Lite with the Edge TPU we need to install the new Edge TPU runtime.

Add Debian package repository to your system:

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update

Install the Edge TPU runtime:

sudo apt-get install libedgetpu1-std

Optionally you can increase the performance further by using the runtime version that operates at the maximum clock frequency (2x the default).

sudo apt-get install libedgetpu1-max

Install the Tensorflow Lite library

There are multiple ways to use Tensorflow Lite. Either you can install the full Tensorflow package or you can install the tflite_runtime package, a minimal package only containing the things needed to run a model.

To install the tflite_runtime package navigate to TensorFlow Lite Python quickstart page and download the right version for your system.

TFLite Runtime Versions
Figure 2: TFLite Runtime Versions

After downloading you can install the runtime by using pip install:

pip3 install tflite_runtime-XXX.whl

Run a model using the TensorFlow Lite API

Google Coral created lots of examples showing you how to use the TensorFlow Lite API including their tflite repository showing you how to perform classification and object detection using TFLite.

From prototype to production

On October 22nd Google announced that Coral is moving out of beta. With this announcement, they introduced new products specifically designed for production.

New Edge Devices
Figure 3: New Edge Devices

These include a stand-alone version of System-on-Module (SoM) which can also be found on top of the Dev Board, as well as accelerators boards for Mini PCIe, M.2 A+E key, and M.2 B+M key.

At the moment the products are only available for sale at Mouser. For more information check out Google's official blog post.

Conclusion

With Coral Google built a really interesting platform for local AI. Their products have a fair price, they can be shipped to most countries and they are easy to use. I really like the updates Google made over the last months especially the addition of the TensorFlow Lite Delegate API and support for full integer post-training quantization which really make a big difference when using Coral hardware.

I look forward to further additions to the Coral family and I will try to provide further resources about how to use Coral tools in the near future.

With that said that’s all from this article. If you have any questions or just want to chat with me feel free to contact me on social media or through my contact form. If you want to get continuous updates about my blog make sure to follow me on Twitter and join my newsletter.