With a form factor of 65x30mm and a price of $75 the USB Accelerator is aimed to allow to run big deep learning models on edge devices like the Raspberry Pi.
The Dev Board comes in at $150 and allows us to perform machine learning and deep learning experiments on a stand-alone small form factor device.
When I first took a look at the Coral USB Accelerator in May I was impressed at the performance you could get with it but at the time it was quite hard to use your own models with it because of some restrictions when creating the model combined with the fact that you needed to use quantization aware training, which wasn't that easy to implement with high-level APIs like Keras.
Furthermore to make use of the computing power of the EdgeTPU you needed to use the EdgeTPU API, which meant that you couldn't run the code when no EdgeTPU was connected.
Over the last few months, Google brought lots of updates to make Google Coral more accessible and easier to use. In this article, I want to show you the most important once.
The first update came only a few days after I released my first article and included support for an offline compiler. This compiler allows you to make your models EdgeTPU compatible using the command line instead of needing to upload them to a online compiler.
The compiler can be installed on Linux systems (Debian 6.0 or higher) with the following commands:
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list sudo apt-get update sudo apt-get install edgetpu
Post-Training Quantization Support
In the first couple of months, you needed to use quantization aware training, which could be quite hard to implement when working with Keras. This is why I was hyped when hearing that full integer post-training quantization support was added to the EdgeTPU compiler.
This change in combination with the release of Tensorflow 2.0 cut the lines of code I needed to create a EdgeTPU compatible CNN in half.
Tensorflow 1.x with quantization aware training:
# From: https://colab.research.google.com/gist/ohtaman/c1cf119c463fd94b0da50feea320ba1e/edgetpu-with-keras.ipynb#scrollTo=jJuqJna_vgna import tensorflow as tf from tensorflow import keras from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense import numpy as np import matplotlib.pyplot as plt import os # load dataset (train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data() img_rows, img_cols = 28, 28 train_labels = keras.utils.to_categorical(train_labels, num_classes=10) test_labels = keras.utils.to_categorical(test_labels, num_classes=10) train_images = train_images.reshape(train_images.shape, img_rows, img_cols, 1) test_images = test_images.reshape(test_images.shape, img_rows, img_cols, 1) # create model def build_keras_model(): model = Sequential() # create sequential model # create some conv, maxpoll, dropout blocks model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu', input_shape=(28, 28, 1))) model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(rate=0.25)) model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu')) model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(rate=0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(rate=0.5)) model.add(Dense(10, activation='softmax')) # train train_graph = tf.Graph() train_sess = tf.Session(graph=train_graph) keras.backend.set_session(train_sess) with train_graph.as_default(): train_model = build_keras_model() tf.contrib.quantize.create_training_graph(input_graph=train_graph, quant_delay=100) train_sess.run(tf.global_variables_initializer()) train_model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) train_model.fit(train_images, train_labels, epochs=5) # save graph and checkpoints saver = tf.train.Saver() saver.save(train_sess, 'checkpoints') with train_graph.as_default(): print('sample result of original model') print(train_model.predict(test_images[:1])) # Freeze model and save it # eval eval_graph = tf.Graph() eval_sess = tf.Session(graph=eval_graph) keras.backend.set_session(eval_sess) with eval_graph.as_default(): keras.backend.set_learning_phase(0) eval_model = build_keras_model() tf.contrib.quantize.create_eval_graph(input_graph=eval_graph) eval_graph_def = eval_graph.as_graph_def() saver = tf.train.Saver() saver.restore(eval_sess, 'checkpoints') frozen_graph_def = tf.graph_util.convert_variables_to_constants( eval_sess, eval_graph_def, [eval_model.output.op.name] ) with open('frozen_model.pb', 'wb') as f: f.write(frozen_graph_def.SerializeToString()) # Generate tflite file os.system('tflite_convert \ --output_file=model.tflite \ --graph_def_file=frozen_model.pb \ --inference_type=QUANTIZED_UINT8 \ --input_arrays=flatten_input \ --output_arrays=dense_1/Softmax \ --mean_values=0 \ --std_dev_values=255') # Check generated tflite file # load TFLite file interpreter = tf.lite.Interpreter(model_path=f'model.tflite') # Allocate memory. interpreter.allocate_tensors() # get some informations . input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() print(input_details) print(output_details) def quantize(detail, data): shape = detail['shape'] dtype = detail['dtype'] a, b = detail['quantization'] return (data/a + b).astype(dtype).reshape(shape) def dequantize(detail, data): a, b = detail['quantization'] return (data - b)*a quantized_input = quantize(input_details, test_images[:1]) interpreter.set_tensor(input_details['index'], quantized_input) interpreter.invoke() # The results are stored on 'index' of output_details quantized_output = interpreter.get_tensor(output_details['index']) print('sample result of quantized model') print(dequantize(output_details, quantized_output)) # Compile the tflite file using EdgeTPU Compiler os.system("edgetpu_compiler \'model.tflite\'")
Tensorflow 2.0 with full integer post-training quantization:
import tensorflow as tf from tensorflow import keras from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense from tensorflow.compat.v2 import lite import os # load dataset (x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 y_train = keras.utils.to_categorical(y_train, num_classes=10) y_test = keras.utils.to_categorical(y_test, num_classes=10) # Use subset to reduce training time (for testing only) x_train = x_train[:100] y_train = y_train[:100] # create model def build_keras_model(): model = Sequential() # create sequential model # create some conv, maxpoll, dropout blocks model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu', input_shape=(32, 32, 3))) model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(rate=0.25)) model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu')) model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(rate=0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(rate=0.5)) model.add(Dense(10, activation='softmax')) model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) return model model = build_keras_model() model.fit(x_train, y_train, epochs=1) train_ds = tf.data.Dataset.from_tensor_slices( (tf.cast(x_train, tf.float32))).batch(1) def representative_data_gen(): for input_value in train_ds.take(100): yield [input_value] print(model.layers.input_shape) converter = lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [lite.Optimize.DEFAULT] converter.representative_dataset = representative_data_gen tflite_model = converter.convert() file = open('model_quant.tflite', 'wb') file.write(tflite_model) # Compile the tflite file using EdgeTPU Compiler os.system("edgetpu_compiler \'model_quant.tflite\'")
For more information about how to build your own model for use on the Edge TPU be sure to check out "TensorFlow models on the Edge TPU".
Updated Edge TPU Python library
With version 2.11.1 of the EdgeTPU Python API, the Coral team added a new on-device backpropagation API, the SoftmaxRegression API, and they completely rebuild their old transfer learning API, the ImprintingEngine API.
The SoftmaxRegression API can be used to perform transfer learning on the last layer of an image classification model, allowing us to quickly tune a model for a specific data-set.
For more information check out "Retrain a classification model on-device with backpropagation".
Updated ImprintingEngine API
The ImprintingEngine API serves a different purpose than the SoftmaxRegression API. Rather than changing the complete last layer Weight Imprinting allows you to add and remove certain classes from the output layer.
For more information check out the "Low-Shot Learning with Imprinted Weights" paper as well as "Retrain a classification model on-device with weight imprinting".
New TensorFlow Lite delegate for Edge TPU
The TensorFlow Lite Delegate API offers Tensorflow Lite a way to delegate part of the graph execution to another executor – in this case, the Edge TPU.
This update allows us to write code that runs no matter if an Edge TPU is or isn't connected.
Install Edge TPU runtime
To use the TensorFlow Lite with the Edge TPU we need to install the new Edge TPU runtime.
Add Debian package repository to your system:
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - sudo apt-get update
Install the Edge TPU runtime:
sudo apt-get install libedgetpu1-std
Optionally you can increase the performance further by using the runtime version that operates at the maximum clock frequency (2x the default).
sudo apt-get install libedgetpu1-max
Install the Tensorflow Lite library
There are multiple ways to use Tensorflow Lite. Either you can install the full Tensorflow package or you can install the tflite_runtime package, a minimal package only containing the things needed to run a model.
To install the tflite_runtime package navigate to TensorFlow Lite Python quickstart page and download the right version for your system.
After downloading you can install the runtime by using pip install:
pip3 install tflite_runtime-XXX.whl
Run a model using the TensorFlow Lite API
Google Coral created lots of examples showing you how to use the TensorFlow Lite API including their tflite repository showing you how to perform classification and object detection using TFLite.
From prototype to production
On October 22nd Google announced that Coral is moving out of beta. With this announcement, they introduced new products specifically designed for production.
With Coral Google built a really interesting platform for local AI. Their products have a fair price, they can be shipped to most countries and they are easy to use. I really like the updates Google made over the last months especially the addition of the TensorFlow Lite Delegate API and support for full integer post-training quantization which really make a big difference when using Coral hardware.
I look forward to further additions to the Coral family and I will try to provide further resources about how to use Coral tools in the near future.
With that said that’s all from this article. If you have any questions or just want to chat with me feel free to contact me on social media or through my contact form. If you want to get continuous updates about my blog make sure to follow me on Twitter and join my newsletter.