Introduction to TPU’s

July 17, 2020

A Tensor is the primary data structure in TensorFlow, a machine learning framework made by Google. Tensors are N-dimensional data structures, more commonly known as scalars, vectors, and matrices. Google also developed the A Tensor Processing Unit or TPU. A TPU is an AI accelerator Application-Specific Integrated Circuit (ASIC). In the past few years, Google started using computationally expensive deep learning models more and more, so they began to work on building TPU’s internally in 2015 and in 2018, started making them available. The TPU was designed to be used with TensorFlow; hardware built specifically for the optimization of the performance of Artificial Neural Network (ANN)- aided machine learning tasks. Below we can see the block diagram of the TPU from the official documentation.

In essence, TPU’s assist the performance of work done with ANNs by introducing the task- specific chips described below. These help with the computation of matrix operations, transfer of data, and even the calculation of specific activation functions, these logic arrays are not as common and specialized in traditional CPUs or GPU’s. To get the most of this hardware, a specialized instruction set is also included, with dedicated instructions that expedite many of the most common tasks when working with ANNs.

The TPU includes the following computational resources (task specific chips):

  • Matrix Multiplier Unit (MXU): 65,536 8-bit multiply-and-add units for matrix operations.
  • Unified Buffer (UB): 24MB of SRAM that works as registers
  • Activation Unit (AU): Hardwired activation functions.

TPU operations(instructions):

TPU Instruction Function
Read_Host_Memory Read data from memory
Read_Weights Read weights from memory
MatrixMultiply/Convolve Multiply or convolve with the data and weights, accumulate the results
Activate Apply activation functions
Write_Host_Memory Write result to memory

As previously stated, this instruction set focuses on optimizing the performance of the majority of mathematical operations required for ANN inference, such as to quickly execute a matrix multiplication between input data and weights, and then apply an activation function to the results.

Advantages of a TPU: 

Simply put the advantages are:

  1. Accelerates the performance of linear algebra computation
  2. Minimizes the time-to-accuracy
  3. Helps models converge in hours, rather than days

When to use a TPU:

You could use a TPU if you have:

  • Models dominated by matrix computations
  • Models with no custom TensorFlow operations inside the main training loop
  • Models that train for weeks or months
  • Large models with very large effective batch sizes

According to the documentation, TPUs are mainly designed to perform fast, bulky matrix multiplication. Therefore, Cloud TPUs are likely to be outperformed by other platforms in workloads that are not dominated by matrix multiplication, which is to be expected of such specialized hardware and a main consideration into why traditional computation hardware is not yet to be completely replaced by TPUs.

Currently, Google’s TPUs are available in a small variety of ways, namely cloud and edge computing. Cloud TPUs are available for use as a service on the Google Cloud Platform Infrastructure. In 2019 the Edge TPU, which is a smaller version of the chip, was also made available. The Edge TPU is a purpose-built ASIC chip designed to run machine learning models for edge computing. It is much smaller and consumes less power when compared to the Cloud TPUs. Google also created a product line called Coral. The first products launched from Coral are the Dev Board, Camera, and the USB accelerator, all of which can be purchased from the respective landing page for the Google Cloud TPU services or Google Coral products.

Using Edge TPU

At Wovenware, we have had the opportunity to work hands-on with the Coral products that have been built with Edge TPUs. I specifically worked with the USB accelerator designed to be a plug and go Edge TPU processor. The Coral USB Accelerator’s dimensions are 2.6″ high x 1.2″ wide x 0.31″ deep  (65mm x 30mm x 8mm) all in all about 3 inches square. This makes it an impressively compact and portable device that fits in your pocket, especially since it contains an Edge TPU.

Initially, the Coral USB Accelerator worked only with Linux Debian derivatives, such as Raspbian and Ubuntu. Very recently Google Coral announced that it is now available for use with Mac and Windows. The USB accelerator can easily be used with various operating systems, making it an ideal prototyping tool for data scientists. I work with Ubuntu daily, so for convenience I followed the installation process for Linux. For the installation, Python version 3.5 or above is required. The installation of the software needed to use the device was straightforward and easy, I just had to install the Edge TPU Runtime to communicate with the Edge TPU and the TensorFlow Lite library.

From there, the documentation offers various demos for Image Classification, Object Detection, and Transfer Learning, that make it fairly easy to get started.  I experimented with the Image Classification demos, but for the sake of brevity, we will only focus on the Transfer Learning ones. Specifically, I will talk about how I did a re-training of a classification model utilizing a procedure known as weight imprinting. Weight imprinting is a technique for re-training classification models using a small set of sample data, based on the method described in the paper Low-Shot Learning with Imprinted Weights. It’s designed to update the weights for only the last layer of the model, but in a way that can retain existing classes while adding new ones. The Edge TPU Runtime includes an Imprinting Engine API, allowing transfer learning through weight imprinting using the Edge TPU. I chose to work with this strategy, because the dataset I had available at the time did not have many images, approximately 300 to 500 images per class. One of the benefits of this method is that very few samples are required. Something that can be considered as a limitation is that specific model architecture requirements need to be met, but fortunately, they have access to a compatible model (MobileNet V1). Using the Imprinting Engine API was very straightforward. I first needed to create an instance for the Imprinting Engine by specifying a compatible pre-trained TensorFlow Lite Model and then determine whether to keep the existing classes in the model or drop them and use the new classes to be added. After that, I needed to use a method called TrainAll, to pass the images in a list to the model. I had some issues at first with obtaining the specially formatted list of images, but after fixing it, I was able to create a list containing each image as a 1-D array and train. Below, we can see an example code expressing the process that was just explained.

The training process was indeed almost real-time, and I was surprised how fast it was .Once the training is done, you can proceed to do inference. Incredibly, the inference was done in milliseconds, below we can see an example output:

TPUs are helping us increase the speed of machine learning-powered services. They can help us perform both faster training and inference, and AI experiments that are built with TensorFlow can be deployed quickly. I found it to be very intuitive to use and it was significantly helpful that Google provided unambiguous documentation. The USB accelerator is very convenient because it’s flexible and allows data scientists to work fast and efficiently prototype their experiments. I believe TPUs have truly advanced the field and will pave the way for the future of Artificial Intelligence.

 

Leave a Reply

  • (will not be published)