Latest developments and curiosities in the world of technology
Quantifying the performance of the TPU, our first machine learning chip
Quantifying the performance of the TPU, our first machine learning chip
We've been using compute-intensive machine learning in our products for the past 15 years. We use it so much that we even designed an entirely new class of custom machine learning accelerator, the Tensor Processing Unit. Just how fast is the TPU, actually?
googleblog.com




A bit of a summary after skimming the paper. Academic papers can be kind of a slog.
Most of you are likely familiar with the basics of how neural networks operate. The network is programmed with a set of weights, which it uses to make a decision based on a provided input. For instance, you could feed the network an image and ask it whether it's a cat picture. There is a "learning" phase, during which you feed it a bunch of inputs for which the answer is already known. The network generates a decision based on its current weights, checks against the correct answer, and then updates its weights accordingly. In this way, you "train" the neural network to make better decisions.
There is then an "inference" phase, in which you take the trained neural network with the perfected weights and put it into production. Now it takes in real inputs for which there is no known answer. The weights do not change at this point. This is the phase that TPU is accelerating. The accelerator takes advantage of a few key aspects of the inference process.
The other key finding about their workload is that inference is often latency sensitive and not throughput sensitive. Developers do care about how long it takes to get a response for a single input, not just the raw number of inputs processed per unit time. This kind of makes sense for a web application.
The accelerator design is essentially a systolic array performing matrix multiply. The array has a bunch of 8-bit multiply-accumulators (MAC). Data flows in from the left and weights flow in from the top. The chart on page 3 has more detail.
As you might expect, they get much better performance/watt than even a top-of-the-line Nvidia GPU or Intel CPU.
Great digest, very much appreciated!
Slowly approaching the showdown between organic and inorganic intelligences. Thanks for the writeup @zhemao!