Bringing AI to the edge

5 mins read

Artificial intelligence (AI) has kindled the imagination of computer scientists for decades and while the ambition and enthusiasm around AI has tended to clash with the complexity of the task, today’s computational power has risen exponentially and the ambition of general AI has been curbed sufficiently to match that power.

Computers now learn to recognise patterns in huge, seemingly random datasets - machine learning (ML) and it is no mean feat.

Supervised or unsupervised? Supervised ML is the technique that has been reaping the most successes so far. An ML system is presented with a large dataset and a task.

In the learning phase, the system will be confronted with input, make predictions and then get feedback through the labels that were pre-attached to the input – correct or not. If its prediction is false, the ML system will tune its parameters (weights) and make a new prediction. This is done repeatedly, until the parameters are fine-tuned to make more accurate predictions. After the learning phase, the system is ready to mine huge data streams on the lookout for meaningful patterns, a process referred to as inference.

What makes these systems successful is that they require relatively little human effort and pre-processing to fine-tune. With feedback from pre-labelled input, they learn the parameters needed, forming a filter that had to be hand-coded previously. But ML systems are very power hungry, especially in the learning phase. Even in the inference phase, where the data pass only once through the system, potentially millions of weights have to be taken into account and billions of calculations made. That puts even inference out of reach of where it could be most useful: the fingers and toes of the IoT where the data are sensed and gathered.

To make inference possible at the edge, imec has been developing hardware solutions that drastically cut that energy usage of inference down to the level that fits into autonomous, wireless sensors.

Unsupervised ML requires no human intervention and training. It’s the holy grail of ML, and it would allow applications that get customised for specific uses to make decisions on the spot, and not with pre-learned parameters. The techniques and algorithms used are loosely inspired on how the human brain learns and functions. But even more than with supervised ML, energy consumption is an issue. Not so much in the cloud, but certainly where customisation is most useful – at the edge, on the sensors.

Specialised hardware

As an added challenge, learning and inference cannot be separated.

Customisation is learning, and it has to be included on the sensors. So even more than with supervised ML this will call for specialised hardware. An interesting use case that imec researchers are working on, is wearable health technology, where each sensor customises itself to the person who is wearing it.

The dominant hardware platforms involved in supervised machine learning have top-of-line GPUs, consuming up to 200W. Some systems use FPGAs which are on average a tad more power efficient, but which also have a corresponding lower performance. Top of the line in the performance/energy trade-off are a number of ASICs, processors specifically built for deep learning. But even these will still use between 50 and 100W.

No wonder that both machine learning and inference are now done centrally, in the cloud. It’s simply not feasible to run a 100W dissipating chip in a mobile phone, let alone in IoT sensors.

However, the IoT sensors are where most of the future data will be captured. In most cases, technical or energy constraints make it impossible to stream all that data to the cloud where the AI resides. In addition, there are also use cases where patterns should be recognised instantaneously, such as with radars that need to detect people or vehicles. There, the time delay of a round-trip to the cloud is simply prohibitive.

So there is a great need to bring machine learning to the edge of the IoT. For supervised learning, that doesn’t have to include the learning phase; the parameters can still be learned in the cloud. But surely inference, the smart pattern recognition, should be brought to the sensors.

But what are the energy budgets available at those nodes? Applications in vehicles, e.g., can deploy chips that use a maximum of between 10 and 30W. But if you go to the mobile space, you’d have to do inference with 1W. And in IoT, the available budget may even be below 10mW, even going towards 1mW.

Many AI inference systems fetch, over and over again in successive layers, data and weights from memory. Each layer performs multiplications and additions – convolutions – and stores the output. The most important priority in designing any low-energy AI chip is therefore minimising both the amount of data that needs to be moved, and the distance it’s moved.

AI systems tend to work with 32bit floating-point arithmetic. Minimising the amount of data could be done by lowering that precision, e.g. to 8-bit arithmetic. It has been proven that for inference, this can be done with hardly any loss of accuracy. So many 8-bit implementations have been made, but they don’t yet bring inference in the energy range of edge computing. A more extreme measure is to bring the precision down to 1-bit, resulting in a so-called binary CNN (convolutional neural network). Unsurprisingly, there is an added accuracy loss compared to 8-bit implementation, but it remains useful for many practical applications.

A second measure is creating an architecture that lowers the energy needed to fetch and store millions of weights and input values over and over again. One solution is to store the learned weights in memory and keep them there, doing inference using a form of analogue in-memory computation. The heart of such an AI processor are thus memory arrays that permanently store the values of the learned weights using analogue non-volatile devices, e.g. resistive RAM technology.

Each such array represents one layer of the neural network. And in the array, the learned weights are encoded in the individual device conductances.

So how are we then to multiply and add these weights with the input value? By setting the input values as the word line voltages of the ReRAM arrays. Each cell’s current will then be the multiplication of the weight and the input value (Ohm’s law). And the word line’s current will be the summation of the cell currents in that line (Kirchhoff's law). That way, we can effectively implement convolutions without having to fetch and move the weights over and over again.

There are challenges to this approach. The variability of the memory chip, for example, will limit the precision with which the weights can be coded. This will especially be an issue with 8-bit precision, but not so much for binary solutions, where STT-MRAM is well suited. There is also the added complexity of integrating an analogue memory in a digital system, requiring e.g. digital to analogue conversions. But the bandwidth gains by not having to move around data far outweigh this added complexity.

A pipeline of AI solutions

Scientists at imec are working on a pipeline of solutions that will be demoed in the coming months and years.

Hardware with non-volatile analogue memories will allow implementing neural network convolutions that use minimal energy, down to the order of milliwatts. They will bring inference to the edge of the IoT, be it with binary or later maybe with a higher precision. This will allow doing smart pattern matching, mining wisdom from huge amounts of sensed data, making the IoT a lot smarter.

The next frontier is hardware for unsupervised machine learning, hardware that allows for sensors without learned parameters, sensors that can adapt on the fly to individual people and situations. Wearable health sensors for example that really and intimately know their wearers.

These will make the IoT smarter, but also allow for a more individual experience.

Author details: Diederik Verkest is responsible for imec’s INSITE en Machine Learning programs