Bring AI to the device level with TinyML optimisations

6 mins read

Artificial intelligence (AI) provides device and appliance makers with a host of possibilities.

AI is being used to improve device performance and increase user satisfaction Credit: adobe.stock.com

Manufacturers and service providers are turning to AI to improve device performance and increase user satisfaction.

The nature of the technology makes it an ideal candidate for situations where it is too difficult to program with 'classic' or 'traditional' algorithms. We have seen this transform user experience through voice-controlled appliances. But the applications go much further.

Using AI, appliances can monitor their own behaviour. Washing machines can listen to their motors and use built-in sensors to determine when it makes the most sense to run drum-cleaning programs. Other applications of AI can improve security and reliability. Smart sensors in industrial locations can detect anomalous behaviour and signal to the rest of the network that there may be problems with hardware or improper usage.

The first wave of AI-enhanced embedded systems used processing in the cloud. This made it possible to employ larger models that would be too power-hungry or slow to run on the microcontrollers found in most embedded-systems hardware. Using cloud-based AI solutions presents several challenges to appliance manufacturers. Their use also leads to concerns among end users. There are a variety of reasons for this.

Energy tradeoffs

Although offloading AI processing to the cloud relieves the embedded target of the burden of running the model computations, it leads to higher energy being used by the communications subsystem. In wireless communication, transmission consumes far more energy than receiving data over the same link. The device may need to send large amounts of audio or image data to the cloud over a wireless network. As responses will often be short and simple answer, this is not a good match to the power constraints of a typical battery-powered device.

Latency is another practical concern. The round-trip delay for each request to the cloud will often take more than 100ms, and more if the relevant data centre is located on a different continent. For user-interface AI, this delay may prove unacceptable.

For example, the model may be intended to handle short commands, where the user expects an immediate reaction. Latency will fall if the offloading is to edge servers that are in the same region, but guaranteeing access to this processing is harder to achieve with a large user base.

Figure 1: Latency is a major consideration for determining whether to offload AI to remote computers and will depend heavily on distance to the nearest available service. (Source: Silicon Labs)

Cloud-based models also lead to privacy concerns. Consumers dislike the idea that devices may record their entire conversations and send them to the cloud for transcription, just to satisfy the need to pick out command phrases.

Similarly, industrial users do not want detailed sensor data to be sent and stored in the cloud in order to have a model perform predictive-maintenance assessments. That sensor data will often contain important information on their processes that would prove useful to competitors if they could obtain copies.

For appliance manufacturers, transferring AI responsibilities to the devices helps them with long-term service delivery. A big problem with using the cloud for AI is that the costs of using the shared computers quickly mount up. The variable nature of those costs makes it hard to price products if a rental or service model is not feasible for the target market.

In some cases, delivering connectivity that is reliable enough for constant use is simply impractical. With smart tags on livestock, the animals may be in the range a wireless router, however the network bandwidth is insufficient to create a reliable connection. In these situations, performing processing locally and using the wireless links for uploading results/metadata makes more sense.

Optimise AI for embedded

Although OEMs have in the past turned to cloud-based processing for AI models because of their computational overhead, there are many ways to optimise the technology for execution on microcontroller targets. The key is to understand how best to take advantage of machine learning and AI for each specific application. One example of an application that, with care, can run standalone on an embedded target is keyword, or wakeword, detection and the recognition of simple voice commands.

In this application, the time and energy to recognise that the user has uttered a keyword intended to activate or control the device needs to be as low as possible. This points to the use of on-device AI and has the benefit of avoiding the need to keep sending continuous audio to the cloud and lets waking up the rest of the system to stream only if necessary.

There are immediate opportunities to save power in this always-on system approach. One is to activate the word recognition model only when a signal arrives that conforms to typical speech patterns. There is no need to keep the subsystem running if the only incoming signal is background noise – this is the so-called voice activity detection. Filtering can help detect and isolate relevant speech frequencies from broadband noise.

In many AI applications, it makes sense to pre-process the digitised speech. For example, it can be easier for models to detect word forms by training on joint time-frequency representations, or spectrograms, of the audio rather than filtered time-domain samples. The spectrograms can use far less memory than sample streams. For example, a block of 16bit samples can easily be reduced to 8bit components in spectral form. This need not reduce the accuracy of the trained model. But it reduces both training and inferencing overhead, with benefits also seen in a reduction of the storage needed for the audio data during analysis.

Figure 2: Converting speech samples to a spectral representation can reduce the overall memory footprint of the application and make it easier to process speech signals.

Further preprocessing can simplify the work of the AI model used to classify utterances into recognised commands and other speech. A key issue in this type of application is how best to provide utterances to the classifier. If a preprocessing algorithm divides the speech input into equal-sized segments, the utterances may not be full words or syllables. Important information may wind up being split across two or more segments. Further preprocessing to try to divide stream into full words or syllables will improve accuracy and reduce errors. It can also lead to a smaller and more efficient model.

Figure 3: An example of a speech-processing pipeline using end-to-end AI models.

There are often intriguing trade-offs between the use of conventional signal processing in the preprocessing stage to a second AI model. One signal-processing strategy is to look at audio energy changes and to separate the speech signal into segments where the energy is at a minimum. This type of technique can isolate individual words or syllables.

Experimentation is key

However, it is possible to not just replace the preprocessing DSP with a one-dimensional convolutional neural network but improve overall processing efficiency as well. Careful design of this 1D model makes it possible to run the CNN at an effective rate of 16ksample/s. That is sufficient for speech processing. The model developed in the project used just 28,000 neuron parameters, providing a highly memory-efficient approach.

Optimisation of the classifier model further reduces the number of computations needed for inferencing. The team found the use of depthwise-separable convolution, where the data stream is separated into lower-resolution channels, made it possible to reduce the overall parameter count. That, in turn, reduced the total number of computations by a factor of 10. Further optimisation derives from the use of pipelining techniques. This avoids the need to re-calculate the entire convolutional neural network as new audio samples stream into the network.

Pipelining helps reduce the amount of working memory required as it lets individual layers within the classifier model work sequentially on partial inputs rather than having to wait for an entire frame to be processed by the preceding layer, which will need to be stored in working memory.

Similar attention to detail can unleash the power of on-device AI in a wide range of applications, some of them providing novel ways to use existing sensors. In another project, Silicon Labs is investigating the use of on-device AI to perform more advanced forms of body and motion detection in rooms using scattering information from wireless transceivers, such as those integrated into the EFR32BG24 wireless SoC. The work has looked at how using other forms of classical machine-learning models, such as random forests, might deliver accurate results with low computational and memory overhead compared to convolutional neural networks.

Developers are not on their own in this new frontier of embedded systems development. Silicon Labs is a leading member of the TinyML Foundation, a community of specialists in efficient, on-device AI. The foundation aims to gain greater adoption and usage of TinyML applications on embedded devices and actively shares knowledge to educate users.

The TinyML Foundation’s work, along with that of Silicon Labs, helps users explore new use-cases and challenges that embedded machine-learning techniques can solve. This kind of engagement is key to driving the adoption of on-device AI at a time when manufacturers are keen to switch away from reliance on the cloud. Silicon Labs is there to help.

Author details: Tamas Daranyi, Product Manager, Silicon Labs and Javier Elenes, Distinguished Engineer, Silicon Labs