Low power chip improves voice controlled electronics

1 min read

A low power chip specialised for automatic speech recognition has been built by MIT researchers. Whereas a cell phone running speech-recognition software might require 1W of power, the new chip is said to require between 0.2 and 10mW, depending on the number of words it has to recognise.

The researchers claim this translates to a power savings of 90 to 99%, which could make voice control practical for simple electronic devices, including those that have to harvest energy from their environment or go months between battery charges.

"The miniaturisation of wearable applications and intelligent devices will require a different interface than touch or keyboard,” says Professor Anantha Chandrakasan. “It will be critical to embed the speech functionality locally to save system energy consumption compared to performing this operation in the cloud."

The chip includes a ‘voice activity detection’ circuit that monitors ambient noise to determine whether it might be speech. If the answer is yes, the chip fires up the larger, more complex speech-recognition circuit.

The researchers experimented with three different voice-activity-detection circuits, with different degrees of complexity. Even though the most complex of the three circuits consumed almost three times as much power as the simplest, it generated fewer false positives, which led to greater power savings.

According to the researchers, a voice-recognition network is too big to fit in a chip's memory, which is a problem because going off-chip for data is much more energy intensive. So the team's design concentrates on minimising the amount of data that the chip has to retrieve from off-chip memory.

The first step in minimising the memory bandwidth is to compress the data associated with each node. The data are decompressed only after they're brought on chip.

The incoming audio signal is split up into 10ms increments, each of which is evaluated separately. The chip brings in a single node of the neural network at a time, but it passes the data from 32 consecutive 10ms increments through it.

If a node has a dozen outputs, then the 32 passes result in 384 output values, which the chip stores locally. Each of those is coupled with 11 other values when fed to the next layer of nodes, and so on. So the chip ends up requiring a sizeable on board memory circuit for its intermediate computations. But it fetches only one compressed node from off-chip memory at a time, keeping its power requirements low.