TinyML: Putting AI on IoT chips is a question of memory

MCUNetV2 allows a low-memory device to run object recognition algorithms. (Photo courtesy of Song Han/MIT)

The internet of things is beginning to take shape. From our smart fridges and thermostats to our virtual assistants and the tiny, glinting cameras keeping watch over our doorstop, the fabric of our homes and vehicles is being interwoven with AI-powered sensors. Unfortunately, though, their reliability is contingent on the strength of one thread: the connection between the sensor and the cloud.

After all, such IoT products lack the on-device memory to accomplish much on their own. Often little more than a sensor and a microprocessing unit (MCU) equipped with a smidgeon of memory, these devices typically outsource most of their processing to cloud facilities. As a result, data has to be transmitted between IoT devices and dedicated server racks, draining power and performance while pooling customer information in costly, distant data centres vulnerable to hacking, outages and other minor disasters.

TinyML: AI in miniature

Researchers like Song Han, meanwhile, have taken a different approach. Together with a dedicated team at his lab at the Massachusetts Institute of Technology (MIT), Han has devoted his career to boosting the efficiency of MCUs with the goal of severing the connection between IoT sensors and their cloud motherships altogether. By placing deep learning algorithms in the devices themselves, he explains, “we can preserve privacy, reduce cost, reduce latency, and make [the device] more reliable for households.”

So far, this field of miniature AI, known as tinyML, has yet to take off. “The key difficulty is memory constraint,” says Han. “A GPU easily has 32 GB of memory, and a mobile phone has 4 GB. But a tiny microcontroller has only 256 to 512 kilobytes of readable and writable memory. This is four orders of magnitude smaller.”

That makes it all the more difficult for highly complex neural networks to perform to their full potential on IoT devices. Han theorised, however, that a new model compression technique might increase their efficiency on MCUs. First though, he had to understand how each layer of the neural network was using the device’s finite memory – in this case, a camera designed to detect the presence of a person before it started recording. “We found the distribution was highly imbalanced,” says Han, with most of the memory being “consumed by the first third of the layers.”

These were the layers of the neural network tasked with interpreting the image, which were using an approach Han compares to stuffing a pizza into a small container. To boost efficiency, Han and his colleagues applied a ‘patch-based inference method’ to these layers, which saw the neural network divide the image into quarter segments that could be analysed one at a time. Even so, these squares began to overlap one another, allowing the algorithm to better understand the image but resulting in redundant computation. To reduce this side-effect, Han and his colleagues proposed an additional optimisation method inside the neural network known as ‘receptive field redistribution’ to keep overlapping to a minimum.

Naming the resulting solution MCUNetV2, the team found that it outperformed comparable model compression and neural architecture search techniques when it came to successfully identifying a person on a video feed. “Google’s mobile networking tool achieved 88.5% accuracy, but it required a RAM of 360KB,” says Han. “Last year, our MCUNetV2 further reduced the memory to 32KB, while still maintaining 90% accuracy,” allowing it to be deployed on lower-end MCUs costing as little as $1.60.

MCUNetV2 also outperforms similar tinyML solutions at object recognition tasks, such as “finding out if a person is wearing a mask or not,” as well as face detection. Additionally, Han sees potential in applying similar solutions to speech recognition tasks. One of Han’s previous methods, MCUNet, achieved notable success in keyword spotting. “We can reduce the latency and make it three to four times faster” using that technique, he says.

Such innovations, the researcher adds, will eventually bring the benefits of edge computing to millions more users and lead to a much wider range of applications for IoT systems. It’s with this goal in mind that Han helped launch OmniML, a start-up aimed at commercialising applications such as MCUNetV2. The firm is already conducting an advanced beta test of the method with a smart home camera company on more than 100,000 of its devices.

It’s also set to make the IoT revolution greener. “Since we greatly reduce the amount of computation in the neural networks by compressing the model,” says Han, they’re “much more efficient than the cloud model.” Overall, that means fewer server racks waiting for a signal from your door camera or thermostat – and less energy expended trying to keep them cool.