Producing pictures of cats in jackets, chip giant Qualcomm has successfully ported and run the Stable Diffusion foundation model on a mobile phone for the first time. The company says this is a major breakthrough in Edge AI that will enable companies to reduce cloud fees and increase security by bringing generative AI to edge computing.
The model runs completely on the device which significantly reduces runtime latency and power consumption, according to the Qualcomm AI research team. The neural network was trained on a vast quantity of data at scale, allowing users to generate photorealistic images from a line or word of text – which makes it power-hungry when it is run.
The company had to optimise every layer of the Stable Diffusion model as well as the entire application, the model, the algorithms, software and the hardware to get it working on a Snapdragon 8 Gen 2 mobile platform. They did this through re-training and post-training quantisation that significantly reduced the power, memory and energy requirements.
They used the FP32 version of Stable Diffusion from Hugging Face – that is the 32 single precision floating point model, a format used in scientific calculation that doesn’t require a great emphasis on precision, and is widely used in AI – then converted it to the smaller, more manageable INT8 format which uses 8-bit integers instead of floating points.
They had to run the re-training techniques across every component model used to make Stable Diffusion work, including the text encoder and Unet. It was also in part possible due to optimisations made to the Qualcomm AI Engine and co-design and integration of hardware and software on the HExagon Processor. Snapdragon 8 Gen 2 also comes with micro tile inferencing which enables large models to run efficiently, which suggests we could see more AI models running on edge devices in future.
Reducing latency and cloud fees
“The result of this full-stack optimisation is running Stable Diffusion on a smartphone in under 15 seconds for 20 inference steps to generate a 512×512 pixel image — this is the fastest inference on a smartphone and comparable to cloud latency. User text input is completely unconstrained,” Qualcomm engineers wrote.
This, says Qualcomm, is the start of the “edge AI era” with large AI cloud models gravitating towards edge devices making them faster and more secure. “Although the Stable Diffusion model seems quite large, it encodes a huge amount of knowledge about speech and visuals for generating practically any imaginable picture,” the engineers wrote.
Its potential goes beyond making pretty pictures, as developers could now integrate this technology into image editing, in painting and style transfer applications running completely on the device even without an internet connection.
Qualcomm says it will now focus on scaling edge AI including further optimisation of Stable Diffusion to run efficiently on phones and other platforms including laptops, XR headsets and any other device with a Snapdragon processor.
This, the company says, will allow end users to reduce cloud computing costs by running processes at the edge and ensure privacy as the input and output never leave the device. “The new AI stack optimisation also means that the time-to-market for the next foundation model that we want to run on the edge will also decrease. This is how we scale across devices and foundation models to make edge AI truly ubiquitous.”