Wallaroo.AI and Ampere Computing to bring cost effective ML inferencing to the Cloud

2 mins read

Wallaroo.AI, a specialist in scaling production machine learning (ML) from the cloud to the edge, has announced a strategic collaboration with Ampere Computing.

The agreement will see the two companies create optimised hardware/software solutions that provide reduced energy consumption, greater efficiency, and lower cost per inference for cloud artificial intelligence (AI).

Ampere processors are designed to be more energy efficient than other traditional AI accelerators. By using an optimised low-code/no-code ML software solution and customised hardware, it is now possible to put AI into production in the cloud both cost-effectively (even at cost-per-inference measure) and using less energy.

“This Wallaroo.AI/Ampere solution allows enterprises to deploy easily, improve performance, increase energy efficiency, and balance their ML workloads across available compute resources much more effectively,” said Vid Jain, chief executive officer of Wallaroo.AI. “All of which is critical to meeting the huge demand for AI computing resources today also while addressing the sustainability impact of the explosion in AI.“

“Through this collaboration, we are combining Cloud Native hardware and optimised software to make ML production within the cloud much easier and more energy-efficient,” added Jeff Wittich, Chief Product Officer at Ampere. “That means more enterprises will be able to turn AI initiatives into business value more quickly.”

One of the key advantages of the collaboration is the integration of Ampere's built-in AI acceleration technology and Wallaroo.AI's highly-efficient Inference Server, part of the Wallaroo Enterprise Edition platform for production ML.

Benchmarks have shown as much as a 6x improvement over containerised x86 solutions on certain models like the open source ResNet-50 model. Tests were run using an optimised version of the Wallaroo Enterprise Edition on Dpsv5-series Azure virtual machines using Arm64 Azure virtual machines using Ampere Altra 64-bit processors, however, the optimised solution will also be available for other cloud platforms.

With a $15.7 trillion (U.S.) potential contribution to the global economy by 2030 (PwC), demand for AI has never been higher. However, the graphics processing units (GPUs) that are used to train AI models are often not a cost-effective solution. For many enterprises, a better alternative is to run software like the Wallaroo.AI inference server, which can cost-efficiently run workloads with similar performance using currently available, advanced CPUs.

The MIT Technology Review has stated that one AI training model uses more energy in a year than 100 US homes, which means that facility costs running GPUs can severely impact cloud providers as well as the power grid.

Many clients of cloud providers also have environmental, social, governance (ESG) or sustainability initiatives that could be impacted by large-scale adoption of AI with GPUs.

However, by using optimised inference solutions on CPUs, like Ampere’s Altra family of processors, it allows for greater efficiency for inference workloads advancing both the need for AI/ML performance while at the same time addressing company ESG goals for greater sustainability.