NeuReality reports strong performance results for the NR1-S AI inference appliance

2 mins read

NeuReality has announced what it describes as ‘remarkable performance results’ from its commercially available NR1-S AI Inference appliance.

Credit: basketman23 - adobe.stock.com

Said to significantly cut costs and energy use in AI data centres, it offers a solution to the growing concerns over AI’s high expenses and energy consumption.

As more organisations raise alarms over AI’s unsustainable power consumption and exorbitant costs, NeuReality's announcement comes at a critical time with the explosive growth of generative AI. According to NeuReality, the NR1-S solution provides a responsible and affordable option for those businesses and organisations struggling to adopt AI.

The NR1-S does not compete with GPUs or other AI accelerators but rather boosts their output and complements them.

NeuReality’s published results compare the NR1-S inference appliance paired with Qualcomm Cloud AI 100 Ultra and Pro accelerators against traditional CPU-centric inference servers with Nvidia H100 or L40S GPUs.

The NR1-S was able to demonstrate dramatically improved cost savings and energy efficiency in AI data centres across common AI applications compared to the CPU-centric systems currently relied upon by large-scale cloud service providers (hyperscalers), server OEMs and manufacturers.

According to a recent technical blog, NeuReality’s real-world performance findings show the following improvements:

  • Massive Cost Savings: When paired with AI 100 Ultra, NR1-S achieves up to 90% cost savings across various AI data types, such as image, audio and text. These are the key building blocks for generative AI, including large language models, mixture of experts (MoE), retrieval-augmented generation (RAG) and multimodality.
  • Critical Energy Efficiency: Besides saving on the capital expenditure (CAPEX) of AI use cases, the NR1-S shows up to 15 times better energy efficiency compared to traditional CPU-centric systems, further reducing operational expenditure (OPEX).
  • Optimal AI Accelerator Use: Unlike traditional CPU-centric systems, NR1-S ensures 100% utilisation of the integrated AI accelerators without performance drop-offs or delays observed in today’s CPU-reliant systems.

The performance data included key metrics like AI queries per dollar, queries per watt, and total cost of 1 million queries (both CAPEX and OPEX).

One of the ASR tests shows NR1-S cutting the cost of processing 1 million audio seconds, making voice bots and other audio-based NLP applications more affordable and capable of handling more intelligence per query

The tests also measured energy consumption, with ASR showing seven seconds of audio processing per watt with NR1-S, compared to 0.7 seconds in traditional CPU-centric systems. This translates to a 10-fold increase in performance for the energy used.

The NR1-S also demonstrated the same performance output regardless of the number of AI accelerators used, allowing customers to efficiently scale their AI infrastructure up or down with zero performance loss.

“As the industry keeps racing forward with a narrow focus on raw performance for the biggest AI models, energy consumption and costs keep skyrocketing,” said NeuReality co-founder and CEO Moshe Tanach. “The NR1-S technology allows our customers to scale AI applications affordably and sustainably. NeuReality was built from inception to solve the cost and energy problem in AI inferencing, and our new data clearly show we have developed a viable solution. It’s an exciting step forward for the AI industry.”