aiWare4+ automotive NPU brings enhanced programmability, flexibility and scalability

2 mins read

aiMotive, a supplier of automated driving technologies, has announced the latest release of its aiWare automotive NPU hardware IP.

aiWare4+ builds on the original aiWare4 IP used in the production of automotive SoCs, by refining the hardware architecture and significantly upgrading the software SDK.

As a result, it enables higher efficiency execution of a far broader range of workloads, such as transformer networks and other emerging AI network topologies. Support for FP8, as well as INT8 computation and dedicated sparsity hardware support, are also included in aiWare4+.

Its “data-first” scalable hardware architecture combines concepts such as near-memory execution, massively parallel on-chip I/O, hierarchical hardware tiling, and wavefront processing to deliver the highest possible PPA.

Upgraded capabilities for aiWare4+ include:

Upgraded Programmability: significant enhancements to the aiWare hardware architecture and SDK portfolio of tools enable users to gain full access to every part of aiWare’s internal execution pipeline without compromising the high-level AI-centric approach that makes tools such as the highly interactive aiWare Studio so popular with both research and production engineers

Full FP8 Support: with aiWare4+, full support has been added for FP8 in addition to INT8 quantization for workload execution

Broader Network Support: SDK upgrades enable users to deliver higher performance for not only CNNs but the latest emerging industry trends, such as transformer networks, occupancy networks and LSTMs. aiWare4+ users also benefit from hardware enhancements delivering significant performance and efficiency boosts for workloads such as transformer networks

Enhanced Sparsity Support: aiWare4+ hardware upgrades mean any weight sparsity results in minimised NPU power consumption on a per-clock basis, ensuring optimised power consumption for the widest possible range of workloads

Improved Scalability: aiWare4+ is designed to scale from 10 TOPS up to 1000+ TOPS using a multi-core architecture to increase throughput while retaining high efficiency (subject to external memory bandwidth constraints). Furthermore, aiWare4+ brings interleaved multi-tasking that optimises performance and efficiency with multiple workloads.

“When we delivered aiWare4, we knew our highly customised hardware architecture enabled us to deliver superior efficiency and PPA compared to any other automotive inference NPU on the market,” said Mustafa Ali, product director, aiWare for aiMotive. “However, while acknowledging our CNN efficiency leadership, some of our customers were concerned about aiWare’s programmability compared to more conventional architectures such as DSP- or GPU-based NPUs. These latest aiWare4+ and aiWare SDK upgrades ensure that our customers can programme aiWare for a broad range of AI workloads, achieving future-proof flexibility comparable to some of the best-known SoCs and DSP-based NPUs, without sacrificing our industry-leading NPU efficiency”.

aiMotive will be shipping aiWare4+ RTL to lead customers starting Q2 2023 while the SDK is now available providing early support for the majority of the new features, with the availability of a production quality release in 2023.