Delivering on the promise of Gen AI

4 mins read

New Electronics looks back at NVIDIA GTC and some of the key announcements that will help to deliver on the promises made for Generative AI.

Credit: Nvidia

If generative AI is to deliver on its promises, then it’s going to need the technology to deliver it. At last month’s NVIDIA GTC, founder and CEO Jensen Huang did just that and introduced a host of new technologies and, in his presentation to over 12,000 attendees, he outlined the major advances that increased computing power are set to deliver going forward.

“Accelerated computing has reached the tipping point - general purpose computing has run out of steam,” said Huang. “We need another way of doing computing, so that we can continue to scale so that we can continue to drive down the cost of computing. Accelerated computing is a dramatic speedup over general-purpose computing, in every single industry.”

Huang used his keynote to introduce the company’s new Blackwell platform that makes it possible for organisations to build and run real-time generative AI on trillion-parameter large language models. Not only that but this platform, according to Huang, can do so at up to 25x less cost and energy consumption than its predecessor.

“For three decades we’ve pursued accelerated computing, with the goal of enabling transformative breakthroughs like deep learning and AI,” said Huang.  “Generative AI is the defining technology of our time. Blackwell is the engine to power this new industrial revolution. Working with the most dynamic companies in the world, we will realise the promise of AI for every industry.”

The key technologies at the heart of the Blackwell platform include a powerful new chip that comprises of 208 billion transistors. These GPUs are manufactured using a custom-built 4NP TSMC process with two-reticle limit GPU dies that are connected by a 10 TB/second chip-to-chip link into a single, unified GPU.

A second-generation Transformer Engine that uses new micro-tensor scaling support and NVIDIA’s advanced dynamic range management algorithms integrated into NVIDIA TensorRT-LLM and NeMo Megatron frameworks, means that Blackwell can support double the compute and model sizes with new 4-bit floating point AI inference capabilities.

The third element is a Fifth-Generation NVLink that can accelerate performance for multitrillion-parameter and mixture-of-experts AI models, with the latest iteration delivering 1.8TB/s bidirectional throughput per GPU, ensuring high-speed communication among up to 576 GPUs for the most complex LLMs.

A RAS Engine provides reliability, availability and serviceability, while advanced confidential computing capabilities protect AI models and customer data with support for new native interface encryption protocols, which is seen as critical for privacy-sensitive industries like healthcare and financial services.

Finally, a Decompression Engine that supports the latest formats, helps to accelerate database queries to deliver the highest performance in data analytics and data science.

Microservices and the Omniverse

During his presentation Huang also referred to NVIDIA’s inference microservices and unveiled not only a new way of packaging and delivering software that connects developers with hundreds of millions of GPUs, enabling them to deploy custom AI of all kinds, but also Omniverse Cloud APIs, which will be able to deliver advanced simulation capabilities.

The NVIDIA Omniverse Cloud will be available as APIs helping designers to create industrial digital twin applications and workflows across the entire ecosystem of software makers.

The five new Omniverse Cloud application programming interfaces, unveiled at GTC, will enable developers to integrate core Omniverse technologies directly into existing design and automation software applications for digital twins, or their simulation workflows for testing and validating autonomous machines like robots or self-driving vehicles.

According to NVIDIA, some of the largest industrial software makers are already embracing Omniverse Cloud APIs in their software portfolios. Among those companies are: Ansys, Cadence, Dassault Systèmes for its 3DEXCITE brand, Hexagon, Microsoft, Rockwell Automation, Siemens and Trimble.

“Everything manufactured will have digital twins,” said Huang. “Omniverse is the operating system for building and operating physically realistic digital twins and the Omniverse and generative AI are the foundational technologies to digitalise the $50 trillion heavy industries market.”

The five new Omniverse Cloud APIs, which can be used individually or collectively, include: USD Render - which generates fully ray-traced NVIDIA RTX renders of OpenUSD data; USD Write, that lets users modify and interact with OpenUSD data; USD Query which enables scene queries and interactive scenarios; USD Notify, tracks USD changes and provides updates and the Omniverse Channel. This connects users, tools and worlds to enable collaboration across scenes.

NVIDIA’s CUDA platform provides the base for its cloud-native microservices which include NVIDIA NIM microservices for optimised inference on more than two dozen popular AI models. In addition, NVIDIA accelerated software development kits, libraries and tools can be accessed as NVIDIA CUDA-X microservices for retrieval-augmented generation (RAG), guardrails, data processing, and HPC among others.

NVIDIA also separately announced over two dozen healthcare NIM and CUDA-X microservices.

According to Huang, this curated selection of microservices will add a new layer to NVIDIA’s full-stack computing platform and will connect the AI ecosystem of model developers, platform providers and enterprises with a standardised path to run custom AI models optimised for NVIDIA’s CUDA installed base of hundreds of millions of GPUs across clouds, data centres, workstations and PCs.

Above: Project GR00T is a foundational model for humanoid robots Credit: Nvidia

Even bigger

GTC also saw NVIDIA unveil its next-generation AI supercomputer, the NVIDIA DGX SuperPOD which is powered by NVIDIA GB200 Grace Blackwell Superchips.

Capable of processing trillion-parameter models with constant uptime for superscale generative AI training and inference workloads it features a new, highly efficient, liquid-cooled rack-scale architecture and provides 11.5 exaflops of AI supercomputing at FP4 precision and 240 terabytes of fast memory - scaling to more with additional racks.

“In the future, data centres are going to be thought of … as AI factories,” Huang said. “Their goal in life is to generate revenues, in this case, intelligence.”

Alongside new developments supporting development in the automotive sector, 6G research, semiconductor design and manufacturing, NVIDIA announced the launch of Project GR00T, a general-purpose foundation model for humanoid robots, designed to further its work driving breakthroughs in robotics and embodied AI.

The company unveiled a new computer, Jetson Thor, for humanoid robots based on the NVIDIA Thor system-on-a-chip (SoC), as well as significant upgrades to the NVIDIA Isaac robotics platform, including generative AI foundation models and tools for simulation and AI workflow infrastructure.

“Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” said Huang. “The enabling technologies are coming together for leading roboticists around the world to take giant leaps towards artificial general robotics.”

GR00T, which stands for Generalist Robot 00 Technology, will enable robots to better understand natural language and emulate movements by observing human actions -  quickly learning coordination, dexterity and other skills in order to navigate, adapt and interact with the real world.

NVIDIA is already building a comprehensive AI platform for specialist humanoid robots and is working with a number of  companies around the world in order to deliver more intelligent robots.