Tachyum demonstrates Machine Check and Recovery on Prodigy FPGA

1 min read

Tachyum has announced the addition of Machine Check and Recovery (MCR) capabilities with the Linux Error Detection and Correction (EDAC) subsystem to the Prodigy Universal Processor.

Deployment was successfully demonstrated as part of the FPGA emulation system.

MCR with Linux EDAC driver is essential for data centre applications, with the pair working together to provide critical information to predict and mitigate failures in the field.

Because it is able to detect and seamlessly correct errors caused by external events in the CPU’s internal memory blocks and attached DDR modules, Prodigy can run prolonged workflows without interruption to maintain and improve uptime of systems deployed at scale.

When the degree of Static Random-Access Memory (SRAM) damage is beyond repair, however, the error detection allows affected computations to be abandoned rather than provide incorrect results.

Error injection is an essential part of testing. Prodigy contains an error-injection module that can inject both correctable and uncorrectable errors into relevant CPU blocks and either a limited number or continuous stream of errors with programmable intervals to ensure the Prodigy architecture meets and exceeds data centre requirements.

Prodigy can provide Double Error Correction and Triple Error Detection (DECTED), which is a key feature to improving uptime, and is complemented by EDAC to enable preventative maintenance.

“Today’s demanding data centre applications require a level of reliability and availability previously unseen in order to complete complex functions while mitigating errors,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “Organisations choosing to deploy Prodigy-enabled datacentre will be able to ‘fortify’ their system by fully integrating and testing the MCR system with the Linux EDAC driver as part of our FPGA emulator, which will ensure optimal performance when the processor is commercially available in the near future.”

As a Universal Processor capable of offering exceptional performance for all workloads, Prodigy-powered data centre servers will be able to seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture.

According to Tachyum, by eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilisation, Prodigy will reduce CAPEX and OPEX significantly.

Prodigy integrates 192 high-performance custom-designed 64-bit compute cores and is said to be capable of delivering up to 4.5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.