Genome data analysis now available in a matter of hours

2 min read

Imec has unveiled elPrep5, the newest version of its software platform for DNA analysis. Obtaining identical results elPrep5 is, according to imec, eight to 16 times faster than the genome analysis toolkit (GATK) - the widely-accepted standard reference.

The platform encompasses the full analysis pipeline from data preparation to variant calling on a similar hardware infrastructure, opening new opportunities and efficiency gains for hospitals and medical practitioners.

“This is the breakthrough we have been anticipating for years. Finally, we can run the entire DNA analysis pipeline with a single software platform solution, and faster than ever,” said imec researcher Dr. Charlotte Herzeel.

“Because variant calling is the most complex step, gathering results up to 16 times faster than the previous method has resulted in a four- to nine-fold reduction in time, all while retaining GATK4-identical results. For the medical sector, this allows massive efficiency gains because the time between sampling and diagnosis dramatically decreases and doctors can run analyses overnight. Moreover, since many hospitals run their analyses via rented cloud solutions, the reduced throughput times can immediately result in a cost reduction per analysis.”

After a DNA sample is sequenced, there are hundreds of gigabytes of data representing the genetic information of the original sample, which, in the sequencing process, was cut into a multitude of smaller fragments. These fragments have to be reconstructed to a representation of the original DNA sample. Afterwards, an analysis is then performed to detect genetic variants, for example, in comparison to a known reference model. elPrep 5 is specifically designed to optimise this variant calling analysis.

Performing this analysis is a computational-heavy challenge and despite substantial cost reductions for DNA analysis over the past decade, runtimes can still last up to two to three days for a whole genome. imec’s elPrep5 can perform a whole genome analysis within a few hours without compromising the quality of the output.

Extensive validations show completely identical outputs to its industry counterparts in GATK, SAMtools and Picard.

By taking advantage of its parallel execution framework, elPrep5 performs the complete analysis after a single pass through the data. This architecture avoids the intensive read and write processes of fragments of data in and out of the memory. elPrep5 is written in Go, an open-source programming language developed by Google, and can be run on standard servers that most hospitals have locally or in the cloud. ElPrep5 extends and improves the elPrep4 functionality and performance by including variant calling as the final step to encompass the whole DNA analysis pipeline and by realizing additional efficiency gains in the process.

ElPrep5 will target users in the pharmaceutical industry, scientific research, medical laboratories, sequencing service providers, sequencing vendors and hospitals. The speedups brought by elPrep5 enable these industries to move from research runs into clinical practice and further scale their operations.

Several industrial partners have already expressed interest to integrate elPrep5 into their daily operations