Genomics England’s 8th Birthday: Augusto on being pioneers in bioinformatics
I feel privileged to have spent the last seven years at Genomics England. Looking back, there has been an immense amount of progress both by us at Genomics England and by the international genomics community. But progress always feels slow to come; we often joke that it sometimes felt like we were flying a plane while building the wings. Yet it flies.
When the 100,000 Genomes Project started in 2013, only a few hundred genomes and a few thousand exomes had been sequenced in the UK. Exomes were becoming ever more popular across the world, and sequencing genomes was only just about possible in the days before Illumina introduced its HiSeq X sequencing platform.
By the end of 2018 we had finished the sequencing of over 100,000 genomes. These genomes represent data from over 36,000 families with rare diseases and more than 17,000 cancer patients. By 2020, together with our colleagues in the NHS, we had processed, analysed, and interpreted the great majority of these data to provide genomic insights to participants.
Much development needed to happen in the bioinformatics space to get to where we are today. Put simply, Bioinformatics is the application of software tools and computational methods to analyse and interpret biological data. Pipelines to take sequence data and identify genetic variation already existed when we started on the 100,000 Genomes Project – and very good ones. However, they took several days and thousands of compute hours to process a whole human genome using large high performance computer clusters. Today, we can process a genome in just over 2 hours using Illumina’s Dragen toolchain. The Dragen team have diligently listened to all our feature requests and data findings (and of course those of the international community) and are taking alignment and variant calling to a whole new level.
Beyond pipelines, much of the infrastructure to analyse genomes was built from scratch. That included PanelApp, a project that started as a way of handling gene lists and grew into an international community crowd-sourcing gene-disease associations. We now have millions of API requests and thousands of visitors every month.
Perhaps less known outside the Genomics England/NHS community is our Clinical Variant Ark. Today, as genetics laboratories interpret the genomes processed and analysed by Genomics England across the UK, the insights associated with that process are automatically stored and made available back to all labs in real time. A scientist in the Newcastle genetics lab can see that there are patients with similar genetic defects in Exeter (or anywhere else in the country) helping them interpret their own patients. All without needing to manually deposit any data.
We have also developed a database, OpenCGA, that currently stores the variant genotypes and associated information for over 100,000 genomes and more than 1.2 Billion variants. That is 10^14 data points. These achievements would have not been possible without our partners in the commercial and academic sectors. We have also been lucky enough to stand on the shoulders of giants such as the DDD study and the Decipher database.
Our work is not only about providing clinical diagnoses, however. It is also about making these data available for research. Here, progress has also been steep. In 2015 we put 5000 genomes, a small amount of clinical data, and a few tools together to explore those data. All of this was built in an on-premises environment. Since then, we have matured the environment and built new tools in response to user needs. We now have hundreds of active users.
Launched in 2020, we now also have a next generation research environment on AWS. Fuelled by the need to quickly disseminate the data form our COVID-19 host sequencing study, we stood up a brand new cloud native research environment leveraging CloudOS, a platform provided by Lifebit to enable genomic research on the cloud. As bioinformatics practitioners, we are in the midst of a paradigm shift in scientific computing, moving from high performance computing systems to become cloud native citizens.
Being pioneers is exciting! There is much to discover and to prototype. Yet, having been one of the early entrants in this field has left a huge amount of legacy software and infrastructure that today feels Precambrian. The Genomics England of today is exciting in a different way. We get to revisit the challenges of the last 8 years with renewed enthusiasm and hopefully more wisdom. We can harness the wealth of technical, scientific and healthcare advances, grounded on solid product and engineering practices, and continue to enable the clinical, research and patients’ communities that we serve with passion.
But in the end, this is about the people that we work with. I want to take the opportunity to thank all of our participants and their families, as well as our NHS colleagues for your patience and support over the last eight years. My team and I feel incredibly grateful to have been able to help and serve you. We are delighted to continue working with you through the NHS Genomic Medicine Service and proud that whole genome sequencing as standard of care in the NHS is a reality.
Augusto Rendon is Chief Bioinformatician at Genomics England.