Over 100,000 whole genome sequences now available for approved researchers
Data Release 7 has now gone live in Genomic England’s Research Environment. While every data release is significant in its own right, v.7 is symbolic. It means we have now passed the milestone of 100,000 whole genomes available to researchers. Of course it’s not just about genomes. The growing wealth of linked clinical and secondary health data associated with the genomic data in each release is what makes the Genomics England dataset one of the most exciting tools in the world for discovery and translational research.
Speaking on the importance of this release, Genomics England Chair Jon Symonds CBE, said:
Improvement of patient outcomes through precision medicine technologies is a long-term goal but in fact, researchers accessing the Genomic England Research Environment have already created impact for project participants. To date, there have been 94 researcher-identified potential diagnoses in addition to over 90,000 results returned to the NHS by the team at Genomics England.
Commenting on the research opportunity, interim Chief Executive Professor Sir Mark Caulfield, commented:
To tell the story behind the numbers, we put a few questions to Dr James Holman, Research Environment Project Lead.
What is the significance of this data release and why?
It took a tremendous effort, across the NHS and Genomics England, to recruit thousands of participants and sequence over 100,000 genomes. Alongside that effort, Genomics England has been working hard to make those genomes available to researchers in our Research Environment. This release is a major milestone as it sees us exceed 100,000 genomes. It certainly hasn’t been a trivial exercise, and illustrates the commitment of Genomics England to accelerating scientific discovery. We owe particular thanks to the participants who have taken part in this endeavour.
It shouldn’t be forgotten that, although this one is symbolically significant, every data release has been important. Each release increases the richness of the linked clinical and secondary health data available to researchers, making our dataset a more powerful tool for discovery and translation. Similarly, following data releases will be just as important as we continue to enrich the Research Environment.
That being said, reaching 100,000 genomes in this data release is a significant achievement for the Genomics England Research Environment.
Can you provide a run through of what goes into a data releases and why they take time to prepare?
Each data release is a huge team effort across Genomics England. Together the teams collate, query, and generate data from numerous information systems across Genomics England and integrate them into a unified data release. These data are curated and made available in the various tools available within the Research Environment. That includes existing tools such as our Data Repository, and soon to be released applications such as the Data Discovery Portal and Genomics Analytics Platform.
Before the release goes live, we undertake an extensive review to confirm the integrity of the data and update the data dictionary. We then prepare a data release note to make users of the Genomics England Research Environment aware of what has been added or changed, making sure that we explain any new data features.
What’s coming up for the research environment for the rest of this year and what is your approach to development?
I am in the fortunate position of leading a team that is responsible for delivering innovative solutions to maximise the research potential of the data that’s available. In the coming months we will be delivering a range of new applications that will help researchers discover, understand, and use the data available within the Research Environment. We plan to deliver applications as advanced prototypes and then iteratively develop them based on user feedback. For each type of application released we are using surveys, pre- and post-release, to ensure the tools have the desired impact. This feedback will ensure that they meet the user needs and allow us to deliver value incrementally and continuously based on those needs.
Tools that we are currently testing and preparing for roll-out include a Data Discovery Portal, Terminology Service, OpenTargets, and a genomics analytics platform. We are also undertaking an evaluation of the common data model developed through the Observational Medical Outcomes Partnership (OMOP) Project to determine how suitable the model is for genomic analysis and what extensions may be required.