The number of users with access to the Genomics England Research Environment has more than doubled since our last update in March 2018 – now over 1,300 researchers are working with data from the 100,000 Genomes Project.
The Genomics England Clinical Interpretation Partnership (GeCIP) is an international consortium of researchers, clinicians and trainees, established to improve understanding and practice of clinical genomics, and uncover new medical insights for patients. Over 2,700 people have come together into 42 research groups, known as ‘domains’. These GeCIP domains are either disease-focused or cross-cutting.
The first members of GeCIP were given access to our Research Environment in June 2017 to work on the de-identified data from the 100,000 Genomes Project and test the environment. Since then, the number of users has grown and this week we’re excited to announce that over 1,300 GeCIP researchers now have access to the Research Environment. These researchers are part of 25 GeCIP domains, mainly covering cancer and rare disease:
|Quantitative methods, machine learning and functional genomics
|Endocrine and metabolism
|Hearing and sight
|Inherited cancer predisposition
|Cancer of unknown primary
|Non-malignant haematological and haemostasis disorders
|Renal cell carcinoma
|Upper gastrointestinal cancer
GeCIP members currently have access to 44,067 genomes and clinical data for over 60,000 participants. This is already the largest number of whole genomes with associated clinical data anywhere in the world and excitingly, this will continue to grow with each data release as more patients are recruited to the project and more genomes are sequenced.
We have also linked our data to external datasets, such as Health Episode Statistics, Patient Reported Outcome Measures and Mental Health Services Data Set, to create a really rich resource with longitudinal life course follow-up.
All of these data are housed in the Genomics England data centre and accessed using the Research Environment, a virtual desktop environment with all of the software and tools required to analyse the data. We hope that by providing all of these data to GeCIP members we’ll be able to gain a better understanding of disease, leading to improved diagnosis and treatment for patients in the future.
Essentially I got involved in the [100,000 Genomes] Project as I am interested in trying to make sense of the vast heterogeneity we see in cancer genomes. There have been several projects like the Project (such as The Cancer Genome Atlas), but where this project is truly unique is the all-encompassing whole genome sequencing (WGS) of samples and the inclusion of clinical annotation, which is commonly missing or unavailable in other projects.
The reason WGS is so exciting is that there is a relatively unexplored world in the non-coding regions of the genome. We may very well find new types of colorectal cancer driven by mutations in RNA genes, for instance.
As for the Research Environment, I think the data are very well organised and accessible. We have been given vast resources in the form of the cluster (grid-computer) and I feel that this was a well-conceived and essential part of the project.
Dr William Cross
Queen Mary University of London, a member of the Colorectal Cancer GeCIP domain