  • The number of users with access to the Genomics England Research Environment has more than doubled since our last update in March 2018 – now over 1,300 researchers are working with data from the 100,000 Genomes Project.

The Genomics England Clinical Interpretation Partnership (GeCIP) is an international consortium of researchers, clinicians and trainees, established to improve understanding and practice of clinical genomics, and uncover new medical insights for patients. Over 2,700 people have come together into 42 research groups, known as ‘domains’. These GeCIP domains are either disease-focused or cross-cutting.

The first members of GeCIP were given access to our Research Environment in June 2017 to work on the de-identified data from the 100,000 Genomes Project and test the environment. Since then, the number of users has grown and this week we’re excited to announce that over 1,300 GeCIP researchers now have access to the Research Environment. These researchers are part of 25 GeCIP domains, mainly covering cancer and rare disease:

Cancer Rare Disease Cross-cutting
Colorectal cancer Neurology Quantitative methods, machine learning and functional genomics
Breast cancer Endocrine and metabolism
Lung cancer Hearing and sight
Ovarian cancer Inherited cancer predisposition
Prostate cancer Renal
Cancer of unknown primary Cardiovascular
Glioma Immune disorders
Haematological malignancy Non-malignant haematological and haemostasis disorders
Melanoma Musculoskeletal
Pan-cancer Renal
Renal cell carcinoma
Testicular cancer
Upper gastrointestinal cancer

GeCIP members currently have access to 44,067 genomes and clinical data for over 60,000 participants. This is already the largest number of whole genomes with associated clinical data anywhere in the world and excitingly, this will continue to grow with each data release as more patients are recruited to the project and more genomes are sequenced.

We have also linked our data to external datasets, such as Health Episode Statistics, Patient Reported Outcome Measures and Mental Health Services Data Set, to create a really rich resource with longitudinal life course follow-up.

All of these data are housed in the Genomics England data centre and accessed using the Research Environment, a virtual desktop environment with all of the software and tools required to analyse the data. We hope that by providing all of these data to GeCIP members we’ll be able to gain a better understanding of disease, leading to improved diagnosis and treatment for patients in the future.

Dr William Cross (Queen Mary University of London), a member of the Colorectal Cancer GeCIP domain, said:

Essentially I got involved in the [100,000 Genomes] Project as I am interested in trying to make sense of the vast heterogeneity we see in cancer genomes. There have been several projects like the Project (such as The Cancer Genome Atlas), but where this project is truly unique is the all-encompassing whole genome sequencing (WGS) of samples and the inclusion of clinical annotation, which is commonly missing or unavailable in other projects.

The reason WGS is so exciting is that there is a relatively unexplored world in the non-coding regions of the genome. We may very well find new types of colorectal cancer driven by mutations in RNA genes, for instance.

As for the Research Environment, I think the data are very well organised and accessible. We have been given vast resources in the form of the cluster (grid-computer) and I feel that this was a well-conceived and essential part of the project.

