The Genomics England Research Environment
The Genomics England Research Environment provides a secure workspace for approved researchers to carry out research on the 100,000 Genomes Project dataset, gaining novel disease and patient-related insights. The aim is to enable scientific discovery and accelerate its translation into patient care.
The Research Environment provides a range of open source tools, databases and research platforms linking genomic data to a rich set of clinical, phenotypic and longitudinal data. Researchers can also develop their own software that can then be run on Genomics England’s high performance compute facilities.
To protect patient data, access to the research environment is controlled and secure, and only results of analysis can be exported. Find out more about data access, security and uses.
An independent Access Review Committee examines and must approve all requests for access to data before users are granted access to the Research Environment.
Data within the Research Environment currently contain the following types of de-identified information:
- genome sequence data
- variant call files
- phenotype/clinical data
- outputs from the Genomics England interpretation pipeline, such as tiering results
- Hospital Episode Statistics
- Diagnostic Imaging Dataset
- Patient Reported Outcome Measures
- Mental Health Services Data Set.
How the Research Environment works
The Genomics England Research Environment operates as a remote desktop connection within our datacentre. After logging in, a virtual desktop opens in a window of the researcher’s internet browser. This looks much like a normal Linux desktop. Researchers are provided with a set of relevant tools, the data they have been approved to access and shared storage drives. As there is no internet access within the Research Environment (except to ‘whitelisted sites’), a wiki-based platform and chat facility have been made available to facilitate communication with other researchers.
A more detailed overview of the Research Environment, including the applications within it, is available here. A collection of ‘how-to’ guides to provide researchers with the information required to work within the Research Environment and understand how to move information and data between the different systems is also provided.
To ensure the privacy of those who have consented to participate in the 100,000 Genomes Project is maintained, data can only be accessed through the secure Research Environment and no participant-level genomic or clinical data can be removed from this environment. Genomics England only allows results of analysis to be taken out and published. To protect participant data, movement of files into and out of the Research Environment is via the ‘Airlock’ system and is subject to review by a team at Genomics England.
Researchers cannot copy and paste information from inside of the Research Environment to outside of it. Internet access within the Research Environment is available for sites authorised by Genomics England (‘whitelisted sites’).
To access the data, researchers and healthcare professionals must first apply to become a member of the Genomics England Clinical Interpretation Partnership (GeCIP). Eligibility criteria and details of how to join are available here.
Currently, over 1,300 academic researchers, who are working across 25 of the 42 registered GeCIP domains, have access to the Research Environment. A list of all domains is available here. These researchers have now begun to put their research plans into action.
To ensure the national workforce has the relevant data science expertise, Health Education England (HEE) has commissioned a Masters in Genomic Medicine programme that is supported by ten Health Education Provider Institutions (HEIs) country-wide. As part of this initiative, HEE students and instructors have been granted limited access to a small subset of 100,000 Genomes Project data within the Research Environment to study bioinformatics. This subset includes:
- de-identified data for five trio families from the rare disease main programme
- de-identified data for five tumour-normal pairs selected from the 1K Cancer Cohort dataset
- select platinum genomes.
Genomics England works with industry through its Discovery Forum, which was created in July 2017.
The Discovery Forum provides a platform for collaboration and engagement between Genomics England, industry partners, academia, the NHS and the wider genomics landscape.
How can researchers gain access the Research Environment?
You will need to join the Genomics England Clinical Interpretation Partnership (GeCIP) as a member of a GeCIP domain. Details of how to join are available here.
Once a researcher is an approved member of a GeCIP domain and their institution has signed the Genomics England Participation Agreement and verified them as a bonafide researcher, they will be granted restricted, remote access to de-identified datasets that contain only the information they need for their specific and approved research study.
Will research users be able to take data away?
Research users will not be able to take away individual level data; they will only be able to download the anonymised results of their analysis, provided that these results do not reveal the identity of any of the participants.
How do members of the Discovery Forum work with researchers and clinicians in Genomics England’s Clinical Interpretation Partnership?
Companies who are members of the Forum work with the researchers, clinicians and analysts who have been successful in joining Genomics England’s Clinical Interpretation Partnership (GeCIP) within a managed framework. This partnership between industry, the academic research community and clinicians will help to accelerate the development of new diagnostics and treatments for NHS patients as all findings will have to be shared.
Can masters students also join a GeCIP domain?
Yes, students can join a GeCIP domain by completing the student application form. Students must have a supervisor who is also a member of the GeCIP domain they would like to join.
How often will the data in the Research Environment be updated?
We intend to release data on a roughly quarterly basis although initial releases may be more frequent.
Will you be updating/improving the Research Environment?
We welcome feedback on the Research Environment from users and are continually looking at ways to improve. Please submit any comments or queries you have through the Genomics England Service Desk.
I’m having a problem using the Research Environment, where do I get help?
Please check the Research Environment guidance and list of live issues on Confluence, our knowledge-sharing wiki. If these do not answer your query, please submit a ticket to the Genomics England Service Desk.