The 100,000 Genomes Project dataset will be made available to GeCIP researchers and trainees free of charge. Access is via a secure analysis environment hosted within the Genomics England datacentre – the Genomics England ‘embassy’. Analytical tools and applications are available within the embassy environment.
The dataset includes de-identified, linked information for each participant:
- Genome sequence data
- Variant call files
- phenotype/clinical data
- HES data
How to access the dataset
To view and work with the data GeCIP members need to first ensure their institution has signed a Participation Agreement.
This contract between Genomics England and your institution outlines obligations and responsibilities of the institution with respect to your participation in the Project.
Your institution may have already signed this agreement if others from your institution are a part of the Project. If your institution hasn’t signed, you will be contacted with the necessary information and documentation. Your institution will be asked to confirm your identity, before you are then sent your login details via email and mobile phone. View details and documentation for this process. See a list of current research projects with access to the dataset.
Viewing the dataset
Access to the 100,000 Genomes dataset is via a remote desktop hosted by the Genomics England datacentre – an embassy. After logging in, a desktop will open in a window of your internet browser. This looks much like a normal desktop and will be available as either Microsoft Windows or Linux. The embassy is preloaded with tools and applications, plus genomes and associated data.
All data access is through the embassy environment only. No sequencing or clinical data will be made available for download.
To preserve data security you cannot copy or paste in the embassy, and there is limited internet access within it. Movement of files into and out of the embassy will be via an ‘airlock’ system.
Each GeCIP member will have a finite number of Central processing unit (CPU) hours per month. The exact allocation is yet to be defined, and may change as the project and datacentre mature. Use of CPU beyond this allocation will require payment.