How your data is used

Name: Genomics England
Address: Queen Mary University of London, Dawson Hall, Charterhouse Square, London, London, EC1M 6BQ, UK
Telephone: 0808 2819 535
Price range: $

None of our work would be possible without the consent and support of patients and participants – they are at the heart of everything that we do. By enabling scientists and clinicians in their research, a patient’s data goes from being an isolated dot to a key part of the picture.

Flat illustration of a diverse crowd of people walking different directions

Your data in the National Genomic Research Library

The data we collect is stored in the National Genomic Research Library, a platform built by us and NHS England that allows approved researchers to access samples, genomic data, and other associated health data via a secure Research Environment.

Researchers who use the Research Environment can learn more about everyone’s health by looking for patterns in the de-identified data of thousands of participants.

Being able to compare all patient data in one place provides researchers with an opportunity to better understand diseases, develop new treatments and can lead to new diagnoses.

All data in the National Genomic Research Library has been de-identified and cannot be extracted by researchers.

Daughter piggybacking on mother, laughing together

What kind of data is stored in the National Genomic Research Library?

Types of de-identified data that are collected:

Your medical test results
Electronic copies of your health records from the NHS, your GP and other organisations (such as NHS Digital and Public Health England)
Information about any illnesses or stays in hospital – including your primary diagnosis and any historic diagnoses going back as far as medical records allow
Copies of hospital or clinic records, medical notes, social care, and local or national disease registries
Relevant imaging data from your NHS records, such as MRI scans, X-rays or pathology images

A lifetime resource

Your records will continue to be updated throughout your lifetime, for as long as you give us permission to. This lifelong health data may allow development of better treatments or diagnostics for others.

Why do we gather clinical data as well as your genome sequence?

To take a deeper approach to genomic research, we need as much detail as possible about a participant’s medical condition and symptoms.

Even small differences in symptoms between individuals might be crucial in understanding what changes in a genome mean, and helping decide the best treatments.

Only by understanding your detailed clinical data are we able to understand how patterns in your genome affect your diagnosis.

Illustration of connected circles containing different health icons like pills, charts, stethoscopes, etc.

What will the data be used for?

Finding treatments

Helping to find new treatments and possibly cures for a wide range of health conditions.

Improving analysis

Researchers might use the data to try and find new, faster ways to analyse large amounts of data.

Sharing knowledge

Researchers may share their findings with other scientists and doctors through publications or meetings to help research advance as quickly as possible.

Developing drug and diagnostic tests

New drugs and diagnostic tests may be developed by the NHS, universities and companies across the world.

Suggesting clinical trials or relevant research

Researchers will be able to find opportunities for you, and others like you, to take part in relevant research projects or clinical trials.

Keeping your data safe

De-identifying data in the National Genomic Research Library

Maintaining your data security and protecting your privacy is our top priority

To maintain your privacy, information that could identify you – like name, date of birth, and all other personal details – is removed from your health records and genomic data.

This process of de-identifying data ensures that researchers have no way of identifying individual participants while using the Research Environment.

Keeping your data safe

How is data stored and secured?

Genome data is enormous, and we use a cloud service from Amazon Web Services (AWS) – based in the UK – to provide a secure cloud computing and storage infrastructure. All patient data is held in secure facilities based in the UK.

All of the data from the National Genomic Research Library stays within the secure, monitored Research Environment where it can be analysed by researchers.

In this way, Genomics England considers the National Genomic Research Library to be a reading library – not a lending library.

How we protect your data:

Continuously review latest best practice for secure storage
Use industry-standard tools and techniques to prevent unauthorised access
Regularly undertake security tests
Meet all laws and standards for data protection
Closely monitor all activity on the secure Research Environment

Who has access to the data?

Anyone with access to the de-identified data in the National Genomic Research Library must be approved by the Access Review Committee and Genomics England.

Researchers

Approved researchers may work for not-for-profit organisations, such as universities, hospitals, or research charities, and for-profit (commercial) companies such as pharma, biotech or diagnostic companies from around the world who will work with de-identified data to better understand diseases and develop diagnostics and treatments.

Clinicians and healthcare staff

Doctors, nurses and other healthcare professionals in the NHS have access to information about the patients they are caring for.

This is to enable clinical staff to see any results or findings from whole genome sequencing for patients in their care.

Service providers

Approved organisations who provide the data itself, IT support, computing infrastructure, data storage, genome analysis, or other technical services.

Each company only has access to the part of the data centre they are working on or supporting, and all individuals have undergone Information Governance training and signed confidentiality agreements.

Keeping your data safe

Approval process for all researchers

Also known as our 'airlock' system

1
Researchers submit an application requesting access to our secure research environment.
Join as a researcher
2
Applications are reviewed by an independent Access Review Committee, overseen by us and NHS England.
Meet the committee
3
Researchers have their identities checked and confirmed.
4
The researcher’s organisation is required to sign legal documentation.
5
Only when approved are researchers given secure access to the National Genomic Research Library.
6
All research activity is continually monitored by Genomics England.

The data will never be used for insurance or marketing purposes, nor for speculative searches, such as checks of DNA profiles or other information derived from your sample.

The Department of Health and Social Care has had confirmation from the Home Office and the Association of Chief Police Officers that they will not seek access to Genomics England’s data without presentation of a court order.

For further details of unacceptable uses, please see the list as detailed in the Research Ethics Committee approved Protocol.

Read the protocol

FAQs

Data sources

We get data from many places. The main ones are:

NHS Digital

Hospital Episode Statistics (HES) These come from all NHS trusts in England, including acute hospitals, primary care trusts and mental health trusts. This data is collected during a patient’s time at hospital. It includes details of diagnosis, treatment received and other details about the patient. HES information is stored as a large collection of separate records – one for each period of care.
Patient Report Outcome Measure Data (PROMs): This data measures health gain typically in patients undergoing hip or knee replacement but is also used to measure a patient’s health or health-related quality of life at a single point in time. This dataset lets us see whether there is any relationship between a gene variant and how well patients do after treatment or surgery.
Mental Health and Learning Disability Set (MHLDS): This is used to analyse patient pathways and enable a deep understanding of mental health service users’ interactions with acute secondary care. For the project it is a particularly important data set as many participants with rare conditions have learning disabilities. But it also helps us see whether mental health problems such as depression are an unrecognised symptom of a genetic variant.
Diagnostic Imaging Dataset (DID): a central collection of detailed information held by NHS Digital about diagnostic imaging tests carried out on patients, such as XRays and MRI scans. This data can provide insights into whether particular gene variants are associated with a particular tumour or condition.
Mortality Data: Cause of death is a crucial piece of information. If a patient dies of a different illness, or in an accident, we need to record this as otherwise it will skew the data. And if people die earlier, or live longer than expected it will help us pick out the variants associated with this. Having mortality data will also reveal if people with one condition die disproportionately from another, such as heart attacks. Approval to link to this data with Office of National Statistic Data (which records deaths) must be granted by both NHS Digital and the ONS.

Public Health England

Participants in the 100,000 Genomes Project cancer programme give their explicit consent to allow their patient records to be linked to the data collected by the National Cancer Registration Service in Public Health England. More information on the data collected by the National Cancer Registration Service and how it is used is available at www.ncras.nhs.uk.

Clinical reports are different for rare disease patients and for cancer patients.

For rare disease patients and their family members who are also taking part, their reports will be divided into three tiers of results.

The first tier includes findings that, with current knowledge and in line with NHS standard practice, could be fed back to the participant after being validated and signed off by their NHS Genomic Medicine Centre. These findings are limited to variants in sets of genes that when disrupted are known to cause the clinical condition under consideration. These are called virtual gene panels and are specific to the disease being investigated.

Tier two provides a larger list of variants in that same gene panel that can be helpful in gaining a more complete picture of the genetic basis of the participant’s clinical condition under consideration.

Tier three is more explorative and is not limited to virtual gene panels but looks at the whole genome, all of the participant’s health data and their family history. In some instances, information generated from this tier will highlight findings that were missed when the first two tiers were looked at and could lead to a new or more precise diagnosis for the participant. This section of the report will also include results on additional findings if participants have consented to those being looked for. This tier is where new research advances are likely to be made.

Reports for cancer patients will mainly be in three sections with information gleaned from the whole genome sequencing data:

a high level description of the tumour’s genome and the most obvious ways it differs from the patient’s normal genome
genetic changes in the tumour which are not present in the patient’s normal DNA in genes known to be involved in causing cancer
changes in the normal DNA, in known cancer genes, that can make patients susceptible to the cancer that is being investigated.

As the Project evolves and in consultation with stakeholders, Genomics England will provide additional information that may help in the prognosis or management of the patient’s condition.

Data sources

We get data from many places. The main ones are:

NHS Digital

Hospital Episode Statistics (HES) These come from all NHS trusts in England, including acute hospitals, primary care trusts and mental health trusts. This data is collected during a patient’s time at hospital. It includes details of diagnosis, treatment received and other details about the patient. HES information is stored as a large collection of separate records – one for each period of care.
Patient Report Outcome Measure Data (PROMs): This data measures health gain typically in patients undergoing hip or knee replacement but is also used to measure a patient’s health or health-related quality of life at a single point in time. This dataset lets us see whether there is any relationship between a gene variant and how well patients do after treatment or surgery.
Mental Health and Learning Disability Set (MHLDS): This is used to analyse patient pathways and enable a deep understanding of mental health service users’ interactions with acute secondary care. For the project it is a particularly important data set as many participants with rare conditions have learning disabilities. But it also helps us see whether mental health problems such as depression are an unrecognised symptom of a genetic variant.
Diagnostic Imaging Dataset (DID): a central collection of detailed information held by NHS Digital about diagnostic imaging tests carried out on patients, such as XRays and MRI scans. This data can provide insights into whether particular gene variants are associated with a particular tumour or condition.
Mortality Data: Cause of death is a crucial piece of information. If a patient dies of a different illness, or in an accident, we need to record this as otherwise it will skew the data. And if people die earlier, or live longer than expected it will help us pick out the variants associated with this. Having mortality data will also reveal if people with one condition die disproportionately from another, such as heart attacks. Approval to link to this data with Office of National Statistic Data (which records deaths) must be granted by both NHS Digital and the ONS.

Public Health England

Clinical reports are different for rare disease patients and for cancer patients.

For rare disease patients and their family members who are also taking part, their reports will be divided into three tiers of results.

Reports for cancer patients will mainly be in three sections with information gleaned from the whole genome sequencing data:

a high level description of the tumour’s genome and the most obvious ways it differs from the patient’s normal genome
genetic changes in the tumour which are not present in the patient’s normal DNA in genes known to be involved in causing cancer
changes in the normal DNA, in known cancer genes, that can make patients susceptible to the cancer that is being investigated.

As the Project evolves and in consultation with stakeholders, Genomics England will provide additional information that may help in the prognosis or management of the patient’s condition.

Questions about the National Genomic Research Library?

If you would like to make a data request, please see our data requests page. To contact us about the National Genomic Research Library, please get in touch on 0808 2819 535 or by email using the link below (it will open in your email application).

[email protected]

Explore Genomics England

Genomic Healthcare

Little girl in hospital bed with nurse and dad smiling at her

Genomics in the UK

Our Initiatives

Illustration of herd of zebras with one rainbow zebra in the middle of them

Our initiatives

Patients and Participants

Father reaching out to catch running son

Research and impact