Frequently asked questions about Data Access and Use
Genomics England is collecting genome sequence data, obtained from samples of blood, tissue, and saliva, together with data on the health and wellbeing of the participants providing those samples. Much of the health data will come from the NHS Genomic Medicine Centre providing care for the participant, and will be directly relevant to their rare disease or to their cancer. Some health data will come from NHS hospitals that have provided care in the past; other health data will come from NHS healthcare organisations (such as NHS Digital and NHS England) that will provide care in the future.
By considering the genome data and the health data together, researchers will be able to better understand the relationship between variations in the genome and the health of the individual. In rare diseases, they may be able to better explain the condition, arrive at a new diagnosis or suggest a new approach to treatment. In cancer, they may be able to predict the effect of a particular course of treatment, avoiding drugs that would not work for the individual concerned or selecting or developing drugs that have a better chance of success.
NHS clinicians will have access to data on participants in their care. They will already have most of this information, but the genome data, and the results of analysis, may lead to a new diagnosis or a different course of treatment.
Researchers will be allowed access to de-identified subsets of the data, for approved research purposes, within a secure, monitored data environment. Both academic researchers and researchers from industry will be allowed. It is important that Genomics England allows users from industry access to the data because if new diagnostic tests and treatments are to come from this Project, they will need to be developed, as they always have been, by the private sector and not within government or the NHS.
Genomics England takes the responsibility of data security seriously and from the outset we have chosen to hold the data in a secure government-owned facility based in the UK.
Genomics England is using industry-standard tools and techniques to prevent unauthorised access. Patient data travels from the NHS Genomic Medicine Centres to the Genomics England data centre – over the dedicated NHS N3 network.
Like any organisation, we are subject to attempts to probe our systems from outside and we regularly test to ensure that none succeed. We have no evidence of being targeted specifically.
All research users will have their research proposal approved by an independent Access Review Committee established by Genomics England. Only when approval has been given will research users be given restricted and remote access to only the de-identified datasets they need for their specific research study.
Research users will also have to undergo a thorough identity check and be employed or contracted to a registered healthcare or research organisation which will enter into a legally binding contract with Genomics England. Genomics England is currently exploring whether to introduce additional controls and what would be the most effective system to do this.
All research users must sign an electronic data access agreement and submit this to Genomics England’s independent Access Review Committee along with their research application for approval. Only when their research study has been approved will they be granted remote access to only those de-identified datasets they need for their specific and approved research study.
Clinicians will not have to sign a data access agreement as by the very nature of their profession they will already have to abide by certain professional standards and adhere to their Trust’s data governance policies when dealing with a patient’s data. Failure to do so would be in breach of their terms of employment resulting in appropriate action being taken.
To ensure there is a strong research element to the Project, Genomics England has established the Genomics England Clinical Interpretation Partnership (GeCIP). Researchers who are approved to become a member of GeCIP will work in domains dependent on their expertise, for example cardiology, breast cancer, neurology, paediatrics or analytics. Because very little is still known about the human genome, bringing together researchers to examine and interpret the data will mean that our knowledge about disease and how it could be treated will be vastly improved.
NHS clinicians will have access to data on patients in their care free of charge.
Academic researchers participating in Genomics England Clinical Interpretation Partnership will also have access to de-identified data free of charge, subject to their research study being approved.
Genomics England is charging industry to participate in the Discovery Forum because providing storage, security and analytic services for the data is costly and it is only right that for profit companies accessing the data for research purposes should contribute to these costs.
Research users will have restricted, remote access to de-identified datasets which contain only the information they need for their specific and approved research study.
Research users will be able to download the anonymised results of their analysis, provided that these results do not reveal the identity of any of the participants. The Information Commissioner’s Office has a code of practice on anonymisation, and this gives an indication of what will, and will not be, allowed.
The data systems that Genomics England have in place will not allow users to ‘cut and paste’ data into their own systems. Clinicians will be able to download the clinical reports on participants in their care.
The data made available to researchers will have direct identifiers removed. Researchers should not be able to work out who this data is about, or even who is participating in the Project, simply by looking at the information in the system.
However, any non-trivial piece of health data – even a de-identified report of an appointment booking – could be re-identified by somebody who already has enough information about the individual in question. This is why Genomics England insists all access to its data takes place within their secure environment, where it can be monitored.
Direct identifiers, such as name and date of birth, will be held in a separate part of the data infrastructure. This part of the data infrastructure will be more secure, in the sense that it will not be accessible to researchers; nor, indeed, will it be accessible to the majority of Genomics England staff. Given the sensitive nature of the health and genome data, similar expectations of security apply across the whole of the data infrastructure.
In the rare cases where Genomics England staff need access to identifiable data, access will only be granted following explicit approval from Genomics England’s Senior Information Risk Owner (SIRO). Access to identifiable data will be strictly limited and protected. All actions can be audited and monitored.
Maintaining data security and protecting the privacy of participants is of the upmost importance to Genomics England.
Any researcher who attempts to re-identify data (and thus “identify” a participant) runs a significant risk of detection. Genomics England has penalties in place for any organisation or individual who breaches or attempts to breach a participant’s identity confidentiality. Penalties for misappropriate use include revocation of user access, withdrawal of access for the organisation the offending user is employed or contracted to, and the reporting of the offending activity to the Information Commissioner which could result in a fine of up to £500,000 under section 55 of the Data Protection Act.
The sequencing of participants’ whole genomes will take place in England and will be carried out by the sequencing company Illumina.
One of the challenges of this Project when it was first established was that there were no sequencing machines in England of the type capable of supporting the Project, both in terms of the number of machines required to sequence 100,000 genomes and to an accuracy that researchers and clinicians would be confident in. Genomics England signed a partnership agreement with Illumina in August 2014 which ensures there is now sufficient sequencing capacity. Thanks to the generous support of £27m from the Wellcome Trust, the sequencing machines are located near Cambridge in a new sequencing centre within the Biodata Innovation Centre at the Wellcome Trust’s Sanger Institute.
Clinicians will receive feedback on participants in their care in a downloadable clinical report.
Yes. Genomics England expects that as new research reveals new information of relevance to a participant’s medical condition, then a new clinical report which captures this additional information will be issued to their clinician.
Clinical reports are different for rare disease patients and for cancer patients.
For rare disease patients and their family members who are also taking part, their reports will be divided into three tiers of results.
The first tier includes findings that, with current knowledge and in line with NHS standard practice, could be fed back to the participant after being validated and signed off by their NHS Genomic Medicine Centre. These findings are limited to variants in sets of genes that when disrupted are known to cause the clinical condition under consideration. These are called virtual gene panels and are specific to the disease being investigated.
Tier two provides a larger list of variants in that same gene panel that can be helpful in gaining a more complete picture of the genetic basis of the participant’s clinical condition under consideration.
Tier three is more explorative and is not limited to virtual gene panels but looks at the whole genome, all of the participant’s health data and their family history. In some instances, information generated from this tier will highlight findings that were missed when the first two tiers were looked at and could lead to a new or more precise diagnosis for the participant. This section of the report will also include results on additional findings if participants have consented to those being looked for. This tier is where new research advances are likely to be made.
Reports for cancer patients will mainly be in three sections with information gleaned from the whole genome sequencing data:
- a high level description of the tumour’s genome and the most obvious ways it differs from the patient’s normal genome
- genetic changes in the tumour which are not present in the patient’s normal DNA in genes known to be involved in causing cancer
- changes in the normal DNA, in known cancer genes, that can make patient’s susceptible to the cancer that is being investigated.
As the Project evolves and in consultation with stakeholders, Genomics England will provide additional information that may help in the prognosis or management of the patient’s condition.
The Information Commissioner (ICO) defines anonymisation as the process of turning data into a form which does not identify individuals and where identification is not likely to take place. The ICO also has a code of practice in relation to anonymisation. There is no standard definition for pseudonymised, and the two words are often used to mean the same thing. There is one, key, technical difference, in that ‘anonymised’ can be used to refer to aggregate data, or results, whereas ‘pseudonymised’ can only refer to individual-level data, but this does not mean that ‘pseudonymised’ data is easier to identify – it depends upon the data, and the context.
For Genomics England, only ‘anonymised’ data can be exported from the secure environment: this may be aggregate, results data, or other data from which so much information has been removed that it is highly unlikely that it could be re-associated with individual participants.
Within the secure data system, the data available to researchers is ‘de-identified’ in the sense that personal identifiers have been removed, together with any other information that is not needed for their specific, approved research purposes. More than this, a fresh, unique reference may be generated for each research purpose, so that researchers are not able to ‘collect’ information about individuals across research studies.
Genomics England chose the National Institute for Health Research Biosample Centre to store all blood and tumour samples from the Project. Samples will also be checked here prior to being sequenced to ensure they meet the necessary quality requirements. The National Institute for Health Research Biosample Centre was formally opened in January 2015 and offers large‐scale automated processing, storage and retrieval of different types of biological samples (for example, blood or saliva) at a range of temperatures. The Centre has the capacity to store up to 20 million samples. All samples will be stored and processed with strict research governance and ethical frameworks which include consent and confidentiality.
No data held by Genomics England will be accessible to other government agencies which includes HMRC and the Child Support Agency. In the unusual situation that a request for data is made by a court order then this will be referred to Genomics England’s Legal Counsel as promptly as possible so that all representations may be made to the court, for example, to limit the information requested being released.
Under no circumstances will data held by Genomics England be used for marketing purposes. This would be in strict violation of Genomics England’s Data Access and Acceptable Uses Policy and against their approved research ethics protocol.
The Department of Health has had confirmation from the Home Office and the Association of Chief Police Officers that they will not seek access to Genomics England’ s data.
Most of the time taking part in research projects, including the 100,000 Genomes Project, won’t affect insurance premiums. You don’t normally have to tell insurers that you are taking part in research, or about genetic test results. However, when applying for insurance you do have to disclose any symptoms you experience or any diagnoses, screening, or treatments you receive, if this information is requested on the application form.
See our page on insurance for full details.
Genomics England is wholly owned by the Department of Health and reports into the Secretary of State for Health. The decision on what will happen with the data will therefore be made by the Secretary of State for Health following discussions with a range of interested parties, including patients and the public to ensure there is a lasting legacy from this landmark Project.
The priorities of the government and Genomics England are to ensure that the identity of all participants is protected, the data storage and access to the dataset is secure, and researchers and clinicians have confidence in the dataset’s accuracy. To ensure public confidence in matters of confidentiality and access, this work is monitored by the Chief Medical Officer for England, who also sits on the board of Genomics England. Genomics England’s dataset is a national asset and it is this government’s clear policy that the dataset, as protected by its owner the Secretary of State for Health, will not be sold or distributed to third parties and that any income generated will be reinvested in health care and health research.
The requirements for the Project are considerable, but computing on this scale is not unprecedented. Genomics England is working with colleagues from organisations around the world, many of whom have experience of processing, managing, and storing whole genome sequence data to overcome this particular challenge.
The MRC grant of £24m is paying for part of the secure data infrastructure. It is paying for some of the hardware, and some of the software, needed to support academic research.
Genomics England has an independent Access Review Committee to evaluate requests for access to the research component of the secure data infrastructure.