Skip to main content

Airlock: How we support approved researchers to access data in a secure and ethical way

By Mitra Sato and Peter O'Donovan ​​​ on

What is Airlock? 

‘Airlock’ is a secure gateway used for moving data into and out of the Genomics England Research Environment. 

This Research Environment, sometimes shortened to ‘RE’, hosts the National Genomic Research Library: a secure database that holds de-identified genomic and health data from participants. 

Using the Research Environment, approved scientists can gain access to this data, allowing them to conduct valuable research in cancer and rare conditions. 

Airlock is essential in this process, ensuring that all transfers of data comply with participant consent, ethical standards and data protection regulations. 

Why is Airlock important? 

Those who have volunteered their data to the National Genomic Research Library are contributing to scientific research. This in turn may lead to advances in the diagnosis, treatment and scientific understanding of rare conditions and cancer.  

The Research Environment stores this data securely; however, it is vital that we keep the data secure, even when it is being moved in or out. 

Airlock addresses this in a number of ways. These include: 

1) Reviewing every request from researchers to import or export data 

2) Ensuring that these requests align with participant consent and policy regulations 

3) Preventing the release of any data that could allow participants to be identified 

4) Ensuring that the data released is proportional to the needs of the given research, minimising the risk of participant identification 

Protecting participant data is essential to maintaining trust in genomic research. 

How does Airlock work? 

To take data in or out of the Research Environment, researchers must first submit a request. These requests are managed using software which provides an online form for the researcher to fill out the request details, for example file(s) proposed for transfer and project details. 

A researcher may request to “export” data out of the RE, in order to be able to publish it to the wider world. They or may also request to “import” data into the RE, as reference material and/or to analyse it alongside other data in the National Genomic Research Library. 

Following this, we at Genomics England confirm that the requester is on an approved, active project listed in the Research Registry database. This database hosts all Genomics England Research Network project details. We also check that the request matches the scope of the project, and that it meets Research Management policy timelines. This requires that a project must have been registered for at least 1 month before it may be cited for an export request, to ensure that the Research Management team has visibility of all research happening. 

Next, we next review the file, checking for identifiers or high-risk content and ensure files are de-identified or aggregated. Should a request pass all of these steps, data can then be transferred into or out of the environment. Approved imports are made available to download into the given researcher’s home directory within the RE; approved exports are made available to the given researcher to securely download via a website that is accessible outside the RE. 

Complex requests are referred by the Airlock team to the Airlock Committee, and all outcomes are logged for transparency. The Airlock Committee is a panel of Genomics England staff with expertise in areas such as clinical genomics in rare diseases and cancer, bioinformatics, machine learning (ML)/artificial intelligence (AI), ethics and data governance. Over time the decisions of the Airlock Committee are used to build “case law” around Airlock requests, enabling more efficient review of future requests. 

What type of data can and can’t be exported? 

The Airlock team uses a 5-tier system to classify all the data in the Research Environment. This system sorts types of data into tiers of identification risk, which ranges from very high-risk, data types in tier 1, such as Genomics England Participant IDs or raw sequencing data, down to very low risk data types such as programming script or methods notes in tier 5.  

Data types in tier 1 will generally be seen as unsafe for export and rejected under most circumstances. Data types in the lower tiers will receive a level of scrutiny proportional to the risk level they pose, before a final decision is made about the suitability of a given request for export.  

In cases where the Airlock team must reject a request, the researcher will be given guidance on how to re-work the data shown to ensure the confidentiality of those who volunteered their data. 

The impact of Airlock 

Airlock has allowed for thousands of successful research outputs, including 400+ peer-reviewed papers, over 100 abstracts, and multiple MSc and PhD theses. See the graph below for exact numbers for each publication type.  

All data originating from the National Genomic Research Library shown in each of these publications will have been reviewed by the Airlock team to ensure that the data was sufficiently summarised and non-identifiable to be safe for export and publication.  

Airlock has also prevented privacy breaches by rejecting high-risk exports, often working with researchers to modify files for approval. 

Figure 1: Number of publications based on NGRL data by publication type published between 1st August 2023 and 1st August 2025 

Tips for a successful submission 

If you are an approved researcher with access to the National Genomic Research Library, there are several things you can do to increase the chance of a successful data transfer request.  

Do: 

Explain clearly what the file contains and why it’s needed 

Request only the minimal data required 

Link to your registered research project and provide the Research Registry number (aka project code) 

Review the Airlock guidance on RE-Docs 

Avoid: 

Using participant or sample IDs 

Leaving low phenotypic counts (<5) unmasked 

Submitting unlabeled or ambiguous data files 

And finally... 

By working together, we can ensure participant data is shared securely, ethically and in a way that builds trust in genomics research. 

If you want to read more about the research that has been made possible by Airlock, check out our other research blogs

Other blog articles