Skip to main content

Expertise, experience, and endurance: mapping NHS cancer records to a common data model

By Laura Kerr, Abigail Carter, and Helen Carter on

With over 300,000 cases of cancer recorded in the National Disease Registration Service (NDRS) each year, it forms an incredible resource for researchers trying to accelerate the understanding, diagnosis, and treatment of cancer.

A team at Genomics England have been mapping NDRS data to a common data model. On the eve of publishing their work, they describe their journey with it, their plans to make it widely available, and the difference it will make for cancer researchers…



Every interaction between the NHS and a patient with cancer or suspected cancer generates data. Each year, over 300,000 cases are registered by the National Disease Registration Service (NDRS). The data comes from various NHS sources that each have their own structures and coding systems, and the NDRS pieces this information together to provide a longitudinal record of each patient. This data can then be used to inform direct care, support service planning, improve population health, and contribute to medical research.

Supporting research

The common format chosen for data comes from the Observation Medical Outcomes Partnership (OMOP) using its Common Data Model (CDM). Although other models exist, the OMOP CDM is widely adopted as a way of increasing interoperability across health data. For example, the NHS Research Secure Data Environment Network recently agreed to adopt the OMOP CDM as its common data model. Once the data is in the OMOP CDM, there are several analytics tools that can be used to run queries; this enables research collaborations, large-scale analytics and development, and the use of shared tools and methods.

A key benefit of this for researchers is that more records are linked in the same way. They can look at a clinical code and find all the related records, such as how a diagnosis was made. This "fact relationship" approach significantly expands the boundaries of the research that can take place, as well as supporting linkage across other datasets that are not part of NDRS. It can also be used by researchers to see what an event might be linked to, enabling them to gain a wider understanding about what might be important to their research.

The end goal of this is to add value to research: extending the scale on which research can be carried out, speeding up the process of data linkage, and enabling new discoveries to be made.

A meticulous process

NDRS is a curated dataset with clinical oversight of coding and classification to assure quality and standardisation. For the mapping, further input has been provided by Dr Prabhu Arumugam, Director of Clinical Data and Imaging at Genomics England.

The mapping conducted by the Genomics England team took NDRS fields on each of the annual 300,000 records and applied them to OMOP. Mapping to the common data model is a meticulous process, and while Extract, Transform and Load (ETL) tools can help, much of it relies on the experience and expertise of the team to ensure that it is clinically accurate.

Equipping researchers

The NDRS to OMOP mapping is part of ongoing work led by Genomics England. It forms part of our mission to equip researchers to use data that may help them find causes of disease. This new mapping facilitates wider access to the data for approved researchers by presenting it in an internationally recognised standard format.

Dr Arumugam sees the team's work as both ground-breaking and transformative in the doors it opens for researchers:

"What this mapping has achieved is truly incredible. On the one hand it opens up the depth of the NDRS dataset - for instance patient journeys from diagnosis through to tumour surgery and further treatment - and at the same time it enables researchers to widen the scope of their work by using data from digital pathology and genomics. We have a huge opportunity through the mapping to significantly improve patient care, precision medicine, and the future of healthcare."

For Sarah Stevens, Deputy Director, NDRS, the mapping is an example of NHS England's commitment to making data more accessible to researchers through collaboration, and is an opportunity to accelerate data linkage:

“The high-quality cancer data curated by NDRS is a vital resource and paramount for research and service evaluation leading to quality improvement. The last decade has seen a vast improvement in cancer data in England, including treatment data and completeness of staging data. This mapping will accelerate further data linkage and supports NHS England’s commitment of making the data more accessible for research. It also demonstrates the value of collaboration across organisations to share skills and expertise for the benefit of the wider heath and care system.”

Further information

You can find out more about the mapping here: https://gitlab.com/genomicsengland/genomics_england_publications/public-omop-mappings

If you have any questions, please get in touch via [email protected]

To read more about the work happening at Genomics England, check out our other research blogs.

About the authors

Laura Kerr, Data Manager, Genomics England

Abigail Carter, Data Wrangler, Genomics England

Helen Carter, Data Manager, Genomic England.

Get the latest updates straight to your inbox