Researchers have developed a new tool to help improve the accuracy of whole genome sequencing (WGS) analysis for patients with haematological cancers, which affect the blood, bone marrow or lymph nodes.
In cancer, WGS can be used to identify genetic changes that drive cancer development by comparing DNA from a patient’s tumour to the DNA in their healthy or ‘normal’ tissue. To do this, two samples need to be taken, one directly from the tumour as well as a 'normal' blood sample.
The new tool, developed by researchers from Genomics England, University of Trieste and Great Ormond Street Hospital for Children NHS Foundation Trust, aims to help address challenges in interpreting WGS cancer data when contamination of ‘normal’ samples occurs.
Bioinformatics pipelines designed to help analyse WGS data from patients with cancer can run into issues if a patient’s ‘normal’ sample has been contaminated by tumour cells, affecting the accuracy of the results generated from these pipelines. This problem is particularly relevant in blood cancers due to the natural spread of tumour cells within the bloodstream.
To help solve this challenge, the research team developed a new tool, known as TINC, to estimate the level of tumour contamination in normal samples based on an existing machine learning model used to understand tumour evolution. The results are published today in Nature Communications.
The TINC tool generates an easily interpretable score for the percentage of tumour cells in the normal sample, so that if a high level of contamination is detected it can trigger an alternative analytical workflow. This helps ensure clinical scientists using WGS to support the diagnosis and treatment of patients with cancer are provided with accurate data.
Genomics England supports the NHS to deliver whole genome sequencing for a number of cancer types via the NHS Genomic Medicine Service. The TINC tool has now been implemented into Genomics England’s clinically accredited bioinformatics pipelines to support WGS analysis for patients with haematological cancers.
The researchers validated the TINC tool using participant data from the 100,000 Genomes Project as well as against standard technologies used for minimal residual disease testing in blood cancers, which checks the number of cancer cells that remain in patient’s blood after treatment.
Implementing the TINC algorithm into Genomics England’s pipeline for analysis of WGS data allowed us to improve the accuracy of WGS analysis for patients with haematological cancers. This work shows how using data from large-scale sequencing projects, such as the 100,000 Genomes Project, and basic science we can produce accurate data that can support clinical decisions for patient care.
Dr Alona Sosinsky
Scientific Director for Cancer at Genomics England
The large-scale adoption of innovative sequencing technologies can revolutionise our understanding of a disease like cancer. With this new opportunity, however, analyses can become more challenging and need innovative computational tools. TINC is an example of a successful collaboration between a large sequencing initiative and our machine learning group to develop a better technology to make sense of complex genomic data.
Professor Giulio Caravagna
Head of the Cancer Data Science Laboratory at the University of Trieste and supported by the Italian Association for Cancer Research (AIRC)
Through the Genomic Medicine Service, we are now using whole genome sequencing routinely for diagnostics in the clinic. The implementation of the TINC algorithm provides clinicians with additional confidence in the analysis and interpretation of these results. The partnership with Genomics England has made this and other key developments in genomic data analysis possible – and these developments are ultimately being used every day to help the care of our patients.
Dr Jack Bartram
Consultant Paediatric Haematologist at Great Ormond Street Hospital for Children NHS Foundation Trust
- Research publication: Mitchell, J., Milite, S., Bartram, J. et al. Clinical application of tumour in normal contamination assessment from whole genome sequencing. Nat Commun 15, 323 (2024).