Skip to main content

Diverse Data

The initiative aims to reduce health inequalities and improve patient outcomes in genomic medicine for minoritised communities.

Imagery of a group of diverse people on a white background.

Emerging technologies and methods

This programme explores how the latest developments in genomics can help ensure the benefits of personalised healthcare are accessible to all.

  • Some types of genetic variation are hard to identify using existing methods for genome sequencing. New sequencing technologies can help to fill in the gaps and promise to deepen our understanding of the relationship between genetics and disease.
  • As we build a more complete picture of genome variation across diverse human populations, new methods for genome analysis will help ensure that we use this knowledge for the benefit of all.

Long-read sequencing

Over the past decade, the emergence of “short-read” sequencing technologies enabled healthcare to enter the genomics era. Despite this, many regions of the human genome have remained inaccessible to these technologies. Recently, “long-read” sequencing has begun to shed light on these inaccessible regions, including those that are relevant to a range of conditions.

However, with limited data from historically under-represented populations, our understanding of the human genome remains incomplete. By sequencing thousands of genomes using long-read technologies, we aim to uncover hidden genomic diversity to help ensure that the benefits of genomic medicine are available to all.

Mapping the genome in diverse populations

Genetic variants are genomic locations where the DNA sequence differs between individuals. These are often identified by mapping an individual's DNA to a single, linear reference genome. As we build a more complete picture of how genomes vary across diverse populations, alternative references known as pangenomes, offer a potentially superior method for mapping genetic variation as they capture a broader range of genomic diversity.

As the size and complexity of genomic datasets increases, so too does the need for sophisticated methods for storing and analysing genomic information. ‘Tree sequences’ are a succinct way to represent the evolutionary relationships between individual DNA sequences that offer potential savings on storage and faster computation. They also enable a range of novel methods for genome analysis that could provide new insights into the relationship between genomes and disease.

Explore Genomics England