Skip to main content

An introduction to Diverse Data at Genomics England

By Dr Maxine Mackintosh on

Maxine is the Programme Lead for the Diverse Data Initiative here at Genomics England.

The Diverse Data initiative at Genomics England was launched in late 2021, and while we are still early days of setting up, recruiting, brainstorming, designing and cracking on, we thought it would be helpful to share a bit of background and how we are thinking about the problem.

Our vision is that all patients, regardless of their background, receive the same quality of genomics-enabled personalised medicine, supported by the latest research on people like them. Sounds good, eh? No small task…

To set the scene (which I suspect you know if you’ve opted to read an article about genomic data diversity), there is a rather glaring problem of diversity within medical sciences and research; it’s heavily influenced by WEIRD countries (and by this, we don’t actually mean weird but rather countries that can be identified as — western, educated, industrialised, rich and democratic). In the context of genomics, this has most notably resulted in a situation whereby globally, genomic datasets are dominated by individuals of European ancestry. And… well... Europeans aren’t the global majority (newsflash).

In a world where research and clinical practice is rapidly being driven by data driven systems, we have to carefully monitor how the solutions that are being built using potentially imbalanced and biased datasets can exacerbate the existing gap in equitable healthcare and research. In human genetics, this has resulted in misdiagnoses, poor understanding of conditions and inconsistent delivery of care, as well as mistrust amongst excluded communities on the collection and use of their genetic data. We’re currently in a position where not all clinical insights are equally as accurate or applicable to every single person. Boo to this.

For this reason, the Diverse Data programme was incepted.

So for starters, what makes human genomes “diverse”?

Human genomes are 99.9% identical; however, it’s differences in the 0.1% which make each of us unique — whether in terms of our hair colour, our like/dislike of certain foods, and even our sleeping patterns.

Variations in genomes can result from factors such as human migration patterns, our ancestry and choice of mate, and oftentimes environmental factors (which along with genetics, can contribute to a trait such as height and weight).

These variants tend to appear at different frequencies across different populations. Sometimes, they’re rare and impact only specific families. At other times, the variants are common and can be found across populations.

How can studying diverse human genomes improves health outcomes?

The human genome is complicated and the discoveries through research have only scratched the surface of understanding everything there is to know about what makes humans unique at the molecular level.

The overrepresentation of populations from ‘WEIRD’ societies (Western, Educated, Industrialised, Rich, and Democratic) in genomic databases has resulted in misdiagnoses, poor understanding of conditions and inconsistent delivery of care, as well as mistrust amongst excluded communities on the collection and use of their genetic data. We’re currently in a position where not all clinical insights are equally as accurate or applicable to every single person.

Prioritising diversity in genetic studies can lead to a new understanding of genes that underlie diseases or which may not be prevalent in populations that are well represented, or new therapeutics for these populations. In addition, diversity in genomic studies can provide a more accurate reading of a person’s risk of developing a specific disease, and allow the design of a clinical management strategy tailored to that individual.

Why has enhancing diversity in genomics research been a difficult task?

Underrepresentation in genomics research can be attributed to many complex barriers such as language, economic circumstances, cultural beliefs, as well as other pertinent social and psychological factors. In addition, historical injustice and lived experiences have led underrepresented communities to build a wall of mistrust against medical research. Mistrust has stemmed from very real historical events and abuses, and is reinforced through the current sociopolitical climate. In addition, both inaccessible and insufficient communications have complicated the understanding of benefits and use of their personal data to participants.

How we’re getting going at Genomics England

In order to tackle the whole pipeline of research and care delivery we’ve identified four buckets of areas we need to work on:

1.We need to understand the data gap better (Research, Discovery & Exploration)

The problem of data diversity is complex, genomes are complex and the systemic issues impacting on this problem are complex! We need to improve our understanding of genomic diversity by reviewing, stimulating and conducting research into diversity and its impacts on scientific, clinical and health system outcomes. As well as just encouraging more people, to ask more questions on more diverse genomes.

2. We need to close these data gaps, together (Community)

We obviously can’t do it alone, but we also recognise this is quite a fragmented landscape with varying degrees of understanding and awareness on this topic. We want to work in a “top-down” manner with other relevant research, commercial and charitable organisations to raise the (often) low bar of understanding and action on this topic. We also want to work in a “bottom-up” way by working with patient and data communities to design, develop or implement equity-enhancing strategies — getting everyone to work better together, as we build trustworthiness, in a more coordinated manner.

3. We need to start filling those data gaps (Sequencing & Data)

We want to increase the volume, depth and breadth of genomic data available on individuals from under-represented groups. The dominance of genomes from individuals of European ancestry is absolutely a problem we can throw our hat in the ring to address. We’ll be running our own bespoke sequencing efforts with carefully selected, high impact cohorts to make sure more, and more diverse genomes are made available to researchers across the UK and beyond. But we also want to look at additional contextual data (“phenotypes”) that give us more clues about people’s conditions and life circumstances which allows us to better understand where genetics play a role in driving inequalities versus social factors. Lastly, we think we could all do a bit better in sharing what we have. There are more diverse genomes out there in the world, they’re just not that accessible, so how can we make better use of what we currently have by sharing, or at least facilitating better use of them.

4. We need to build better solutions and bridge the data gap

We want to work with clinicians, analysts, researchers, patients and community groups to develop new tools, processes and approaches to changing research, service-delivery practices, recruitment and care. This can range from analytical approaches to better characterise local ancestry, to toolkits to support community leaders engage their communities on topics and questions in precision medicine.

So this is the Diverse Data programme in a nutshell. We see a future where genomics research and service provision is diverse-by-design resulting in equity of access, treatment and potential innovations. We want to make sure that the solutions we are developing are embedded across Genomics England, but also tackle genomic inequities across the broader genomic ecosystem. So, if you have suggestions and are eager to collaborate in finding solutions to this big phat problem, we welcome your ideas!

Thanks for reading, for more on what we're up to or to get in touch, check out our Diverse Data webpage.


Our Diverse Data team aims to share more during the process to show the journey, and hopefully encourage others to course direct. This blog series is part of that ethos of working in the open.

Get the latest updates straight to your inbox