Skip to main content

Modernising quality assurance for clinical grade rare disease pipelines

By Luke Paul Buttigieg on

Delivering genomic research at national scale requires more than accurate science. It also requires confidence that every change we make to a pipeline behaves exactly as expected.

At Genomics England, our rare disease pipeline underpins whole genome analysis for the NHS Genomic Medicine Service. Over the past 2 years, we have redesigned how we test and validate this pipeline, managing to achieve faster releases and stronger reliability.

This blog shares how we modernised our approach to quality assurance, reduced release testing time by over 60%, and built a more sustainable method to pipeline development.

Why change was needed

Historically, validation and verification of the rare disease pipeline relied on running full sized genome samples that represented key clinical scenarios.

While thorough, this approach was slow. Failures often required complete reruns, and testing used many connected systems at once rather than using the pipeline in isolation.

Earlier stages of testing, such as unit or integration tests, covered relatively little code. As a result, defects often surfaced late, during this most expensive stage of testing.

At the same time, the workflow codebase lacks clear modular boundaries, and Individual components frequently handle multiple responsibilities, making targeted testing difficult and slowing developer feedback.

A layered approach to testing

To address these challenges, we introduced a layered testing strategy guided by the Testing Trophy model.

This model encourages a balance of tests, with most coverage coming from fast integration tests, supported by unit tests and a smaller number of end-to-end runs.

Rather than adding more tests, we focused on understanding what our existing tests were actually proving.

By analysing what full-sized samples validated, we translated those guarantees into faster integration tests that check the same behaviours in minutes, rather than hours.

This shift reduced the number of full-sized regression samples we run routinely by over 50%, while allowing continuous integration pipelines to prevent defects from being merged, as we now run the equivalent check on every merge request rather than only during release testing.

Automating validation outcomes

We also automated how we validate test results.

Expected outputs from legacy full-sized runs are now stored in a version-controlled repository, rather than in shared documents. Validation scripts automatically compare expected and actual outputs.

This reduces manual checking, improves consistency, and makes it easier to test more frequently. It also consolidates our sources of truth into one place that works for both automation and human review.

Other teams are also able to use this source of truth in their own testing efforts, for example, by downstream Decision Support Systems which service the results from the bioinformatics pipeline to end users.

Introducing “Tiny Genomes”

One of the most impactful changes was the creation of “Tiny Genomes”.

These are minimal but representative whole genome datasets designed to exercise key pipeline logic while drastically reducing data volume.

Tiny genomes include only the minimum coverage required at each genomic position for the pipeline to run correctly, particularly the Illumina Dragen powered mapping and variant calling process. This reduces processing time by around 75%, cutting runtimes from over 4 hours to just over one.

Despite their size, these datasets still provide strong confidence that pipeline integration is working as expected. Tiny genomes also test that the pipeline is categorising variants as expected, as they synthetically crafted to include a selection of variants of interest.

Nightly testing for earlier feedback

We now run tiny genome tests nightly on both release and latest builds of the rare disease pipeline.

These runs provide daily feedback on pipeline health and can detect regressions much earlier than traditional regression packs.

They also test integration with complementary application programming interfaces (APIs), helping us catch issues across system boundaries without waiting for full-sized runs.

This means full-sized regression testing is now reserved for more nuanced validation, rather than being the first line of defence.

Targeted testing with Nextflow

To enable even more focused testing, we introduced a Nextflow based test workflow that allows individual parts of the rare disease pipeline to run in isolation.

This supports activities such as A/B testing between versions, parameter scanning, performance optimisation, and rapid debugging.

It also makes it much easier to test specific logic, such as variant prioritisation, without running the entire workflow.

Refactoring for testability

Alongside testing improvements, we are refactoring the rare disease workflow codebase to make testing easier by design.

Key principles include:

Each module should do one thing well

External services should not be called mid‑execution

Inputs and outputs should be clearly defined

Unnecessary dependencies between modules should be removed

This modular structure makes fine-grained testing possible and reduces the need to run full workflows to validate small changes.

What has this enabled?

Together, these changes have transformed how we validate and maintain the rare disease pipeline.

Testing now happens earlier, faster, and more continuously. Developer feedback loops are shorter, and reliability has improved rather than being traded for speed.

Most importantly, this approach provides a scalable foundation for future pipeline development, supporting the growing needs of the NHS Genomic Medicine Service while maintaining clinical confidence.

And finally...

If you want to learn more about this work, you can download the full detailed poster that was presented at the Festival of Genomics conference 2026 here.

You can also check out our other technical blogs or listen to our podcast.