The 100,000 Genomes Project

To find out more information about Genomics England’s work with the GenOMICC consortium on COVID-19, please read our press release.

The project was established to sequence 100,000 genomes from around 85,000 NHS patients affected by a rare disease, or cancer.

The Project would also create a new genomic medicine service for the NHS – transforming the way people are cared for and bringing advanced diagnosis and personalised treatments to all those who need them.

Combining genomic sequence data with medical records has created a ground-breaking research resource. Researchers are currently studying how best to use genomics in healthcare and how best to interpret the data to help patients. The causes, diagnosis and treatment of disease is also being investigated. Revealing which variants cause disease is also helping companies find new targeted medicines. Kick-starting a UK genomics industry was another key aim of the project and the UK now has a vibrant genomics ecosystem.

Recruitment of participants to the 100,000 Genomes Project was completed in 2018, with the 100,000th sequence achieved in December 2018.

To date, actionable findings have been found for 1 in 4/1 in 5 rare disease patients, and around 50% of cancer cases contain the potential for a therapy or a clinical trial.

Introduction to the 100,000 Genomes Project

History of the 100,000 Genomes Project

You can read about the background and history of the 100,000 Genomes Project below.

April 2003 marked one of the most significant scientific breakthroughs of modern times. After years of painstaking research carried out by thousands of dedicated scientists across the world, the complete genetic code of a human being – their genome – was published.

The Human Genome Project, as this work was known, was the largest international collaboration ever undertaken in biology with British scientists leading the global race to read the 3 billion letters of the human genome, letter by letter. This is a technique called sequencing. The UK has often led the world in scientific breakthroughs and DNA was no exception. Crick and Watson won the Nobel Prize for discovering the double helix structure of DNA. And it was a British double Nobel Prize winning scientist, Fred Sanger, who discovered how to sequence it.

Now there is a real opportunity to turn the very important scientific discoveries about DNA and the way it works into a potentially life-saving reality for NHS patients across the country.

Most of us have heard of genetics, the study of the way particular features or diseases are inherited through genes passed down from one generation to the next. But the more we learn about genes, the more we understand that the old idea of having a single gene for this, or a single gene for that, which determines your fate is not – except in the case of unusual inherited diseases – a good way of describing the complexity of genes. In fact, groups of genes work together and their activity is influenced by a huge variety of environmental and other factors.

Your genome is your body’s instruction manual and you have a copy of it in almost every healthy cell in your body. The study of that genome and all the technologies needed to analyse and interpret it is called genomics.

When the first draft of the whole human genome was announced it was claimed that it would revolutionise medical treatment. It had taken 13 years and over £2 billion to laboriously read every letter of the human genetic code. It took such a long time because the DNA sequence of humans is very long – 3 billion letters – and because the sequencing machines available at the time were so slow and laborious. Now a human genome can be sequenced in a few days for less than £1000.  It’s the leap in the speed and cost of technology that has opened up the potential of genomics and brought it within reach of mainstream healthcare.

But haven’t we already got a good understanding of genetics? One of the great surprises from the Human Genome Project was that there were only about 20,000 genes– about the same number as a starfish. The role of the remainder of a human’s genome – in fact a staggering 95 percent of it – was a mystery. Now we know that the remaining DNA is not irrelevant as was once thought but that much of it has a critically important role, influencing, regulating and controlling the rest. That’s why it’s necessary to sequence the whole human genome (rather than just looking at the 20,000 genes currently used for diagnosis in medicine) if we are to really understand the role of genes in health and disease.

But people are very different, so studying only a small number of genomes would not be enough to give doctors and scientists a true picture of our genes and their relationship to disease. Another key point is that by itself, a genome can’t tell you very much. To make sense of it, it is essential to know much more about the person who donated it; details like their symptoms and when they first started, along with physiological measurements, such as heart rate or blood pressure (this sort of information is provided by clinicians and called phenotypic data).  Another set of information which may be important in interpreting genomic data comes from their past medical records and would include such things as previous illnesses, medications and birth weight.

And this is where the NHS comes in. The way in which the NHS is able to link a whole lifetime of medical records with a person’s genome data and the fact it can do this on a large scale is unique. The richness of this data can help to understand disease and to tease apart the complex relationship between our genes, what happens to us in our lives and illness.

So what can genomics do? You can use it to predict how well a person will respond to a treatment or find one that will work best for them – so called personalised medicine. A good example in use already is whether or not a woman’s breast cancer is HER2 positive. If it is, Herceptin will be very effective for her but not for someone who doesn’t have HER2. You can also use genomics to test how well a cancer might respond to radiotherapy. For some that can mean far fewer radiotherapy sessions. Or use it to find the 30,000 people who currently use insulin for their Type 1 diabetes but would do better on simple tablets. Genomics can be used to track infectious disease, precisely pinpointing the source and nature of the outbreak through looking at the whole genomes of bugs. The potential of genomics is huge, leading to more precise diagnostics for earlier diagnosis, new medical devices, faster clinical trials, new drugs and treatments and potentially, in time, new cures.

The supersonic age of genomics has begun. And just as the NHS has been at the forefront of scientific breakthroughs before, the NHS is at the forefront again, with its patients benefiting from all that genomics offers, becoming the first mainstream health service in the world to offer genomic medicine as part of routine care for NHS patients.

In late 2012, Prime Minister David Cameron announced the 100,000 Genomes Project.

Genomics England, a company wholly owned and funded by the Department of Health & Social Care, was set up to deliver this flagship project and sequence 100,000 whole genomes from NHS patients, something that at the time no one in the world had even attempted. Its four main aims were; to create an ethical and transparent programme based on consent; to bring benefit to patients and set up a genomic medicine service for the NHS; to enable new scientific discovery and medical insights; and to kick start the development of a UK genomics industry.

The project focused on patients with a rare disease and their families and patients with cancer. The first samples for sequencing were being taken from patients living in England with discussions taking place with Scotland, Wales and Northern Ireland about potential future involvement.

In the UK, just fewer than 160,000 people died from cancer in 2011 with over 330,000 new cases reported every year. Because cancer is more likely to occur as people age, we expect the number of cancer cases to rise as people live longer. And although rare diseases are individually very uncommon, because there are between 5000 and 8000 of them, a surprisingly large number of people are affected in total – 3 million – or, put another way one in 17 (or between 6 and 7 percent) of the UK population. Genomics has great potential for both because both rare disease and cancer are strongly linked to changes in the genome.  Cancer begins because of changes in genes within what was a normal cell. Although a cancer starts with the same DNA as the patient, it develops mutations or changes which enable the tumour to grow and spread. By taking DNA from the tumour and DNA from the patient’s normal cells and comparing them, the precise changes are detected. Knowing and understanding them strongly indicates which treatments will be the most effective. Genomics has already started to guide and inform doctors about the best treatment for individual patients. We’ve already mentioned Herceptin for HER2 positive breast cancer but we are only at the beginning. Many more cancer types, including those for whom there is hardly any successful current treatments such as lung cancer could be helped if only we knew which gene changes were important.

At least 80 percent of rare diseases are genomic with half of new cases found in children. Knowledge of the whole genome sequence may identify the cause of some rare diseases and help point the way to new treatments for these devastating conditions – vital progress given that some rare diseases take two or more years just to identify. As most rare diseases are inherited, the genomes of the affected individual (usually a child) plus two of their closest blood relatives were included to pinpoint the cause of the condition.

In all, it was anticipated that about 75,000 people would be involved. The numbers added up like this: 50,000 genomes from cancer – two per patient, therefore 25,000 patients. 50,000 from rare disease – three per patient (affected person plus two blood relatives) – therefore roughly 17,00 rare disease patients. There was an extraordinary response by patients and their families wanting to take part in this ground-breaking project.

Today, we have sequenced over 100,000 genomes from over 97,000 patients and their family members, totalling over 21 petabytes of data – 1 petabyte of music would take 2,000 years to play on an MP3 player.

Some patients involved in the 100,000 Genomes Project have already benefitted (see First patients diagnosed through the 100,000 Genomes Project), because a better treatment is identified for them or their condition is diagnosed for the first time. However, for most, the benefit will be in knowing that they will be helping people like them in the future through research on the genome data they generously allow to be studied but all will know that because of their involvement, an infrastructure will be developed which, in the future will enable the NHS to offer genomic services much more widely, to any patient who might benefit.

To make genomics a reality for the NHS it has to be of high quality, fast and affordable with results that are readily understood. How was this achieved?

The sequencing challenge

Genomics England invested in the latest, state of the art sequencing machines to sequence the 100,000 genomes in the project. Because it was the first time sequencing had been attempted at such a scale in the UK, it was assumed that sequencing would be the most difficult part of the project. Whilst it wasn’t without its challenges, thanks to the support of our partner Illumina it proved to be less difficult than we had anticipated.

The data challenge

Data was a major challenge on two fronts. The first step after sequencing is to compare the possibly millions of differences between the patient’s genome and a reference genome, a process called variant calling. The next hurdle – annotation – is to interpret the meaning and importance of those differences which are important. Some of the differences will just be natural harmless variations between individuals, but some will be damaging and almost certainly involved in the development of disease. Automating this process – creating the Genomics England pipeline – so that it took weeks rather than years, was very difficult.

The second big data problem was that information about a person and details of their illness are needed for interpretation. It’s a bit like being able to measure the amount of haemoglobin in a sample of blood but not being able to say whether it is normal without knowing more about the person who donated it – were they a child or an adult for instance? Getting data from the NHS in such a way that it all followed the same ‘rules’ (so you knew you were comparing apples with apples) was very challenging, but the NHS staff involved worked incredibly hard to make it happen.

Another data issue is its size. The raw data from one genome is about 200GB which would occupy most of the average laptop’s hard drive. Just the annotations would easily fill a DVD by themselves. This mountain of data needs to be sifted, analysed and presented in a way that is helpful to doctors, most of whom will not have specialist knowledge of gene changes.

The cancer challenge

At one point, the cancer programme had to be halted because it became clear that the usual methods used in the collection and analysis of cancer tissues, such as preserving them in formalin and then fixing it in parafin (FFPE) damaged DNA. We had to find other ways to preserve samples, and this is where we decided to use fresh frozen tissue. Again, the NHS was magnificent in responding to this challenge which required completely reconfiguring how samples were collected.

The security challenge

The genome data is large in size and also precious and is stored securely and respectfully with rigorous conditions for access which the public can have confidence in. Access is contrlled by a wide range of security measures as well as detailed governance. Participants are involved in deciding which researchers are allowed to access the data.

Each one of these challenges involved science at the cutting edge in a field that continues to move very rapidly, and doing it all at a scale never seen before. Genomics England had to be very flexible, changing its plans frequently to reflect new advances but also being humble enough to learn from things that didn’t go right, especially in the pilot stage. We learned a huge amount from patients and clinicians.

Delivering benefit to patients

The 100,000 Genomes Project delivered clinical benefits to patients, but an additional and critically important spin off is the importance of this huge amount of data to researchers. This includes those wanting to understand more about the genome itself but also to those wanting to develop new treatments, diagnostics, devices and medicines. Researchers can be academics as well as those from life science industries. These are not just well-known, big pharmaceutical and biotechnology companies but also a great number of innovative small and medium enterprises (SMEs) working in machine learning, data management and software.

Some people feel companies should not benefit commercially from patients who have donated their genome data without receiving any payment. Or that participant’s data might not be secure and that they could be identified if they take part, or their data used by researchers in a way that is not fair.

Commercialisation and who benefits?

Patients donate their samples and information using models of informed consent which have been approved by an independent NHS ethics committee.  Download the approved protocol for more details. Patients have explicitly been asked if they are willing for commercial companies to be able to conduct approved research on their data. Those people that have already generously consented to take part understand the challenges about sharing data in their own case but they are keen to see their data used to help progress research into the condition that affects them. If innovative treatments are to be found to extend or save lives then commercial companies will need to invest in the research, development and manufacture of new drugs and diagnostic tests. It has always been the case that this work is carried out in the commercial sector and not by government or within the NHS itself.

Genomics England is developing ways of charging for its data services to ensure that the costs of maintaining the data are shared with companies and that the UK tax payer will benefit should companies successfully develop drugs, devices, treatments, diagnostic tests or other services through its use. If successful products are developed, it means that patients are benefiting. Bespoke arrangements will be made with each company that uses the data if they are able to develop commercial products because of it.

Ethical issues

The 100,000 Genomes Project put ethics at its heart from the outset. Without doing this, it would not have been possible to develop a service that the NHS could use. High standards of ethical practice continue to underpin the NHS Genomic Medicine Service. Genomics England has its own independent Ethics Advisory Committee which advises the Genomics England board on the ethical aspect of everything Genomics England does. Issues already scrutinised include what information patients should receive about their results as well as policies on consent. A series of engagement and involvement activities with patients, clinicians and other groups about these issues has been undertaken. The outputs of these discussions is available here.

Privacy and confidentiality issues

Any relevant information about a patient will be returned to their doctor. For other medical researchers and companies to access Genomics England’s data services is conditional on first passing a rigorous ethical review before having their research proposal approved by Genomics England’s Access Review Committee using policies developed by our Ethics Advisory Committee. Insurers and marketing companies are not allowed access to the data.

Oversight by the Genomics England Data Advisory Committee will ensure that any researchers wanting access to data will go through rigorous identity checks and their use of the data will be closely supervised. No raw genome data can be taken away. The data will be kept within Genomics England’s data structures and will be constantly under its control. Genomics England commits itself to constant testing and re-testing of its security systems to ensure data safety.

While Genomics England has the data, patient identifiers (such as NHS number or postcode are removed) to reduce the risk of re-identification of clinical and genomic information with a particular individual. Only when data is used for a patient’s own care will identifiable data be made available to the patient’s doctor and medical team. Patients are told that participant anonymity cannot be absolutely guaranteed as in theory, any non-trivial piece of health records data can be re-identified by someone who already has access to sufficiently detailed information about an individual, for instance, social media posts. In practice, this is still very hard to do and harder still to achieve undetected. Genomics England can’t promise that no researcher would be able to do this but what it can promise is that it will be made so difficult that there would be far easier ways to achieve the same goal. Re-identifying patients is also illegal.

Genomics England is talking constantly to patients about their concerns to make sure that any issues they may have are addressed. Patients have been involved from the outset and are at the very heart of this project. In particular, the commitment to consent is of paramount importance.

It is not just patients and the NHS that stand to benefit from the 100,000 Genomes Project. There will be numerous knock-on advantages for the country. An example from the past of how a major infrastructure project produced widespread benefit beyond that intended might be the introduction of the railways in the Victorian era. Individuals and families benefited from cheap travel but the infrastructure created by the new railways also triggered an economic boom. Whilst the growth of some companies say, those making railway tracks, was predicted other economic benefits were not. For instance, there was a boom in holidays travel, resulting in the development of seaside towns, of hotels and even a boom in travel guides.

The 100,000 Genomes Project has some parallels. Whilst primarily for the benefit of people who are sick, there are potentially many economic benefits for the nation. We can be certain of benefits such as new medicines and diagnostic tests but just as with railways, some of the companies that may develop will be unexpected, built on new, as yet undiscovered technologies that will emerge over the next five years.

The 100,000 Genomes Project was not guaranteed to succeed, in the same way that there was no guarantee for the railways. So only the government has been willing to take the risk and make the necessary investment in it. And just as Victorian England with its great engineers was the perfect place for the birth of the railways, the UK, which not only leads the world in life sciences but has the unique benefit of the NHS, is the best place in the world to initiate the practical use of genome sequencing and interpretation for patient benefit. Our vision was one where the UK is the leader in a new industry where genomics is used to help patients get better, more personalised care and treatment. This has happened, with Genomics England having a global reputation.

The NHS has been preparing to use genomics as part of its routine care. It needed more scientists, geneticists and doctors, and these have been trained to interpret the data and understand what it means for a patient’s medical condition. In parallel with Genomics England’s work, a skills and training programme for workers in the NHS was set up by Health Education England.

The 100,000 Genomes Project has used the generosity of patients and the outstanding skills and talent found in the medical and the life sciences’ sectors in the UK to help deliver this project. Genomic England’s legacy is a genomics service that has been adopted by the NHS, high ethical standards and public support for genomics, new medicines, treatments and diagnostics and a country which hosts the world’s leading genomic companies.

Genomics England is wholly owned by the Department of Health & Social Care.

The 100,000 Genomes Project is mainly funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also generously funded research and infrastructure in the programme.