The first attempt to sequence a whole human genome took 13 years – all 3 billion letters of the genetic code. Super fast new technology means this can be done in as little as 24 hours today.
In practice, because we batch genomes together for efficiency, it takes us 3 days to sequence a whole genome.
But sequencing is only the beginning.
When we look at your genome, we are looking for a needle – a glitch – in a vast haystack. The first thing we need to do is make the haystack a bit smaller.
Luckily human genomes are 99.8% the same. But that’s still around 4 million potential differences, most of which are healthy variations that make us the individuals we are.
Sequencing your genome produces two files of info. One is the raw data – all six billion letters. The other is what’s called a variant call file. That’s the 4 million. This is the ‘small’ haystack we now work with.
The process so far has largely been automatic. (Though at the time we started this project, no-one in the world had sequenced 100,000 whole genomes – and we’ve made something sound easy that even 5 years ago would have been thought impossible. Hats off to Illumina who do this for us in Cambridge.)
Next comes the bit that takes the time.
Bioinformaticians – scientists who are brilliant at organising information and spotting patterns – trawl through the 4 million, looking for the glitches that might possibly account for someone’s symptoms. They decipher how each one might affect a person and pull from the many, many hundreds of potential ‘needles’ – changes that might possibly be responsible for a problem. This bit is called ‘annotation’.
But they’ve still got hundreds and hundreds of potential suspect glitches. A small haybale’s worth.
Some of these now get discarded thanks to a filtering system which uses special tools that can access huge databases of knowledge. Out for instance go changes where there’s good evidence that they’re commonly found in the population and don’t cause a problem. Out also go changes that don’t fit the disease in your family.
This is where knowledge of your symptoms becomes key. The question we ask here is how likely are the changes we’ve identified so far to be the cause of your symptoms? We use what are called ‘panels’. This is where we compare your changes against a set of changes already known to be involved with a particular condition. We also use external experts in ‘clinical interpretation’ to look for anything our panels have missed.
This part of the process consults online gene encylopaedias, built up by thousands of scientists and doctors, all over the world, over many years. It also uses clever computer programmes that recognise patterns of symptoms seen in people where we know the cause of their problems.
But we may draw a blank. Could this be something new?
We send this report back to your specialist at the hospital where you entered the project. Here, NHS scientists and doctors – usually clinical geneticists – but also doctors with interests in inherited problems in their particular specialty, like cardiology or ophthalmology, pore over the list of potential glitches we’ve produced and use their knowledge of the way that genes are inherited and family patterns, to compare your symptoms and the suspect glitch list. They mark them as likely or not likely to be the cause of your particular condition.
Where there is a likely result that will be fed back to you, they repeat a test for that finding. They do this because our processes are new and they need to be sure we’ve got our answers right. This quality checking will go on for a while.
The NHS scientists and doctors who review our report may decide that none of our glitch list is likely, or that some of them might be suspect but as no-one else has reported them as associated with your type of symptoms, that goes down as a potential query.
We don’t tell you about potential queries. That’s because we could have got it completely wrong and we need to be much, much more sure. After all, if we were wrong, you could be given entirely the wrong treatment or advice. And that’s not right.
For a lot of patients, there will be no glitch identified. It doesn’t mean that there isn’t one. We just haven’t found it yet. But we carry on looking.
And remember those queries? They go back into the system’s encyclopaedias and back to Genomics England’s specialist clusters of experts (they’re called GeCIP domains). If there is another report, from anywhere in the world of the same glitch causing a problem like yours, your ‘potential query’ suddenly looks a lot more likely. And we will get back to you. No matter how long it takes.
Genomics England is building a pipeline in which most of the processes we’ve told you about here will become automatic. No-one, anywhere in the world, has ever done this before and it’s really hard.
We’ve begun to produce results using a semi-automatic process. We’re switching over to fully automatic soon. We will get quicker and quicker. But if you are one of the first who took part, sorry and thank you. Sorry because it is taking so long for you but thank you – for bearing with us and because without your involvement we wouldn’t have been able to start trying to automate this and make it quicker for everyone else.
In the end, we want results to turn round in just over a month. At the moment, we are at least a year or more if you were in the pilots. Sorry again. And thanks.
We’re getting the pilot results back first.
Then we’ll begin returning the main programme results. It won’t be in a strictly first in, first back order. That’s because some are really tricky and take us extra time whereas others – particularly where we have genomes from mum, dad and child – are less hard and we can get them back faster.
We know how important these results are so we’re doing them as speedily as we can, but we are devising completely new systems for doing this that no-one else in the world has ever attempted before. It is really difficult.
And if you ticked the ‘additional findings’ box, these will be returned to you separately at a later date. We can’t tell you at this stage when that will be.