Quantitative Methods, Machine Learning and Functional Genomics GeCIP Domain


Project summary

In the field of genomics, technology is outstripping our capacity to analyse the data. The 100,000 Genomes Project raises data analysis and interpretation questions that have not yet been addressed. These are the main focuses of the Quantitative Methods, Machine Learning and Functional Genomics GeCIP domain.


To develop and use statistical methods to improve our understanding of the human genome. To identify genetic changes that cause disease. To better understand the relationship between changes in the genome and human disease. To train the next generation of genome analysts.


Statistical Genomics and Genetic Epidemiology (SGGE) Prof. Ele Zeggini
Prof. Martin Tobin
The SGGE subdomain comprises 5 main streams of activity
1) Methods for relating Genotype to Disease;
2) Methods for Functional Genomics;
3) Implementation;
4) Dissemination
5) Training
Statistical Machine Learning (SML)Prof. Chris Holmes
Prof. Chris Yau
The SMMLE subdomain aims to:
1. facilitate the open exchange and sharing of statistical ideas and methods across GeL and ensure best practice in clinical interpretation analysis.
2. work with disease groups and provide access to world-leading experts for the development of novel statistical and computational data analysis algorithms.
3. provide training and support for junior researchers and developers to leave a legacy of world-leading expertise in statistical machine learning in genomics-based healthcare.
Epigenomics (E)Prof. Vardhman Rakyan
Prof. Stephan Beck
The epigenomic subdomain will focus, in partnership with GEL, on generation of blood DNA methylomic datasets and on the subsequent computational and bioinformatic analysis of these data to identify epigenetic variation of clinical relevance. The activity will focus on the following questions:
(i) Are there disease specific profiles in blood and can they be used for biomarker discovery?
(ii) Are there genetic-epigenetic interactions that could be clinically informative?
(iii) What is the relationship between expression and epigenetic marks?
(iv) How clinically informative is the analysis of epigenetic variation at retroelements?
(v) What are the commonalities in epigenetic profiles among different related diseases and can this information be used for clinical benefit?
(vi) How do patient derived epigenomic profiles compare with other ‘healthy’, richly phenotyped populations contributed by executive and associated members and can such comparisons uncover antecedents of disease?
Building on the initial analyses, we envisage additional research questions that will take advantage of the power of multi-omic approaches in GEL data, and leverage evidence from parallel functional studies in cell or model organisms (e.g. epigenomic engineering approaches).
Transcriptomics and RNA Splicing (TRS)Prof. Diana Baralle
Prof. Ian Eperon
The TRA subdomain will provide service and research activities. The service activities involve (i) finding and interpretation of transcriptome variants associated with disease and (ii) identifying where possible how they affect gene function, whether through cis- or trans-acting effects on gene expression. The subdomain will also optimise the processes for assessing variants with regards to effects on the transcriptome through developing laboratory testing, bioinformatics or transcriptomics. This will provide a resource for future discovery research aimed at better understanding and prediction of the effects of mutations. The above will be achieved through 3 main areas of activity:
· Clinical: variant interpretation with regards to splicing, working closely with the Validation and Feedback domain and clinical domains to optimise return of findings;
· Translational clinical research - including development of processes, algorithms and tools for improved prediction of phenotypic consequences of intronic and exonic variants that alter RNA processing, with follow-up studies of key 'suspicious' variants of unknown significance (abbreviated VUS) splicing pharmacogenomics and, importantly, transcriptomics;
· Discovery research – including the identification of system-wide effects, investigations of the mechanisms and predictability of deep intronic mutations, identification of splicing events affected by drugs and identification of common splicing switches that mediate disease which could be targeted therapeutically.

Other Projects