Research Environment Training Sessions: Finding participants based on genotypes
For many analyses, you may be starting with a (list of) gene(s) and you want to find all participants with variants in that/those gene(s). Or maybe you have variant loci and you want to get all participants with homo- or heterozygous alternative alleles at these loci.
In this training session, we will look at both point-and-click tools for finding variants and command line tools on the high-performance cluster (HPC), including using GEL-provided workflows.
We will have a look at the Labkey tiering tables that provide all variants that are considered to be plausibly pathogenic, and learn how to filter these by genes or loci. We will use the Integrated Variant Analysis tool (IVA) to search for variants by genes or loci, plus other parameters such as proband and parental genotypes, consequences and population frequencies. For each of these variants, we can pull out the participants with these variants. The training will also cover how you can use APIs to fetch the same data programmatically.
We will also use the Small Variant workflow and SV/CNV workflow that allow us to identify all variants (short and structural, respectively) in a list of genes, pulling out the platekeys of participants with these variants. To find individuals with variants at particular loci, we will use bcftools with the aggregated VCF files on the HPC.
You are only allowed to attend this session if you are eligible for data access. This means that you are a GECIP or Discovery Forum member that has met the necessary verification checks and passed our Information Governance training course. If you do not meet this criterion by 17 July 2023, you will be unregistered for this session.
You can find materials from past training sessions and information on upcoming training sessions on the Genomics England Research Environment User Guide.
13.30 Introduction and admin
13.35 LabKey tables of variant genotypes
13.45 Finding genotypes with IVA
14.00 The Small Variant and SV/CNV workflows
14.15 Aggregated variant files
14.30 Using bcftools on the HPC
14.45 Getting help and questions
After this training you will know:
- Know which LabKey tables which contain tiered variant data
- Use the IVA Variant Browser to filter variants
- Differentiate between the the Small Variant and SV/CNV workflows and know when to use them
- Understand the contents of the aggregated variant files: AggV2 and SomAgg
- Run pipelines and tools on the GEL HPC
This training is aimed at researchers:
- working with the Genomics England Research Environment
- working with genetic and genomic variation data
- who can work on the command line to run tools and scripts