Skip to main content

Genomics England Research Environment Training Session: Building a cohort based on phenotypes

Past event


Location Online

Please note that this event has already passed. View our upcoming events

Building a cohort is a vital first step in many kinds of genomics studies, such as GWAS, aggregate variant testing and identifying cancer characteristics. The vast array of phenotypic data available in the Genomics England Research Environment (GEL RE), both recruited disease and electronic health records, is a great resource for cohort building and verification.

You are only allowed to attend this session if you are eligible for data access. This means that you are a GECIP or Discovery Forum member that has met the necessary verification checks and passed our Information Governance training course. If you do not meet this criteria by 23rd May 2022, you will be unregistered for this session.

This training session will go over some of the ways you can build cohorts in the GEL RE: Participant Explorer for point-and-click creation and the Labkey API for programmatic construction and verification. Using both methods, we will show how you can pull out the genomic file locations, or the participant identifiers to use with variant aggregation files.

During the session, we will discuss the tables in the database which contain phenotypic data, using ICD10 codes for diagnoses in the primary and secondary tables, plus other parameters such as staging and treatment in cancer, or continuous measurements in rare disease. We will also look at covariates that you may wish to consider, such as age at diagnosis, sex and ethnicity.


14.00 Welcome and introduction 14.05 Parameters and considerations for building a cohort
14.15 Point-and-click cohort building with participant explorer
14.25 Labkey tables for cohort building in cancer, rare disease and common disease
14.35 Covariates for cohort building
14.45 Using the Labkey API in Python and R
14.55 Getting genomic filepaths for your cohort
15.05 Using your cohort with aggregate VCFs and bcftools
15.15 Questions

Learning objectives

After this training you will know:

  • Where to find phenotypic and covariate data for building cohorts in the Genomics England Research Environment
  • How to create cohorts using the Participant Explorer point-and-click interface
  • How to use the LabKey API to create and verify cohorts with Python or R

Get the latest updates straight to your inbox