Skip to main content
Research seminar hero blue

Research Environment Training Sessions: Building rare disease cohorts with matching controls

Past event


Location Online

Watch event
Please note that this event has already passed. View our upcoming events

Research Environment Training Sessions: Building rare disease cohorts with matching controls

Building a cohort is a vital first step in many kinds of genomics studies, such as GWAS, survival analysis and identifying cancer characteristics. The vast array of phenotypic data available in the Genomics England Research Environment, both recruited disease and electronic health records, is a great resource for cohort building and verification.

This training session will go over some of the ways you can build cohorts in the Genomics England Research Environment: Participant Explorer for point-and-click creation and the Labkey API for programmatic construction and verification. Using both methods, we will show how you can pull out the genomic file locations, or the participant identifiers to use with variant aggregation files. During the session, we will discuss the tables in the database which contain phenotypic data, using ICD10 and HPO codes for diagnoses in the primary and secondary tables, plus continuous measurements in rare disease. We will look at how you can build matched cohorts of sex and ethnicity.

You are only allowed to attend this session if you are eligible for data access. This means that you are a GECIP or Discovery Forum member that has met the necessary verification checks and passed our Information Governance training course. If you do not meet this criterion by 19 June 2023, you will be unregistered for this session.

You can find materials from past training sessions and information on upcoming training sessions on the Genomics England Research Environment User Guide.


13.30 Introduction and admin

13.35 Parameters and considerations for building a cohort

13.45 Point-and-click cohort building with Participant Explorer

13.55 Labkey tables for cohort building in rare disease

14.05 Using the Labkey API in Python and R

14.15 Creating a matched cohort

14.25 Getting genomic filepaths for your cohort

14.35 Using your cohort with aggregate VCFs and bcftools

14.45 Getting help and questions

Learning objectives

After this training you will know:

  • Where to find phenotypic and covariate data for building cohorts in the Genomics England Research Environment
  • How to create cohorts using the Participant Explorer point-and-click interface
  • How to use the Labkey API to create and verify cohorts with Python or R

Target audience

This training is aimed at researchers:

  • working with the Genomics England Research Environment
  • working in rare disease genomics
  • who can programme in python and/or R (a small segment of the training is suitable for non-programmers)

Get the latest updates straight to your inbox