Cool stuff the Ensembl VEP can do: highlighting likely causal GWAS variants with PostGAP

Identifying the causal variants from a GWAS generally involves identifying the haplotype blocks that contain your variant of interest, rather than the variant and the gene it is affecting itself. To find the actual genes involved, you need to consider all variants in LD with your identified associations. Ensembl Post-GWAS analysis pipeline (PostGAP) can provide automatic fine-tuning of your GWAS variants, incorporating regulatory information and population-wide LD calculations, along with your VEP results.

Prioritising variants linked to single gene disorders can use a strategy of filtering by allele frequency, severe variant consequences and genes with known phenotypes, but when you are studying common diseases or phenotypes, for example by GWAS, this requires a much more complex approach. It is thought that common diseases are generally caused by the accumulation of a number of common variant alleles that subtly affect gene expression or activity, so you expect your alleles to be common in the population and you don’t always expect to see your variants in genes. On top of this, common variants are generally in LD with other variants, making tying down which one is actually affecting your phenotype, and which ones are just along for the ride, even more difficult. Indeed the causal variant may not be the one found by your GWAS, but one in LD with a GWAS top hit.

We’ve worked with our EBI colleagues from the NHGRI-EBI GWAS Catalog and Open Targets to develop PostGAP. PostGAP considers all the variants in LD with your variants of interest, based on 1000 Genomes genotypes. These variant clusters are analysed for variant-gene interactions: as well as the normal mapping variants to GENCODE genes, regulatory activity is analysed to give genes whose gene expression is affected by the variants, incorporating data from GTEX, RegulomeDB, FANTOM, DHS and PCHiC. Known phenotypes associated with variants in previous GWAS are identified, providing statistics on those from the NHGRI-EBI GWAS Catalog.

Data from numerous sources are incorporated into the POSTGAP pipeline

These data are pre-computed for known variants and can be accessed by running the PostGAP plugin along with your VEP script. This can also be used as a python script directly on your GWAS results, or to find known GWAS associations with a particular trait.