We’re fortunate to be part of the EMBL European Bioinformatics Institute (EBI), which puts us alongside stellar bioinformaticians and resources in every discipline. From this, great collaborations can grow. We’ve already worked with our colleagues at Gene Expression Atlas and Reactome to embed widgets in Ensembl for viewing baseline gene expression and biochemical pathways respectively, but our latest collaboration is with the Protein Data Bank in Europe (PDBe) to show genetic variation on protein structures.
The number of genes and transcripts we have in Ensembl can make your VEP results very big. Filtering your results after running the VEP is the best way to make this more manageable, but you can also reduce the results in your run itself, to only get one result per variant or variant/gene combo.
The VEP can work as an offline or a web tool and it’s also available as REST service. Perfect for integrating into pipelines or displaying data on the web, the REST API VEP endpoints can take input as HGVS, genomic loci or variant identifiers and can interpret common forms of non-standard HGVS. They are all available using both GET and POST protocols, supporting queries on single or multiple variants respectively.
Identifying the causal variants from a GWAS generally involves identifying the haplotype blocks that contain your variant of interest, rather than the variant and the gene it is affecting itself. To find the actual genes involved, you need to consider all variants in LD with your identified associations. Ensembl Post-GWAS analysis pipeline (PostGAP) can provide automatic fine-tuning of your GWAS variants, incorporating regulatory information and population-wide LD calculations, along with your VEP results.
Most of the time when we talk about variant annotation, we talk about the effects of variants on genes, but did you know that the VEP can also tell you how variants affect the genomic features that regulate gene expression, such as promoter and enhancers?
If a variant hits a splice site, you want to know if splicing is going to occur as normal, or if you can expect a different protein isoform. We have a few cool tools with the VEP that will help you to assess that for your own variants.
A common use case for the VEP is as a first step towards identifying the causal genetic variant of a rare phenotype from whole genome/exome sequencing. The VEP tells you which genes are hit, what effects they have on them, and you have to begin the long laborious process of filtering those down. Things you might consider include allele frequency, association with genes known to be involved in rare disease and whether both genes in a diploid organism are affected. Rather than faffing about doing this manually, you can use the G2P (genotype to phenotype) plugin instead, which was recently published as a preprint.
If you don’t want to analyse your variants on external servers or have more than 1000 or so to annotate, you probably want to use the VEP script. Setting it up might not always be straightforward as there are dependencies you need, but the installation script takes away a lot of the trouble.
Ensembl produce high quality gene annotation for a number of species, but getting it to the high quality we expect takes time. This means there are many species and strains where we don’t have annotation yet. If you’re working with a species without Ensembl annotation (like Trixie the Triceratops here) or even a specific strain that we don’t have, you can still make use of VEP for predicting the effect of variants on genes and transcripts, using your own annotation. All you need is a GFF or GTF of the transcripts, and a FASTA file of the genome.