Variants can be represented in myriad different ways; indeed, Ensembl VEP currently supports input in many different formats, including VCF, HGVS and SPDI. However, even within these specifications, variants can be described ambiguously. Insertions and deletions within repeated regions can be described at multiple different locations. For example, VCF describes variants using their most 5’ representation, while HGVS format describes a variant at its most 3’ location. 

Starting in Ensembl 100, VEP optionally normalises variants within repeated regions by shifting them as far as possible in the 3’ direction before consequence calculation. This standardises VEP output for equivalent variant alleles which are described using different conventions. 

Continue reading

If you are filtering a set of variants to look for those potentially involved in disease, your first stop will probably be databases of phenotype associations, like ClinVar. There is also a lot of valuable information on variant-disease associations in the literature, which may not yet have been extracted into curated databases. It can be hard to compile lists of citations for a large set of variants, but Ensembl VEP is here to help! 

Continue reading

We know that installing the VEP is not always trivial – there are dependencies and modules that you may or may not have already, and your existing setup may require different module versions. It’s also designed for a Linux system and installing on, for example, Windows, can be complex. To get around this, the VEP and all its dependencies are available in a Docker image, so that you can install everything with just a few simple commands.

Continue reading

Plugins can be an excellent way to extend the functionality of the VEP. They can be used to look-up information in external databases or use the Ensembl API to add to or filter your VEP output. Many plugins have already been written, both by us and external groups, but with a bit of Perl you can easily write your own.

Continue reading

The interpretation of non-coding variants is more challenging than that of coding variants as less prediction methods and reference data are available. On top of the annotation provided for human and mouse in the Ensembl Regulatory Build, the Ensembl Variant Effect Predictor (VEP) also integrates two other human-specific datasets providing information about how variants can affect gene expression. The plugins, satMutMPRA and FunMotifs, are available for use with command-line VEP. One provides detailed information on the impact on expression of variants in the regulatory regions of disease-associated genes; the other an alternative set of genome-wide transcription factor binding motifs.

Continue reading

Some missense variants have significant impact on the protein function, some do not. In the absence of global comprehensive functional assays of missense variants, the next best way to assess if a missense variant is likely to be pathogenic is through prediction tools which take into account factors like the chemical properties of amino acids, functional protein domains and protein conservation to predict how likely it is that a missense variant will impact function. A number of different missense pathogenicity predictors are available for human through Ensembl VEP, and these are are optimised for different purposes.

Continue reading