Plugins can be an excellent way to extend the functionality of the VEP. They can be used to look-up information in external databases or use the Ensembl API to add to or filter your VEP output. Many plugins have already been written, both by us and external groups, but with a bit of Perl you can easily write your own.Continue reading
Category: Ensembl VEP
The interpretation of non-coding variants is more challenging than that of coding variants as less prediction methods and reference data are available. On top of the annotation provided for human and mouse in the Ensembl Regulatory Build, the Ensembl Variant Effect Predictor (VEP) also integrates two other human-specific datasets providing information about how variants can affect gene expression. The plugins, satMutMPRA and FunMotifs, are available for use with command-line VEP. One provides detailed information on the impact on expression of variants in the regulatory regions of disease-associated genes; the other an alternative set of genome-wide transcription factor binding motifs.
By default VEP will tell you the consequences for every transcript affected by a variant. You may wish to prioritise your analysis to only the most important or well supported transcripts for each gene, and VEP provides information to help you do that.
Some missense variants have significant impact on the protein function, some do not. In the absence of global comprehensive functional assays of missense variants, the next best way to assess if a missense variant is likely to be pathogenic is through prediction tools which take into account factors like the chemical properties of amino acids, functional protein domains and protein conservation to predict how likely it is that a missense variant will impact function. A number of different missense pathogenicity predictors are available for human through Ensembl VEP, and these are are optimised for different purposes.
With all the fuss we make about our resources for human genomes, you might think the VEP was just for human; it’s not. We have really useful resources, like SIFT, phenotypes and caches for loads of other species in Ensembl.
HGVS notation is an excellent way to describe variants in proteins, and VEP can interpret variants described this way to see if they are already known or if they affect other genomic features, so long as there is enough information to find a unique genomic location. If there isn’t, the Variant Recoder can help you to find the variant you need.
Interpreting a single variant can be a lot more involved than just finding out its consequence. Sometimes to understand a variant, you need to know exactly where it falls, which exon, which amino acid, sometimes even which base in the codon. The VEP gives you all of this by default.
If you’re really delving into the role of a particular genetic variant, you might want to know about that base position in other species. VEP can get you ancestral alleles in human and conservation scores in many species for a variant position allowing you to assess if a position is evolutionarily important, or if an allele matches our primate ancestors.
If you’re trying to work out which variants are associated with a phenotype or disease, a major thing you might want to know is if someone else has already spotted it. And if not the variant, maybe the gene that it hits. You can get that through the VEP.
The number of genes and transcripts we have in Ensembl can make your VEP results very big. Filtering your results after running the VEP is the best way to make this more manageable, but you can also reduce the results in your run itself, to only get one result per variant or variant/gene combo.