In our latest release of Ensembl, we launched a brand new web interface for the VEP (Variant Effect Predictor).

vep_logo

As “it says on the tin”, the VEP predicts the effect of variants (i.e. SNPs, indels and CNVs) on genes and regulatory elements. It tells you where your variants are located (e.g. introns, coding exons, transcription factor binding motifs), what effect they may have on protein coding sequences, and whether these effects might be deleterious or benign.

The VEP does this by mapping your variants against genes, transcripts, translations, and regulatory features that we annotate in Ensembl.
The Variant Effect Predictor can also be run against other gene sets: you can predict the effect of your variants on RefSeq genes too!

What is new?

The new web interface is more user-friendly and has lots of improvements:

Increased number of variants you can input

You can upload up to one million variants in a compressed format, with a 50MB file size limit. To upload these larger files, you simply need to log in. If you do not have an Ensembl account, you are missing out, as there are many perks of registering. It’s easy to do: just provide your name and email address. If you’d rather not register, the upload limit drops down to 5 MB, i.e. around 100,000 variants.

Display of results

We provide a summary statistics table and pie charts illustrating the different SO terms and the classes of coding consequences for the variants you input.

Slide1

The new web interface provides user-friendly pie charts and summary statistics.

The results preview table with additional details is shown after the pie charts. You can apply a range of filters to any of the data fields and limit the results you see. The full or filtered results can be downloaded as VCF or tab-delimited text for import into Excel.

Slide1

The new ticket tracker

You can run several jobs at the same time and track them back at a later date via the ticket numbers assigned to them. You can easily edit and re-run previous jobs. These jobs will be kept in our Ensembl servers for 30 days. If you register though, the jobs will be kept for as long as you like.

Slide1Population data from the NHLBI Exome Sequencing Project (ESP)

Slide1

Population frequency data for the 1000 Genomes and ESP projects.

The VEP provides frequency data for known variants from both the 1000 Genomes and NHLBI exome sequencing projects.

You can also use this frequency data to filter your variants: you may wish to exclude known variants with a frequency above 1%, for example.

VEP results are linked to BioMart

The results table in the VEP is now directly linked to BioMart, a data export tool.
This allows you to retrieve additional data about known variants or the genes your variants affect.
Slide1

You just need to select the attributes in BioMart, e.g. phenotype, orthologues, Gene Ontology terms, and you are ready to go.

Other ways to access the VEP

If you use a command line, you can run the VEP with our script on your own computer. With the Perl script, you can do everything you can do in the online version plus much, much more! It’s the most powerful way to use the VEP.

A couple of functionalities of the VEP (e.g. fetch variant consequences) are also available in the beta version of our language agonistic Rest API.

Help on the VEP

Have a look at our video on the new online VEP interface and our documentation pages for help on the web interface and script versions of VEP.

If you have questions or comments, please get in touch with us.

It has been quite a while since we’ve blogged about the VEP (Variant Effect Predictor), and in that time we’ve added a whole load of new features, particularly to the downloadable script version.

Structural variants

The VEP now supports finding the consequences of structural variants, with input either in VCF or tab-delimited format. Using the web interface to the VEP you can visualise which transcripts and features your structural variants overlap by clicking through to the Region in Detail view:

Screen Shot 2013-04-19 at 15.14.23 copy

The cache

We’ve really pushed the VEP script’s capabilities when using local “caches” (as opposed to using remote databases). Almost every feature of the VEP is now available when using the cache in offline mode. You can use a local FASTA file to quickly retrieve the sequences required to construct HGVS notations. You can even construct your own cache from a GTF file if your species isn’t supported by Ensembl.

Our cache for human now contains allele frequency data from phase 1 of the 1000 Genomes Project, and you can use these frequencies to filter your input (for example, you might want to filter out variants that are common in the combined European (EUR) population). We also now provide SIFT predictions for 8 species – human, mouse, zebrafish, pig, cow, chicken, rat and dog.

Plugins

We’re always trying to add new and useful features to the VEP, but we also recognise that other users have great ideas that they’d like to implement. The VEP script enables the use of plugins; these are bits of code that add extra functionality to the VEP. They can be used to retrieve data from remote sources, run external tools, filter output; pretty much anything you can think of can be accomplished in a plugin!

It’s easy to get started, and a basic plugin can be just a few lines of code – have a look at some of the examples we’ve created.

I recently added a plugin to retrieve data from dbNSFP – this is a great resource created by Liu et al in Houston, TX. They have, for every possible missense substitution in the human genome, pre-calculated pathogenicity scores, frequencies, conservation scores and a plethora of other things, and made all of this available as an easily downloadable file. To use this with the VEP, you just download the file and the plugin, run a couple of commands to get the data into the right format, and away you go – the VEP can now provide you with scores from LRT, MutationAssessor, MutationTaster, FATHMM and more for any missense substitution in your input.

Summary and HTML output

We had a number of requests for the VEP to provide summary statistics at the end of each run, and who are we to disappoint our loyal users?!? The VEP now writes a pretty HTML summary:
Screen Shot 2013-04-03 at 13.35.45 You can also view your output as HTML using the –html flag, which allows you to sort, filter and analyse your output on the fly.

Don’t hesitate to get in touch with us about the VEP – our developer mailing list is the best place for technical questions, with helpdesk for everything else.