The Variant Effect Predictor (VEP) is one of Ensembl’s most popular tools. It has grown in 6 years from a simple perl script with just a couple of hundred lines of code to become a multi-limbed beast with thousands of lines of code and well over 100 configurable options.
VEP is now used by many high-profile projects, institutes and companies around the world. In order to effectively manage this growth and ensure we deliver the most reliable and feature filled variant annotator out there, we’ve had to go back to basics. Over the past six months the VEP codebase has been totally rewritten, and the new version is now available for download. Users of VEP’s web and REST API interfaces should see virtually no difference with the new version, so if that’s you, you can stop reading now!
For users of our command line tool, you can trial the new VEP by visiting https://github.com/Ensembl/ensembl-vep. The full list of changes to the code can be found in the README on GitHub, but these are the main points of note:
- Faster : process an individual genome in around 30 minutes.
- Backward-compatible : all data sources (cache files, databases) and most command line flags from the old code are fully compatible with the new code.
- More reliable : test-driven development means the new code is covered by more than 1500 unit tests with over 99% statement coverage.
For those tied to the current codebase, it is still available as part of the ensembl-tools GitHub repository, though updates and support for this will cease over time. Ensembl release 87 will be the last for which the ensembl-tools version of VEP will be the “primary” VEP codebase. Of course, the previous code and supporting data will remain available as part of Ensembl’s archiving strategy.
Some other points of note:
- The documentation at ensembl.org still refers to the old code. From Ensembl release 88 onwards full documentation for the new code will be made available.
- If possible, please report any issues you may find with the new code as a GitHub Issue.
- The code that calculates variant consequence types (e.g. missense_variant, stop_gained) remains a part of the ensembl-variation API module and has not been (significantly) updated; it is used by both the old and new code. The ensembl-vep codebase performs the following functions:
- parsing command line flags
- parsing input
- reading data from annotation sources (databases, cache files, flat files)
- interval alignment of input variants with annotation data
- writing output
- monitoring statistics
- data filtering interface