During the latest Ensembl VEP releases, we have been enhancing the options available to you when you annotate variants against your own reference data as ‘custom annotation’. Here we detail some of those improvements.
The ‘–custom’ option enables you to integrate genomics data in standard file formats (VCF, GTF,GFF3, BED, bigWig) into Ensembl VEP analysis. All the information on using custom annotations with Ensembl VEP is available in our public documentation. Check the Options section for a list of all available options, including those mentioned below.
Named options in custom annotation argument
Since Ensembl VEP 110, each custom annotation source can be configured using a comma-separated list of key-value pairs:
./vep […] –custom file=Filename,short_name=Short_name,format=File_type,type=Annotation_type,fields=VCF_fields
These named options in the custom annotation argument improve readability and are easier to use as you don’t need to remember the order they need to be in.
For backwards compatibility, Ensembl VEP still supports the positional options from previous versions, as detailed in our public documentation. However, note that the new features presented here can only be enabled using the named options feature.
You can now configure which reference features are returned by specifying the kind of overlaps you wish to see.
You can select different match types (type):
- overlap (default): report annotations that overlap the variant by even 1 base pair.
- exact: report annotations whose coordinates match exactly those of the variant.
- within (new): report annotations within the variant.
- surrounding (new): report annotations that completely surround the variant.
and different distances or % overlap:
- overlap_cutoff: minimum percentage overlap between annotation and variant.
- reciprocal: mode of calculating the overlap percentage for overlap_cutoff.
- 0: percentage of annotation covered by the variant.
- 1: percentage of the variant covered by annotation.
- distance: distance (in base pairs) to the ends of the overlapping feature.
Additional match configuration is available when you are using custom VCF:
- same_type: only match identical variant classes (for instance, only match deletion variant with deletion annotations).
Summary statistics for custom annotations
Ensembl VEP now is also able to calculate summary statistics (mean, minimum, maximum, count and sum) for custom annotations based on the scores of the different file formats with the summary_stats option. By default, no statistics are calculated.
The scores are obtained differently depending on the custom file format in use:
- For BED, GTF and GFF files, the score is retrieved from the score column (5th column in BED files and 6th column in GTF/GFF).
- For BigWig files, the score is retrieved from the data values.
- For VCF files, the score is retrieved from the quality (QUAL) column.
Number of matching records to display
When using large structural variants with custom annotation (specially BED/BigWig files), we may get more matching records than desired, creating heavy files that are hard to parse and handle. As such, Ensembl VEP now only reports 50 matching records by default. Any remaining records are represented with an ellipsis (…).
To change this behaviour, please use the new option num_records:
- Select any positive value to display that specific number of records.
- Use all to display every single matching record. Note that this may create really large files that are computationally intensive to process.
- Use 0 to display uniquely an ellipsis (…) if there are matching records.
If you have comments on the new features or ideas on how to improve them, please contact us on the Ensembl Helpdesk.