Cool stuff the Ensembl VEP can do: annotating structural variants

The Ensembl VEP does not only allow you to annotate short variants, such as SNPs and short insertions or deletions, but also some types of structural variants.

Structural variants (SVs) are larger re-arrangements in the chromosomal structures, including copy number variations (CNV), inversions and translocations.

Our documentation summarises how we classify SVs further

To annotate your SV with the VEP, you should provide the data in VCF (Variant Call Format). You specify the SVTYPE as usual; and CNVs, long insertions, deletions, duplications and inversions can be annotated. In the INFO field, you need to include ‘END’ or ‘SVLEN’ tags to show how long the SV is.

The consequence predictions made by VEP depend on what you specify here. For deletions, the VEP tests if the SV results in a feature, such as a transcript, being ablated (ie. totally removed) or truncated. For insertions, it tests if a feature is elongated. Finally, for duplications, it tests if a feature is elongated or amplified. The VEP can report the length and the proportion of the transcript that is overlapped by the SV. To do this, use the –overlaps option with the command line tool. 

Identifying similar overlapping variants, which may have useful annotations, is naturally more complex for SVs than it is for short variants. You can use the StructuralVariantOverlap plugin to customise your overlap criteria and annotate your variants with information from other SV sets, such as the population frequency data from the Genome Aggregation Database (gnomAD). The phenotypes plugin can also be used to report any phenotypes which are associated with the genes your SVs overlap.

It is important to note that VEP only annotates variants with a size of up to 10MB by default. You can change this with the –max_sv_size option of the command line tool, but this will increase your memory requirements. To help counteract this, you can reduce the number of variants VEP analyses at once, from the standard 5000, using the –batch_size option. Finally, we are aware that the functionality of the VEP for SVs is still somewhat basic. We will continue to work on it to improve it in the future!