To support the filtering and interpretation of structural variants (SVs), the Ensembl VEP web interface has been extended in release 115 to annotate them with allele frequencies from gnomAD and clinical significance from ClinVar.
gnomAD Allele Frequencies
The option is available under the “Variant and frequency data” section and called “gnomAD SV allele frequencies”. Once this option is selected, a sub-menu will appear to choose the minimum percentage of overlap needed to trigger annotation (Figure a). For example, if “80% overlap” is selected, then the frequency from gnomAD SVs that cover at least 80% of the input SVs is attached. By default, the value is “exact match”.
A distinction needs to be made between the “100% overlap” and “exact match” options. For “100% overlap”, the gnomAD variant can be much larger than the input SV, covering the input variant 100% and some (or many) more. But for “exact match”, they need to be an exact match in their length and location, but not necessarily type.

(a) A sub-menu for gnomAD SV allele frequencies in the Ensembl VEP job form that gives the option to select the minimum percentage of overlap needed to trigger annotation.
The following figure (b) shows an example of the output page showing annotated frequencies for different gnomAD genetic ancestry groups. The first column “gnomAD SV” provides the identifier and a link that takes to the relevant variant page in the gnomAD browser.

(b) The Ensembl VEP results page shows the annotated frequencies for different gnomAD genetic ancestry groups and links out to the gnomAD browser.
ClinVar Clinical Significance
This option is available under “Phenotype data and citations” named “Clinical Significance (SV)”. It also has a similar sub-menu like gnomAD SV allele frequencies to choose the minimum percentage overlap.

(c) A sub-menu for Clinical Significance (SV) in the Ensembl VEP job form that gives the option to select the minimum percentage overlap.
The result page has two columns – “ClinVar SV CLNSIG“ showing the clinical classification and “ClinVar SV CLNACC” showing the ClinVar variant record (VCV) identifier. The VCV identifier also provides a clickable link on the ClinVar SV accession that takes you to the variant page on the ClinVar website.

(d) The Ensembl VEP results page displays the clinical classification (ClinVar SV CLNSIG) and the ClinVar variant record (VCV) identifier (ClinVar SV CLNACC).
Limitations
Ensembl VEP has a default limit of 10 Mbp for the size of SV it can process. A variant larger than this cannot be annotated in the web interface.
Also, rendering of the “result preview table” can be slower and may time out if the processing of the Ensembl VEP result file takes too long (You may encounter an Ajax error indicating that the page failed to load). The rendering time would vary depending on the number, size and the location of the variants used for input. For perspective, a ~5Mbp deletion variant overlapping ~5000 features and overlapping ~30 gnomAD and ClinVar variants on average for each feature can take a minute to render.
The same functionality is available via the Ensembl VEP command line and is suitable for larger workloads. For example, for gnomAD frequencies, download the VCF from the gnomAD website and use it with the custom annotation option:
vep -i test.vcf --offline --cache --custom file=”/path/to/gnomad.v4.1.sv.sites.vcf.gz”,short_name=gnomad_sv, format=vcf,type=overlap, fields=AF%AF_afr%AF_ami%AF_amr%AF_asj%AF_eas%AF_fin%AF_mid%AF_nfe%AF_rmi%AF_sas
Make sure to also download the tabix index file provided along with the VCF and store it alongside the VCF file in the same directory. Similarly, for ClinVar you can download the VCF and tabix files from dbVar FTP (nstd102 study) and run:
vep -i test.vcf --offline --cache --custom file=”/path/to/nstd102.GRCh38.variant_call.vcf.gz”, short_name=clinvar_sv,format=vcf,type=overlap,fields=CLNSIG%CLNACC
Authored by Syed Nakib Hossain, Jamie Allen, Aleena Mushtaq
