We had a number of queries about the Ensembl VEP related to its support for SARS-CoV-2. Here, we talk about how you can use the command line VEP to analyse your variants against the SARS-CoV-2 gene set from Ensembl.
This uses the VEP custom annotation option and the GFF file with the SARS-CoV-2 gene annotation available on our FTP site. The GFF file needs sorting and indexing as described here. You also need a FASTA file with the sequence that is available from our FTP site too. The FASTA file will need bgzip’ing to enable rapid sequence look up.
For our example, we use the accession MN908947.3 as the sequence name. If your variant file uses a different name for the sequence (such as NC_045512.2) you will need to provide a synonym file, such as chr_synonyms.txt used in the example command below. This would have the following format in our example:
MN908947.3 NC_045512.2
NC_045512.2 MN908947.3
Now you are ready to run the VEP! Here is an example command using a VCF file with your variants as input:
vep -i myvariants.vcf -gff Sars_cov_2.ASM985889v3.100.primary_assembly.MN908947.3.gff3.gz -fasta Sars_cov_2.ASM985889v3.dna_sm.toplevel.fa.gz -synonyms chr_synonyms.txt
If you want to compare to other variant sets, or filter against the list of problematic sites from De Maio et al, you can use the -custom option to annotate and the filter_vep tool for further filtering :
-custom reference.vcf.gz,,vcf,exact,0,
Finally, if you ever happen to work on a different virus or organism that we don’t have in Ensembl, keep in mind that you can use the VEP too, as described in an earlier blog.