Update to the Ensembl COVID-19 resource

We are pleased to announce an updated release of the Ensembl SARS-CoV-2 genome browser, including new sequence variants generated from sequence data held in ENA and updated community annotation.

The Ensembl SARS-CoV-2 genome browser was launched in May 2020 to support the global work to develop treatments, diagnostics and vaccines in response to the COVID-19 pandemic. Since then, a huge global effort has produced an ever-increasing amount of data relating to the SARS-CoV-2 pathogen and Ensembl has been working to integrate this data into the SARS-CoV-2 genome browser.

The updated SARS-CoV-2 genome browser includes a brand new set of variation data derived from 4936 publicly available sequences by the ENA team, using a LoFreq-based pipeline. This is a preliminary analysis and has been filtered using basic quality control filters (samples with more than 40 calls were removed; variants where the alternate allele was supported by less than 20% of the reads in any sample or which showed strand bias in every sample where removed). A more refined variant set derived from a larger number of samples will be available in coming weeks.  This data will be displayed in the gene tab and as a track in the location tab. This will be presented alongside the NextStrain variation data which was available in our last release.

We have also added the Ensembl VEP and BLAST web-based tools to allow you to analyse SARS-CoV-2 variation data and perform sequence-based searches in the SARS-CoV-2 browser.

There are also updates to the UCSC community annotation which includes twice the number of entries through community annotation since our last release and can be viewed as a track in the location tab.

The updated Ensembl SARS-CoV-2 genome browser will also include:

  • reperformed runs of Interproscan to predict protein and genomic features, which includes SARS-CoV-2-oriented annotations from Pfam as well as annotations from other protein annotation resources (including Superfamily, SMART and Gene3D)
  • updated external references
  • updated Gene Ontology (GO) import using dedicated SARS-CoV-2 GPAD files from GOA that includes a significant increase in the number of annotated terms from the last release
  • fixes for minor bugs identified in our initial release, including removal of an erroneous 3’ UTR annotation in orf10

If you have any questions, please get in touch with the Ensembl Helpdesk.