Update to the Ensembl COVID-19 resource

The latest release of Ensembl’s covid-19 site includes updated variation datasets plus new alignments and gene trees with around 60 other viral genomes.

New variation data and identifiers

We have updated our variation database to include 14,806 variant loci from the first European Variation Archive  (EVA) SARS-CoV-2 data release. Variant records from different submissions have been clustered by location and type and stable RefSNP accessions have been assigned. This release includes variant location, alleles and RefSNP identifier.

We continue to display variant data from ENA, Nextstrain and the COVID-19 Genomics UK Consortium.  Where the same variant is present in multiple resources, the RefSNP identifier is used by default but other names are available for searching and display.

New gene trees and alignments

We used Cactus to align SARS-CoV-2 and 60 publicly available virus genomes from the Orthocoronavirinae subfamily resulting in 78% of the SARS-CoV-2 genome aligned with at least one other genome and 35% of the genome aligned with the complete set of Orthocoronavirinae genomes.

We have also applied our gene tree methods to group the protein coding genes into families and to predict orthologous and paralogous relationships between genes.