Ensembl launches COVID-19 resource

Today, Ensembl has joined the international scientific effort to tackle the COVID-19 pandemic. COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which has spread rapidly since emerging in late 2019. Our SARS-CoV-2 genome browser and related resources at covid-19.ensembl.org are intended to support both basic research and ongoing work to develop treatments, diagnostics and vaccines.

This initial release of our new resource includes the following data:

  • Gene annotation of the reference  genome ‘Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1’ (MN908947.3) using a modified Ensembl genebuild supported by protein evidence
  • Gene annotation from the Shanghai Public Health Clinical Center & School of Public Health, Fudan University, Shanghai, China, via ENA
  • Information on gene functions from Gene Ontology
  • Variation data from Nextstrain with consequence predictions and nucleotide frequencies for individual isolates, as well as grouped by clades and countries
  • Problematic variant sites as defined by De Maio et al
  • Protein and genomic features from InterProScan, which include SARS-CoV-2-oriented annotations from Pfam as well as annotations from other protein annotation resources (including Superfamily, SMART and Gene3D)
  • Links to RefSeq peptides, UniProt and INSDC proteins, PDBe protein structures
  • Links to the genome sequence in ENA and NCBI genes
  • Alignments of Rfam covariance models to the genome
  • UCSC community annotation tracks displaying annotations made via a public spreadsheet available here. Anyone can contribute freely to the spreadsheet.

We will continue to add new data and expand this resource in future releases. Like for all Ensembl data, we place no restrictions on the use of our COVID-19 resources. You can download sequences, pictures and tables via the browser. You can also add further data by attaching custom tracks. Whole genome databases can be downloaded from our FTP server.

In addition, the GENCODE project is updating the annotation of human protein-coding genes linked to COVID-19; please see this blog post for details and how to access the data in Ensembl.

Our new COVID-19 resource is part of a wider effort at EMBL-EBI to advance SARS-CoV-2 research by open data sharing through the COVID-19 Data Portal.

The Ensembl project launched in 1999 to annotate the human genome. It has since broadened its scope into a comprehensive genomics resource that now includes model organisms and many vertebrates as well as bacteria, protists, invertebrate metazoa, plants and fungi available across the Ensembl and Ensembl Genomes websites. SARS-CoV-2 is the first virus we have added to our growing resource of more than 45,000 genomes. We hope this will be useful and we welcome your thoughts on it. Please email us to give feedback.