Ensembl Bacteria updates

The genome annotation data available in Ensembl Bacteria was extensively updated in Ensembl 102/ Ensembl Genomes 49

Prior to release 49, Ensembl Bacteria had 44,048 genomes, many of which were genomically redundant to each other.  To help with scalability, we filtered redundant proteomes following UniProt criteria, reducing our total number of bacterial genomes to 31,332 in release 49. 

Find out more on UniProt’s procedure to eliminate redundant genomes (DOI:10.1093/database/baw139). We used the redundancy definitions in UniProt version 2015_04.

Please note that all redundant genomes we had until this release will continue to exist on our archive sites.

All species.production_name have been appended with their accession number (“_gca_0000XXXX”)

Updates to Pan Compara bacterial species

Following the withdrawal of redundant species 15 species are no longer present in  PanCompara. These species will be replaced in a future release:

  • Streptomyces coelicolor a3 2
  • Shewanella oneidensis mr 1
  • Salmonella enterica subsp enterica serovar typhimurium str lt2
  • Pseudomonas aeruginosa mpao1 p2
  • Prevotella intermedia 17
  • Mycobacterium tuberculosis h37rv
  • Moraxella catarrhalis 7169
  • Methanobacterium formicicum dsm 3637
  • Mannheimia haemolytica serotype a2 str ovine
  • Legionella pneumophila str paris
  • Corynebacterium glutamicum atcc 13032
  • Clostridium botulinum a str hall
  • Citrobacter freundii 4 7 47cfaa
  • Bordetella pertussis tohama i
  • Aggregatibacter actinomycetemcomitans d11s 1

In addition 6 species were renamed (not considering the GCA append):

  • Vibrio fischeri es114′ is now  ‘Aliivibrio fischeri es114
  • Sulfolobus solfataricus p2′ is now  ‘Saccharolobus solfataricus p2
  • ‘Propionibacterium acnes kpa171202’ is now ‘Cutibacterium acnes kpa171202’
  • Two strains of ‘Chlorobium tepidum tls’ are now  ‘Chlorobaculum tepidum tls’
  • ‘Borrelia burgdorferi b31’ is now  ‘Borreliella burgdorferi b31’

Updated data

PHI-base annotation (version 4-8_2019-09-16) has been added to this release. 1163 Bacterial genomes have been annotated with a total of 36,737 genes and 44,106 translations having PHI-base external references. From those, about  68% are proteins reported by PHI_base and directly matched by ensembl and about  32% are extrapolated sequences with 100% identity match in taxonomic substrains of the original reported species.

Covariance models from Rfam (version 12.2) have been aligned to bacterial genomes to find and annotate  homologous sequences to known non-coding RNA. The covariance models are taxonomically filtered before alignment, so that structural RNA features that have never been annotated in bacterial species are not inappropriately aligned. The alignment is performed with cmscan, from the Infernal software suite.

The Rfam alignments are visible as a browser track (named ‘Rfam models’); selecting an alignment in the genome browser displays metadata such as the Rfam description and the secondary structure.We have been particularly interested in the alignments to families of riboswitches to gain insights into the factors that regulate downstream protein-coding genes, or to locate gene annotations that may be missing.