What’s coming in Ensembl release 107 / Ensembl Genomes 54?

Ensembl 107 and Ensembl Genomes 54 are expected in June 2022. Check out what we’re up to, although we can’t guarantee everything listed here will make it into the final release.

Major data updates for Human

Human chromosome 21 region artifactual duplication: The genes within the artifactual region in human chromosome 21 will be updated to a biotype of “Artifact”, with only one transcript per locus. The Gene tab for these genes will describe these genes as Artifactual duplication. Where relevant the pages will give information on the real copy of the gene.

Removal of the coding biotype for 49 polymorphic pseudogenes: Polymorphic pseudogenes in humans are coding genes that are non-coding in the reference genome due to SNP/Indel but are coding in other individuals. There are 49 polymorphic pseudogenes on GRCh38. 

The polymorphic pseudogene biotype will be replaced with protein_coding_LoF (loss of function) at the transcript level. However, the polymorphic pseudogene biotype at the gene level will be made protein-coding.

Simplified regulatory annotation 

Streamlined regulatory annotation track: Currently, overlapping regulatory features are bumped to the next row, regardless of type. From release 107, regulatory features of the same type will be displayed on the same row. For example, promoters are now always on the top row of the regulatory build track and enhancers and open chromatin regions on the second. 

Reduced regulatory feature types: We will remove the standalone promoter-flanking feature type. These are not strictly adjacent to promoters (and are identified similarly to enhancers), so to avoid confusion we will fold this category into enhancers

More focused motif features: We currently determine motif features with a relatively permissive threshold. Therefore, to keep the results more focused, we will only display motif features that overlap a ChIP-seq peak in at least one cell type (i.e., validated motif features). 

New Assemblies and/or Annotation

Vertebrates

  • Reannotation of the reference assembly for pig (Sscrofa11.1)
  • Reannotation of chicken assembly (GRCg6a)
  • Annotation of 2 new GRC chicken assemblies (GRCg7w and GRCg7b)
  • Chicken reference will be changed from GRCg6a to GRCg7b
  • Assembly and Gene set will be updated for Tropical clawed frog (Xenopus tropicalis): Xenopus tropicalis v9.1 to UCB Xtro 10.0

Metazoa:

New assemblies and annotation linked to Infravec2

  • Anopheles atroparvus (GCA914969975.1). Please, note that this assembly will replace ‘AatrE3’ (GCA_000473505.1) in the future. AatrE3 is to be removed from the future Ensembl Genomes 56/Ensembl 109 release
  • Phlebotomus perniciosus 

More new assemblies and annotation

  • Agrilus planipennis 
  • Anneissia japonica
  • Aphidius gifuensis
  • Aplysia californica
  • Athalia rosae
  • Bactrocera dorsalis
  • Bactrocera latifrons
  • Bactrocera tryoni
  • Centruroides sculpturatus
  • Ceratitis capitata
  • Copidosoma floridanum
  • Cotesia glomerata
  • Crassostrea gigas – in collaboration with  The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh
  • Crassostrea virginica
  • Dendroctonus ponderosae
  • Dendronephthya gigantea
  • Dermatophagoides pteronyssinus
  • Diabrotica virgifera
  • Diuraphis noxia
  • Gigantopelta aegis
  • Hermetia illucens
  • Hydra vulgaris
  • Leptinotarsa decemlineata
  • Limulus polyphemus
  • Lytechinus variegatus
  • Metaseiulus occidentalis
  • Monomorium pharaonis
  • Onthophagus taurus
  • Ooceraea biroi
  • Orbicella faveolata
  • Orussus abietinus
  • Parasteatoda tepidariorum
  • Penaeus monodon
  • Pocillopora damicornis
  • Pomacea canaliculata
  • Priapulus caudatus
  • Rhagoletis pomonella
  • Rhopalosiphum maidis
  • Saccoglossus kowalevskii
  • Sipha flava
  • Stylophora pistillata
  • Trichogramma pretiosum

Plants:

New species 

  • Echinochloa crusgalli (Barnyard grass)
  • Digitaria exilis (White fonio)
  • Vigna unguiculata (Cowpea)

Other updates and changes

  • New population frequencies from the gnomAD 3.1.2 genomes collection will be available in the Ensembl VEP cache and via all interfaces. Data from NHLBI GO-ESP will no longer be supported as this is part of the gnomAD exome data available in Ensembl VEP.
  • IntAct data will be reported via the Ensembl VEP web and REST interfaces to highlight  when a variant lies in an experimentally derived protein interaction site. 
  • Information on the function of a gene a variant lies within, will be available via the Ensembl VEP web and REST interfaces. 
  • Allele frequency data from EVA will be displayed for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225).
  • We will change the default structural variant track which is displayed for in the GRCh38 human genome browser from the 1000 Genomes Project data (SV – 1KG 3 – All) to the newer and denser gnomAD data (SV – gnomAD – SV (dbVar study nst166). The option to view 1000 Genomes Project data will remain.
  • Changes to some default tracks in location view: We will remove %GC track in all species. We will add: MANE Select, MANE Plus Clinical, Constrained elements.
  • Readthrough gene attributes will be added to protein coding genes in human and mouse.
  • We will add labels to the icons for image configuration on the Location view. 
  • We will retire Ensembl release 88 and 89 archives.
  • There will be new ATAC-seq tracks (peaks and signal) for the previous and new chicken reference genomes (GRCg6a and GRCg7b). These can be selected under Features by Cell/Tissue in the Regulation section of Configure this page.
  • Removing Regulatory Evidence from BioMart. We will be changing the way we store peaks to support future improvements in Ensembl Regulation. As a consequence, the Human and Mouse Regulatory Evidence datasets in the Regulation BioMart will be retired from BioMart