Ensembl 107 has been released!

Ensembl 107 and Ensembl Genomes 54 have now been released. Check out our site for exciting new updates to regulatory annotations, updated vertebrate genomes and lots of new metazoan species.

Major data updates for Human

Human chromosome 21 region artifactual duplication: The genes within the artifactual region in human chromosome 21 have been updated to a biotype of “Artifact”, with only one transcript per locus. The Gene tab for these genes describes these as Artifactual duplication. Where relevant the pages give information on the real copy of the gene.

Removal of the coding biotype for 49 polymorphic pseudogenes: Polymorphic pseudogenes in humans are coding genes that are non-coding in the reference genome due to SNP/Indel but are coding in other individuals. There are 49 polymorphic pseudogenes on GRCh38. 

The polymorphic pseudogene biotype has been replaced with protein_coding_LoF (loss of function) at the transcript level. However, the polymorphic pseudogene biotype at the gene level has been made protein-coding.

Simplified regulatory annotation 

Streamlined regulatory annotation track: Previously, overlapping regulatory features were bumped to the next row, regardless of type. In release 107, regulatory features of the same type are now displayed on the same row. 

Reduced regulatory feature types: We have removed the standalone promoter-flanking feature type. These are not strictly adjacent to promoters (and are identified similarly to enhancers), so to avoid confusion, this category has been folded into enhancers

More focused motif features: We previously determined motif features with a relatively permissive threshold. We will now display motif features that overlap a ChIP-seq peak in at least one cell type in order to keep the results more focused (i.e., validated motif features). 

New Assemblies and/or Annotation

Vertebrates

  • Reannotation of the reference assembly for pig (Sscrofa11.1)
  • Reannotation of chicken assembly (GRCg6a)
  • Annotation of 2 new GRC chicken assemblies (GRCg7w and GRCg7b)
  • Chicken reference has changed from GRCg6a to GRCg7b
  • Assembly and Gene set has been updated for Tropical clawed frog (Xenopus tropicalis): Xenopus tropicalis v9.1 to UCB Xtro 10.0

Metazoa:

New assemblies and annotation linked to Infravec2

Anopheles atroparvus (GCA914969975.1). Please, note that this assembly will replace ‘AatrE3’ (GCA_000473505.1) in the future. AatrE3 is to be removed from the future Ensembl Genomes 56/Ensembl 109 release.

Phlebotomus perniciosus 

More new assemblies and annotation

Agrilus planipennis 

Anneissia japonica

Aphidius gifuensis

Aplysia californica

Athalia rosae

Bactrocera dorsalis

Bactrocera latifrons

Bactrocera tryoni

Centruroides sculpturatus

Ceratitis capitata

Copidosoma floridanum

Cotesia glomerata

Crassostrea gigas – in collaboration with  The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh

Crassostrea virginica

Dendroctonus ponderosae

Dendronephthya gigantea

Dermatophagoides pteronyssinus

Diabrotica virgifera

Diuraphis noxia

Gigantopelta aegis

Hermetia illucens

Hydra vulgaris

Leptinotarsa decemlineata

Limulus polyphemus

Lytechinus variegatus

Metaseiulus occidentalis

Monomorium pharaonis

Onthophagus taurus

Ooceraea biroi

Orbicella faveolata

Orussus abietinus

Parasteatoda tepidariorum

Penaeus monodon

Pocillopora damicornis

Pomacea canaliculata

Priapulus caudatus

Rhagoletis pomonella

Rhopalosiphum maidis

Saccoglossus kowalevskii

Sipha flava

Stylophora pistillata

Trichogramma pretiosum

Plants:

New species 

Echinochloa crusgalli (Barnyard grass)

Digitaria exilis (White fonio)

Vigna unguiculata (Cowpea)

Other updates and changes

  • New population frequencies from the gnomAD 3.1.2 genomes collection are available in the Ensembl VEP cache (cache file size increases ~30-50% for GRCh38) and via all interfaces. Data from NHLBI GO-ESP is no longer supported as this is part of the gnomAD exome data available in Ensembl VEP.
  • IntAct data will now be reported via the Ensembl VEP web and REST interfaces to highlight when a variant lies in an experimentally derived protein interaction site. 
  • Gene Ontology terms describing the function of a gene a variant lies within are  available via the Ensembl VEP web and REST interfaces. 
  • A new option is available on the GRCh38 VEP web tool, which incorporates predictions of variant pathogenicity from EVE
  • Allele frequency data from EVA is now displayed on variant pages for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225).
  • We have changed the default structural variant track which is displayed for the GRCh38 human genome browser from the 1000 Genomes Project data (SV – 1KG 3 – All) to the newer and denser gnomAD data (SV – gnomAD – SV (dbVar study nst166). The option to view 1000 Genomes Project data will remain.
  • Changes to some default tracks in location view: We have removed %GC track in all species. We have added: MANE Select, MANE Plus Clinical, Constrained elements.
  • Readthrough gene attributes have been added to protein coding genes in human and mouse.
  • Labels have been added to the icons for image configuration on the Location view. 
  • Ensembl release 88 and 89 archives have been retired.
  • We have new ATAC-seq tracks (peaks and signal) for the previous and new chicken reference genomes (GRCg6a and GRCg7b). These can be selected under Features by Cell/Tissue in the Regulation section of Configure this page.
  • Removed Regulatory Evidence from BioMart. We have changed the way we store peaks to support future improvements in Ensembl Regulation. As a consequence, the Human and Mouse Regulatory Evidence datasets in the Regulation BioMart have been retired from BioMart.