What’s coming in Ensembl 96 / Ensembl Genomes 43

We are planning to release Ensembl 96 and Ensembl Genomes 43 in late March or beginning of April 2019.

The Ensembl 96 release includes the first pass full annotation of the mouse genome, with the GENCODE M21 gene set.

The Ensembl Genomes 43 release will bring changes to our REST API and FTP server that may affect your pipelines. Specifically, we will merge our Ensembl and Ensembl Genomes REST servers into a single server. We will also change the Ensembl Genomes Comparative Genomics FTP file structure to make it consistent with Ensembl.

We have got lots of new genomes: 19 birds, five reptiles and 12 mammals, which include primates, rodents, American mink, American bison and wild yak.

We also have an exciting first release of Ensembl-RefSeq MANE Select v0.5 transcripts!

New GENCODE gene sets

This release includes the first pass full annotation of the mouse genome, represented by the update of the mouse GENCODE gene set to GENCODE M21. The human GENCODE gene set will be updated to GENCODE 30.

Release of Ensembl-RefSeq MANE Select v0.5 Transcripts

The MANE project (Matched Annotation from NCBI and EMBL-EBI) is a collaboration between Ensembl-GENCODE and RefSeq to select a default transcript per human protein coding locus that is representative of biology, well-supported, expressed and conserved. These transcripts match GRCh38 and are 100% identical between Ensembl-GENCODE and RefSeq for 5’ UTR, CDS, splicing and 3’ UTR. This is a beta release of the MANE Select v0.5 transcripts for 53% of human protein coding genes. Along with the release, we plan a human RefSeq annotation update, for GRCh38 only, with a set of updated GFF3 files that replace the existing refseq_import data in the other features database.

Changes to Ensembl Genomes REST and FTP

In a move towards combining the databases for Ensembl and Ensembl Genomes, we will make some changes to the REST API and FTP site for Ensembl Genomes:

  • A combined REST server for Ensembl and Ensembl Genomes. For Comparative Genomics REST API endpoints, you will need to specify which division you wish to query. If no Compara argument is provided, it will default to using the vertebrates division database.
  • There will be changes to the Ensembl Genomes Comparative Genomics FTP file structure. For instance, /pub/release-xx/plants/maf files will be moved to /pub/release-xx/plants/maf/ensembl-compara/pairwise_alignments. The Comparative Genomics FTP file structures will change in a similar fashion for fungi, protists and metazoa. We will provide symlinks from the previous FTP folders for this release and remove them for Ensembl 97.

A detailed blog post, summarising these changes and how it might affect your pipelines will be released later.

New Genomes

Birds:

  • Coturnix japonica (Japanese quail)
  • Numida meleagris (Helmeted guineafowl)
  • Parus major (Great tit)
  • Manacus vitellinus (Golden-collared manakin)
  • Calidris pygmaea (Spoon-billed sandpiper)
  • Dromaius novaehollandiae (Emu)
  • Lepidothrix coronata (Blue-crowned manakin)
  • Apteryx owenii (Little spotted kiwi)
  • Apteryx rowi (Okarito brown kiwi)
  • Apteryx haastii (Great spotted kiwi)
  • Zonotrichia albicollis (White-throated sparrow)
  • Calidris pugnax (Ruff)
  • Cyanistes caeruleus (Blue tit)
  • Lonchura striata domestica (Bengalese finch)
  • Anser brachyrhynchus (Pink-footed goose)
  • Nothoprocta perdicaria (Chilean tinamou)
  • Junco hyemalis (Dark-eyed junco)
  • Melopsittacus undulatus (Budgerigar)
  • Serinus canaria (Common canary)

Reptiles:

  • Salvator merianae (Argentine black and white tegu)
  • Crocodylus porosus (Australian saltwater crocodile)
  • Pogona vitticeps (Central bearded dragon)
  • Notechis scutatus (Mainland tiger snake)
  • Chelonoidis abingdonii (Abingdon island giant tortoise)

Primates:

  • Theropithecus gelada (Gelada)
  • Piliocolobus tephrosceles (Ugandan red colobus)
  • Prolemur simus (Greater bamboo lemur)

Rodents:

  • Castor canadensis (American beaver)
  • Urocitellus parryii (Arctic ground squirrel)
  • Marmota marmota marmota (Alpine marmot)
  • Meriones unguiculatus (Mongolian gerbil)
  • Spermophilus dauricus (Daurian ground squirrel)
  • Mus spicilegus (Steppe mouse)

Other mammals:

  • Neovison vison (American mink)
  • Bos mutus (Wild yak)
  • Bison bison bison (American bison)

New Assemblies and Annotation

  • Phascolarctos cinereus (Koala, phaCin_unsw_v4.1)
  • Cricetulus griseus (Chinese hamster, CriGri-PICR)
  • Peromyscus maniculatus bairdii (Northern American deer mouse, HU_Pman_2.1)
  • Anas platyrhynchos platyrhynchos (Common mallard, CAU_duck1.0)
  • Actinidia chinensis (Kiwifruit, GCA_003024255.1)
  • Panicum hallii (Hall’s panicgrass, ecotypes HAL2 and FIL2, GCA_003061485.1 and GCA_002211085.2, respectively)

Other Updates and Highlights

  • Additional phenotype annotations will be provided via our VEP web interface and REST service. Our new view showing the location of a variant on any relevant 3D protein structure will also be available via web VEP.
  • Variant Recoder will support SPDI genomic format
  • We will provide GERP and CADD scores on our variant pages
  • Variation tracks will be available for Chlorocebus sabaeus (Vervet)
  • Discontinuation of Drosophila melanogaster (Fruitfly) variation database in Ensembl Metazoa
  • New additions to the Ensembl Metazoa Compara database (the springtails Orchesella cincta and Folsomia candida, and the biting midge Culicoides sonorensis)
  • Polypoid view for Triticum dicoccoides (Emmer Zavitan wheat)
  • New variation data from CerealsDB (Axiom 35K)
  • New interface for configuration of Regulation tracks
  • New probe mapping data for ten species: Anas platyrhynchos platyrhynchos (Common mallard), Cricetulus griseus (Chinese hamster, CriGri-PICR), Cyprinodon variegatus (Sheepshead minnow), Equus caballus (Horse), Fundulus heteroclitus (Mummichog), Ictalurus punctatus (Channel catfish), Piliocolobus tephrosceles (Ugandan red colobus), Prolemur simus (Greater bamboo lemur), Scophthalmus maximus (Turbot), Theropithecus gelada (Gelada)
  • Probe mapping rerun for Homo sapiens (Human), Mus musculus (Mouse), Bos taurus (Cow) and Canis lupus familiaris (Dog)
  • Update of miRNA targets for Homo sapiens (human) and Mus musculus (Mouse)
  • Updated gene annotation for Oryza sativa (Rice, RAP-DB) and added unplaced genes to recently updated Solanum lycopersicum (Tomato) annotation
  • Added ID mappings to previous annotations for Vigna radiata (Mungbean), Aegilops tauschii (Tausch’s goatgrass), Physcomitrella patens (Spreading earthmoss) and Oryza sativa (Rice)
  • Display name change for Astyanax mexicanus: From Cave Fish (blind cave-dwelling) to Mexican tetra (surface-dwelling)
  • Assembly name change for Turkey (Meleagris gallopavo): From UMD2 to Turkey_2.01

Please note that these are intentions and are not guaranteed to make it into the releases.

 

3 comments

  1. Hi,

    A while ago I contacted the helpdesk about an update of the gnomAD exome dataset. I was then told this would be a part of release 96. Is this still planned?

    Thanks
    M

  2. We plan to include the gnomAD v2.1 dataset in Ensembl 96, however cannot guarantee that it will make it into the release.

Leave a Reply

Your email address will not be published. Required fields are marked *