Ensembl 96 and Ensembl Genomes 43 are out!

These releases are huge in many respects, so it was difficult to decide which news to put first! Let’s start with some exciting news from our annotators.

Mouse Genome Annotation Milestone

Ensembl/GENCODE annotators have completed a first-pass ‘walk’ across the entire reference mouse genome that started in 2012, investigating the sequence, aligned data and computational predictions for each BAC clone in turn. This is the GENCODE M21 gene set.

Having completed the first pass, we are now targeting specific loci, for example to identify unannotated protein-coding and lncRNA genes, or alternatively spliced transcripts, and to reassess older protein-coding gene annotation in the light of current data.

Release of Ensembl-RefSeq MANE Select v0.5 Transcripts

Our new joint initiative with the NCBI – the Matched Annotation from the NCBI and EMBL-EBI (MANE) project – aims to define a genome-wide transcript set that is matched between RefSeq and Ensembl/GENCODE (MANE transcripts).

We are releasing phase 1, which includes one well-supported transcript for every protein-coding locus in the human genome (MANE Select set). This first set contains a MANE Select transcript for 53% of the human protein-coding genes and is versioned 0.5.

If you want to learn more about this transcript set, check out our previous blog or watch our recorded webinar.

New human GENCODE Gene Set

We have updated the human gene set to GENCODE 30.

Joint REST Server for Ensembl and Ensembl Genomes, and Changes to the FTP Directory Layout

We are in the process of combining the databases for Ensembl and Ensembl Genomes.

As an important step towards this aim, we merged the Ensembl and Ensembl Genomes REST servers into a single server (rest.ensembl.org) and retired rest.ensemblgenomes.org. Going along with merging our REST server, we have changed the comparative genomics (compara), genomes and info/species endpoints. Don’t worry too much though – simply replacing rest.ensemblgenomes.org with rest.ensembl.org in your REST call should work as before in most cases. If it doesn’t, please make sure to you check the details in our blog outlining the changes.

In a similar move to ensure consistency between Ensembl and Ensembl Genomes, we made changes to the structure of the Ensembl Genomes FTP directory layout. These affect the ‘gvf’, ‘vcf’ and ‘vep’ directories as well as the whole genome alignment files. We have provided the details of all changes in another blog.

New Genomes

Tweet tweet tweet! Have you heard? This spring release brings you lots of bird genomes, including from three kiwis.

But that’s not everything – we have many other new vertebrate genomes too. We are particularly pleased to bring you the annotated genome of Lonesome George, the last known individual of the Abingdon island giant tortoises. In his final years of life, before he sadly died in 2012, he was known as the rarest creature in the world.

Here’s the full list of new genomes in this release:

Birds:

  • Coturnix japonica (Japanese quail)
  • Numida meleagris (Helmeted guineafowl)
  • Parus major (Great tit)
  • Manacus vitellinus (Golden-collared manakin)
  • Calidris pygmaea (Spoon-billed sandpiper)
  • Dromaius novaehollandiae (Emu)
  • Lepidothrix coronata (Blue-crowned manakin)
  • Apteryx owenii (Little spotted kiwi)
  • Apteryx rowi (Okarito brown kiwi)
  • Apteryx haastii (Great spotted kiwi)
  • Zonotrichia albicollis (White-throated sparrow)
  • Calidris pugnax (Ruff)
  • Cyanistes caeruleus (Blue tit)
  • Lonchura striata domestica (Bengalese finch)
  • Anser brachyrhynchus (Pink-footed goose)
  • Nothoprocta perdicaria (Chilean tinamou)
  • Junco hyemalis (Dark-eyed junco)
  • Melopsittacus undulatus (Budgerigar)
  • Serinus canaria (Common canary)

Reptiles:

  • Salvator merianae (Argentine black and white tegu)
  • Crocodylus porosus (Australian saltwater crocodile)
  • Pogona vitticeps (Central bearded dragon)
  • Notechis scutatus (Mainland tiger snake)
  • Chelonoidis abingdonii (Abingdon island giant tortoise)

Primates:

  • Theropithecus gelada (Gelada)
  • Piliocolobus tephrosceles (Ugandan red colobus)
  • Prolemur simus (Greater bamboo lemur)

Rodents:

  • Castor canadensis (American beaver)
  • Urocitellus parryii (Arctic ground squirrel)
  • Marmota marmota marmota (Alpine marmot)
  • Meriones unguiculatus (Mongolian gerbil)
  • Spermophilus dauricus (Daurian ground squirrel)
  • Mus spicilegus (Steppe mouse)

Other mammals:

  • Neovison vison (American mink)
  • Bos mutus (Wild yak)
  • Bison bison bison (American bison)

New Assemblies and Annotation

In addition to the new genomes, we have updated the assembly and annotation of four vertebrate and two plant species:

  • Phascolarctos cinereus (Koala, phaCin_unsw_v4.1)
  • Cricetulus griseus (Chinese hamster, CriGri-PICR)
  • Peromyscus maniculatus bairdii (Northern American deer mouse, HU_Pman_2.1)
  • Anas platyrhynchos platyrhynchos (Common mallard, CAU_duck1.0)
  • Actinidia chinensis (Kiwifruit, GCA_003024255.1)
  • Panicum hallii (Hall’s panicgrass, ecotypes HAL2 and FIL2, GCA_003061485.1 and GCA_002211085.2, respectively)

New Interface for Configuration of Regulation Tracks

The Regulatory Build for both human and mouse have been updated within the past year, in Ensembl 95 and 93, respectively. We now have data for 123 human and 79 mouse cells/tissues. The increased amount of data meant that our previous interface for configuration of regulation tracks became difficult to use, and importantly that it won’t be suitable for the data we expect in the future.

That’s why we’re introducing a new interface in this release! It allows you to select the cell/tissue and the data you would like to see with a few clicks.

You can access the interface on the Regulation tab, e.g. here. Click on the ‘Details by cell type’ icon at the top, then the ‘Configure Cell/Tissue’ button:

You can also access it on the Location or Gene tab, e.g. here. Click on the ‘Configure this page’ button, then on ‘Features by Cell/Tissues’ in the pop-up window.

Our short YouTube video shows you how to use our new interface for configuration of regulation tracks.

Updates on Variation Data and Displays

This release brings variation data for Chlorocebus sabaeus (Vervet). We added 31,779 markers from the 35K Axiom SNP array to Triticum aestivum (Bread wheat). This SNP array is widely used by breeders for marker assisted selection; therefore adding this variation data to the IWGSC RefSeq v1.0 wheat assembly is important. We have also added a polyploid view for Triticum dicoccoides (Emmer Zavitan wheat). At the same time, we will discontinue the Drosophila melanogaster (Fruitfly) variation data in Ensembl Metazoa.

The Variant Effect Predictor (VEP) now provides additional phenotype annotations, both via the web interface and the REST server. The web tool also shows the location of a variant on relevant 3D protein structures from PDBe for human and mouse, where these models are available. The VEP and the browser now provide gnomAD version 2.1 frequency data, with an improved mapping to GRCh38.

Finally, the Variant Recoder will support SPDI genomic format, and variant pages in the browser display GERP scores for all vertebrate species and CADD scores for human, to provide an indication of how tolerant a locus is to change .

Other Updates

  • New additions to the Ensembl Metazoa Compara database (the springtails Orchesella cincta and Folsomia candida, and the biting midge Culicoides sonorensis)
  • New probe mapping data for ten species: Anas platyrhynchos platyrhynchos (Common mallard), Cricetulus griseus (Chinese hamster, CriGri-PICR), Cyprinodon variegatus (Sheepshead minnow), Equus caballus (Horse), Fundulus heteroclitus (Mummichog), Ictalurus punctatus (Channel catfish), Piliocolobus tephrosceles (Ugandan red colobus), Prolemur simus (Greater bamboo lemur), Scophthalmus maximus (Turbot), Theropithecus gelada (Gelada)
  • Probe mapping rerun for Homo sapiens (Human), Drosophila melanogaster (Fruitfly), Mus musculus (Mouse), Bos taurus (Cow) and Canis lupus familiaris (Dog)
  • Updated gene annotation for Oryza sativa (Rice, RAP-DB 2018-11-26) and added unplaced genes to recently updated Solanum lycopersicum (Tomato) annotation
  • Added ID mappings to previous annotations for Vigna radiata (Mungbean), Physcomitrella patens (Spreading earthmoss) and Oryza sativa (Rice)
  • Display name change for Astyanax mexicanus: From Cave Fish (blind cave-dwelling) to Mexican tetra (surface-dwelling)
  • Assembly name change for Turkey (Meleagris gallopavo): From UMD2 to Turkey_2.01

Find out more

If you would like to learn more about Ensembl 96 and Ensembl Genomes 43, watch a guided tour or ask questions to our team, please register for the release webinar on Wednesday 17th April 2019 at 16:00 BST.