Ensembl 102 has been released!

We are pleased to announce the release of Ensembl 102, and the corresponding release of Ensembl Genomes 49 featuring lots of new and updated data in this release including the addition of human population frequency data from the NCBI Allele Frequency Aggregator, new plant species and a large update of the available bacterial data.


Genome assemblies and annotation for many new species are also being continuously added to the Ensembl Rapid Release genome browser.

Major data updates for human

Update to translate all non-ATG start codons as Methionine

Up until Ensembl 101, Ensembl/GENCODE has followed a literal interpretation of the genetic code using the standard vertebrate codon-translation table. However, there are a small number of genes that use a non-ATG start codon where the ribosomal machinery allows non-ATG codons to translate as Methionine. For human, there are 50 annotated genes with a non-ATG start codon. From Ensembl 102 onwards, we will be changing these genes to display a Methionine as the first residue in the protein translation. The affected genes are all manually tagged with a ‘non-ATG start’ annotation remark by the HAVANA annotators and a ‘non-ATG’ attribute will be visible for these transcripts in the transcript tab.

Addition of population frequency data from NCBI Allele Frequency Aggregator (ALFA)

The NCBI Allele Frequency Aggregator (ALFA) was launched earlier this year to provide summary data for variants from more than 1 million individuals across approved controlled-access studies in dbGaP. The initial release of allele frequency data from 100 thousand individuals from 12 populations includes allele counts and frequencies for 447 million variants, and will be available in Ensembl through the Population Genetics pages in the variant tab.

The Ensembl ‘Population Genetics’ table showing ALFA allele frequencies for a variant (rs4988235) in the promotor of the lactase gene showing frequency differences across populations.

New genomes

Genome sequences and annotation will added for three new plant species and two metazoa species:

New Assemblies and/or Annotation

Mammals:

An updated genome assembly and annotation of the Tasmanian Devil (Sarcophilus harrisii) will be added to Ensembl 102. The Tasmanian Devil is a carnivorous marsupial which is currently endangered with a declining population on the island of Tasmania. Understanding genetic diversity among the Tasmanian Devil population is thought to be an important step in conservation efforts. The Tasmanian Devil is also an important model organism in the study of Devil Facial Tumour Disease, which is an example of transmissible cancer.

Plants:

Metazoa:

Bacteria:

In Ensembl 102, there will also be a batch update of bacterial and archaeal genomes and annotation from ENA. There will be 31,332 genomes available in Ensembl Bacteria 102, including:

  • 22,088 new genomes
  • 34,804 genomes have been removed due to redundancy

plus updated annotation of pathogen-host interaction data from PHI-base, alignments to Rfam covariance models available through the ‘Rfam models’ track and updated protein features for all species using InterProScan 77.0. Read more in our separate blog post about the updates to Ensembl Bacteria.

Other updates and changes

  • Variation data added for soybean and Phaseolus vulgaris from the European Variation Archive.
  • Plant reactome mappings for plant species from Gramene.
  • Updated repeated element annotation for selected plants using a custom plant library (nrTEplants).
  • Retirement of Ensembl 81 archive site (jul2015.archive.ensembl.org).