Ensembl 93 has been released!

Are you feline excited for our new pawsome release?!

Ensembl 93 has been released, bringing with it two new big cat genomes for tiger and leopard, and an update to the domestic cat assembly. If cats aren’t your thing, we also have a huge new dbSNP import for human and a brand new regulatory build and GENCODE update for mouse.

We also have a new hagfish genome, important changes to our VEP REST endpoints, and many more exciting developments so read on to find out more!

Human dbSNP update – twice as many short variants!

We’ve imported new variants from dbSNP version 151. This has led to a jump from ~329 million to 655 million short variants! Many of these variants also have allele frequency data from the gnomAD project and TOPMed. You can find this new data in the browser (filter source column to dbSNP151 to see new variants) as well as in the Variant Effect Predictor (VEP), BioMart and across our REST and Perl APIs.

New mouse regulatory feature and gene annotations

A substantial new resource for mouse genomics, we’ve imported over three terabytes of data from ENCODE to use as evidence to inform our annotation of 419,000 regulatory features on the mouse reference genome (Mus musculus, strain C57BL/6J). These include features that regulate gene expression including, CTCF and transcription factor binding sites, enhancers, promoters and flanking regions, and open chromatin regions.

Not only will we show you where these features are on the genome, but also cell specific activity. Previously we were able to show data from eight epigenomes (a.k.a. cell types), from this release onward we are able to provide data from 79 epigenomes! We will be releasing a separate blog post shortly with more details about this new data and how to access it.

Accompanying this new update to regulation data, we have also updated the GENCODE gene set from version M17 to M18. This new version has ~160 new non-coding gene and ~240 new pseudogene annotations, as well as updates to coding genes. We have also updated our homology data for this new gene set.

New species

We have three new species for this release, all have full gene sets, annotated with the Ensembl gene build pipeline.

  • Amur tiger (Panthera tigris altaica)
    • The Amur tiger (also known as the Siberian tiger) was once widespread, but a census in 2015 indicated that only 562 Amur tigers are present in the wild. This genome (PanTig1.0) was sequenced and assembled by the Personal Genomics Institute, South Korea.
  • Leopard (Panthera pardus)
    • Similar to the Amur tiger, the Leopard has suffered significant contraction of its previous numbers and range, resulting in the IUCN assigning this species ‘vulnerable’ status. This genome (PanPar1.0) was sequenced and assembled by UNIST in South Korea.
  • The inshore hagfish (Eptatretus burgeri)
    • The hagfish may not be as pretty as our other new species, but its unique biology make this species a critical subject for evolutionary studies. Hagfish are ‘living fossils’, barely changed for 300 million years ago, and the only animals to have a skull, but not vertebral column! This genome (Eburgeri_3.2) was sequenced and assembled by the Riken Centre for Development Biology, Japan.

 

New cat genome assembly

The domestic cat (Felis catus) genome assembly has been updated to Felis_catus_9.0. We’ve performed a full gene build, and new whole genome alignments with the new tiger and leopard genome assemblies have been carried out, as well as updates to homology data. We also are happy to announce that we now provide SIFT predictions for missense variants in cat. These scores indicate whether an amino acid substitution is likely to affect protein function. You can see this data in the variant table, and in the VEP output.

New lincRNA annotations for marmoset and zebrafish

We’ve improved our long intergenic non-coding RNA (lincRNA) annotation for marmoset (Callithrix jacchus, ASM275486v1) and zebrafish (Danio rerio, GRCz11). This has increased the number of lincRNA gene annotations in marmoset from six to 739 genes, and from 2,660 to 3,278 genes in zebrafish.

Retirement of the variant image view in human

The significant increase in number of short variants has led us to the decision to discontinue our gene variant image view for human from this release onwards, which was already unloved and overwhelmed. You can read more about this change here.

New variant allele frequencies for dog (Canis lupus)

Following an update to variant data in dog in our last release, this release we’ve added genotype data and allele frequencies from a huge study (EVA accession: PRJEB24066) carried out by UNIBE, Spain that sequenced the whole genomes of 238 dogs.

Changes to the VEP REST API endpoints

If you’re planning on running any Variant Effect Predictor (VEP) jobs using the REST API endpoints and an existing script you may need to make some changes as we’ve changed the way that the VEP reports allele frequencies.

We now match the input variant allele (received as input by the VEP REST endpoint) with alleles from co-located variants and only report the allele frequency of the co-located variant if the alleles match. We created a frequencies entry in the colocated variants section and removed all *_maf and *_allele entries which previously reported allele frequencies.

Here’s a summary of what’s changed:

If available, allele frequencies are reported for the followong populations: 1000 Genomes Project: afr, amr, asn, eas, eur, sas; ESP: aa, ea; gnomAD: gnomad_afr, gnomad_amr, gnomad_asj, gnomad_eas, gnomad_fin, gnomad_nfe, gnomad_oth, gnomad_sas.

Find out more

If you would like to find out more about these new changes, see live demos on how to find new data in the site, and ask questions to the Ensembl team, please register for the release webinar at 4pm (GMT) on Tuesday the 24th of July. A recording of this webinar is available on our YouTube channel.