Ensembl 79 has been released!

What’s new in e79:

Human update

The human gene set now corresponds to GENCODE 22 while the assembly has been updated to include new assembly patches for GRCh38.p2.

Corrected RYBP gene in GRCh38.p2 assembly
The HG-126 patch (KN538364.1) in GRCh38.p2 corrects a misassembly in this region which affects the RYBP gene.

Comparison of human RefSeq transcripts to Ensembl models

In this release we provide comparisons between the imported RefSeq transcripts in human to all overlapping Ensembl models. The comparison is done at the transcript level where all exons are compared in terms of genomic coordinates and the transcript sequences of the two models are also compared. Additionally, we have also compared the genomic sequences of the RefSeq transcripts to the Ensembl models. Both of these data sets are available via our API.

Gene gain/loss tree view

It is now possible to view the Gene gain/loss tree with our new interactive view (which uses the same engine as the species tree view introduced last release). Click on the toolbar to change the layout of the tree or on a node to get its details and expand, collapse, or focus on it.

cafe_tree_new_widget

Global Alliance REST Endpoints

Our REST server now implements the Global Alliance for Genomics and Health (GA4GH) Genomics API. The aim of the GA4GH API is to allow the interoperable exchange of genomic information across multiple organisations and on multiple platforms – see http://ga4gh.org/#/api for further details. Phase 3 genotype data from the 1000 Genomes project is now available from Ensembl via three new GA4GH endpoints.

Import of NextGen project sheep genotype data

Genotype data for three sheep species have been imported from the NextGen project. The project aims to preserve the biodiversity of farm animals. The imported populations are Iranian Ovis aries and Ovis orientalis and Moroccan Ovis aries. You can read more about other NextGen data set on the Ensembl projects.

Other news

  • Updated human HAVANA annotation in Vega
  • Import of phenotype and disease data from the Rat Genome Database (RGD)
  • RefSeq GFF3 annotations for majority of Ensembl species
  • Addition of non-coding genes to the vervet-AGM gene set
  • Updated APPRIS flags for human, mouse, rat, zebrafish with addition of pig
  • Assembly and gene set update for Drosophila to BDGP6 (FB2014_05)

A complete list of the changes can be found on the Ensembl website.

Find out more at the Ensembl Release Webinar e79 (16.00 GMT, Thursday 26th March). Register here (for free!).

8 thoughts on “Ensembl 79 has been released!

  1. Hi ..

    You say :
    Additionally, we have also compared the genomic sequences of the RefSeq transcripts to the Ensembl models. Both of these data sets are available via our API.

    But what has been added to the api to analyse this comparison?
    Can’t find any reference to it on the documentation.

    • Hi Duarte,

      They are transcript attributes so they are available for all RefSeq import transcripts under various attribute codes. For the RefSeq mRNA to genomic sequence comparison the codes are:
      rseq_mrna_match
      rseq_mrna_nonmatch

      rseq_5p_mismatch
      rseq_cds_mismatch
      rseq_3p_mismatch
      rseq_nctran_mismatch
      rseq_no_comparison

      Every transcript will either have a rseq_mrna_match or rseq_mrna_nonmatch attribute. For the nonmatch ones they will have at least one of the five other attributes listed (they describe the nature of the conflict).

      For the RefSeq model to overlapping Ens models the codes are:
      rseq_ens_match_wt
      rseq_ens_match_cds
      rseq_ens_no_match

      Every RefSeq import transcript has one of the above.

      If you want to fetch them directly off the RefSeq import transcripts use something like:

      @transcript_attributes = @{$transcript->get_all_Attributes()};

      Then loop through the array and check for the codes you’re interested in.

      You can retrieve the descriptions for all these codes through the API using something like:

      my $aa = $db->get_AttributeAdaptor();
      my ($attrib_type_id, $code, $name, $description ) = $aa->fetch_by_code(‘rseq_ens_match_wt’);

      If you’re looking directly in the database all of these are described in the attrib_type table under attrib_type_ids 450-459. The attributes themselves are in the transcript_attrib table of the otherfeatures db (that is the db with the RefSeq import transcripts).

      Fergal.