We have been developing a pipeline to build gene models using only RNA-seq data. For release 58 we have added a preliminary set of Zebrafish RNA-seq gene models with an intention to integrate this new source of evidence into a full genebuild soon.

Zebrafish transcriptome data from 9 tissues were used to build a set of genes and splice variants. For each loci we chose the variant with the highest read support to display, further details on the process are available here.
To display the genes, go to the Region in Detail, or Region Overview. Use the “Configure this page” button and select “RNASeq Genes” from the “Genes” menu. The “Supporting DNA Alignments” menu contains supporting exon and intron features from each of the nine tissues. Clicking on these features in Ensembl location pages shows a simple read count for the intron features and RPKM values for transcripts and exons, (reads per kilobase of model per million mapped reads, from Mortazavi Nature Methods 2008).

This is a first attempt at visualising tissue specific read depth and alternative splicing, which we hope to develop further in the future.

The Ensembl Genomes Project is pleased to announce release 5 of Ensembl Genomes.

The main highlights of this release are:

  • Software migration to Ensembl 58
  • Total of 6 new bacterial genomes for Escherichia/Shigella and Staphylococcus collections for Ensembl Bacteria; pairwise alignments added for all collections.

  • 2 new genomes, Phaeodactylum tricornutum and Thalassiosira pseudonana, for Ensembl Protists; pairwise alignments added.

  • Pristionchus pacificus genome from Wormbase and updates to the Drosophilia variation database added to Ensembl Metazoa.

  • Updates to the variation databases for A.thaliana, O.sativa japonica and V.vinifera in Ensembl Plants.

For further details regarding this release please visit:

We are pleased to announce the fourth release of Ensembl Genomes.

Ensembl Genomes is a companion project to Ensembl designed to provide access to genome scale data for non-chordate species of scientific interest.

Features of the new release include the addition of new databases for the bread mould Neurospora crassa to the Ensembl Fungi division, the slime mould Dictyostelium discoideum (built with the assistance of dictyBase) to Ensembl Protists, and the body louse Pediculus humanus (containing data provided by VectorBase) to Ensembl Metazoa.

The release also includes new variation databases for Drosophila melanogaster (using data from the Drosophila Population Genomics Project) and Plasmodium falciparum; the update of existing variation databases for Arabidopsis thaliana and Vitis vinifera in Ensembl Plants; and the addition of three new clades to Ensembl Bacteria.

Ensembl Genomes is available at http://www.ensemblgenomes.org

Updated elephant and gorilla genomes are now available on the Ensembl Pre! site.

They will be released in full with annotated gene sets in Ensembl 57 (due spring 2010). The new gorilla assembly (gorGor2) includes short-read and capillary sequences. The elephant genome (Loxafr3.0) was also updated, and is at 7x coverage. The 57 release will present new genebuilds for both species.

Ensembl release 57 has been rescheduled for mid to late February 2010.

We had originally planned for release 57 to be this week, but our final quality checks identified a significant error in the unreleased data set. Because of this, we feel that our users would be better served by rescheduling the release to ensure that we provide the best possible data resources for the community.

On behalf of everyone in the project, thank you for your continued support of Ensembl and we wish you all the very best for the holiday season and the new year.

We are pleased to announce the third release of EnsemblGenomes, which includes the first release of two new Ensembl-based portals, Ensembl Plants and Ensembl Fungi.

These complete the span of Ensembl Genomes portals across the taxonomic space, complementing the coverage of vertebrate genomes available through Ensembl.

  • Ensembl Plants has been built in collaboration with Gramene and includes the genomes of six monocots and two dicots. Variation databases are available for four of these species.
  • Ensembl Fungi includes a new build of the Sacchromyces cerevisiae genome using the latest data from SGD, including variation data derived from the Saccharomyces Genome Resequencing Project; and Ensembl databases for Schizosaccharomyces pombe (built in collaboration with GeneDB_Spombe) and eight species of Aspergillus (built in collaboration with the Central Aspergillus Database Repository, CADRE).
  • User upload databases are now operational for Ensembl Protists, Fungi, Plants and Metazoa, allowing users to visualise their own data in the Ensembl environment.

Ensembl Genomes release 3 has been built using Ensembl 55 software. We aim to synchronise with Ensembl with our next release (Ensembl Genomes 4/Ensembl 57), and to stay synchronised thereafter.

The Ensembl project is pleased to announce release 56 of Ensembl (http://e56.ensembl.org/). Highlights of this release are:

Reintroduction of our multi-species views. Alignments (image), formerly alignsliceview, shows pairwise or multiple alignments from the Ensembl Compara database, highlighting any gaps in the alignment.

Multi-species view, formerly known as multicontigview, displays pairwise alignments without gaps; multiple pairwise alignments can be configured to create a multiple alignment display. As well as genes, other types of features such as regulatory features can be displayed in this view, making this a very useful display for comparative genomic analysis.

A new tab has been added in release 56 based on a Regulatory Feature object. This will enable better display some of the data underlying the Ensembl regulatory build. The new pages are accessed from the gene displays by clicking on the ‘Regulation’ link in the left-hand menu and then clicking on a regulatory stable ID in either the image popup menus or the table.

From release 56, users can upload wiggle plot data in WIG and bedGraph formats and view this data on various location-based views. At the moment, only a single style, “wiggle”, is available on Region in Detail, whereas a selection of density plots are available on whole chromosome and karyotype images. In addition, Region in Detail now supports greyscale rendering of BED scores via the useScore parameter in the file, and rendering of features in different colours via the itemRgb parameter and per-feature values.

New data in this release includes gene sets on two new species (Pig and Marmoset) and a new gene set on the existing Rat Rnor3.4 assembly. Also in this release is an updated human gene set which includes all the Havana manual annotation in the merge with the Ensembl automatic annotation set. This set represents the Encode project GENCODE 3b gene set. Also included is a new human variation database based on dbSNP 130 and mapped to assembly GRCh37.

For more information on these and other new features in this release visit:


EMBL-EBI is pleased to announce release 2 of Ensembl Genomes; extending Ensembl further across the taxonomic space!

Highlights for this release include:

  • 9 new genomes of Escherichia species; 4 new Bacillus and Streptococcus genomes; and additional genomes of Mycobacterium and Staphylococcus genera added to Ensembl Bacteria; taking the total bacteria species/strain count to 134.
  • Bacillus subtilis now represented using the re-annotation by Barbe et al. (Microbiology 155 2009, 1758-1775).
  • Comparative genomic analyses (bacteria, protist, metazoan and pan-taxonomic Compara databases) updated.
  • Ensembl 54 software

Keep an eye out for Release 3 in September, which will include plants and fungi…

We are currently working on our next release which is due at the end of June 2009 and will contain the following:


Human GRCh 37
We will be releasing a new genebuild for human based on the latest assembly GRCh37 from the Genome Reference Consortium. A preliminary version of this assembly is available now in Ensembl Pre! Due to the new assembly we will have:

  • Updated repeat masking
  • New probeset mappings
  • cDNA update
  • A new ensembl-vega merge delivering a new gene set
Ensembl 55 includes the 2X genome for Tammar Wallaby (Macropus eugenii), this will be a projection build similar to our other 2X species.

C. elegans
We will also include an import of the WormBase release WS200 database for C. elegans.

Anole lizard – A gene patch incorporating the gene set provided by Chris Ponting at Oxford University means that we have a new gene set for the green anole lizard (Anolis Carolinensis).

Mouse – The mouse cDNA alignments have been updated.

Zebrafinch – There will be an updated gene set for the 6X zebra finch genome.

Zebrafish – Non-coding RNAs will be added to the Zv8 zebrafish assembly and there will also be some changes to protein coding gene models and new repeats and expression patterns.


Schema Changes

  • Patch to update versions (patch_54_55_a.sql). * Add the missing types to go_xref (patch_54_55_b.sql).
  • Add new table dependent_xref (will hold the dependencys for the xrefs, i.e. if an EMBL entry come from a uniprot entry this relationship will be in the table)( patch_54_55_d.sql).
  • Add new tables for alternative splicing/transcript events (patch_54_55_c.sql).
  • Add new column ‘is_constitutive’ to the exon table (patch_54_55_e.sql)

Xrefs will be run for Human, Macacca, Opossum, Chimp, Chicken, Dog and Mouse (including Fantom Xrefs).

Ontology database schema and tools
The ensembl_go_NN databases are no longer being built. Instead we are replacing this with the ensembl_ontology_NN database which may be connected to using the core API.

Assembly mapping
Some of the databases will contain mapping coordinates between current and previous assemblies:

  • human: mapping from current GRCh37 to NCBI36, NCBI35 and NCBI34
  • mouse: mapping from current NCBIM37 to NCBIM36, NCBIM35 and NCBIM34
Other changes
  • API support for alternative transcripts/splicing events will be added
  • API support for constitutive exons will be added
  • Deprecated API modules will be removed
  • All slices will be created using the new_fast method from the SliceAdaptor to improve performance
  • seq_region seq edit support will be added. Seq_edits can already be stored and retrieved but these were not used in getting the sequence data. This will be changed so that “_rna_edit” attributes in the seq_region_attrib table will be used and the sequence changed.
  • MySQL and FASTA dumps will be copied to Amazon Public Datasets project
  • Gene name and xref projections

  • New functional genomics mart * A new Probe section added to Ensembl mart
  • New ontology mart
  • Constitutive exon information will be re-added to Ensembl mart

  • There will be a new human variation database generated by mapping NCBI36 coordinates to GRCh37 (using dbSNP 129)
  • Illumina array data for SNP/CNV is to be added
  • Transcript variations for Zebrafish and Zebrafinch will be reculated to include information from the new gene sets
  • Schema change – added a call to get consequence_type
Functional genomics
  • Human Regulatory Build will be updated using the GRCh37 assembly
  • Probe alignment and transcript annotation for all species will migrate from the core datbases to the functional genomics databases, this includes Affymetrix, Illumina, Codelink and Phalanx
  • Schema change, an is_current filed is to be added to the coord_system table
Comparative genomics

Alignments – The new human assembly means that the following alignments will be regenerated:

  • 9 eutherian mammals EPO multiple alignments
  • 31 eutherian mammals EPO multiple alignments
  • 12 amniota vertbrates Pecan multiple alignments
  • 4 catarrhini primate EPO multiple alignments
  • Pairwise BLASTZ-NET alignments of human against each of the other 9 and 31 eutherian mammals
  • Additional pairwise BLASTZ-NET alignments will be run for human-opossum, human-platypus, human- chicken and human-wallaby
  • Translated BLAT-NET will be regenerated for human against fugu, X.tropicalis, C.intestinalis, C.savignyi, stickleback, medaka, chicken, zebrafish, tetraodon, zebrafinch and anole lizard

Synteny will be recalculated for: rat vs. huamn, chicken vs. human and human vs. macaque, dog, chimpanzee, platypus, opossum, mouse, orangutan, horse and cow

Homologies amd families

  • 50 way GeneTrees and homologies with new/updated genebuilds and assemblies
  • Clustering using hcluster_sg
  • Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins + muscle + kalign + probcons) and new exon-skipping aware “skipper” algorithm.
  • New ‘putative gene split’ and ‘distant paralog’ homology types
  • Pairwise gene-based dN/dS calculations for high coverage species pairs
  • Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa
  • Multiple sequence alignments with MAFFT
  • Stable IDs for GeneTrees (ENSGT00550NNNNNNNNN) and MCL Families (ENSFM00550NNNNNNNNN).