Want to be sure you are going to the right version of Ensembl? Whilst our archives can be accessed through the link at the bottom of each page, if you want to cite a particular version or access it directly, you previously needed to know the month and year of release to find the archive site (e.g. may2009.archive.ensembl.org).

Now, for the convenience of our users, we have introduced shortcuts that include the version number instead of the date. For example, typing:

e54.ensembl.org

into your browser will redirect you to the same May 2009 archive.

We have put these redirects in as far back as e30 – if the archive no longer exists, you will be directed to the next most recent one (unless doing so would mean a change of assembly, in which case you are redirected back to the last archive on your chosen assembly, if available).

P.S. Don’t forget the ‘e’ at the beginning – we can’t use plain numbers as it causes problems with DNS servers


We have added a little trick for orthology-lovers. Starting from the orthologues page, you can choose to switch to the GeneTree. This will highlight the orthologue of interest, as well as the ancestral node that relates both genes.

Another useful feature added in Ensembl 57 is the possibility to display a set of genes (up to 10) using the Multi-Species view. Click on an internal node and select the “Jump to Multi-species view” option. This will show each of these genes in their respective genomic location, with genomic alignments when available.

Ensembl 57 includes the turkey genome, the third bird in Ensembl. We are now providing a 3-way avian multiple alignments (chicken, turkey and zebra finch) together with GERP constraint analysis. The image shows amniote and bird constrained elements on the chicken genome.

We have also added a new set of fish multiple alignments (stickleback, medaka, takifugu, tetraodon and zebrafish). GERP constraint analysis is available on fish genomes as well.


The gene tree images now have little intron “ticks” on them showing how the intron position is placed relative to the protein sequence. An example is shown above. Each tick is a little black line on each side of the green protein bars, on the right. As intron positions have been remarkably stable on the “chordate” side of the metazoan tree (ie, the deutrosomes), one should expect that the introns line up – if they do, it is good evidence that the alignment is right.

There are some interesting things. Ensembl models small frameshifts to create open reading frames around erroneous data as tiny introns. In this code you cannot distinguish these two classes of introns, but as these errors normally come in patches, a run of intron ticks unique to a genome is probably a set of errors (an example is in Gorilla). I’ve enjoyed browsing around some of my favourite genes to check out that the introns make sense.

There is some more to go here. The fact that the intron ticks disappear on collapsed nodes is a bit frustrating – it would be nice to see “consensus” intron positions (though this is a bit complex to execute underneath).

Did you know that you can use the ensembl API to predict the consequences of your own SNP positions? This is a really popular question and there is some example code on the website to guide you through this. See an example here. This functionality is available from ensembl release 56 but we have also recently patched release 54 in case you need to use the NCBI 36 human assembly.
Soon there will be a page on the website where you can upload your data and we will project SNP consequences for you.

When we changed our look and feel almost a year ago, we “left behind” our two main graphical genome-wide comparative genomics displays (our textual comparative genomics displays remains, as did some of the gene centric ones). These were some of the most complex displays, not only in the graphics layout but also in aspects such as configuration – with comparative genomics tracks with up to 30 species, potentially one has the union of all tracks in each species, and doing this consistently required reworking how we thought about the “same” or “different” tracks across species.

It’s taken longer than we thought it would, but finally in release 56 these displays are back and better than ever. With more aggressive caching of data items as they head to the web (and, in addition, if you are on the west coast of the US or the Pacific Rim, check out the US west mirror at uswest.ensembl.org) they go far faster, making them far more useable.

We have two fundamentally different ways of thinking about genomic alignments.

In “Multi Sequence View”, which works fundamentally as a set of pairwise alignments, we maintain the linear sequence of each genome, and then draw regions which are conserved between them. Check out displays like:

Mouse/Human

And make sure you hit “Configure Page” and in the Comparative Genomics section, switch on blastz. I also like to have genes in “Collapsed, labels” (so alternative splicing doesn’t produce excessive displays) and also switch on Regulatory Features.

Now – you get a nice picture of this region in human and mouse. The orthologous gene (PECI) has conserved exons, and the regulatory features at the start of this gene is conserved in human and mouse and both cases classified as a promoter. All as expected.

But a closer look shows that the transcript going by the catchy name of AC123437.5 in mouse, going on the opposite strand has some of its exons overlapping to the human PECI, and Human PECI is duplicated into two local genes here. This is perhaps easier to see as one zooms out in this display (notice you can drag-and-select in the upper panels, or use the + and – bars to change in the lower panels)

Zoom Out

In contrast, the alignment (Image) view, asks you to choose one species as the co-linear
reference, and then the other species are organised specifically by the alignment of that
reference. This is ideal in more linear, orthologous regions. I like using the 10-way EPO alignment for visualisation/gene model comparison, although to go things like conservation analysis, you want to use the 31-way mammalian alignment with the low coverage data

This is gene, well conserved across mammals.

Co Linear

We can look at the precisely the same alignment from the perpsective of Mouse, Rat, Dog, Horse, Human, Pig. In each case, the alignment is unbiased to each species. For example, the Mouse-Rat portion of this multiple alignment still aligns the unique rodent portions.

Here is that same region from the perspective of Cow:

Cow

Notice when you go to human you have a choice of not only 4 different multiple alignments – a 4-way primate alignment, a 10-way mammalian alignment, a 12-way alignment including chicken and 31-way mammalian alignment, but also 40 odd other individual pairwise alignments.

In each case, you can get the alignment out as text – here’s a 4-way primate alignment:

Text alignment

or the same region in a 31-way glory

31 one way text

Of course, all this information is also available to download or access through our Perl API. A particularly interesting thing in these alignments is the ability to switch on the ancestral sequence as well (go to the configuration panel).

More on the use and power of comparative genomics later I hope, but for the moment, do enjoy these displays being back, and do both browse around and download/script against them.

Ewan

Release 55 has lots of goodies – not least the new, coordinated, GRCh37 assembly (more on that later), but one addition is the Martability of Ensembl Regulatory Features. Regulatory features are on by default on Human and Mouse, and each gene has a specific page for the regulatory features (for example http://www.ensembl.org/Homo_sapiens/Gene/Regulation?g=ENSG00000139618). Regulatory Features are developing fast, and the Martability is bringing out the richer information in the functional genomics database – for example, the classification of features into “promoter”, “gene associated” and “unclassified”. Next release we’re hoping to release a more graphical view for each feature, but the present of the regulatory features in Mart allows the large scale users – from Perl, Java, R or just plain-only tab delimited text – to use them.

We’re expecting alot of development in this area – the addition of Mouse DNaseI sites has allowed us to develop a Mouse build, and of course, the ENCODE project which is now on line in production mode will provide a far richer, deeper, dataset to work against.

So – watch this space.

Starting with release 55 of Ensembl we provide an ensembl_ontology database. It replaces the older ensembl_go database which used to be loaded straight from the public table dumps provided by the Gene Ontology group (and hence wasn’t really an Ensembl database to start with). The associated API is now part of the Ensembl Core API, which should make working with GO terms in Ensembl more straightforward than it was in the past. Available methods include, amongst others, fetching all parent or child terms of a given GO term and fetching all genes, transcripts or translations annotated with a given GO term.

More detailed documentation on both database and API can be found at ensembl/misc-scripts/ontology/README.

Credit for developing the ensembl_ontology database and API goes to Andreas Kahari of the Ensembl Software team.


We have some news for the forthcoming ensembl release. We have added a few more display options for our gene trees. It will be possible to colour the background of the trees based on the taxonomy. It will be much easier to locate orthologues or paralogues in a given clade. For people who prefer more subtle colouring, they can choose to colour the branches instead.

It will also be possible to automatically collapse all the genes for a given clade. In the example shown in this figure, glires and diptera are collapsed. Moreover, the new version will also allow you to hide fish genes for instance or even all genes from low-coverage genomes.

All these options are configurable from the configure panel available through the ‘configure page’ link in the left panel.

The Ensembl Functional Genomics (eFG) environment has been expanded to incorporate array mapping functionality. Historically, arrays from different vendors have been processed in similar, but non-identical ways due to differing array designs, with the output being stored in the core database. The ‘arrays’ environment unifies this process within the eFG database to provide a new standardised array mapping procedure for all array formats. This involves a two step process whereby probe sequences are aligned both to genomic and transcript sequences, and then subsequently transcripts are annotated with xrefs(DBEntries) dependant on the quality of the probe alignments around a given transcript locus.

The ‘arrays’ environment provides easily accessible and interactive command line functions to help run and administer the array mapping pipeline. Recent developments include broader array format support and multi-species capability, along with capture of much more detailed mapping information. This data has yet to be seen in the Ensembl browser, but from release 55 we will start redirecting the web displays to use the eFG data, with a view to developing a more detailed ‘Probe’ panel at some point later in the year.

We will endeavour to provide alignments and mappings of all popular arrays, for all others we invite you to try out the eFG ‘arrays’ environment. For more information check out(literally):

ensembl-functgenomics/docs/array_mapping.txt

Or see it online here.

If you have any questions, please mail ensembl-dev@ebi.ac.uk