Starting with release 55 of Ensembl we provide an ensembl_ontology database. It replaces the older ensembl_go database which used to be loaded straight from the public table dumps provided by the Gene Ontology group (and hence wasn’t really an Ensembl database to start with). The associated API is now part of the Ensembl Core API, which should make working with GO terms in Ensembl more straightforward than it was in the past. Available methods include, amongst others, fetching all parent or child terms of a given GO term and fetching all genes, transcripts or translations annotated with a given GO term.
More detailed documentation on both database and API can be found at ensembl/misc-scripts/ontology/README.
Credit for developing the ensembl_ontology database and API goes to Andreas Kahari of the Ensembl Software team.
We have some news for the forthcoming ensembl release. We have added a few more display options for our gene trees. It will be possible to colour the background of the trees based on the taxonomy. It will be much easier to locate orthologues or paralogues in a given clade. For people who prefer more subtle colouring, they can choose to colour the branches instead.
It will also be possible to automatically collapse all the genes for a given clade. In the example shown in this figure, glires and diptera are collapsed. Moreover, the new version will also allow you to hide fish genes for instance or even all genes from low-coverage genomes.
All these options are configurable from the configure panel available through the ‘configure page’ link in the left panel.
The Ensembl Functional Genomics (eFG) environment has been expanded to incorporate array mapping functionality. Historically, arrays from different vendors have been processed in similar, but non-identical ways due to differing array designs, with the output being stored in the core database. The ‘arrays’ environment unifies this process within the eFG database to provide a new standardised array mapping procedure for all array formats. This involves a two step process whereby probe sequences are aligned both to genomic and transcript sequences, and then subsequently transcripts are annotated with xrefs(DBEntries) dependant on the quality of the probe alignments around a given transcript locus.
The ‘arrays’ environment provides easily accessible and interactive command line functions to help run and administer the array mapping pipeline. Recent developments include broader array format support and multi-species capability, along with capture of much more detailed mapping information. This data has yet to be seen in the Ensembl browser, but from release 55 we will start redirecting the web displays to use the eFG data, with a view to developing a more detailed ‘Probe’ panel at some point later in the year.
We will endeavour to provide alignments and mappings of all popular arrays, for all others we invite you to try out the eFG ‘arrays’ environment. For more information check out(literally):
Or see it online here.
If you have any questions, please mail email@example.com
Many users ask us about how to download data from ensembl. Usually, the answer is using BioMart. Comparative genomics data are also available in the standard Mart for your favorite species. For instance to get all the human-mouse orthologs, one can select the human dataset, filter all the genes with no mouse orthologs and choose to output the mouse orthologs for all the resulting genes.
Here is how to get these data in 10 simple steps
1. Go to: http://www.ensembl.org/biomart/martview
2. Choose “Ensembl 52”
3. Choose “Homo sapiens genes (NCBI36)”
4. Click on “Filters” in the left menu
5. Unfold the “MULTI SPECIES COMPARISONS” box, tick the “Homolog filters” option and choose “Orthologous Mouse Genes” from the drop-down menu.
6. Click on “Attributes” in the left menu
7. Click on “Homologs”
8. Unfold the “MOUSE ORTHOLOGS” box and select the data you want to get (most probably the gene ID and maybe the orthology type as well).
9. Click on the “Results” button (top left)
10. Choose your favorite output
Here is the preview of the results:
Other people may prefer to use our Compara Perl API or get the data directly from the Compara DB. These options are also available.
The beginning of this week myself and Paul Flicek were in lovely Rotterdam at the Gen2Phen kick off meeting, an EU project lead by Tony Brookes from Leicester. Like all large European projects, the kick off meeting is a get-to-know everyone, have beers (very good ones in Holland) and get a feel for the project.
For me, the exciting thing was getting closer to the locus specific databases – in the project is Johan den Dunnen (from just down the road in Leiden, Holland) and Andy Devereau (from Manchester) who run locus specific databases and diagnostic databases respectively. Getting this valuable data coordinated with genome data (and the fiddly bit is about sequence coordinates, at least at first) is going to be great thing to do.
There’s lots to do in this area – certainly this is something that effects all the big browsers (UCSC, NCBI, ourselves) and has a had a long history of complex systems and sociological tensions in getting things sorted. But my sense in this small room hidden away in the Erasmus medical centre was that we had good people in the room, committed to finding a good solution whilst understanding the complexity of problem. Next up will be more technical meetings, but it was an excellent start. Don’t expect anything tomorrow, but I think we can expect something end of 2008/2009.
And did I mention the beer was good as well?