Starting from this release (65), we provide further details on the internal aspects of the GeneTrees. Our trees are built using a several phylogenetic reconstruction methods, each of them resulting in a different tree. These are combined by TreeBeST to produce final tree (Read more on the pipeline). Each node of the final tree is supported by at least one of the original trees.
The new functionality is not quite like an X-ray plate, but you can click on any node to find out about which methods support this particular node. The 5 methods we use for protein-coding gene trees are:
- phyml_aa: maximum likelihood (ML) tree based on the protein alignment with the WAG model
- phyml_nt: ML tree based on the codon alignment with the HKY model
- nj_mm: neighbour-joining (NJ) tree based on the codon alignment using p-distance
- nj_dn: NJ tree based on the codon alignment using dN distance
- nj_dn: NJ tree based on the codon alignment using dS distance
We have recently added a new table at the top of the Orthologues view. This table shows the number of species that have a 1:1, 1:many or many:many orthology relationship with the current gene. This table shows that information for the human BRCA2 gene:
The table also contains a ‘Show details’ column. This can be used to restrict the list of orthologues shown on the page to this or these species clades.
Please, refer to the Ensembl documentation at http://www.ensembl.org if you want to know more about how we infer orthologues.
People following the declarations of intentions for the next release (these are sent to email@example.com) may have noticed that we are releasing LASTZ pairwise alignments instead of BLASTZ ones. LASTZ is written by Bob Harris from the Penn State University as a replacement of BLASTZ, as BLASTZ is now considered obsolete (read the announcement).
This is the first release where we use LASTZ for the new alignments. We will update the previous alignments in the following releases.
Ensembl is already working on the forthcoming release (e60!). The declaration of intentions have been published and include the new Giant Panda (Ailuropoda melanoleuca) genome.
While I was looking at the alignments we are planing to release, I saw the H.sap-A.mel LASTZ-net alignments. Hey! I thought we banned Apis mellifera from Ensembl long time ago.
Then I realised that the scientific names of both the honey bee and the giant panda start with the same letters. So, don’t get confused, one of them eats flowers and makes honey, while the other one eats bamboo and makes very nice teddies!
It is now possible to get the GERP constrained elements via the DAS protocol. For instance the DAS command to get all the GERP elements on the BRCA2 gene (Human chr 13: 32889611-32973347) is:
By default, you obtain both the constrained derived from our 16-way amniote alignments and the 33-way placental mammals ones (these include all the low-coverage genomes). You can filter the elements you want by using the argument type:
Read more on DAS or on the multiple alignments and constrained elements.
Comparing and assessing the quality of whole-genome multiple alignments is a difficult task. In the protein world, many mathematical models are available. They are based on synonymous and non-synonymous substitutions and the physicochemical similarities among the aminoacids. None of this can be applied to non-coding sequences, the 99% of the human genome.
There are two main trends for whole-genome alignments. Authors have either used genomic features like ancestral repeats or have developed phylogenetic models to generate synthetic sequences for which the “real” alignment is known.
Two articles have been published recently, one proposing a new method based on artificial sequences (Kim & Sinha, BMC Bioinformatics 2010, 11:54) and the other one looking at the coverage, agreement and accuracy of the alignments in the ENCODE pilot regions (Chen & Tompa, Nature Biotechnology 2010, doi:10.1038/nbt.1637).
According to both studies, Pecan is the strongest contender, showing the clear advantage of using a consistency-based approach (see Paten et al., Genome Res. 2008, 18:1814-28) to align the sequences.
This week, Albert Vilella and myself participated in the Xfam consortium meeting. The meeting focussed on protein, domains and ncRNA classification, and on the new developments of the HMMER package.
Although Ensembl is not part of Xfam, we share many interests. We are getting increasingly interested in the use HMMER models, especially since the release of HMMER3.0. Also, in the forthcoming release (version 58), Ensembl will provide gene trees for ncRNAs. Most of these ncRNA genes are annotated using Rfam models.
Stay tuned for more!
We have added a little trick for orthology-lovers. Starting from the orthologues page, you can choose to switch to the GeneTree. This will highlight the orthologue of interest, as well as the ancestral node that relates both genes.
Another useful feature added in Ensembl 57 is the possibility to display a set of genes (up to 10) using the Multi-Species view. Click on an internal node and select the “Jump to Multi-species view” option. This will show each of these genes in their respective genomic location, with genomic alignments when available.
Ensembl 57 includes the turkey genome, the third bird in Ensembl. We are now providing a 3-way avian multiple alignments (chicken, turkey and zebra finch) together with GERP constraint analysis. The image shows amniote and bird constrained elements on the chicken genome.
We have also added a new set of fish multiple alignments (stickleback, medaka, takifugu, tetraodon and zebrafish). GERP constraint analysis is available on fish genomes as well.
We have some news for the forthcoming ensembl release. We have added a few more display options for our gene trees. It will be possible to colour the background of the trees based on the taxonomy. It will be much easier to locate orthologues or paralogues in a given clade. For people who prefer more subtle colouring, they can choose to colour the branches instead.
It will also be possible to automatically collapse all the genes for a given clade. In the example shown in this figure, glires and diptera are collapsed. Moreover, the new version will also allow you to hide fish genes for instance or even all genes from low-coverage genomes.
All these options are configurable from the configure panel available through the ‘configure page’ link in the left panel.