We are delighted to announce the latest Ensembl release 64 (e!64).

This release includes assemblies for two new species; lamprey (Petromyzon marinus) and Tasmanian devil (Sarcophilus harrisii) as well as a patch of the human assembly (GRCh37.p5) and an update of the cow assembly (UMD 3.1).  We have incorporated the most recent human and mouse manual gene annotations from HAVANA, new regulation data for human and mouse, as well as many other interesting data updates and features. The previous Ensembl release is archived at e63.ensembl.org.

Petromyzon_marinus_7.0 is an assembly of the sea lamprey (Petromyzon marinus) provided by the lamprey consortium which was sequenced to a total of 5.0X whole genome coverage. The gene set for lamprey was built using the Ensembl genebuild pipeline. New translated BLAT whole genome pairwise alignments against the zebrafish, the stickleback, Ciona intestinalis and the human genome are now available for lamprey. Protein trees now include genes from the lamprey (10,079 genes) and with the inclusion of the lamprey, 849 more trees have a root older than the last common ancestor of bony vertebrates.

We now have new phenotype views where one can view genes associated with diseases and phenotypes. The new phenotype page can be accessed via the gene tab. Associated genes and variations to a phenotype can also be displayed on a karyotype. The associated colour key corresponds to the p-value of the association between the variation and the phenotype.

In order to make turning on data tracks easier, a number of changes have been made to the configuration panel in the region in detail page (accessed via the “Configure this page” button), including a new menu structure with grouping for similar track types. Configuration for regulatory evidence is now accessible via two links in the Regulation section of the menu for the configuration panel – “Open chromatin & TFBS” and “Histones & polymerases”.

The Tasmanian devil (Sarcophilus harrisii) 7.0 assembly, provided by Illumina and the Wellcome Trust Sanger Institute, has been added as a new species to Ensembl for release 64.  RNASeq data was used in the genebuild and can be found in the otherfeatures database. More detailed information on the genebuild can be found here.

Check out our improved FAQ’s. These have been reorganized into categories.

Confused about browser navigation? Why not try our new elearning course!

More details on some of these changes will be posted soon, so keep an eye on our blog!

More information also available on the Ensembl website.

We are glad to announce the launch of our latest installment.

Ensembl Release 63 (e63) includes a new high-coverage assembly for microbat (Myotis lucifugus), the most recent human and zebrafish manual gene annotations from Havana, and a fresh update of mouse variation data, among numerous other additions. The previous Ensembl release is archived at e62.ensembl.org.

Tracks on Region in detail and Region overview pages can now be reordered by dragging them to a new position on the image. The strand of the track can still be identified by a colour and a text message when passing the mouse over the track bar.

 

The popular Variant Effect Predictor (VEP) tool has been updated in e63, including speed improvements and a renewed support for variants that fall in regulatory regions.

 

Pie charts have been added to the human variation pages for the 1000 Genomes population allele frequencies.


A new configuration table facilitates the exploration of regulatory data, including the capacity to search for specific markers of interest.  To access this functionality, click on ‘Configure this page’ while on a Location View or a Regulatory Region View and select ‘Regulatory Evidence’.

 

A new microbat genome assembly brings it from low to high-coverage. A new genebuild has been performed on this assembly using the Ensembl gene annotation pipeline.

 

Users of our Perl API will certainly enjoy the new Doxygen-based API documentation, with an improved user interface, better support for object-oriented programming and a comprehensive search tool. There is also an updated Regulation API tutorial to help users access regulatory data programmatically.

More details on some of these changes will be posted soon, so keep an eye on our blog!

More information also available on the Ensembl website.

Chimp Pre!There is a new Pre! site for version 2.1.3 of the chimpanzee genome assembly (known as Pan_troglodytes-2.1.3). This assembly will remain as a Pre! site while we complete the genebuild for the most recent assembly:  Pan_troglodytes-2.1.4. These two assemblies are identical except for the Y chromosome which has been updated for Pan_troglodytes-2.1.4.

The Pre! site contains over 56,000 Genscan predictions  and 1,517 gene models based on chimpanzee proteins. In addition we are displaying Exonerate alignments for chimp cDNAs and ESTs as well as cDNAs and Ensembl peptides from e!62  human.

The Pre! Ensembl site has been updated to run version 62 of the Ensembl APIs and web code. This means that many of the new web features that have been added to the main Ensembl site over the past few releases are now available on the Pre! Ensembl site, including the Variant Effect Predictor (VEP), favourite tracks and the ability to attach BAM, BigWig and VCF data files.

The full gene build of the previous Pan_troglodytes-2.1 assembly can be found here.

Variation consequence types, such as “intronic” or “non-synonymous”, describe the variation location or effect of a variation on a transcript. For the latest version of Ensembl (release 62) we have made some significant changes to the way in which we determine these consequence types, and we’d like to provide an overview of these improvements.

Firstly, we are now able to assign a specific effect to every allele of a variant. For example, rs12795274 has three alleles, the reference allele is T, and it also has two alternative alleles; C and A. The A is predicted to cause an amino acid change, while the C is synonymous. We now list the effect of each individual allele on the website and you also can fetch them separately when using the variation API

Another improvement we’ve made is that “under the hood” we now use terms defined in the Sequence Ontology (SO) to describe the consequence types. Moving to this set of externally maintained terms should make it easier to compare Ensembl annotations with those from other groups. The SO also groups the various terms we use into a hierarchical tree and, in the future, this will let users query for variants with particular effects in a much smarter way than is possible now.  On the website we are still using our old terms by default, but you can see the mapping between the old terms and the SO terms on the variation documentation page and you can use “Configure this page” on several variation views to choose which set of terms you want to see (here‘s an example).

We also now provide SIFT and PolyPhen predictions for any variant that is predicted to cause an amino acid substitution in human. These are popular tools developed by external groups that try to predict the effect of a non-synonymous mutation on the function of the protein. You can see these predictions on several variation views, a useful example is the protein variation view. You can find more information about these tools and how we run them in Ensembl on the variation documentation page.

CropperCapture[1402]

All of these improvements are also available for you to use to analyse your own data using the Variant Effect Predictor (VEP). The VEP has new configuration options that allow you to choose which set of terms you want to use for the consequence annotations, and also offers options to fetch SIFT and PolyPhen predictions for any missense mutations in your data. We are able to provide these predictions for novel mutations by computing the predictions from SIFT and PolyPhen for all possible amino acid substitutions in human proteins and storing these in the variation database. We hope that this makes the VEP even more useful for mining your data and we have plans to add support for these sort of tools in other species in the near future.

We have just updated the Ensembl genome browser and underlying databases to version 62.  We would like to share some new features with our user community.

The new Ensembl release hosts a new species- the white-cheeked gibbon (Nomascus leucogenys).  A new genebuild has been performed using the Ensembl gene annotation pipeline, incorporating both gibbon and human sequences to determine the gibbon gene set.   Compara has also incorporated the gibbon genes and genome into comparative genomics analyses, as is usual for new genebuilds.  Gibbon can now be found in gene trees, a pairwise whole genome alignment with human , and a 35-way multi-species alignment.  View these alignments in views like this one.

BigWig files are now supported through attachment of a url to Ensembl.  Click on ‘Manage your data’ at the left of a location page, and select ‘Attach Remote File’ from the new menu.

Not sure what Ensembl has to offer, or how to use our resources?  Now whenever you search for a term, hits to Help and Documentation will come up.  These may be to page-specific help, FAQs or the glossary, depending on the term.  As always, we hope our users make requests- if you can’t find what you’re looking for, let our helpdesk know.

Finally, SIFT and PolyPhen predictions are available in human variation pages, and in the popular Variant Effect Predictor (for human).  A more detailed post on these variation analyses will be coming soon, so keep your eye on the blog.

More features like the new comparative genomics navigation menu have been released, so explore and let us know what you think.  More news is available on our website.

I’m going to be blogging a bit more about the recent Ensembl 61 release and the Ensembl Genomes 8 release – lots and lots of goodies in both these releases – web site tweaks (some of the them totally critical for generating good displays), the new “favourite tracks” feature, and impressive content changes.

I’ll start today on content changes, and in Ensembl Genomes 8 there are some important genome additions. Some come from Paul Kersey’s new collaboration with PhytoPathDB – more on that in a later post – but top of my excitement has been the diversity in metazoa. The Ensembl Metazoa team has added Sea Urchin, Sea Anemone, the rather weird primitive animal, Trichoplax adhaerens (also called the “carpet” organism) and the blood fluke, Schistosoma mansoni. The motivation of bringing these organisms in is to broaden our phylogenetic tree and comparisons we can provide across all of life. So for example for the drosophila Twist Gene one can now see the deep tree for this across metazoa. For example, there is a deep ortholog to Trichoplax which seems to predate the split of some of these Helix Loop Helix proteins, whereas there are other members of the family which have a paralog in Trichoplax meaning that there seems a fundamental split in this developmentally key transcription factor. This is just one of many interesting gene trees that one can look at using this resource…

Happy browsing/data mining!

The Ensembl Genomes Project is pleased to announce release 8 of Ensembl Genomes (http://www.ensemblgenomes.org/).

The main highlights of this release are:

  • Software migration to Ensembl 61
  • New Pan Compara database consisting a selection of vertebrate genomes from Ensembl 61 and genomes from Ensembl Genomes 8 (incorporating 8 new species), giving a species total of 313.
  • 3 oomycete genomes added to Ensembl Protists, including Phytopthora infestans and Phytopthora ramorum responsible for potato blight and Sudden Oak Death disease respectively.
  • 5 genomes added to Ensembl Metazoa, including Strongylocentrotus purpuratus (Echinodermata) (sea urchin), Apis mellifera (Arthropoda) (honey bee) and Nematostella vectensis (Cnidaria) (sea anemone).

For further details please visit the individual homepages:
http://bacteria.ensembl.org
http://protists.ensembl.org
http://fungi.ensembl.org
http://plants.ensembl.org
http://metazoa.ensembl.org

Ensembl 61 has gone live. In displays like gene summary and region in detail, favourite tracks can be turned on. To do this, open the configuration panel (click on configure this page in the left hand menu). Activating a star by clicking on it will place that track in the favourites menu (shown by an arrow in the diagram).

Hover over any track name in these views to view information about the data, change the display, or turn the track off. Turning on tracks must still be done with the configuration panel.

We hope this helps ease of navigation. Read about other updates in 61, such as our new species, Turkey, in the news. The Ensembl Team

We are pleased to announce new documentation, specific for describing the gene annotation methodology and results for particular species.

Ensembl gene annotation is a multi-step process which usually takes several months to complete for one species, and is termed the genebuild. In order to provide our users with more information on the data resources used and decisions made during the genebuilding process, we are introducing a new genebuild summary PDF document for each new genebuild, starting from early February 2011 with Ensembl release 61. Each document includes details on not only the alignment programs and data filtering parameters used, but also statistics on the number of protein/cDNA/EST sequences used at different stages of the genebuild. For example, users will be able to find out how many protein sequences were retrieved from public repositories (RefSeq and UniProt) at the beginning of the genebuilding process, how many of these proteins aligned to the genome by various algorithms at different stages of the build, and how many remain in the final gene set as supporting evidence for genes. For human, mouse and zebrafish, the process of merging Ensembl and Havana annotations is also explained.

The genebuild summary will be available for six species: the Anole lizard, Marmoset, Mouse, Panda, Turkey and Zebrafish. More genebuild summaries will be available in the future when genebuilds of existing species are being updated, or when new species are being annotated. You can download the document via a link found near the bottom of the “Description” page for each species. Just click on the species of interest from the home page, to open its description page.

The Ensembl Genomes Project is pleased to announce release 7 of Ensembl
Genomes (http://www.ensemblgenomes.org/).

The main highlights of this release are:

* Software migration to Ensembl 60

* 66 new genomes added for Ensembl Bacteria and updated functional
genomics databases for Escherichia/Shigella and Staphylococcus

* Puccinia graminis f sp tritic genome added to Ensembl Fungi

* Zea mays and Physcomitrella patens genomes added to Ensembl Plants,
new variation datasets for Oryza sativa and updates to the
functional genomics databases for Oryza sativa indica, Oryza sativa
japonica and Arabidopsis thaliana

* Acyrthosiphon pisum genome added to Ensembl Metazoa

Please see the individual homepages for more detailed information:
http://bacteria.ensembl.org
http://protists.ensembl.org
http://fungi.ensembl.org
http://plants.ensembl.org
http://metazoa.ensembl.org