We are happy to announce the launch of our latest Ensembl release (e66).

Major updates for human include:

  • Genome: The human genome assembly was updated to GRCh37.p6, containing 124 assembly patches. The DNA sequence for the primary assembly (chromosomes 1-22, X, Y, unlocalized scaffolds and unplaced scaffolds) remains unchanged.
  • Variation: The short sequence variants are now updated to dbSNP 135.
  • Regulation: A new regulatory consequence table shows regulatory features and motifs at the position of the SNP or other sequence variant.  

All Species:

The Region Report tool allows you to export genomic information like sequence, genes, structural variants, SNPs, conserved regions, and regulatory features.  Try the web interface, or use the Perl script with the core API, and let us know your feedback!

A preserved specimen of Latimeria chalumnae in the Natural History Museum, Vienna, Austria (Source: Wikipedia)

New species:

We have a new species in this release: Coelacanth (Latimeria chalumnae).
Illumina techology was used to produced this high quality draft from the Broad Institute. The whole genome shotgun data was assembled with Allpaths.

 

New genome:

The assembly for Ciona intestinalis is updated. This genome is the smallest of any experimentally manipulable chordate. Ensembl presents the sequence data provided by Kyoto University, with additional Ensembl genebuild.

A complete list of the changes in release 66 can be found here.

Enjoy exploring Ensembl!

The Ensembl Genomes Project is pleased to announce release 12 of Ensembl Genomes.

This release contains 5 new genomes, bringing the total genomes supported to 335.   Main highlights are:

* Software migration to Ensembl 65

* Operon data from RegulonDB for E. coli K12 in Ensembl Bacteria

* A new species (Ashbya gossypii) and manually curated annotation from a new resource for yeast!  This is the first release of Ensembl Fungi containing data from PomBase, a new genome-centric resource (developed by Ensembl Genomes, the University of Cambridge, and University College London) for the fission yeast Schizosaccharomyces pombe.

* The addition of Cyanidioschyzon merolae to Ensembl Plants and significant updates to Physcomitrella patens and the soybean Glycine max
* Three new genomes in Ensembl Metazoa: Atta cephalotes, a leaf-cutting ant, Tribolium castaneum, the red flour beetle, and Bombyx mori, a domesticated silkmoth

Note, EG 12 will be available on ftp next week.

Happy holidays!

The Ensembl Genomes Team

We are happy to announce the latest Ensembl release 65 (e!65).

CodWe have a new species; the Atlantic cod (Gadus morhua) assembly was provided by the Cod genome consortium, and the Ensembl gene set was determined using a combination of annotation approaches.  The standard genebuild procedure was combined with whole-genome alignment and projection from stickleback. The final gene set, now displayed on the main Ensembl website, comprises 20,095 protein-coding genes, 518 pseudogenes, and 1,541 non-coding RNA genes.  Comparative analyses were run using the new species, incorporating the cod into pairwise alignments.

photo of chimpVersion 2.1.4 of the Chimpanzee (Pan troglodytes) genome assembly now replaces 2.1 on the Ensembl web site.

This new gene build was a ‘projection build’, that is, we aligned Human GRCh37 to the new chimp assembly and then projected the Ensembl gene models we had in human onto chimp. This was augmented with traditional gene build pipelines which aligned chimp proteins from RefSeq and UniProt to provide evidence for the gene models. More details are provided in the Genebuild summary document. The final data set includes 18746 protein-coding genes.

The Bushbaby (Otolemur garnettii) gene annotation in e!65 is based on the newest high coverage assembly OtoGar3 provided by the Broad Institute.

variation iconThe Ensembl Variation set for Human has been updated to dbSNP 134 and now includes over 40 million variants.  In addition, we’ve updated the somatic variants from COSMIC (release 55), phenotype data and the structural variation data as usual.  We have added a protein display to the LRG pages (more about the LRG project).

Also, check out our new landing page; icons (see image, above right) help you to navigate our variation views!

The Ensembl Regulation team has delivered several data updates in release 65, including:

  • A new regulatory build for mouse (8 new data sets).
  • Updated micro array mapping for human, chimp and zebrafish.

In addition to this we have also made some updates to the browser:

The Experiment view aims to provide links to the source data used in the Ensembl regulatory build. Each supporting evidence peak in the regulation displays now has a ‘Source’ label in the on click pop up menu. Clicking this will link to the Experiment view which will list the details of the source used to generate the peak.

Enjoy exploring our new data updates!

The Ensembl Genomes Project is pleased to announce release 11 of Ensembl Genomes.

This release contains 7 new genomes.  Main highlights are:

* Software migration to Ensembl 64

* Cross-references to PHI-base, a database of pathogen-host interactions, added to genomes in Ensembl Protists

* Mycosphaerella graminicola and Phaeosphaeria nodorum (major wheat pathogens), along with Tuber melanosporum (symbiotic Perigord Truffle) included in Ensembl Fungi

* Three new genomes in Ensembl Plants: Glycine max (soybean), the spikemoss Selaginella moellendorffii and the green algae Chlamydomonas reinhardti, along with updates to the Oryza glaberimma (African rice) genome and to the gene models of Vitis vinifera (grape)

* Amphimedon queenslandica (demosponge) included in Ensembl Metazoa, along with updated gene models for Drosophila melanogaster and Drosophila pseudoobscura.

We are delighted to announce the latest Ensembl release 64 (e!64).

This release includes assemblies for two new species; lamprey (Petromyzon marinus) and Tasmanian devil (Sarcophilus harrisii) as well as a patch of the human assembly (GRCh37.p5) and an update of the cow assembly (UMD 3.1).  We have incorporated the most recent human and mouse manual gene annotations from HAVANA, new regulation data for human and mouse, as well as many other interesting data updates and features. The previous Ensembl release is archived at e63.ensembl.org.

Petromyzon_marinus_7.0 is an assembly of the sea lamprey (Petromyzon marinus) provided by the lamprey consortium which was sequenced to a total of 5.0X whole genome coverage. The gene set for lamprey was built using the Ensembl genebuild pipeline. New translated BLAT whole genome pairwise alignments against the zebrafish, the stickleback, Ciona intestinalis and the human genome are now available for lamprey. Protein trees now include genes from the lamprey (10,079 genes) and with the inclusion of the lamprey, 849 more trees have a root older than the last common ancestor of bony vertebrates.

We now have new phenotype views where one can view genes associated with diseases and phenotypes. The new phenotype page can be accessed via the gene tab. Associated genes and variations to a phenotype can also be displayed on a karyotype. The associated colour key corresponds to the p-value of the association between the variation and the phenotype.

In order to make turning on data tracks easier, a number of changes have been made to the configuration panel in the region in detail page (accessed via the “Configure this page” button), including a new menu structure with grouping for similar track types. Configuration for regulatory evidence is now accessible via two links in the Regulation section of the menu for the configuration panel – “Open chromatin & TFBS” and “Histones & polymerases”.

The Tasmanian devil (Sarcophilus harrisii) 7.0 assembly, provided by Illumina and the Wellcome Trust Sanger Institute, has been added as a new species to Ensembl for release 64.  RNASeq data was used in the genebuild and can be found in the otherfeatures database. More detailed information on the genebuild can be found here.

Check out our improved FAQ’s. These have been reorganized into categories.

Confused about browser navigation? Why not try our new elearning course!

More details on some of these changes will be posted soon, so keep an eye on our blog!

More information also available on the Ensembl website.

We are glad to announce the launch of our latest installment.

Ensembl Release 63 (e63) includes a new high-coverage assembly for microbat (Myotis lucifugus), the most recent human and zebrafish manual gene annotations from Havana, and a fresh update of mouse variation data, among numerous other additions. The previous Ensembl release is archived at e62.ensembl.org.

Tracks on Region in detail and Region overview pages can now be reordered by dragging them to a new position on the image. The strand of the track can still be identified by a colour and a text message when passing the mouse over the track bar.

 

The popular Variant Effect Predictor (VEP) tool has been updated in e63, including speed improvements and a renewed support for variants that fall in regulatory regions.

 

Pie charts have been added to the human variation pages for the 1000 Genomes population allele frequencies.


A new configuration table facilitates the exploration of regulatory data, including the capacity to search for specific markers of interest.  To access this functionality, click on ‘Configure this page’ while on a Location View or a Regulatory Region View and select ‘Regulatory Evidence’.

 

A new microbat genome assembly brings it from low to high-coverage. A new genebuild has been performed on this assembly using the Ensembl gene annotation pipeline.

 

Users of our Perl API will certainly enjoy the new Doxygen-based API documentation, with an improved user interface, better support for object-oriented programming and a comprehensive search tool. There is also an updated Regulation API tutorial to help users access regulatory data programmatically.

More details on some of these changes will be posted soon, so keep an eye on our blog!

More information also available on the Ensembl website.

Chimp Pre!There is a new Pre! site for version 2.1.3 of the chimpanzee genome assembly (known as Pan_troglodytes-2.1.3). This assembly will remain as a Pre! site while we complete the genebuild for the most recent assembly:  Pan_troglodytes-2.1.4. These two assemblies are identical except for the Y chromosome which has been updated for Pan_troglodytes-2.1.4.

The Pre! site contains over 56,000 Genscan predictions  and 1,517 gene models based on chimpanzee proteins. In addition we are displaying Exonerate alignments for chimp cDNAs and ESTs as well as cDNAs and Ensembl peptides from e!62  human.

The Pre! Ensembl site has been updated to run version 62 of the Ensembl APIs and web code. This means that many of the new web features that have been added to the main Ensembl site over the past few releases are now available on the Pre! Ensembl site, including the Variant Effect Predictor (VEP), favourite tracks and the ability to attach BAM, BigWig and VCF data files.

The full gene build of the previous Pan_troglodytes-2.1 assembly can be found here.

Variation consequence types, such as “intronic” or “non-synonymous”, describe the variation location or effect of a variation on a transcript. For the latest version of Ensembl (release 62) we have made some significant changes to the way in which we determine these consequence types, and we’d like to provide an overview of these improvements.

Firstly, we are now able to assign a specific effect to every allele of a variant. For example, rs12795274 has three alleles, the reference allele is T, and it also has two alternative alleles; C and A. The A is predicted to cause an amino acid change, while the C is synonymous. We now list the effect of each individual allele on the website and you also can fetch them separately when using the variation API

Another improvement we’ve made is that “under the hood” we now use terms defined in the Sequence Ontology (SO) to describe the consequence types. Moving to this set of externally maintained terms should make it easier to compare Ensembl annotations with those from other groups. The SO also groups the various terms we use into a hierarchical tree and, in the future, this will let users query for variants with particular effects in a much smarter way than is possible now.  On the website we are still using our old terms by default, but you can see the mapping between the old terms and the SO terms on the variation documentation page and you can use “Configure this page” on several variation views to choose which set of terms you want to see (here‘s an example).

We also now provide SIFT and PolyPhen predictions for any variant that is predicted to cause an amino acid substitution in human. These are popular tools developed by external groups that try to predict the effect of a non-synonymous mutation on the function of the protein. You can see these predictions on several variation views, a useful example is the protein variation view. You can find more information about these tools and how we run them in Ensembl on the variation documentation page.

CropperCapture[1402]

All of these improvements are also available for you to use to analyse your own data using the Variant Effect Predictor (VEP). The VEP has new configuration options that allow you to choose which set of terms you want to use for the consequence annotations, and also offers options to fetch SIFT and PolyPhen predictions for any missense mutations in your data. We are able to provide these predictions for novel mutations by computing the predictions from SIFT and PolyPhen for all possible amino acid substitutions in human proteins and storing these in the variation database. We hope that this makes the VEP even more useful for mining your data and we have plans to add support for these sort of tools in other species in the near future.

We have just updated the Ensembl genome browser and underlying databases to version 62.  We would like to share some new features with our user community.

The new Ensembl release hosts a new species- the white-cheeked gibbon (Nomascus leucogenys).  A new genebuild has been performed using the Ensembl gene annotation pipeline, incorporating both gibbon and human sequences to determine the gibbon gene set.   Compara has also incorporated the gibbon genes and genome into comparative genomics analyses, as is usual for new genebuilds.  Gibbon can now be found in gene trees, a pairwise whole genome alignment with human , and a 35-way multi-species alignment.  View these alignments in views like this one.

BigWig files are now supported through attachment of a url to Ensembl.  Click on ‘Manage your data’ at the left of a location page, and select ‘Attach Remote File’ from the new menu.

Not sure what Ensembl has to offer, or how to use our resources?  Now whenever you search for a term, hits to Help and Documentation will come up.  These may be to page-specific help, FAQs or the glossary, depending on the term.  As always, we hope our users make requests- if you can’t find what you’re looking for, let our helpdesk know.

Finally, SIFT and PolyPhen predictions are available in human variation pages, and in the popular Variant Effect Predictor (for human).  A more detailed post on these variation analyses will be coming soon, so keep your eye on the blog.

More features like the new comparative genomics navigation menu have been released, so explore and let us know what you think.  More news is available on our website.

I’m going to be blogging a bit more about the recent Ensembl 61 release and the Ensembl Genomes 8 release – lots and lots of goodies in both these releases – web site tweaks (some of the them totally critical for generating good displays), the new “favourite tracks” feature, and impressive content changes.

I’ll start today on content changes, and in Ensembl Genomes 8 there are some important genome additions. Some come from Paul Kersey’s new collaboration with PhytoPathDB – more on that in a later post – but top of my excitement has been the diversity in metazoa. The Ensembl Metazoa team has added Sea Urchin, Sea Anemone, the rather weird primitive animal, Trichoplax adhaerens (also called the “carpet” organism) and the blood fluke, Schistosoma mansoni. The motivation of bringing these organisms in is to broaden our phylogenetic tree and comparisons we can provide across all of life. So for example for the drosophila Twist Gene one can now see the deep tree for this across metazoa. For example, there is a deep ortholog to Trichoplax which seems to predate the split of some of these Helix Loop Helix proteins, whereas there are other members of the family which have a paralog in Trichoplax meaning that there seems a fundamental split in this developmentally key transcription factor. This is just one of many interesting gene trees that one can look at using this resource…

Happy browsing/data mining!