We are currently working on our next release which is due at the end of June 2009 and will contain the following:

Data

Human GRCh 37
We will be releasing a new genebuild for human based on the latest assembly GRCh37 from the Genome Reference Consortium. A preliminary version of this assembly is available now in Ensembl Pre! Due to the new assembly we will have:

  • Updated repeat masking
  • New probeset mappings
  • cDNA update
  • A new ensembl-vega merge delivering a new gene set
Wallaby
Ensembl 55 includes the 2X genome for Tammar Wallaby (Macropus eugenii), this will be a projection build similar to our other 2X species.

C. elegans
We will also include an import of the WormBase release WS200 database for C. elegans.

Anole lizard – A gene patch incorporating the gene set provided by Chris Ponting at Oxford University means that we have a new gene set for the green anole lizard (Anolis Carolinensis).

Mouse – The mouse cDNA alignments have been updated.

Zebrafinch – There will be an updated gene set for the 6X zebra finch genome.

Zebrafish – Non-coding RNAs will be added to the Zv8 zebrafish assembly and there will also be some changes to protein coding gene models and new repeats and expression patterns.

Core

Schema Changes

  • Patch to update versions (patch_54_55_a.sql). * Add the missing types to go_xref (patch_54_55_b.sql).
  • Add new table dependent_xref (will hold the dependencys for the xrefs, i.e. if an EMBL entry come from a uniprot entry this relationship will be in the table)( patch_54_55_d.sql).
  • Add new tables for alternative splicing/transcript events (patch_54_55_c.sql).
  • Add new column ‘is_constitutive’ to the exon table (patch_54_55_e.sql)

Xrefs
Xrefs will be run for Human, Macacca, Opossum, Chimp, Chicken, Dog and Mouse (including Fantom Xrefs).

Ontology database schema and tools
The ensembl_go_NN databases are no longer being built. Instead we are replacing this with the ensembl_ontology_NN database which may be connected to using the core API.

Assembly mapping
Some of the databases will contain mapping coordinates between current and previous assemblies:

  • human: mapping from current GRCh37 to NCBI36, NCBI35 and NCBI34
  • mouse: mapping from current NCBIM37 to NCBIM36, NCBIM35 and NCBIM34
Other changes
  • API support for alternative transcripts/splicing events will be added
  • API support for constitutive exons will be added
  • Deprecated API modules will be removed
  • All slices will be created using the new_fast method from the SliceAdaptor to improve performance
  • seq_region seq edit support will be added. Seq_edits can already be stored and retrieved but these were not used in getting the sequence data. This will be changed so that “_rna_edit” attributes in the seq_region_attrib table will be used and the sequence changed.
  • MySQL and FASTA dumps will be copied to Amazon Public Datasets project
  • Gene name and xref projections

Mart
  • New functional genomics mart * A new Probe section added to Ensembl mart
  • New ontology mart
  • Constitutive exon information will be re-added to Ensembl mart

Variation
  • There will be a new human variation database generated by mapping NCBI36 coordinates to GRCh37 (using dbSNP 129)
  • Illumina array data for SNP/CNV is to be added
  • Transcript variations for Zebrafish and Zebrafinch will be reculated to include information from the new gene sets
  • Schema change – added a call to get consequence_type
Functional genomics
  • Human Regulatory Build will be updated using the GRCh37 assembly
  • Probe alignment and transcript annotation for all species will migrate from the core datbases to the functional genomics databases, this includes Affymetrix, Illumina, Codelink and Phalanx
  • Schema change, an is_current filed is to be added to the coord_system table
Comparative genomics

Alignments – The new human assembly means that the following alignments will be regenerated:

  • 9 eutherian mammals EPO multiple alignments
  • 31 eutherian mammals EPO multiple alignments
  • 12 amniota vertbrates Pecan multiple alignments
  • 4 catarrhini primate EPO multiple alignments
  • Pairwise BLASTZ-NET alignments of human against each of the other 9 and 31 eutherian mammals
  • Additional pairwise BLASTZ-NET alignments will be run for human-opossum, human-platypus, human- chicken and human-wallaby
  • Translated BLAT-NET will be regenerated for human against fugu, X.tropicalis, C.intestinalis, C.savignyi, stickleback, medaka, chicken, zebrafish, tetraodon, zebrafinch and anole lizard

Synteny will be recalculated for: rat vs. huamn, chicken vs. human and human vs. macaque, dog, chimpanzee, platypus, opossum, mouse, orangutan, horse and cow

Homologies amd families

  • 50 way GeneTrees and homologies with new/updated genebuilds and assemblies
  • Clustering using hcluster_sg
  • Multiple Sequence Alignments using consistency-based MCoffee meta-aligner (mafftgins + muscle + kalign + probcons) and new exon-skipping aware “skipper” algorithm.
  • New ‘putative gene split’ and ‘distant paralog’ homology types
  • Pairwise gene-based dN/dS calculations for high coverage species pairs
  • Updated MCL families including all Ensembl transcript isoforms and newest Uniprot Metazoa
  • Multiple sequence alignments with MAFFT
  • Stable IDs for GeneTrees (ENSGT00550NNNNNNNNN) and MCL Families (ENSFM00550NNNNNNNNN).



The new Ensembl release includes a new view for SNPs and other genomic variations. It shows the alignment of the polymorphic position together with 10 base pairs of sequence up- and downstream. The user can choose among all available multiple alignments. Polymorphic positions in the other species are also shown.

This is very useful for looking at ancestral alleles, especially in combination with our EPO alignments as they include the inferred ancestral sequence. Although dbSNP provide predicted ancestral alleles for human SNPs, these are based on the chimp sequence only. In several cases, the ancestral sequence inferred from the multiple alignment is in disagreement with the chimp sequence like in this example. Using multiple alignments gives better results and more confidence to the calls.

The Ensembl project is pleased to announce release 54 of Ensembl. Highlights of this release are:

  • New Zv8 zebrafish assembly;
  • Comparative alignment text displays for variations and regions;
  • Ability to add personal notes to any Gene or Transcript.
For more information visit:

 

Along side this release we are also releasing a new version of the pre site. This now includes:

  • The GRCh37 human assembly released in February 2009, with preliminary analyses included;
  • The callJacc3 marmoset assembly.

 

Today the long-awaited Ensembl Genomes went live! This is a ‘sister project’ focusing on those species that aren’t part of Ensembl, i.e. non-vertebrates. Please have a look at what the Ensembl Genomes team have to say about it themselves:

“We are delighted to announce the forthcoming release of Ensembl Bacteria, Ensembl Protists and Ensembl Metazoa, the first sites to be launched as part of the EBI’s “Ensembl Genomes” project to extend the use of the Ensembl browser to non-vertebrate genomes.

These following site are available:

http://bacteria.ensembl.org
http://protists.ensembl.org
http://metazoa.ensembl.org

Additional sites for fungi and plants are in development and will be launched during the summer of this year.

In the Ensembl Genomes project, we are aiming to do two things: firstly to work with particular communities to support the bioinformatic analysis of genome-scale data; and secondly, to provide an integrative portal to data from species of scientific interest from across the taxonomic space. In pursuit of both these aims, we will re-use and extend the proven Ensembl software system, that has been developed by EBI and the Wellcome Trust Sanger Institute in the context of vertebrate genomics.

As with Ensembl, Ensembl Genomes will provide access to DNA and protein sequence, positional and functional annotation of protein-coding and non-protein coding genes, repeat analysis and other features and statistics. An interesting feature made available with the release of Ensembl Genomes is the inclusion of a multi-way comparative genomic analysis performed using a selection of species from bacteria to humans, and the production of gene trees showing the inferred ancestral relationships within deeply conserved protein families. Comparative resources are also provided at a narrower level (for example, DNA and protein-based analyses of individual bacterial clades). In partnership with collaborators, we are working on capturing gene expression, and population-scale variation data, in a number of contexts. More generally, we anticipate the ongoing enrichment of these resources through the integration of increasing quantities of high throughput data now becoming routinely available for all species.

Ensembl Genomes will provide access to data through the usual routes supported for vertebrate data; web-based browser, FTP site, programmatic API, DAS, and BioMart-style data warehouse; as well as text and sequence-based search.

We look forward to working with you as future producers and consumers of data. More information about the project is available at http://www.ensemblgenomes.org. We will be happy to receive any feedback you might wish to offer us at helpdesk@ensemblgenomes.org.”


Ensembl just updated the live site and underlying databases to
version 53.

Some new features include ‘Active Tracks’ and a searchable ‘Configure this page’!

Go to any region of the chromosome.

Click ‘Configure this page’ at the left.

‘Active tracks’ allows you to see (and deselect) all tracks that are turned on.

‘Search display’ allows you to search for tracks in the menus. In this example, we searched for UniProt. Tracks from different menus appear.

For more updates, including new species, variations, and Amazon Web Services, see the news.

We are already working on our next release (out late in April 2009) which will come with the following:

Data

Zebrafish
We will be releasing a new genebuild for zebrafish (with updated repeat masking) based on the latest assembly Zv8. Thus, we’ll have a new gene set (with new probeset mappings).

Horse
A gene patch (fixing split genes) based on human/mouse 1:1 orthologues. Therefore we have a new gene set.

Human

  • cDNA update
  • New ensembl-vega merge delivering a “new gene set”.

Mouse

  • cDNA update
  • New ensembl-vega comparison, delivering a “new gene set” .

New gene sets (ncRNA genes) for several low coverage genomes:
Sloth (Choloepus hoffmanni), armadillo (Dasypus novemcinctus), kangaroo rat (Dipodomys ordii), elephant (Loxodonta africana), hyrax (Procavia capensis), megabat (Pteropus vampyrus), tarsier (Tarsius syrichta), dolphin (Tursiops truncatus) and alpaca (Vicugna pacos).

Mart

  • New functional genomics mart

Core
Minor schema changes

  • cDNA update
  • Update versions (patch_53_54_a.sql)
  • Increase size of oligo_probe.name (patch_53_54_b.sql)
  • Increase size of external_db.db_name (patch_53_54_c.sql)
  • Move analysis_id from identity_xref to object_xref (patch_53_54_d.sql)
  • Increase size of analysis.logic_name (patch_53_54_e.sql)


Variation and Functional Genomics

  • Schema change to source table to add description column for web display
  • Updated zebafish database
  • Import Illumina data whenever available
  • Recalculate consequence type for mouse regulatory feature
  • eFG array mapping: Human, Mouse, Rat, Drosophila
  • Affymetrix (UTR/IVT + ST), Illumina (WG)

New mouse DNAse data to support the first Mouse RegulatoryBuild

Code Other

  • Amazon EC2 public datasets updated
  • New GO database (ensembl_ontology_54) and API
  • Changing default behaviour of TranscriptAdaptor
  • Translation attribs modified
  • Remove entries with spaces from species.classification
  • Gene name and xref projections


Pairwise alignments

Update the pairwise alignments for zebrafish (Danio rerio):

  • human-zebrafish translated BLAT-NET
  • mouse-zebrafish translated BLAT-NET
  • rat-zebrafish translated BLAT-NET
  • chicken-zebrafish translated BLAT-NET
  • frog-zebrafish translated BLAT-NET
  • tetraodon-zebrafish translated BLAT-NET
  • fugu-zebrafish translated BLAT-NET
  • medaka-zebrafish translated BLAT-NET
  • stickleback-zebrafish translated BLAT-NET
  • Ciona savignyi-zebrafish translated BLAT-NET
  • Ciona intestinalis-zebrafish translated BLAT-NET

Add new alignments for medaka:

  • human-medaka BLASTZ-NET (imported from UCSC)
  • mouse-medaka BLASTZ-NET (imported from UCSC)


The following files will be available for download:

  • EMF dumps for GeneTrees
  • EMF dumps for EPO and PECAN multiple alignments
  • BED files for 31 way GERP constrained elements
  • BED files for 12 way GERP constrained elements

Homologies and families

  • 49-way GeneTrees and Homologies, with new/updated gene sets and assemblies.
  • Multiple Sequence Alignments with consistency-based MCoffee
  • Meta-aligner (mafftgins+muscle+kalign+probcons).
  • Pairwise gene-based dN/dS calculations for high coverage species pairs.
  • Updated MCL families including all Ensembl AS isoforms and latest UniProt Metazoa.
  • Multiple Sequence Alignments with MAFFT

We are already working on our next release (out late in February 2009) which will come with the following:

Data

  • New species added to our set: sloth (Choloepus hoffmanni), Anolis lizard (Anolis carolinensis) and zebrafinch (Taeniopygia guttata).
  • Updated marker information for human, cow, dog, horse, chicken, macaque, mouse and Medaka.
  • Updated manual annotation for mouse from VEGA.

Comparative Genomics

  • Pairwise alignments with the new species (human/sloth, zebrafinch/chicken, lizard/chicken).
  • New 31-way eutherian mammal alignment using these 2x genomes (based on the 9-way Enredo-Pecan-Ortheus multiple alignments): elephant (Loxondonta africana) , armadillo (Dasypus novemcinctus), tenrec (Echinops telfairi), rabbit (Oryctolagus cuniculus), guinea pig (Cavia porcelus), hedgehog (Erinaceus europaeus), shrew (Sorex araneus), microbat (Myotis lucifugus), tree shrew (Tupaia belangeri), squirrel (Spermophilus tridecemlineatus), bushbaby (Otolemur garnetii), pika (Ochotona princeps), mouse lemur (Microcebus murinus), cat (Felis catus), megabat (Pteropus vampyrus), dolphin (Tursiops truncatus), alpaca (Vicugna pacos), kangaroo rat (Dipodomys ordii), hyrax (Procavia capensis), tarsier (Tarsius syrichta), gorilla (Gorilla gorilla) and sloth (Choloepus hoffmanni).
  • The current clustering will be replaced by a hierarchical clustering sparse graphs (hcluster) for our trees.

Variation and Functional Genomics

  • An improved array mapping environment integrates genomic and cDNA mappings, supporting multi-species databases.
  • We’ll link to Genome Wide Association from the NHGRI catalogue (Hindorff et al.)
  • Genotype data for mouse (reference strain C57BL/6) will be included.
  • Update of variation for dog, chicken and platypus.

Other

Hot on the heels of release 51 comes release 52 of Ensembl – the first revision of the new webcode… So what’s new?

Data:

Web site:

  • Updated export: – Restored most of the functionality with the new Export wizard on Genes, Transcripts and Locations – to allow export of FASTA, EMBL, Genbank, GFF, TSV, Vista and PIP files.
  • Image export: – Restored an improved version of the image export functionality – all “Horizontal” generated images have and [Export image] button to allow the image to be exported in vector format (PDF, SVG, EPS) and scaled bitmap format (PNG x0.5, x1, x2, x5 and x10) to allow publication quality images to be exported.

    The vector formats PDF, SVG and EPS can all be imported into vector image editors to be manipulated as well.

The web team can finally let out a quick sigh of relief now that the long awaited new web code has finally emerged kicking and screaming out of the web team office…

It is obvious to see the “cosmetic” changes to the site:

  • the colours,
  • fonts,
  • layout,
  • the unified configuration
  • the reduction in page sizes.

On top of this there have been a large number of underlying technical improvements to the way the pages are put together.

  • Streamlining the JavaScript and css to make sure that the transfers to and from the server to your browser are as fast as possible; Using unobtrusive JavaScript throughout the new code so pages work with or without JavaScript or AJAX – althouth they are not quite as functional they still work!
  • Making the pages standards compliant to make them render in most browsers without issues (unless of course that browser is IE and there are lots of places where the “standards” approach fails)
  • Using an fast in memory cache (a modified version of memcached which allows for the use of tags) to reduce the load on our user database and to store and server temporary images, processed HTML etc.
  • Segregation of code into more modules to reduce the size of the very large modules we had (noticeably the breakdown of the Component modules into smaller chunks)
  • Configuration meta information contained in core databases making the site easier and more automatic to set up.
  • Optimisation of drawing and configuration code.
  • Transparent use of AJAX in many cases. Use of Perl’s LWP::ParallelUserAgent where the user’s browser doesn’t support AJAX.
  • Further areas where the extensible plugin system is available – defining colours, configuring images.