The new Ensembl release includes a new view for SNPs and other genomic variations. It shows the alignment of the polymorphic position together with 10 base pairs of sequence up- and downstream. The user can choose among all available multiple alignments. Polymorphic positions in the other species are also shown.

This is very useful for looking at ancestral alleles, especially in combination with our EPO alignments as they include the inferred ancestral sequence. Although dbSNP provide predicted ancestral alleles for human SNPs, these are based on the chimp sequence only. In several cases, the ancestral sequence inferred from the multiple alignment is in disagreement with the chimp sequence like in this example. Using multiple alignments gives better results and more confidence to the calls.

The Ensembl project is pleased to announce release 54 of Ensembl. Highlights of this release are:

  • New Zv8 zebrafish assembly;
  • Comparative alignment text displays for variations and regions;
  • Ability to add personal notes to any Gene or Transcript.
For more information visit:

 

Along side this release we are also releasing a new version of the pre site. This now includes:

 

Today the long-awaited Ensembl Genomes went live! This is a ‘sister project’ focusing on those species that aren’t part of Ensembl, i.e. non-vertebrates. Please have a look at what the Ensembl Genomes team have to say about it themselves:

“We are delighted to announce the forthcoming release of Ensembl Bacteria, Ensembl Protists and Ensembl Metazoa, the first sites to be launched as part of the EBI’s “Ensembl Genomes” project to extend the use of the Ensembl browser to non-vertebrate genomes.

These following site are available:

http://bacteria.ensembl.org
http://protists.ensembl.org
http://metazoa.ensembl.org

Additional sites for fungi and plants are in development and will be launched during the summer of this year.

In the Ensembl Genomes project, we are aiming to do two things: firstly to work with particular communities to support the bioinformatic analysis of genome-scale data; and secondly, to provide an integrative portal to data from species of scientific interest from across the taxonomic space. In pursuit of both these aims, we will re-use and extend the proven Ensembl software system, that has been developed by EBI and the Wellcome Trust Sanger Institute in the context of vertebrate genomics.

As with Ensembl, Ensembl Genomes will provide access to DNA and protein sequence, positional and functional annotation of protein-coding and non-protein coding genes, repeat analysis and other features and statistics. An interesting feature made available with the release of Ensembl Genomes is the inclusion of a multi-way comparative genomic analysis performed using a selection of species from bacteria to humans, and the production of gene trees showing the inferred ancestral relationships within deeply conserved protein families. Comparative resources are also provided at a narrower level (for example, DNA and protein-based analyses of individual bacterial clades). In partnership with collaborators, we are working on capturing gene expression, and population-scale variation data, in a number of contexts. More generally, we anticipate the ongoing enrichment of these resources through the integration of increasing quantities of high throughput data now becoming routinely available for all species.

Ensembl Genomes will provide access to data through the usual routes supported for vertebrate data; web-based browser, FTP site, programmatic API, DAS, and BioMart-style data warehouse; as well as text and sequence-based search.

We look forward to working with you as future producers and consumers of data. More information about the project is available at http://www.ensemblgenomes.org. We will be happy to receive any feedback you might wish to offer us at helpdesk@ensemblgenomes.org.”


Ensembl just updated the live site and underlying databases to
version 53.

Some new features include ‘Active Tracks’ and a searchable ‘Configure this page’!

Go to any region of the chromosome.

Click ‘Configure this page’ at the left.

‘Active tracks’ allows you to see (and deselect) all tracks that are turned on.

‘Search display’ allows you to search for tracks in the menus. In this example, we searched for UniProt. Tracks from different menus appear.

For more updates, including new species, variations, and Amazon Web Services, see the news.

We are already working on our next release (out late in April 2009) which will come with the following:

Data

Zebrafish
We will be releasing a new genebuild for zebrafish (with updated repeat masking) based on the latest assembly Zv8. Thus, we’ll have a new gene set (with new probeset mappings).

Horse
A gene patch (fixing split genes) based on human/mouse 1:1 orthologues. Therefore we have a new gene set.

Human

  • cDNA update
  • New ensembl-vega merge delivering a “new gene set”.

Mouse

  • cDNA update
  • New ensembl-vega comparison, delivering a “new gene set” .

New gene sets (ncRNA genes) for several low coverage genomes:
Sloth (Choloepus hoffmanni), armadillo (Dasypus novemcinctus), kangaroo rat (Dipodomys ordii), elephant (Loxodonta africana), hyrax (Procavia capensis), megabat (Pteropus vampyrus), tarsier (Tarsius syrichta), dolphin (Tursiops truncatus) and alpaca (Vicugna pacos).

Mart

  • New functional genomics mart

Core
Minor schema changes

  • cDNA update
  • Update versions (patch_53_54_a.sql)
  • Increase size of oligo_probe.name (patch_53_54_b.sql)
  • Increase size of external_db.db_name (patch_53_54_c.sql)
  • Move analysis_id from identity_xref to object_xref (patch_53_54_d.sql)
  • Increase size of analysis.logic_name (patch_53_54_e.sql)


Variation and Functional Genomics

  • Schema change to source table to add description column for web display
  • Updated zebafish database
  • Import Illumina data whenever available
  • Recalculate consequence type for mouse regulatory feature
  • eFG array mapping: Human, Mouse, Rat, Drosophila
  • Affymetrix (UTR/IVT + ST), Illumina (WG)

New mouse DNAse data to support the first Mouse RegulatoryBuild

Code Other

  • Amazon EC2 public datasets updated
  • New GO database (ensembl_ontology_54) and API
  • Changing default behaviour of TranscriptAdaptor
  • Translation attribs modified
  • Remove entries with spaces from species.classification
  • Gene name and xref projections


Pairwise alignments

Update the pairwise alignments for zebrafish (Danio rerio):

  • human-zebrafish translated BLAT-NET
  • mouse-zebrafish translated BLAT-NET
  • rat-zebrafish translated BLAT-NET
  • chicken-zebrafish translated BLAT-NET
  • frog-zebrafish translated BLAT-NET
  • tetraodon-zebrafish translated BLAT-NET
  • fugu-zebrafish translated BLAT-NET
  • medaka-zebrafish translated BLAT-NET
  • stickleback-zebrafish translated BLAT-NET
  • Ciona savignyi-zebrafish translated BLAT-NET
  • Ciona intestinalis-zebrafish translated BLAT-NET

Add new alignments for medaka:

  • human-medaka BLASTZ-NET (imported from UCSC)
  • mouse-medaka BLASTZ-NET (imported from UCSC)


The following files will be available for download:

  • EMF dumps for GeneTrees
  • EMF dumps for EPO and PECAN multiple alignments
  • BED files for 31 way GERP constrained elements
  • BED files for 12 way GERP constrained elements

Homologies and families

  • 49-way GeneTrees and Homologies, with new/updated gene sets and assemblies.
  • Multiple Sequence Alignments with consistency-based MCoffee
  • Meta-aligner (mafftgins+muscle+kalign+probcons).
  • Pairwise gene-based dN/dS calculations for high coverage species pairs.
  • Updated MCL families including all Ensembl AS isoforms and latest UniProt Metazoa.
  • Multiple Sequence Alignments with MAFFT

We are already working on our next release (out late in February 2009) which will come with the following:

Data

  • New species added to our set: sloth (Choloepus hoffmanni), Anolis lizard (Anolis carolinensis) and zebrafinch (Taeniopygia guttata).
  • Updated marker information for human, cow, dog, horse, chicken, macaque, mouse and Medaka.
  • Updated manual annotation for mouse from VEGA.

Comparative Genomics

  • Pairwise alignments with the new species (human/sloth, zebrafinch/chicken, lizard/chicken).
  • New 31-way eutherian mammal alignment using these 2x genomes (based on the 9-way Enredo-Pecan-Ortheus multiple alignments): elephant (Loxondonta africana) , armadillo (Dasypus novemcinctus), tenrec (Echinops telfairi), rabbit (Oryctolagus cuniculus), guinea pig (Cavia porcelus), hedgehog (Erinaceus europaeus), shrew (Sorex araneus), microbat (Myotis lucifugus), tree shrew (Tupaia belangeri), squirrel (Spermophilus tridecemlineatus), bushbaby (Otolemur garnetii), pika (Ochotona princeps), mouse lemur (Microcebus murinus), cat (Felis catus), megabat (Pteropus vampyrus), dolphin (Tursiops truncatus), alpaca (Vicugna pacos), kangaroo rat (Dipodomys ordii), hyrax (Procavia capensis), tarsier (Tarsius syrichta), gorilla (Gorilla gorilla) and sloth (Choloepus hoffmanni).
  • The current clustering will be replaced by a hierarchical clustering sparse graphs (hcluster) for our trees.

Variation and Functional Genomics

  • An improved array mapping environment integrates genomic and cDNA mappings, supporting multi-species databases.
  • We’ll link to Genome Wide Association from the NHGRI catalogue (Hindorff et al.)
  • Genotype data for mouse (reference strain C57BL/6) will be included.
  • Update of variation for dog, chicken and platypus.

Other

Hot on the heels of release 51 comes release 52 of Ensembl – the first revision of the new webcode… So what’s new?

Data:

Web site:

  • Updated export: – Restored most of the functionality with the new Export wizard on Genes, Transcripts and Locations – to allow export of FASTA, EMBL, Genbank, GFF, TSV, Vista and PIP files.
  • Image export: – Restored an improved version of the image export functionality – all “Horizontal” generated images have and [Export image] button to allow the image to be exported in vector format (PDF, SVG, EPS) and scaled bitmap format (PNG x0.5, x1, x2, x5 and x10) to allow publication quality images to be exported.

    The vector formats PDF, SVG and EPS can all be imported into vector image editors to be manipulated as well.

The web team can finally let out a quick sigh of relief now that the long awaited new web code has finally emerged kicking and screaming out of the web team office…

It is obvious to see the “cosmetic” changes to the site:

  • the colours,
  • fonts,
  • layout,
  • the unified configuration
  • the reduction in page sizes.

On top of this there have been a large number of underlying technical improvements to the way the pages are put together.

  • Streamlining the JavaScript and css to make sure that the transfers to and from the server to your browser are as fast as possible; Using unobtrusive JavaScript throughout the new code so pages work with or without JavaScript or AJAX – althouth they are not quite as functional they still work!
  • Making the pages standards compliant to make them render in most browsers without issues (unless of course that browser is IE and there are lots of places where the “standards” approach fails)
  • Using an fast in memory cache (a modified version of memcached which allows for the use of tags) to reduce the load on our user database and to store and server temporary images, processed HTML etc.
  • Segregation of code into more modules to reduce the size of the very large modules we had (noticeably the breakdown of the Component modules into smaller chunks)
  • Configuration meta information contained in core databases making the site easier and more automatic to set up.
  • Optimisation of drawing and configuration code.
  • Transparent use of AJAX in many cases. Use of Perl’s LWP::ParallelUserAgent where the user’s browser doesn’t support AJAX.
  • Further areas where the extensible plugin system is available – defining colours, configuring images.

New design
You will already have seen a number of emails about the upcoming Ensembl 51 release – the web team are working hard to tidy up the loose ends of the release! We have got most of the major views ready, and just working on some of the views you may have never found before. As a taster I’m posting a few screen shots from our development site, the first shows the new page layout for graphical display of genomic regions (the old contigview). You will see many of the new design decisions in this screen shot:

  • There are more views per object as we have broken up the large single pages into smaller components;
  • Tabs for the different focus objects – in this case Gene and Location. Transcript and Variation feature are the other tabs available;
  • A tree of all information available about the focus feature on the left hand side;
  • Left/right pagination buttons to allow you to navigate between all the information we have about the focus object.
  • “General” and “local” tools areas

Under the hood!

There have been a large number of changes under the hood of the web-site. Notable changes have been:

  • Use of modified version of memcached to store and retrieve cached images, static and dynamic content, user settings;
  • Re-writing the configuration code to automagically detect the contents of the databases and try and display the content appropriately;
  • Breaking up of the component code into separate modules;
  • Removing the need for a script per view – by using “routeing” style URL parsing to work out what objects are to be rendered and how… e.g. /Gene/Compara_Tree/Text displays the text version of a gene’s homology tree.
  • More and easier to configure renderers for drawing code.
  • A strive for standards compliance in both XHTML and CSS; which should allow us to support more easily modern web browsers. We will be actively supporting Firefox 3+, Internet Explorer 7+ and Safari 3+ (and other similar browsers), while trying to make sure that the site is still workable in other browsers (at the site appears to work in Opera 9.25+)

New configuration panel

All configuration of the site and individual views has been moved to a common “Configuration dialog” box.

  • The old “yellow menus” are replaced by a more expansive and easier to navigate tree of features. Important now there are nearly 200 individual tracks in the Human Location view page.
  • There are more choices to display some tracks – rather than just turning them on and off, you can decide how you wish them to be displayed.
  • Configuration for other pages are loaded in a similar way.
  • The site has a common site-wide image width setting.
  • The configuration panel is also where you will: manage your accounts, upload data, attach DAS and URL based data

Different renderers

For different data types we now support different renderers – not just collapsed and expanded.
For example:

  • For genomic alignments we support, the ungrouped features (all on one line), normal grouped and bumped features at both full and half-height, and now also “stacked” features – “2 pixel” high glyphs.

We hope when you see the new interface that you will find it more intuitive, more discoverable and faster to use and most importantly more productive for the research work that you are doing.

Steve posted the news that we’re delaying our new release for at least two more weeks. The message is pasted in here:

Hi all

In our Intentions Summary mail for release 51 we stated that the release was scheduled for early/mid September. The 51 release will include significant updates and improvements to the web interface. We are delaying release while we complete development on these. We are working to get the release out as soon as possible, and are now aiming for end September/early October. I apologise for this delay.

Steve

 

Dr Steve Searle
Ensembl Project Leader, Sanger

It is always so frustrating to delay, but of course, far more important to have a working site than something only part working. Welcome to delivering high end services.

We took on alot of things to change in this web refresh. For most users the main thing people will notice is the entirely new web layout. This was driven by our surveys of users who mainly complained about being buried in too many displays and data. We then took around 4 months working with user groups and trialling different layouts (many thanks for those who participated) which in some cases made significant changes to our original designs (we now have a hybrid “tab and left-hand-side” approach, voted as best by ~60% of people, with the other three options splitting the rest of vote). We’re very excited about this new layout going live as it just looks cleaner, less cluttered and yet providing more information. The other thing people will notice is that it is just faster. As the saying goes, you can’t be too rich, too thin or have your websites go too fast.

Making a website go faster is harder than it might look. It involves all sorts of things – the bandwidth of your machines to us, the speed the servers, the connectivity of servers to databases, the speed of the API, the database to disk, the management of the huge number of simultaneous users we have and then the size of the html returned and finally the render speed on your browser. All of these contribute to the overall perception of “speed”. Under the hood we’ve been working on all these aspects – internally a big change is that we have switched from needing a common file system for our web farm to work off. Previously when your browser asks for a contigview page, our servers generates html with an image and that image is written to the common disk, the browser parses the image tag, asks for this image – and this is the critical bit – sends a request which in all likelihood will be served by a different server in our webfarm. That server then went to the common file system to pick up the file and send it back. Many times a critical bottleneck has been read/write on this shared filesystem. In the new system this has all gone, and the images are stored in a memory-based common store, meaning both that we remove this bottle-neck (which will be the first big effect) and secondly we will be able to cache alot more – the hope is that many of the identical pictures for the common species will be entirely served from memory in the new system. Another important change has been aggressively sliming our html. Currently all sorts of files – often very small – are pinged by each page up, just to see if they have changed. We’ve consolidated alot of these files – and compressed them – and then also optimised them for render speed.

There is a variety of things not for this release but coming up end of 2008/early 2009 also on speed. Our API has a new concept, collections, which better handles the case of zoomed out views, where we know the renders will not be able to render every object. Instead a collection – which may be rendered as a union or density or something will be provided. The other thing on the horizon is us setting up a US mirror on the west coast. For the last year we have been extensively monitoring the speed of Ensembl from different sites, and there is a large increase in time to retrieve on the north-west coast of the US. We’ve been investigating quite why this (and learning lots more about the backbone of the internet than we knew before) but it seems as if the simplest way to getting speed to work in the west coast is to just run a mirror over there. Probably 2009 for that to go live.

Back to the website. It looks so much better – and has much better hardware characteristics – (our shared file system is … well … rather 2004 technology and needs pretty constant care at the moment) that I can’t wait until it comes out. But there is absolutely no point in having a crippled site in functionality even though we’ve got many of the user interface and technical issues right. The sticking point at the moment is the configuration panel. This comes up as “modal” box on top of the page, allowing alot of options to choose from, but not a bewildering set of options on each page. To cope with the 200 odd different tracks to switch on and off, the box has to have tabs and friendly, browseable hieriarchies. To get all this to work in a nice, friendly, slick way… that’s alot of Javascript.

And alot of Javascript is alot of browser compatible headaches. Even using JS libraries – prototype and scriptolicious (I think – James smith can tell you the details!) there are all sorts of details that might not work just-quite the same way on IE5 compared to IE6. Or Firefox. Or Safari. And it must degrade at least functionally without JS. And of course work, and render fast. This modal box is the last, complex thing to get sorted.

We’re close. I’ve seen the box come up over James’ screen. I hear Steve has seen it come and tracks change, and see the link of tracks to changes. The API for the configuration system was gutted and is much better. But its got to work on all main browsers. For all our genomes, in particular Human and Mouse. And this is just tricky, fiddly work.

We’re not quite there yet. We’re really close, and so much is working it is just excruitiating. But we need another couple of weeks. James is being shielded from other jobs by Steve and others; Eugene is torture testing memcachedb to stress test the system before it goes live; Xose, Bert and Guilietta are writing help; Beth and Anne are writing the additional pagelets inside of the new geneview and transcriptviews. and it all looks really good.

So – apologies – we thought we’d be launching in July. We thought we’d be launching in September. We still might just do that, but then again, it might well be October. If it goes any later I will have no hair.

But it does look really good.

It is definitely worth the wait. Like Guinness.

Ewan