As part of the EBI Roadshow training programme, Ensembl teamed up with ArrayExpress to run workshops for students, postdocs, and professors at ITESM, UNAM, and CIBNOR in these bioinformatic tools. The response was very positive. Feedback from 86 participants includes comments such as:

“I am an undergraduate student, and know little about bioinformatics. In the future, I will be able to use EBI as my primary resource.”

“It is a really good opportunity to now get all these tools, to help facilitate understanding and analysis of scientific data”

“An excellent course, and very useful tools!”

Ensembl and ArrayExpress were ranked by 99% of participants as being useful to their work. Not only are people made more aware of individual projects through these workshops, EBI resources are publicised. 31% of our participants were unaware of EBI resources before the workshop, which contrasts to 95% responding that after this workshop, they would most likely use EBI resources. 88% of participants would like more training in these resources and others; specifically mentioned were ontologies, proteomics, and genome sequencing as topics to learn more about. This reflects a need for bioinformatics courses in the life sciences in that part of the world, if not in all the world.

And finally, an after-effect of the workshops was to prove that there is a lot of interest in bioinformatics. This from our host at CIBNOR:

“CIBNOR is in a growing stage, we have a project for an Innovation and Technology Park and I am trying to convince people about the need for a Bioinformatics Unit. I am sure that things like this course will help us a lot.”

We greatly enjoyed training in Mexico, because of all the keen interest, energy, and the evenings on the sand dunes. We took the course into the field, discovering a pufferfish and spine on the beach, in honor of vertebrate genomes!

Ensembl announces the release of This Ensembl site is for users who still need access to the NCBI36 human assembly. It is actually a complete copy of the Ensembl 54 release which was the last Ensembl release containing NCBI36.

Although access was already possible through the Ensembl archive sites, the new site will provide better performance because it is running on separate hardware. Also provides Blast/Blat search support which the archives do not.

The main reason we have provided a dedicated site for NCBI36 is for two large projects (Encode and 1000 Genomes) which have some of their data aligned on this assembly. will only be up for as long as there is significant need for it. We will be reviewing usage in Spring 2010 and currently plan to remove the site by Summer 2010. After that time users will still be access the NCBI36 assembly via the archive sites, there just won’t be a dedicated site for it anymore.

In addition to Release 2 of Ensembl Genomes early this month, EBI-EMBL would also like to announce the new arrival of Ensembl Fungi (beta).

Release includes:

  • 2 yeast genomes: Saccharomyces cerevisiae and Schizosaccharomyces pombe.
  • 7 Aspergillus genomes A.clavatus, A.flavus, A.fumigatus, A.niger, A.oryzae, A.terreus and Neosartorya fischeri.

If you have any comments or feedback please do not hesitate to contact us at

This year we’ve invested in our own mirror – maintained by us – on the west coast of the US. This was mainly because assessing the web return time for our users showed a consistent additional 3 to 4 seconds if you were lucky enough to live out on the west coast (worse still if you are in Australia!). Although we did alot last year to improve the general response time of our web pages (for example, compressing our CSS and Javascript down to single files for the whole site, so these are only loaded once and then cach’ed locally), the Ensembl site delivers alot of dynamic content – and nothing but getting closer to the users can help this.

You can reach the site directly at or alternatively there is a little “world” icon on the top right of the page which switches to the star-and-stripes when you’re on the west coast. Having the mirror not only helps our users who are on the west coast but also provides resilience when our main site goes down. As we’re responsibile for provisioning it in-sync with our main site (its part of our release process) this mirror will stay current with the main site.

In some sense the mirror should be a low cost “per user” for us having the mirror – if users go to the mirror, it means less load on the main site, and so it’s really how we distribute the “web farm” that sits behind Ensembl geographically. However, there are overheads from hiring rack space in the US to making our own release cycle more complex. This means we will need to assess whether running a US mirror makes sense in the long term. Our instinct is yes, but we need hard data on this.

These things need time to pick up, but already we’d be interested in feedback on this – for US users, is this site faster for you – in particular for East coast people who we think are probably still best off on the main site. Does it change with time of day? For Pacific rim users – Japan, Singapore, Korea, Australia – is the west coast site snappier for you? We’ll be putting in place our own monitoring schemes, but user feedback is always good…

Ensembl is pleased to announce the release of its West Coast US mirror ( This is a full mirror of the current Ensembl 54 release. We are providing this mirror to improve performance for users in the US, particularly on the West coast. It includes full search, BioMart and BLAST support (BLAST searching is actually run at Sanger with results passed back to the mirror).

This mirror is managed directly by the Ensembl web team, and we will aim to update it along with the main site, to keep it current. Credit for gettting this mirror up goes to James Smith and Eugene Bragin from the web team, with support from the Sanger systems team, particularly Peter Clapham, John Nicholson and Dave Holland.

Future plans: We will improve the mirror in the near future by allowing users to switch between the main and mirror site. Currently, we do not suggest logging in to the mirror. All user data must be retrieved by the main site at the Wellcome Trust Genome Campus. Speed is optimal if login is not used, however this will be improved in the future.

Following a recent thread in our ensembl-dev mailing list, we can point our users to a recent post in the Gramene blog (a resources for grass genomes maintained at CSHL). This framework extends Ensembl with a data resource to browse several plant species: maize (Zea mays), rice (Oryza glaberrima and Oryza rufipogon), sorghum (Sorghum bicolor), the model organism Arabidopsis thaliana, grape (Vitis vinifera), and poplar (Populus trichocarpa); with comparative maps for additional species such as wheat (Triticum aestivum), barley (Hordeum vulgare) and oat (Avena sativa).

You can find some sample scripts to load an Ensembl species database from scratch, here.

Thanks to our colleagues at Gramene.

We hope you like the new Ensembl website – we have had quite a lot of feedback about the system, and are digesting this to see how and where we can make the site more easy to use.

Missing features

We know there are a number of features which were in the webcode prior to the revamped version 51 that we are working on.


  • AlignSliceView [target e!53]
  • MultiContigView [target e!54]
  • CytoDump [will be released in e!53 as part of the export module]
  • DotterView
  • HistoryView – "ID liftover" [target e!53/4]
  • AssemblyConverter – "location liftover" [target e!53/4]


  • Drawing code tracks, e.g. rat QTLs, protein co-ordinate based DAS tracks [target e!53]
  • User gene annotations [target e!54]

New developments

We have a number of new "web" developments in the pipeline – some of these are listed below:

  • Extended configuration panel – searching for tracks, show currently active etc [target e!53]
  • Extended configuration panel – re-ordering tracks etc [target e!53]
  • Extended configuration panel – further configuration options – colour, depth, more display options, label options [target e!54/5]
  • New BLAST/BLAT interface [target e!55/6]
  • Re-write of the vertical drawing code to allow high quality PDF/PS/SVG karyotype and chromosome images to be produced.
  • Further work on export – finer configuration of what to export, exporting in multi-regions, integration with "user data"

If you have clicked on the GeneTree link in Ensembl (for example, the gene tree for IL2), you may have noticed that we have a new way of displaying large GeneTrees. This time, if you have a large gene family with lots of genes that you want to look at, you won’t need to ask the Miami Dolphins to let you plug your laptop into their huge screen…

This new feature in EnsemblCompara is called collapsible subtrees and allows for more compact, summarized views of interesting gene families like PAX2/PAX5/PAX8:

If you check the legend at the bottom, you will see that “blue triangles” correspond to collapsed subtrees that have within-species paralogs of your gene. If you want to see all the within-species paralogs expanded, you can click on the option “View paralogs of current gene“. You can even set that as a default if you want in the “Configure this page” options.

Jalview is a great way to view protein alignments in the tree. And were is my Jalview link now? Click on any internal node (square) in the tree, and be able to visualize the alignment (or subalignment) with the new Jalview applet by clicking on the Jalview link. You have to have Java installed though, or the link won’t show. The two Jalview windows that pop up are one, the protein alignment and the other, the underlying TreeBeST tree. You can now use Jalview’s sorting feature to sort your sequences according to the tree with: Calculate->Sort->By Tree Order->URL. Having the tree associated to the alignment allows for a more phylo-centric visualization of sequence conservation: if you click at a point in the tree, a red vertical line will appear that divides the alignment into different groups. If you choose Colour->Percentage Identity, the shades of blue will be relative to the subgroups in your tree (e.g., fish versus placental mammals). This is also useful to spot segments in the alignment that don’t look that good, or gaps created in a subpart that can now be collapsed in the subalignment (Edit->Remove Empty Columns), or sequences that stand out as long branches in the alignment (View->Overview Window).

For even more tree funkiness, you can use PhyloWidget to visualize our NHX trees. Use our NHX tree (“Configure this page->Output for normal tree->NHX->Save and Close->Gene Tree(text)“) to copy+paste the representation of the GeneTree into Phylowidget, with duplication/speciation events (red/blue), bootstrap values (greyscale) and taxonomy levels “View->Rendering->Show clade labels“. Then use the “Zoom in/Zoom out” features, or clicking on an internal node, the “Tree Edit->collapse“, and specially the “View->Branch lenghts [x]” and the “View->Layout->Options->Branch Scaling” options.

We hope these new features will help you in your research. We have some new ideas that we are currently testing to visualize even more phylogenetic information, and help make better judgement on the orthology and paralogy relationships in our EnsemblCompara GeneTrees. Stay tuned for more updates!

We’re happy to announce that Ensembl is one of the launch partners for Amazon’s “Public Data Sets” initiative, so the MySQL data and index files for the current release of Ensembl can be accessed from within Amazon’s Elastic Compute Cloud (EC2) service. From the Amazon website:

AWS Hosted Public Data Sets provide a convenient way to share, access, and use public domain or non-proprietary data within your Amazon EC2 environment. Select public data sets are hosted on AWS for free as an Amazon EBS snapshot. Any Amazon EC2 customer can access this data by creating their own personal Amazon EBS volume from a publicly shared Amazon EBS public data set snapshot. They can then access, modify, and perform computation on these data sets directly using an Amazon EC2 instance and just pay for the compute and storage resources that they use.

Details of how to access the data can be found at .

We have plans to make much more use of AWS in the future, stay tuned!

Due to the changes in the web interface there have been a number of changes to the URLs for pages. In most cases the web code catches these changes but there are a number of requests which due to the nature of the site have changed:

  • Configuring the way a page is rendered;
  • Changing the way tracks are rendered;
  • Adding DAS sources via a web-address and not via the web interface;
  • Attach UCSC style external resources.

These are now all attached in a similar – systematic way:

  • To change global page settings: add a paramter config=key=value{,key=val}
    to turn off the top image on Location > Region in detail;config=view_top=off

    e.g. to link directly to the Exon Intron markup panel (Transcript > Exons) and to show full introns and only 60bp flanking sequence AND turn the display to be 60bp wide;config=flanking=60,seq_cols=60,fullseq=yes

  • To change configuration for an individual panel add a parameter refering to the panel (this will be documented shortly on the website) e.g. For Location > Region in detail the two panels are contigviewtopcontigviewbottom, for Location > Region overview it is cytoview. This is again a comma separated list, where the left hand side of each “=” is the name of the track, and the right hand side is the name of the “renderer” to use – the latter depends on the type of track. Additionally the left hand side can be used to integrate external data: Notes:
    • Track names are now systematically named so will have changed from the values you may have been used to using – again we will shortly publish a list of these, but examples are: transcript_core_ensembl – the ensembl genes from the ensembl database.
    • Renderers depend on the type of track, but e.g. for transcripts you have the option of “transcript_label”, “transcript_nolabel”, “collapsed_label” and “collapsed_nolabel”, for alignment features (and also url attached data at the moment) “normal”, “half_height”, “stack”, “unlimited” and “ungrouped”, for DAS tracks “labels” (show labels if configured by the source) or “nolabels” – hide labels.
    • At the moment two special parameters can be used:
      – which attaches a DAS source to the session and selects the renderer

    For example:;config=panel_top=off;contigviewbottom=das:,transcript_core_ensembl=collapsed_nolabel

    Turns on a das source (in this case the Ensembl transcripts) and collapses the standard ensembl track down to a single line per Gene AND also turns off the top panel!