We hope you like the new Ensembl website – we have had quite a lot of feedback about the system, and are digesting this to see how and where we can make the site more easy to use.

Missing features

We know there are a number of features which were in the webcode prior to the revamped version 51 that we are working on.

Views:

  • AlignSliceView [target e!53]
  • MultiContigView [target e!54]
  • CytoDump [will be released in e!53 as part of the export module]
  • DotterView
  • HistoryView – "ID liftover" [target e!53/4]
  • AssemblyConverter – "location liftover" [target e!53/4]

Components:

  • Drawing code tracks, e.g. rat QTLs, protein co-ordinate based DAS tracks [target e!53]
  • User gene annotations [target e!54]

New developments

We have a number of new "web" developments in the pipeline – some of these are listed below:

  • Extended configuration panel – searching for tracks, show currently active etc [target e!53]
  • Extended configuration panel – re-ordering tracks etc [target e!53]
  • Extended configuration panel – further configuration options – colour, depth, more display options, label options [target e!54/5]
  • New BLAST/BLAT interface [target e!55/6]
  • Re-write of the vertical drawing code to allow high quality PDF/PS/SVG karyotype and chromosome images to be produced.
  • Further work on export – finer configuration of what to export, exporting in multi-regions, integration with "user data"


We are already working on our next release (out late in February 2009) which will come with the following:

Data

  • New species added to our set: sloth (Choloepus hoffmanni), Anolis lizard (Anolis carolinensis) and zebrafinch (Taeniopygia guttata).
  • Updated marker information for human, cow, dog, horse, chicken, macaque, mouse and Medaka.
  • Updated manual annotation for mouse from VEGA.

Comparative Genomics

  • Pairwise alignments with the new species (human/sloth, zebrafinch/chicken, lizard/chicken).
  • New 31-way eutherian mammal alignment using these 2x genomes (based on the 9-way Enredo-Pecan-Ortheus multiple alignments): elephant (Loxondonta africana) , armadillo (Dasypus novemcinctus), tenrec (Echinops telfairi), rabbit (Oryctolagus cuniculus), guinea pig (Cavia porcelus), hedgehog (Erinaceus europaeus), shrew (Sorex araneus), microbat (Myotis lucifugus), tree shrew (Tupaia belangeri), squirrel (Spermophilus tridecemlineatus), bushbaby (Otolemur garnetii), pika (Ochotona princeps), mouse lemur (Microcebus murinus), cat (Felis catus), megabat (Pteropus vampyrus), dolphin (Tursiops truncatus), alpaca (Vicugna pacos), kangaroo rat (Dipodomys ordii), hyrax (Procavia capensis), tarsier (Tarsius syrichta), gorilla (Gorilla gorilla) and sloth (Choloepus hoffmanni).
  • The current clustering will be replaced by a hierarchical clustering sparse graphs (hcluster) for our trees.

Variation and Functional Genomics

  • An improved array mapping environment integrates genomic and cDNA mappings, supporting multi-species databases.
  • We’ll link to Genome Wide Association from the NHGRI catalogue (Hindorff et al.)
  • Genotype data for mouse (reference strain C57BL/6) will be included.
  • Update of variation for dog, chicken and platypus.

Other

Happy Holidays, and Happy New Year from Ensembl!

The new year will start with some workshops given by our Outreach team on how to use our new interface (and the data behind the scenes!). We hope you have had time to explore and learn the layout! Remember to send any questions to our helpdesk.

Upcoming workshops in January, 2009:

11 Jan Ensembl Demo at the PAG XVII conference, San Diego, CA, USA
13-14 Ensembl 2-day browser workshop at the Universidad de Chile, Santiago, Chile
15-16 Modules in the EBI Bioinformatics Roadshow, UCLA, USA
19-20 Modules in the EBI Bioinformatics Roadshow, City of Hope, USA
22-23 Modules in the EBI Bioinformatics Roadshow, UCSF, USA
24 Browser course in the Computational Biology Workshop, Sultan Qaboos University, Muscat, Oman
26 Browser course in the 9th BioSapiens European School of Bioinformatics, Brussels, Belgium

That’s all for now!

If you have clicked on the GeneTree link in Ensembl (for example, the gene tree for IL2), you may have noticed that we have a new way of displaying large GeneTrees. This time, if you have a large gene family with lots of genes that you want to look at, you won’t need to ask the Miami Dolphins to let you plug your laptop into their huge screen…


This new feature in EnsemblCompara is called collapsible subtrees and allows for more compact, summarized views of interesting gene families like PAX2/PAX5/PAX8:

http://www.ensembl.org/Homo_sapiens/Gene/Compara_Tree?g=ENSG00000075891

If you check the legend at the bottom, you will see that “blue triangles” correspond to collapsed subtrees that have within-species paralogs of your gene. If you want to see all the within-species paralogs expanded, you can click on the option “View paralogs of current gene“. You can even set that as a default if you want in the “Configure this page” options.

Jalview is a great way to view protein alignments in the tree. And were is my Jalview link now? Click on any internal node (square) in the tree, and be able to visualize the alignment (or subalignment) with the new Jalview applet by clicking on the Jalview link. You have to have Java installed though, or the link won’t show. The two Jalview windows that pop up are one, the protein alignment and the other, the underlying TreeBeST tree. You can now use Jalview’s sorting feature to sort your sequences according to the tree with: Calculate->Sort->By Tree Order->URL. Having the tree associated to the alignment allows for a more phylo-centric visualization of sequence conservation: if you click at a point in the tree, a red vertical line will appear that divides the alignment into different groups. If you choose Colour->Percentage Identity, the shades of blue will be relative to the subgroups in your tree (e.g., fish versus placental mammals). This is also useful to spot segments in the alignment that don’t look that good, or gaps created in a subpart that can now be collapsed in the subalignment (Edit->Remove Empty Columns), or sequences that stand out as long branches in the alignment (View->Overview Window).


For even more tree funkiness, you can use PhyloWidget to visualize our NHX trees. Use our NHX tree (“Configure this page->Output for normal tree->NHX->Save and Close->Gene Tree(text)“) to copy+paste the representation of the GeneTree into Phylowidget, with duplication/speciation events (red/blue), bootstrap values (greyscale) and taxonomy levels “View->Rendering->Show clade labels“. Then use the “Zoom in/Zoom out” features, or clicking on an internal node, the “Tree Edit->collapse“, and specially the “View->Branch lenghts [x]” and the “View->Layout->Options->Branch Scaling” options.


We hope these new features will help you in your research. We have some new ideas that we are currently testing to visualize even more phylogenetic information, and help make better judgement on the orthology and paralogy relationships in our EnsemblCompara GeneTrees. Stay tuned for more updates!

Hot on the heels of release 51 comes release 52 of Ensembl – the first revision of the new webcode… So what’s new?

Data:

Web site:

  • Updated export: – Restored most of the functionality with the new Export wizard on Genes, Transcripts and Locations – to allow export of FASTA, EMBL, Genbank, GFF, TSV, Vista and PIP files.
  • Image export: – Restored an improved version of the image export functionality – all “Horizontal” generated images have and [Export image] button to allow the image to be exported in vector format (PDF, SVG, EPS) and scaled bitmap format (PNG x0.5, x1, x2, x5 and x10) to allow publication quality images to be exported.

    The vector formats PDF, SVG and EPS can all be imported into vector image editors to be manipulated as well.

We’re happy to announce that Ensembl is one of the launch partners for Amazon’s “Public Data Sets” initiative, so the MySQL data and index files for the current release of Ensembl can be accessed from within Amazon’s Elastic Compute Cloud (EC2) service. From the Amazon website:

AWS Hosted Public Data Sets provide a convenient way to share, access, and use public domain or non-proprietary data within your Amazon EC2 environment. Select public data sets are hosted on AWS for free as an Amazon EBS snapshot. Any Amazon EC2 customer can access this data by creating their own personal Amazon EBS volume from a publicly shared Amazon EBS public data set snapshot. They can then access, modify, and perform computation on these data sets directly using an Amazon EC2 instance and just pay for the compute and storage resources that they use.

Details of how to access the data can be found at http://aws.amazon.com/publicdatasets .

We have plans to make much more use of AWS in the future, stay tuned!

Due to the changes in the web interface there have been a number of changes to the URLs for pages. In most cases the web code catches these changes but there are a number of requests which due to the nature of the site have changed:

  • Configuring the way a page is rendered;
  • Changing the way tracks are rendered;
  • Adding DAS sources via a web-address and not via the web interface;
  • Attach UCSC style external resources.

These are now all attached in a similar – systematic way:

  • To change global page settings: add a paramter config=key=value{,key=val}
    e.g.
    to turn off the top image on Location > Region in detailhttp://www.ensembl.org/Homo_sapiens/Location/View?r=1:1000-2000;config=view_top=off

    e.g. to link directly to the Exon Intron markup panel (Transcript > Exons) and to show full introns and only 60bp flanking sequence AND turn the display to be 60bp wide

    http://www.ensembl.org/Homo_sapiens/Transcript/Exons?t=ENST00000309255;config=flanking=60,seq_cols=60,fullseq=yes

  • To change configuration for an individual panel add a parameter refering to the panel (this will be documented shortly on the website) e.g. For Location > Region in detail the two panels are contigviewtopcontigviewbottom, for Location > Region overview it is cytoview. This is again a comma separated list, where the left hand side of each “=” is the name of the track, and the right hand side is the name of the “renderer” to use – the latter depends on the type of track. Additionally the left hand side can be used to integrate external data: Notes:
    • Track names are now systematically named so will have changed from the values you may have been used to using – again we will shortly publish a list of these, but examples are: transcript_core_ensembl – the ensembl genes from the ensembl database.
    • Renderers depend on the type of track, but e.g. for transcripts you have the option of “transcript_label”, “transcript_nolabel”, “collapsed_label” and “collapsed_nolabel”, for alignment features (and also url attached data at the moment) “normal”, “half_height”, “stack”, “unlimited” and “ungrouped”, for DAS tracks “labels” (show labels if configured by the source) or “nolabels” – hide labels.
    • At the moment two special parameters can be used:
      das:http://www.mydas.source/das/my_data=render
      – which attaches a DAS source to the session and selects the renderer
      url:http://www.myweb.server/my_data.format=render

    For example:

    http://www.ensembl.org/Homo_sapiens/Location/View?g=ENSG00000012048;config=panel_top=off;contigviewbottom=das:http://www.ensembl.org/das/Homo_sapiens.NCBI36.transcript=nolabels,transcript_core_ensembl=collapsed_nolabel

    Turns on a das source (in this case the Ensembl transcripts) and collapses the standard ensembl track down to a single line per Gene AND also turns off the top panel!

The web team can finally let out a quick sigh of relief now that the long awaited new web code has finally emerged kicking and screaming out of the web team office…

It is obvious to see the “cosmetic” changes to the site:

  • the colours,
  • fonts,
  • layout,
  • the unified configuration
  • the reduction in page sizes.

On top of this there have been a large number of underlying technical improvements to the way the pages are put together.

  • Streamlining the JavaScript and css to make sure that the transfers to and from the server to your browser are as fast as possible; Using unobtrusive JavaScript throughout the new code so pages work with or without JavaScript or AJAX – althouth they are not quite as functional they still work!
  • Making the pages standards compliant to make them render in most browsers without issues (unless of course that browser is IE and there are lots of places where the “standards” approach fails)
  • Using an fast in memory cache (a modified version of memcached which allows for the use of tags) to reduce the load on our user database and to store and server temporary images, processed HTML etc.
  • Segregation of code into more modules to reduce the size of the very large modules we had (noticeably the breakdown of the Component modules into smaller chunks)
  • Configuration meta information contained in core databases making the site easier and more automatic to set up.
  • Optimisation of drawing and configuration code.
  • Transparent use of AJAX in many cases. Use of Perl’s LWP::ParallelUserAgent where the user’s browser doesn’t support AJAX.
  • Further areas where the extensible plugin system is available – defining colours, configuring images.

There are still a few more Ensembl training events before the end of the year.

Browser workshops:

UNAM, Mexico City, Mexico (1-2 Dec)
UNAM, Cuernavaca, Mexico (5 Dec) (+ departmental seminar 4 Dec)

Amsterdam, The Netherlands (19 Dec)

Developers workshop:

University of Cambridge, UK (1-3 Dec)

In addition, Ensembl will feature as part of the following courses:

Wellcome Trust Open Door Workshop ‘Working with the Human Genome Sequence’ (1-2 Dec, Hinxton, Cambridge, UK) and Genes en evolución, ecologia e conservación (8-9 Dec, La Paz, Baja California, Mexico)

For details of these workshops, please have a look at the complete list of Ensembl training events.

Do you know a bit of Perl? Ensembl hosts an API (Application Programmers Interface) which uses Object-Oriented Perl to extract data from Ensembl databases. This API is public and can be used for people to programmatically access the data in the Ensembl database. We understand that not everyone is used to Object-Oriented code, although people may have basic Perl skills and be interested in using our datasets. For that kind of bioinformaticist, I would recommend a recent short read in O’Reilly’s Broadcast:

Beginners Introduction to Object-Oriented Programming with Perl – O’Reilly Broadcast

And for the more advanced readers, the classic reference book in OO-Perl would be Damian Conway’s Object Oriented Perl, which a part from being very informative, has a really cool cover 🙂

We are always trying to lower the barrier to entry for research communities interested in using the Ensembl database in programmatic ways that make use of all the complexity associated with the generation of our data. That’s why our API is public and well-documented. You can learn about our API by attending on of our API workshops for free (e.g.: 1-3 December – Univ. Cambridge, UK). We are currently trying to smooth things out even more, working on ways to make it even easier to download all that’s needed to use the API and have the example scripts running in your computer with the minimum number of steps. Keep tuned for news in this respect soon…