Ensembl has begun to incorporate data from genome-wide association studies. These data are being added in coordination with the European Genotype Archive, a new database resource at the EBI designed to provide a permanent archive for human variation data that is not available for unlimited public release because of ethical or individual privacy restrictions. The European Genotype Archive has recently launched with the raw data from the Wellcome Trust Case Control Consortium (WTCCC. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661-678). In the future the EGA will provide additional array-based genotype data as well as data from re-sequencing and CNV studies. The EGA will also contain phenotype data.

Ensembl is incorporating summary data from genome-wide association studies represented in the EGA. The data generally represent the p-value for each of the tested SNP (Single Nucleotide Polymorphism) associated with the given phenotype.

The WTCCC summary data is now available on Ensembl as DAS tracks selectable from the “DAS Sources” menu from the CytoView and ContigView pages. The following menu items provide access to data from biopolar disorder (BD), coronary artery disease (CAD), cardiovascular disease (CD), hypertension (HT), type 1 diabetes (T1D), type 2 diabetes (T2D):

WTCCC BD
WTCCC CAD
WTCCC CD
WTCCC HT
WTCCC T1D
WTCCC T2D

In future releases, GWAS data will be integrated into the Ensembl variation databases.

We will be adding additional data to both Ensembl and the European Genotype Archive as the data become available. We hope you find these new data resources useful.

Ensembl is currently migrating to new hardware in conjunction to the development of new webcode for the next release (due in late September). During this period, and due to some technical issues, there might be some downtime of our website. We apologise for any problems this may cause and we are working to minimise its impact in Ensembl.

The Ensembl Team

The Ensembl team has recently run a series of workshops in China:

  • In Shanghai we were at Tongji University for a workshop organised by the Shanghai Center for Bioinformation Technology, and
  • In Beijing, we were hosted by Professor Jingchu Luo from the Center of Bioinformatics at Peking University where we also delivered some lectures in the Applied Bioinformatics Course.
  • Following this experience and due to the success of the tour, we are planning to go back to China. So if you were interested in hosting a workshop or you have a collaboration with a Chinese group who might be interested in knowing more about Ensembl, please contact us to discuss dates. We are trying to coordinate our next trip with different hosts.

    The Ensembl Team

    Ensembl announces a workshop for developers that will take place in the Wellcome Trust Genome Campus in Hinxton (near Cambridge, UK) next September (14th-16th September), following the Genome Informatics meeting.

    In this workshop we will be exploring Ensembl beyond the website. Participants will be expected to have experience in writing Perl programs and a background in object oriented programming techniques. Being familiar with databases (MySQL) and the Ensembl APIs would be an advantage.

    Several Ensembl developers will present uses of our APIs (Application Programming Interfaces) as well as extensions of the Ensembl system. Note this is not a course about how to use the Ensembl APIs.

    At the end of this course, attendees will:

    • have a good understanding of Ensembl’s annotation pipeline;
    • know how to customise a local installation of the Ensembl website;
    • and have hands-on experience with the annotation pipeline.

    In late 2008, the Ensembl Genomes project at the EBI will leverage the Ensembl system to create consistent genome annotation resources focused on a wide variety of eukaryote, as well as prokaryote genomes and thereby continue the activities of the current EBI Integr8 and Genome Reviews projects.

    Thus, there will be a session where the new divisions of Ensembl will be introduced and previewed; the initial data content and future directions will be discussed.

    There is no registration fee to attend this course, but you may need accommodation (or extending your stay in Hinxton Hall: info@wtconference.org.uk), if you could let us know you are planning to attend or wanted more information, please send an eMail to xose@ebi.ac.uk.

    The Ensembl Team

    In a previous post I promised to do some more genome browser screenshot counting. So, that is what I did last week at the XX International Congress of Genetics 2008 in Berlin. I limited myself to the second poster session of the conference that should have contained 675 posters. To my surprise a vast amount was missing though, so I estimate that the number I looked at was closer to somewhere between 400 and 500. Compared to the Barcelona conference the result was poor; I identified only 4 posters with Ensembl screenshots as well as 4 posters with UCSC Genome Browser screenshots and none with NCBI Map Viewer screenshots. So, based on the combined results from two genetics conferences, it seems that the Ensembl and UCSC browsers are about equally popular amongst poster-making geneticists.

    However, I had expected more genome browser screenshots in general. What can be the reason for these low numbers? Is there no need for screenshots at all? Or can people not get what they need for their poster from Ensembl or UCSC? We are curious about your thoughts and views on this and are welcoming any suggestions for improvements to Ensembl that will make preparing figures for your poster (or publication) more of a breeze!

    The next release (50) will happen in just under a week’s time. This will retain the old (classic) look, with the Ensembl interface you are all used to! The new interface will be released in August as a publicly accessible beta testing site alongside our usual Ensembl, in order to make sure everything is running smoothly before we switch over completely. This will give us time to collect feedback from you about the new interface, before we completely switch over to the new interface in release 51 (due in September).

    What can you expect in release 50?

    A new gene set for human, where UTRs (UnTranslated Regions) are based on ditags. An improved merge between the new human Ensembl gene set and the latest manually annotated gene set from Havana will be available. Also, new gene sets for tetraodon (genes from the Ensembl pipeline along with other genes from the genoscope set), C. elegans (WS190), and projection of the new human set against pika and cat.

    Cow has a new assembly and geneset! The Ensembl automated pipeline was run on Btau 4.0 for this release.

    New variation sets will be available for orangutan, tetraodon, cow and human.

    We will keep you posted about the new interface, beta testing surveys, and upcoming organisms and annotation in release 51.

    Thanks to all our users.

    I was thinking about the web design process for e50 – our new web interface due out in July (definitely will be late July). We’re at the stage now where Fiona is going to be asking users their preferences for all the “little things” which make no difference to technical aspects of the web site but make a pretty big difference to the useability. Like, for example, how do we colour our genes? This is a long standing debate where everyone has an opinion and everyone’s opinion is right – at least for them. (only 2 colours, and the colours should distinguish manually annotated genes from automatic says one person. No – use the whole spectrum of colours, and make sure we distinguish non-coding RNA genes from pseudogenes from protein coding genes and indicate which ones have orthologs – to mouse. No to rat. No – instead of that use GO functional catagories to colour genes. Or the number of non coding SNPs. Or the gene-wide omega value from the dn/ds measurement)

    Sometimes people look at this debate and say that this is a clear area for user defined colours. Which is sort of true for 10 seconds, but – not really. Firstly most users are not going to get around to changing options – partly due to the fact they have better things to do (like design experiments and run them!), partly because this sort of configuration is just a bit too geeky and partly because, to be honest, if they are into configuring things we’d like them first off to work out which tracks that would like displayed (more on this below), and colouring genes should be low on their list. Secondly we want to provide a scheme which feels natural to the most number of people. Hence a rather long series of options to choose from currently being proposed.

    The same argument goes for default tracks. (I can’t imagine not having SNPs on my display! I can’t imagine not having the ESTs switched on!). Everyone has an opinion and everyone is right. Here it is clear we’ve got to make sensible default decisions (which are also heavily, heavily speed optimised – sadly the new Collections framework wont be ready for SNPs for 50, which is annoying, as really we want SNP density these days in human, but all the other obvious default tracks are pretty well optimised, including some funky scaling stuff to get the continuous basepair comparative genomics measure to come back sensibly when you are zoomed out). But then our main task it to get the user to explore as the “wouldn’t it be nice to see xxxx, I wonder if Ensembl has it” with configuration system which is very enticing, but not in the way, and importantly for the non-expert user, not completely overwhelming. In our e50 design means more hierarchy in the options so they can be grouped (itself a bit of pain to handle – we’ve got alot of tracks), and a nice “light box” effect over the display which reassures you that (a) the thing that you were looking at wont disappear (b) the display will come back quickly. I think we’re on the right path here for the configuration, but we still have decide on the default tracks (for me the only obvious one is “Genes”).

    Finally we’ve got the mundane business of which words do we use for each of our “pagelet” displays. (our new pagelets are very nice, and in our latest round of testing, >50% of the in-the-lab biologists liked not only the pagelets, but a specific layout of them. less than 10% preferred the current ensembl display). So – we need one or two words to describe “A graphical representation of a phylogenetic tree of a gene with duplication nodes marked”. Hmmm. “Gene Tree”. Or “Phylogenetic Tree”? (phylogenetic is a bit of a long word, and might get in the way of the menu…). What about “a text based alignment of resequenced individuals with the potential to mark up some features of interest”. Is this – “resequencing alignment” or “individual alignment” or “individuals”.

    If you’d like to take part in this, email survey@ebi.ac.uk (perhaps cc’d to Xose – xose@ebi.ac.uk) to make sure you are on our list. Ideally we’d like you to be wet-lab biologists. We have alot of in-house or near-in-house opinions from bioinformaticians, and in anycase, bioinformaticians are happier to explore configurations etc. Its the researcher who will be visiting us – say – once or twice a month which we think is the main user to optimise for (again, more frequent users we hope will explore configuration to match things perfectly for them).

    More on other e50 topics soon – speed, the importance of chocolate in bribing web developers and the end game for e50!


    Development for the new Ensembl 50 website is progressing well… some of you may have already seen the test sites when you signed up to be part of our testing team…

    One of the complaints of the current site (hardware failures aside) is the performance of the webpages – we are addressing this in a number of ways in the Ensembl 50 web code.

    • Tuning the Apache web server configuration:
      Compressing all HTML/Javascript/CSS files using mod_deflate;
      Minimizing the number and size of Javascript/CSS files by stripping unnecessary white space and comments from the files and merging them together;
      Setting headers to improve the browsers caching of content.
    • Aggressively caching content on the server side using a modified version of memcached (this will require Linux users using a 2.6.x kernel as it uses the epoll technology).
    • Increased use of asynchronous HTTP requests (AJAX) to allow more immediate responses for the page while generating other content; and to minimize the content that is sent (can retrieve initially hidden content later)
    • Reducing page size – rather than having single pages containing lots of disparate information having more pages containing smaller amounts of information; this doesn’t just help with the page size – but also increases the discoverability of content that we have on the site – which people do not find easily – especially comparative genomics; variational genomics and regulatory information.

    For those who will be implementing local copies of Ensembl 50 code – additionally Ensembl 50 code will:

    • Make configuration easier – the pages will configure most of the tracks directly from the contents of the databases;
    • Make code more pluggable:
      ConfigPacker – the SpeciesDefs database parsing; and
      ImageConfig – replacement for UserConfig;
    • Make caching and AJAX implementation easier.

    There are a number of changes to the code – so if you have written your own components or drawing code tracks there will be work to be done but in most cases these modifications are easy to implement (e.g. moving code between modules).

    Finally, here are some additional system recommendations:

    • Perl 5.8.8 or newer;
    • MySQL 5.0 server;
    • 64 bit architecture;
    • large memory machine;
    • you can compile our modified “memcached” code (e.g. for Linux you will need a 2.6.x kernel) to get significant speed up;

    For the past two days, Ensembl has been slow or has not returned the page (instead offering an ‘Ensembl is down’ yellow screen).

    Be assured we are working on the problem. It is a hardware issue, but should be resolved soon.

    From all of us in the Ensembl team, thanks for your patience!

    As you know, we are working on a new website design for the Ensembl 50 release. We are currently seeking ‘beta testers‘ who would be happy to take part in a survey and help us shape the look and feel of the new website.

    If you could spare some time we would be very grateful if you could send an eMail to survey@ebi.ac.uk so we can add you to our list of testers.

    We are looking forward to hearing from you.
    The Ensembl Team