The Ensembl team has been involved in several activities in Hyderabad (India) during the last few days, making the most of the latest HUGO’s 13th Human Genome (HGM2008).

A satellite workshop has been organised within the Open Door Workshop framework at the Centre for Cellular and Molecular Biology (CCMB). Over 40 scientists from different countries had the opportunity to learn about different resources freely available on the Internet, providing us with useful feedback.

Following our presence in the HGM2008 in the EBI booth we had the opportunity to make several contacts that hopefully should allow us to organise a series of workshops around India next year. If you were interested to know more about this, or query about possibilities to host one of our workshops, you can contact us.

Greetings from India भारत से नमस्ते

As usual October is a busy month for the Ensembl trainers with workshops on 4(!) different continents.

From 1-3 Oct Ensembl will feature in the Wellcome Trust Open Door Workshop “Working with the Human Genome Sequence” in Hyderabad, India, and from 6-8 Oct in the EBI hands-on workshop “A two-day dip into the EBI’s data resources: Understanding your data” in Hinxton, UK.

Upcoming browser workshops:
9-10 Oct: J. Craig Venter Institute, Rockville, MD, US
14 Oct: National Human Genome Research Institute (NHGRI), Bethesda, MD, US
15 Oct: National Human Genome Research Institute (NHGRI), Bethesda, MD, US
16-17 Oct: University of the Free State, Bloemfontein, South Africa
20-21 Oct: University of the Witwatersrand, Johannesburg, South Africa
22 Oct: University of Nottingham, Nottingham, UK
23-24 Oct: University of the Western Cape, Cape Town, South Africa
29-30 Oct: EBI Roadshow, Dublin, Ireland

If you want to know to which locations we are coming after October, then have a look at the complete list of all upcoming training events.

Considering hosting an Ensembl workshop yourself? Please contact Xose Fernandez.

Steve posted the news that we’re delaying our new release for at least two more weeks. The message is pasted in here:

Hi all

In our Intentions Summary mail for release 51 we stated that the release was scheduled for early/mid September. The 51 release will include significant updates and improvements to the web interface. We are delaying release while we complete development on these. We are working to get the release out as soon as possible, and are now aiming for end September/early October. I apologise for this delay.

Steve

 

Dr Steve Searle
Ensembl Project Leader, Sanger

It is always so frustrating to delay, but of course, far more important to have a working site than something only part working. Welcome to delivering high end services.

We took on alot of things to change in this web refresh. For most users the main thing people will notice is the entirely new web layout. This was driven by our surveys of users who mainly complained about being buried in too many displays and data. We then took around 4 months working with user groups and trialling different layouts (many thanks for those who participated) which in some cases made significant changes to our original designs (we now have a hybrid “tab and left-hand-side” approach, voted as best by ~60% of people, with the other three options splitting the rest of vote). We’re very excited about this new layout going live as it just looks cleaner, less cluttered and yet providing more information. The other thing people will notice is that it is just faster. As the saying goes, you can’t be too rich, too thin or have your websites go too fast.

Making a website go faster is harder than it might look. It involves all sorts of things – the bandwidth of your machines to us, the speed the servers, the connectivity of servers to databases, the speed of the API, the database to disk, the management of the huge number of simultaneous users we have and then the size of the html returned and finally the render speed on your browser. All of these contribute to the overall perception of “speed”. Under the hood we’ve been working on all these aspects – internally a big change is that we have switched from needing a common file system for our web farm to work off. Previously when your browser asks for a contigview page, our servers generates html with an image and that image is written to the common disk, the browser parses the image tag, asks for this image – and this is the critical bit – sends a request which in all likelihood will be served by a different server in our webfarm. That server then went to the common file system to pick up the file and send it back. Many times a critical bottleneck has been read/write on this shared filesystem. In the new system this has all gone, and the images are stored in a memory-based common store, meaning both that we remove this bottle-neck (which will be the first big effect) and secondly we will be able to cache alot more – the hope is that many of the identical pictures for the common species will be entirely served from memory in the new system. Another important change has been aggressively sliming our html. Currently all sorts of files – often very small – are pinged by each page up, just to see if they have changed. We’ve consolidated alot of these files – and compressed them – and then also optimised them for render speed.

There is a variety of things not for this release but coming up end of 2008/early 2009 also on speed. Our API has a new concept, collections, which better handles the case of zoomed out views, where we know the renders will not be able to render every object. Instead a collection – which may be rendered as a union or density or something will be provided. The other thing on the horizon is us setting up a US mirror on the west coast. For the last year we have been extensively monitoring the speed of Ensembl from different sites, and there is a large increase in time to retrieve on the north-west coast of the US. We’ve been investigating quite why this (and learning lots more about the backbone of the internet than we knew before) but it seems as if the simplest way to getting speed to work in the west coast is to just run a mirror over there. Probably 2009 for that to go live.

Back to the website. It looks so much better – and has much better hardware characteristics – (our shared file system is … well … rather 2004 technology and needs pretty constant care at the moment) that I can’t wait until it comes out. But there is absolutely no point in having a crippled site in functionality even though we’ve got many of the user interface and technical issues right. The sticking point at the moment is the configuration panel. This comes up as “modal” box on top of the page, allowing alot of options to choose from, but not a bewildering set of options on each page. To cope with the 200 odd different tracks to switch on and off, the box has to have tabs and friendly, browseable hieriarchies. To get all this to work in a nice, friendly, slick way… that’s alot of Javascript.

And alot of Javascript is alot of browser compatible headaches. Even using JS libraries – prototype and scriptolicious (I think – James smith can tell you the details!) there are all sorts of details that might not work just-quite the same way on IE5 compared to IE6. Or Firefox. Or Safari. And it must degrade at least functionally without JS. And of course work, and render fast. This modal box is the last, complex thing to get sorted.

We’re close. I’ve seen the box come up over James’ screen. I hear Steve has seen it come and tracks change, and see the link of tracks to changes. The API for the configuration system was gutted and is much better. But its got to work on all main browsers. For all our genomes, in particular Human and Mouse. And this is just tricky, fiddly work.

We’re not quite there yet. We’re really close, and so much is working it is just excruitiating. But we need another couple of weeks. James is being shielded from other jobs by Steve and others; Eugene is torture testing memcachedb to stress test the system before it goes live; Xose, Bert and Guilietta are writing help; Beth and Anne are writing the additional pagelets inside of the new geneview and transcriptviews. and it all looks really good.

So – apologies – we thought we’d be launching in July. We thought we’d be launching in September. We still might just do that, but then again, it might well be October. If it goes any later I will have no hair.

But it does look really good.

It is definitely worth the wait. Like Guinness.

Ewan

After the Summer break we are getting up to speed again with our training events:

14-16 Sep: Ensembl User Meeting, Hinxton, UK

17-19 Sep: Browser workshops and presentations, Erasmus MC Molecular Medicine Postgraduate School, Rotterdam, The Netherlands

22 Sep: Browser workshop, VIB Flanders Interuniversity Institute of Biotechnology, Antwerp, Belgium

We also have a complete list of all upcoming training events for the coming months available. Are we not coming to a location close to you? Why not host then an Ensembl workshop yourself? For more details, please contact Xose Fernandez.

Ensembl has begun to incorporate data from genome-wide association studies. These data are being added in coordination with the European Genotype Archive, a new database resource at the EBI designed to provide a permanent archive for human variation data that is not available for unlimited public release because of ethical or individual privacy restrictions. The European Genotype Archive has recently launched with the raw data from the Wellcome Trust Case Control Consortium (WTCCC. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661-678). In the future the EGA will provide additional array-based genotype data as well as data from re-sequencing and CNV studies. The EGA will also contain phenotype data.

Ensembl is incorporating summary data from genome-wide association studies represented in the EGA. The data generally represent the p-value for each of the tested SNP (Single Nucleotide Polymorphism) associated with the given phenotype.

The WTCCC summary data is now available on Ensembl as DAS tracks selectable from the “DAS Sources” menu from the CytoView and ContigView pages. The following menu items provide access to data from biopolar disorder (BD), coronary artery disease (CAD), cardiovascular disease (CD), hypertension (HT), type 1 diabetes (T1D), type 2 diabetes (T2D):

WTCCC BD
WTCCC CAD
WTCCC CD
WTCCC HT
WTCCC T1D
WTCCC T2D

In future releases, GWAS data will be integrated into the Ensembl variation databases.

We will be adding additional data to both Ensembl and the European Genotype Archive as the data become available. We hope you find these new data resources useful.

Ensembl is currently migrating to new hardware in conjunction to the development of new webcode for the next release (due in late September). During this period, and due to some technical issues, there might be some downtime of our website. We apologise for any problems this may cause and we are working to minimise its impact in Ensembl.

The Ensembl Team

The Ensembl team has recently run a series of workshops in China:

  • In Shanghai we were at Tongji University for a workshop organised by the Shanghai Center for Bioinformation Technology, and
  • In Beijing, we were hosted by Professor Jingchu Luo from the Center of Bioinformatics at Peking University where we also delivered some lectures in the Applied Bioinformatics Course.
  • Following this experience and due to the success of the tour, we are planning to go back to China. So if you were interested in hosting a workshop or you have a collaboration with a Chinese group who might be interested in knowing more about Ensembl, please contact us to discuss dates. We are trying to coordinate our next trip with different hosts.

    The Ensembl Team

    Ensembl announces a workshop for developers that will take place in the Wellcome Trust Genome Campus in Hinxton (near Cambridge, UK) next September (14th-16th September), following the Genome Informatics meeting.

    In this workshop we will be exploring Ensembl beyond the website. Participants will be expected to have experience in writing Perl programs and a background in object oriented programming techniques. Being familiar with databases (MySQL) and the Ensembl APIs would be an advantage.

    Several Ensembl developers will present uses of our APIs (Application Programming Interfaces) as well as extensions of the Ensembl system. Note this is not a course about how to use the Ensembl APIs.

    At the end of this course, attendees will:

    • have a good understanding of Ensembl’s annotation pipeline;
    • know how to customise a local installation of the Ensembl website;
    • and have hands-on experience with the annotation pipeline.

    In late 2008, the Ensembl Genomes project at the EBI will leverage the Ensembl system to create consistent genome annotation resources focused on a wide variety of eukaryote, as well as prokaryote genomes and thereby continue the activities of the current EBI Integr8 and Genome Reviews projects.

    Thus, there will be a session where the new divisions of Ensembl will be introduced and previewed; the initial data content and future directions will be discussed.

    There is no registration fee to attend this course, but you may need accommodation (or extending your stay in Hinxton Hall: info@wtconference.org.uk), if you could let us know you are planning to attend or wanted more information, please send an eMail to xose@ebi.ac.uk.

    The Ensembl Team

    In a previous post I promised to do some more genome browser screenshot counting. So, that is what I did last week at the XX International Congress of Genetics 2008 in Berlin. I limited myself to the second poster session of the conference that should have contained 675 posters. To my surprise a vast amount was missing though, so I estimate that the number I looked at was closer to somewhere between 400 and 500. Compared to the Barcelona conference the result was poor; I identified only 4 posters with Ensembl screenshots as well as 4 posters with UCSC Genome Browser screenshots and none with NCBI Map Viewer screenshots. So, based on the combined results from two genetics conferences, it seems that the Ensembl and UCSC browsers are about equally popular amongst poster-making geneticists.

    However, I had expected more genome browser screenshots in general. What can be the reason for these low numbers? Is there no need for screenshots at all? Or can people not get what they need for their poster from Ensembl or UCSC? We are curious about your thoughts and views on this and are welcoming any suggestions for improvements to Ensembl that will make preparing figures for your poster (or publication) more of a breeze!

    The next release (50) will happen in just under a week’s time. This will retain the old (classic) look, with the Ensembl interface you are all used to! The new interface will be released in August as a publicly accessible beta testing site alongside our usual Ensembl, in order to make sure everything is running smoothly before we switch over completely. This will give us time to collect feedback from you about the new interface, before we completely switch over to the new interface in release 51 (due in September).

    What can you expect in release 50?

    A new gene set for human, where UTRs (UnTranslated Regions) are based on ditags. An improved merge between the new human Ensembl gene set and the latest manually annotated gene set from Havana will be available. Also, new gene sets for tetraodon (genes from the Ensembl pipeline along with other genes from the genoscope set), C. elegans (WS190), and projection of the new human set against pika and cat.

    Cow has a new assembly and geneset! The Ensembl automated pipeline was run on Btau 4.0 for this release.

    New variation sets will be available for orangutan, tetraodon, cow and human.

    We will keep you posted about the new interface, beta testing surveys, and upcoming organisms and annotation in release 51.

    Thanks to all our users.