I am writing in my capacity as leader of the Ensembl project based at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) based near Cambridge, England. Ensembl is one of the world’s leading sources of genome information and a central aggregation point for genomic data.

Continue reading

Please note that the archive websites for Ensembl releases 71 (April 2013) and 72 (June 2013) will be retired in July when version 85 is released.

This is in accordance with our rolling retirement policy, whereby archives more than three years old are retired unless they include the last instance of the previous assembly from one of our key species (human, mouse and zebrafish).

For more information about how to use archives, please see our previous blog post on the topic; a list of all current archives is available on the main website.

A mysteriously common debilitating genetic disorder. A deadly tropical disease. One of my favourite stories in the history of genetics weaves together these two elements – it’s a good one and it always deserves a re-telling – that of malaria and sickle cell anaemia.

This story captures my attention and inspires me in the power of scientific observation, curiosity and experiment. I’m sure you are all aware of the details of this worn-out tale: it is used as an example in classrooms and lecture theatres every year to explain Mendelian genetics, haploinsufficiency, physiology, disease and protein structure and function to young scientists. To mark the coincidental coinciding of DNA day and Malaria day, we wanted to re-visit this ‘historical’ example of how scientific observation and experimental approaches have led to the understanding of how a disease as debilitating as sickle cell anaemia paradoxically persists in the human population.

Molecular biology and bioinformatics have transformed the face of biological research over the last few decades. The speed that scientists can sequence and analyse DNA means that global collaborations that study thousands of individuals are beginning to shed light on a range of different diseases.

Sickle-cell anaemia is a disease in which red blood cells form an abnormal crescent (or sickle) shape. It is an inherited disorder, and was the first ever to be attributed to a specific genetic variant (rs334, see it here in Ensembl).

rs334_info

In 1949, ‘Sickle Cell Anaemia, a Molecular Disease’, from Pauling et al. identified a difference in the electrophoretic mobility between haemoglobin from healthy individuals and those with sickle-cell anaemia caused by a change in molecular structure of haemoglobin responsible for the sickling process [1]. The genetic variant (A, Reference:T) that causes cell sickling results in the substitution of a conserved glutamic acid residue at position 7 in beta chain of haemoglobin to a valine [2].

You can find this information in the Genes and regulation section for this variant. In the table below, which has been filtered to see only missense variants, the ‘Allele (transcript allele)’ column describes the variant allele (A) and the  transcript allele (T, as the HBB gene is located on the reverse strand). You can also see the nature and location of the variant on the transcript in the ‘Position’, ‘Amino acid’ and ‘Codons’ columns. The SIFT and Polyphen algorithms predict the effect of the amino acid change on protein structure and function. Interestingly, only the SIFT algorithm predicts that the T/A variant would have deleterious effect on haemoglobin structure and function, confirming that predictions can never be as accurate as experimental evidence.

rs334_consequences

Only those individuals that are homozygous for the variant allele develop sickle cell anaemia, although heterozygous individuals do have the much more manageable sickle cell trait. If untreated, individuals with sickle cell anaemia have a shorter than normal life expectancy, experiencing lethargy and breathlessness throughout their lives, with increased risk of stroke and pulmonary hypertension, as well as increased vulnerability to infection. Individuals with the milder sickle cell trait can experience problems in low oxygen or as a result of severe physical exercise, but can mostly be expected to live normal lives.

As such it would be expected that this variant would be rare in human populations. However, observations made in mid-20th century revealed that this variant is, in fact, surprisingly common in African, African American and Caribbean populations (you can see this in the 1000 Genomes allele frequencies available under Population genetics in Ensembl). Coincidentally, these were people descended from those who came from areas where malaria is prevalent [3]. Why was this happening?

rs334_pop_genetics

Individuals carrying just one copy of the variant allele were known not to develop sickle cell anaemia, leading rather normal lives. However, it was found that these same individuals, were in fact highly protected against malaria. It turned out that, quite bizarrely, having alternate alleles at this loci simultaneously prevented infection from the malaria parasite with entirely manageable sickle manifestations! Therefore, individuals with one copy of each allele have a greater chance of survival in geographical areas where malaria is endemic, preserving both alleles in the population.

Understanding this relationship has led to a deeper understanding of the infective lifecycle of the malaria parasite and novel approaches in combating malaria [4-5], but also an appreciation of the genetic factors leading to sickle-cell anaemia.

This story exemplifies how observation, epidemiology and scientific investigation can uncover the mysteries of a human disease and provide important insights for its treatment. Nowadays, this gold standard of studying single genetic disorders has been multiplied and sped up on an unprecedented scale. There are now numerous projects that are aimed at sequencing the DNA of many individuals with different diseases and using the power of bioinformatics to analyse how genetic variation might lay at the foundations for previously poorly understood diseases.

[1] Pauling L. et al. Sickle cell anemia a molecular disease Science, 1949 Nov 25;110(2865):543-8

[2] Ingram VM et al. Abnormal human haemoglobins. III. The chemical difference between normal and sickle cell haemoglobins Biochim Biophys Acta 1959 36: 543–548

[3] Allison AC et al. Protection Afforded by Sickle-cell Trait Against Subtertian Malarial Infection 1954 Br Med J 1 (4857): 290–294

[4] Mounkaila A. et al. Sickle Cell Trait Protects Against Plasmodium falciparum Infection American Journal of Epidemiology, 2012 176 175-185

[5]  Gregory LaMonte et al. Translocation of Sickle Cell Erythrocyte MicroRNAs into Plasmodium falciparum Inhibits Parasite Translation and Contributes to Malaria Resistance Cell Host & Microbe, 2012 12 187-199

 

Please note that the archive website for Ensembl release 68 (Jul 2012) will be retired in September when version 82 is released.

This is in accordance with our rolling retirement policy, whereby archives more than three years old are retired unless they include the last instance of the previous assembly from one of our key species (human, mouse and zebrafish).

For more information about how to use archives, please see our previous blog post on the topic; a list of all current archives is available on the main website.

In line with EMBL-EBI policy, from the end of 2015 Ensembl will be removing support for DAS from our browser. This means that we will no longer provide our annotations over DAS and that we will not visualise third party annotation provided to us via DAS. If you have data with genomic coordinates that you wish to present in Ensembl then we recommend that you do this using TrackHubs. For annotation on other coordinate systems, we are currently working on providing support for this and will announce developments in this area over the course of the coming year. If you need more details then please get in touch with us at helpdesk@ensembl.org.

My recent trip to Malawi as part of a Wellcome Trust Open Door Workshop has really reminded me how privileged I really am. I’m an Outreach Officer, which means that I have the privilege to travel out to institutes around the world to deliver free Ensembl workshops. Most of the time, these workshops are in Europe or the US, at fancy research institutes and universities, and it’s an awesome privilege to facilitate research at these institutes.

An even greater privilege is to be involved in the Open Door Workshops on Working with the Human Genome Sequence, organised by Wellcome Trust Advanced Courses, which head out to more developing countries to teach. They’re called ‘Open Door’ because all the resources we teach in them are free and open on the web, which means anyone, anywhere, with nothing but an internet connection can do it. I teach the Ensembl section of the course, but we also cover other resources from the EBI, Sanger Institute, NCBI and elsewhere.

We hold these courses at Wellcome Trust research centres, for example the Malawi-Liverpool Wellcome Trust I visited recently, which are fantastic investments by the Wellcome Trust in research around the world. Participants travel from all over the continent to attend the course; attendance is free (with selection) and the Wellcome Trust can even fund travel bursaries. It is a great privilege for me to be able to travel to these locations and to teach them all about Ensembl.

Group photograph

The group from the Open Door Workshop at the Malawi-Liverpool Wellcome Trust. Featuring instructors me (seated, second from left), Jane Loveland (Sanger Institute; seated, middle), Rob Finn (EBI; back row, far left), Charlie Steward (Sanger Institute; back row, middle) and Matt Clark (TGAC; back row, second from right). Photo by Heidi Hauser (Wellcome Trust Advanced Courses).

I am proud to present Ensembl to these workshops participants. Partly because I think it’s an amazing resource that can really facilitate research. Partly because we give it away for free, and I know this makes a huge difference to researchers whose labs are not well funded. Even in labs with £1 million grants, money is always tight, but for many of the people who attend our workshops, labs struggle with knackered PCR machines, ghost equipment that they can’t afford to buy the reagents to use and a complete reliance on Open Access publishing as they can’t pay for journal subscriptions, yet they still manage to produce world-class science. If they had to choose between replacing those broken machines and a pay-per-use or subscription-only bioinformatics resource, it would really be a no-brainer. But by giving them a free resource means they don’t have to make that choice. Indeed, it gives them the opportunity to carry out research that doesn’t need any expensive equipment or reagents.

The Wellcome Trust is one of the major funders of Ensembl. We are so grateful to them for allowing us to make our data freely available, so that everybody can make use of it. It really is a privilege.

The Ensembl Pre! site has been updated for four species: zebrafish (Danio rerio), rat (Rattus norvegicus), sperm whale (Physeter macrocephalus) and fugu (Takifugu rubripes).

Sperm whale is a new species to Ensembl. Our main site already displays earlier assemblies for fugu, zebrafish and rat.

Zebrafish

ZebrafischThe zebrafish assembly, GRCz10 (GCA_000002035.3), was made available by The Genome Reference Consortium in September 2014. Since the previous release, Zv9 in July 2010, the GRC has taken over the task of improving and maintaining the zebrafish assembly. The most notable changes in the chromosome landscape since the previous release can be found on chromosome 4, which has gained about 15 Mb in length. Furthermore, 94 of the 112 previously unplaced contigs are now located on chromosomes. In total, this assembly consists of 26 chromosomes and 3,399 unplaced scaffolds. The full annotation of an older zebrafish assembly, Zv9, can be found on our main website. Click here to go to the zebrafish Pre! site, where you can view alignments of zebrafish UniProt proteins and human Ensembl translations, as well as gene models projected from the previous zebrafish assembly.

Rat

rattusThe new rat assembly, Rnor_6.0 (GCA_000001895.4), was produced by The Rat Genome Sequencing and Mapping Consortium and was released in July 2014. This assembly comprises 954 toplevel sequences, 22 of which are chromosomes (chromosome Y is a new addition in this assembly), and 1,395 of which are unplaced scaffolds. The full annotation of an older rat assembly, Rnor_5.0, can be found on our main website. Otherwise, click here to visit the rat Pre! site, where you can view alignments of rat UniProt proteins and human and mouse Ensembl translations, as well as gene models projected from the previous rat assembly.

Sperm Whale

800px-Mother_and_baby_sperm_whaleThe sperm whale assembly, PhyMac_2.0.2 (GCA_000472045.1), was produced in September 2013 by The Aquatic Genome Models Consortium. The assembly does not contain any assembled chromosomes or linkage groups and is instead made up of 11,711 unplaced scaffolds. The species is an important model for a number of human conditions such as respiratory disease, metal toxicity and cancer. For example, sperm whales exposed to high levels of chromium have no adverse health effects whereas humans do. Studying this species could lead to development of treatments for human chromium-related disorders. Click here to visit the sperm whale Pre! site, where you can view alignments of human and dolphin Ensembl translations.

Fugu

fugu_tThe fugu genome assembly, FUGU5 (GCA_000180615.2), was released in October 2011 by The Fugu Genome Sequencing Consortium. It is composed of 22 autosomal chromosomes, with a total sequence length of 391Mb. The species was initially proposed as a useful model for annotating and understanding the human genome, as it contains a similar repertoire of genes to human yet is only roughly one-eighth of the size. It is among the smallest vertebrate genomes, and previous assemblies of this species have already shown themselves to be useful reference genomes for identifying genes and other functional elements in other vertebrate species. The full annotation of an older fugu assembly, FUGU 4.0, can be found on our main website. Click here to visit the fugu Pre! site, where you can view alignments of human and dolphin Ensembl translations.

Please note that the archive website for Ensembl release 65 (Dec 2011) will be retired in December when version 78 is released.

This is in accordance with our rolling retirement policy, whereby archives more than three years old are retired unless they include the last instance of the previous assembly from one of our key species (human, mouse and zebrafish).

For more information about how to use archives, please see our previous blog post on the topic; a list of all current archives is available on the main website.

You may have noticed our beta REST server has been retired. We have replaced it with our new service, http://rest.ensembl.org, and have a handy migration guide to help you update existing scripts. Details about the new server can be found in the article published in Bioinformatics. Some of the improvements include:

  • New POST endpoints
  • POST messages allow users to submit a list of inputs as a single request
    This is supported for the archive, lookup and vep endpoints
  • The rate limit has been increased, with up to 15 requests per second allowed
    Combined with POST, we were able to process 1000 variants per second!
  • New /variation endpoint to retrieve variation information linked to a gene or a transcript
  • New /regulatory endpoint to retrieve data from the regulatory build
  • HTTPS support for clients working with a secure environment

Screen Shot 2014-10-08 at 10.12.26

This server provides access to the latest data in Ensembl, including the new human build on the GRCh38 assembly. For those wishing to use data from the GRCh37 assembly, a dedicated server is available on http://grch37.rest.ensembl.org