We are pleased to announce that Ensembl Genomes 36 has now been released, which includes new and updated genome assemblies and gene annotation as well as updated variation data and comparative genomics analyses. Find out more below:

  • Ensembl Bacteria includes an additional 142 genomes from release 35 together with an update to gene families.
  • Ensembl Fungi has added gene symbols for 1-to-1 orthologues from S. cerevisiae to Botrytis cinerea and includes updated PHI-base 4.3 annotations.
  • Ensembl Metazoa now has automated RNA gene annotation for 37 species (i.e. all species that have not been imported from FlyBase, VectorBase or WormBase) and alignment of Rfam 12.2 covariance models for all species. There are also updated protein features, which now includes features from new sources (CDD, MobiDB and SFLD).
  • Ensembl Protists now has new automatic ncRNA alignments across all protist species as well as updated PHI-base 4.3 annotations.
  • Ensembl Plants now includes the new genome assembly for Hordeum vulgare (barley), the biggest diploid yet sequenced, which is included in updated comparative peptide analyses for all species. There are also new ncRNA gene annotations and new plant reactome cross references across all plant species. New and updated variation data has also been included in this release for both Oryza sativa and Arabidopsis thaliana. Last, but not least, 80829 variation markers from the iSelect 90k array and 13.8 million Inter-Homoeologous Variants (IHVs) have been added to the wheat assembly, along with chloroplast and mitochondrial components (including gene annotations) imported from ENA.

Please see the release notes for full details of the updates.

Ensembl 90 is scheduled for August 2017 and it’s set to be our biggest release ever in terms of new genome annotation. Here’s what you can look forward to:

New assemblies, gene sets and annotations

  • Annotation of 15 rodent genomes, including three updates to old genomes:
    • Brazilian guinea pig
    • Chinese hamster
    • Damara mole rat
    • Degu
    • Golden Hamster
    • Guinea Pig (update)
    • Kangaroo rat (update)
    • Lesser Egyptian jerboa
    • Long-tailed chinchilla
    • Naked mole-rat – we have two different assemblies for naked mole-rat so you can keep working with your preferred genome
    • Northern American deer mouse
    • Prairie vole
    • Squirrel (update)
    • Upper Galilee mountains blind mole rat
  • Bringing in annotation of the well-used rodent cell-line, Chinese Hamster Ovary, and two mouse species, Ryukyu mouse and Shrew mouse.
  • Annotation on the latest Pig genome assembly, Sscrofa11.1
  • Updating the Human gene set to GENCODE 27.
  • Updating the Mouse gene set to GENCODE M15.
  • Adding transcript models from RNA-seq to the gene database and pri-miRNAs to the otherfeatures database in Zebrafish.

Other updates and highlights

  • Updating our human variation database with:
    • COSMIC 81 somatic variants
    • HGMD 2016.4
    • dbSNP 150
    • DGVa structural variants
    • TopMed in GRCh37
    • Phenotypes from NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Cosmic Gene Census, DDG2P, MIM Morbid and Orphanet
  • In other species we also have variation updates as follows:
    • DGVa in Cow, Dog and Mouse
    • Phenotype updates from relevant databases in Cat, Chicken, Chimpanzee, Cow, Dog, Horse, Macaque, Mouse, Pig, Rat, Sheep, Turkey and Zebrafish
  • Updating our microarray probe mappings in:
    • C.intestinalis
    • Caenorhabditis elegans
    • Chicken
    • Chimpanzee
    • Cow
    • Dog
    • Fruitfly
    • Human
    • Macaque
    • Mouse
    • Mouse 129S1/SvImJ
    • Mouse A/J
    • Mouse AKR/J
    • Mouse BALB/cJ
    • Mouse C3H/HeJ
    • Mouse C57BL/6NJ
    • Mouse CAST/EiJ
    • Mouse CBA/J
    • Mouse DBA/2J
    • Mouse FVB/NJ
    • Mouse LP/J
    • Mouse NOD/ShiLtJ
    • Mouse NZO/HlLtJ
    • Mouse PWK/PhJ
    • Mouse SPRET/EiJ
    • Mouse WSB/EiJ
    • Pig
    • Platypus
    • Rabbit
    • Rat
    • Saccharomyces cerevisiae
    • Xenopus
    • Zebrafish

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

Ensembl transcripts have two identifiers, the versioned ENST, which is stable through time and can be tracked from release to release, and a separate identifier that incorporates a gene symbol. The latter have changed in e!89; read on for more details.Continue reading

We’re really excited to be a part of the ESHG conference again, this time in Copenhagen from the 27th-30th May. We can’t wait to see all the great science that’s going to be presented, but here’s a guide to the talks, workshops and posters from Ensembl and some of our close friends:

W18 – Ensembl & GENCODE Workshop

Monday 29th May 3pm-4:30pm Ancona Room 

This joint workshop is organised and will be presented by Ben Moore and Amonida Zadissa from Ensembl and Adam Frankish from HAVANA. It is aimed at attendees familiar with Ensembl, including wet-lab biologists, clinicians and the bioinformatics community.
The workshop will start with a brief introduction to the Ensembl project and genome browser along with a talk about the gene annotation process carried out by the Ensembl and HAVANA teams to produce the GENCODE gene set.

After the introduction to the Ensembl genome browser, there will be hands-on demonstrations to teach you how to:

  • annotate SNPs and CNVs with functional consequences using the Variant Effect Predictor
  • investigate quick alternatives to the browser (BioMart and the REST API)

You can find more information about the workshop location and timings on the ESHG programme. If you have any questions or want to talk about Ensembl and GENCODE you can e-mail the Ensembl Helpdesk to arrange a time to meet, tweet us or simply come by and meet us after the workshop.

Poster Sessions

As well as the Ensembl and GENCODE workshop, Amonida will also be presenting an electronic poster (E-P16.08) that describes genome annotation and assembly assessment in Ensembl. Electronic Posters will be on display in the Poster Area and can be accessed during exhibition opening hours from 09:30 on Saturday 27th May to 17:45 on Monday 29th May by all participants.

Our friends Jackie, Joannella and Maria from the GWAS catalog will be presenting their work at ESHG.

  • Maria will be presenting an electronic poster (E-P16.12) with new ideas they have for increasing the speed that GWAS data is incorporated into the GWAS Catalog. Electronic Posters will be on display in the Poster Area and can be accessed during exhibition opening hours from 09:30 on Saturday 27th May to 17:45 on Monday 29th May by all participants.
  • Jackie will be presenting a poster about the steps they are taking to ‘Increase the utility of the NHGRI-EBI genome-wide association study (GWAS) Catalog for users’. Jackie’s poster number is P16.36D and you can come and talk to her between 16:45 and 17:45 on Monday 29th May.

Jackie and Maria would love to hear feedback from GWAS Catalog users during the conference, particularly on the new and proposed functionality presented in our posters. They will also be able to answer any questions on the GWAS Catalog. Any users who would like to talk to us should e-mail the GWAS Catalog team at gwas-info@ebi.ac.uk, and they can arrange a time to meet, or simply come by and meet them at their posters.

And last, but not least, Giselle Kerry will be representing The European Genome-Phenome Archive (EGA) with a poster about their future plans. Giselle’s poster number is P19.44D and you can come and talk to her between 10:15 and 11:15 on Monday 29th May.

Looking forward to seeing you all there!

This is the second instalment of our monthly posts introducing a member of the Ensembl team, and what they do in Ensembl. This time, it’s Will McLaren who works in the Variation team.

What is your job in Ensembl?

I’m the principal developer in the Ensembl Variation team. Our team produces, maintains and supports all of Ensembl’s variation resources. This includes a number of databases as well as the APIs and tools that use them, including the Variant Effect Predictor (VEP).

What do you enjoy about your job?

I love hacking around with code, making new things, taking things apart and fixing them again. Knowing that we’re contributing to advancing science and medicine by doing that is a huge bonus, and the satisfaction I get from that is what I enjoy the most.

I also enjoy interacting with our users, either helping them out over email or face to face at our workshops. Our users really are the inspiration for what we do, so I think it’s really important that we engage with them as much and as productively as we can.

What are you currently working on?

I usually have a number of projects on the go. A lot of my development time is spent working on and supporting VEP, and I’m currently working on improving how we handle RefSeq genes, as well as managing our recent transition to a major VEP update. I’m also spending some time working on a collaborative project with OpenTargets, the aim of which is to help identify links between genomic variation, genes and disease.

What is your typical day?

A typical day for me usually consists of continuing progress on whatever development project or projects I need to prioritise that day, and for me that usually means a terminal, a text editor and a web browser are my best friends. This will be punctuated by various things. I have a daily standup meeting with my team, saying what we’ve done, what we’re going to do, and discussing any issues that come up from that – it might be a colleague has become stuck on something that someone else has already figured out, so sharing in this way is great for all of us. We also communicate a lot via instant messaging if we can’t in person, and this extends across the whole of Ensembl. There’s usually a couple of help requests from users, and occasionally this will also involve finding and squashing a bug in our code (something we try to prioritise where we can). If the bug turns up in code someone else is responsible for, we might discuss with them before working out how to fix it. There may be a pipeline I need to kick off or check on that generates data as part of Ensembl’s ongoing release cycle. I might also have a conference call or a meeting with collaborators; typically this might be to discuss a manuscript in progress or a shared project.

How did you end up here?

I started my academic life doing Biochemistry and Genetics. I pretty quickly realised I didn’t have the manual skills or the patience for lab work, but I loved the discovery aspect of the science. I’d always had a hobby messing with computers, and was delighted to discover that I could combine my hobby and academic interests in this thing we call bioinformatics.

After a year studying bioinformatics, I got my first job working in informatics for pet health. Not the most glamorous, but it started me down the path of working in variation data, and a change of city led me to working at the Sanger Institute (who share a campus with us here at EMBL-EBI) doing statistical genetics for genome-wide association studies (GWAS).

I’d dabbled with Ensembl before, and when I saw a job advertised in Ensembl in a new-ish team that fit my experience and interests and was looking to expand, I jumped at the chance and haven’t looked back!

What surprised you most about Ensembl when you started working here?

What surprised me most was the diversity in the Ensembl team. We really are an international team with representatives from nearly every continent, which is great on both a personal and professional level. As well as this, we have a surprising diversity in people’s educational and career backgrounds. The combination of these means we have a huge breadth and depth of knowledge across the Ensembl team, which allows us to deliver what I consider a staggering array of data and functionality to our users.

What is the coolest tool or data type in Ensembl that you think everybody should know about?

We have a cool web view called the transcript haplotype view. This shows you whole transcript and protein sequences as they would appear in each individual from the 1000 Genomes project, by considering all of the genomic variation across a gene together. We also have a related tool that you can use on your own data called Haplosaurus, and I think this is going to be a really important step towards seeing the real biological picture in sequenced genomes.

We are pleased to announce that Ensembl Genomes 35 has now been released.

New and updated genomic sequences are available in all EG sub-portals, while updated comparative peptide analyses have been performed for Fungi, Metazoa, Plants, and Protists:

  • Ensembl Bacteria now incorporates 2460 new genomes, as well as revised assemblies and annotation for 188 and 234 genomes, respectively;
  • Ensembl Fungi now incorporates more than 100 new genomes, including the Puccinia striiformis f. sp. tritici PST-130 v1.0 assembly from the Joint Genome Institute, and provides updates to existing genomes and annotation. In particular, a new, manually-annotated genebuild, curated by the community using the WebApollo tool, has been added for Botrytis cinerea B05.10;
  • Ensembl Metazoa adds three new genomes, including that of Hessian fly. In addition, orthologue metrics have been calculated for all metazoan species and have been used to compute a set of “high-confidence” orthologues;
  • Ensembl Plants includes a new genome assembly and genebuild for Sorghum bicolor, and an updated genebuild for maize. New variation data are available for bread wheat, as are new comparative peptide analyses for all species;
  • Ensembl Protists contains 11 new genomes, along with revised genomic assemblies for more than 25 other species. Variation data have been newly included for Phaeodactylum tricornutum, and have been updated for Phytophthora infestans and Plasmodium falciparum; new comparative peptide analyses have also been performed.

Please see the release notes for full details of the updates: http://ensemblgenomes.org/info/release-notes/35

We’re already gearing up for Ensembl 89, scheduled for May 2017. It’s a slimline release this time, with just a handful of highlights:

Updated assemblies, gene sets and annotations

  • Human: updated cDNA alignments
  • Mouse: updated cDNA alignments and update to Ensembl-Havana GENCODE gene set

Other updates and highlights

  • Variation and phenotype database updates, including COSMIC version 80.
  • GnomAD frequencies will be available via the website, VEP and APIs.
  • Mapping of array probes to 15 different mouse strains in Ensembl.

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.