Functional genomics data from DANIO-CODE has been released to the public. This international effort, similar to ENCODE in human and mouse, seeks to characterise the functional elements in the zebrafish (Danio rerio) genome. Announced on Saturday at the International Zebrafish Conference, the DANIO-CODE dataset exists as a track hub, which can be viewed in Ensembl.

Continue reading

In this blog we catch up with Ensembl’s 2018 Google Summer of Code (GSoC) students and hear about their now completed projects, and their reflections on the experience. You may have already seen our previous blog post which we published as they were just beginning their projects. Read on to find out how they went, what they learnt and what valuable advice they can pass on to aspiring GSoC students.

Continue reading

A common use case for the VEP is as a first step towards identifying the causal genetic variant of a rare phenotype from whole genome/exome sequencing. The VEP tells you which genes are hit, what effects they have on them, and you have to begin the long laborious process of filtering those down. Things you might consider include allele frequency, association with genes known to be involved in rare disease and whether both genes in a diploid organism are affected. Rather than faffing about doing this manually, you can use the G2P (genotype to phenotype) plugin instead, which was recently published as a preprint.

Continue reading

Rating variants for their potential deleteriousness is vital for solving the link between genotypes and phenotypes. There are many different algorithms for predicting how likely it is that a human variant would affect the function of a protein, and in release 94 of Ensembl, we’ll be making more of these available.

Continue reading

We’re excited to be trying a new conference this year: the African Society of Human Genetics (AfSHG) conference in collaboration with H3Africa, in Kigali Rwanda, 19th-21st September. The conference is a fantastic opportunity for African scientists to showcase their work, build collaborations and learn more about their field of research. For us, it’s great to see what research is going on outside of our usual sphere, as well as to promote our free database and training to researchers who could benefit from it.

Continue reading

It’s probably reasonable to assume that the coding sequence (CDS) of a protein-coding transcript model is the feature that is of primary interest to most people who use Ensembl. However, both the 5’ and 3’ untranslated regions (UTRs) are important biological entities in their own right, and it is vital that we in Ensembl do the best we can to represent them accurately. However, the annotation of these UTRs is complicated, so we’re going to focus on exploring the annotation process for 3’ UTRs in this article (Figure 1).

Continue reading

Trixie the Triceratops

Ensembl produce high quality gene annotation for a number of species, but getting it to the high quality we expect takes time. This means there are many species and strains where we don’t have annotation yet. If you’re working with a species without Ensembl annotation (like Trixie the Triceratops here) or even a specific strain that we don’t have, you can still make use of VEP for predicting the effect of variants on genes and transcripts, using your own annotation. All you need is a GFF or GTF of the transcripts, and a FASTA file of the genome.

Continue reading