Ensembl VEP maps your variants to genes but what do those genes do?
The Gene Ontology (GO) annotates genes with molecular function, the cellular location in which the gene product functions and the biological process in which the gene product is involved. In addition to phenotype association information, which is only available for a few genes, we now show GO annotations to help guide variant prioritisation by providing an indication of the functionality a variant may affect.
This blog post is a joint contribution by Joannella Morales, Jane Loveland, Adam Frankish, Fiona Cunningham and Astrid Gall.
We are pleased to introduce the Matched Annotation from the NCBI and EMBL-EBI (MANE) project. This new joint initiative between EMBL-EBI’s Ensembl project and NCBI’s RefSeq project aims to release a genome-wide transcript set that contains one well-supported transcript per protein-coding locus. All transcripts in the MANE set will perfectly align to GRCh38 and will represent 100% identity (5’UTR, coding sequence, 3’UTR) between the RefSeq (NM) and corresponding Ensembl (ENST) transcript.
It’s probably reasonable to assume that the coding sequence (CDS) of a protein-coding transcript model is the feature that is of primary interest to most people who use Ensembl. However, both the 5’ and 3’ untranslated regions (UTRs) are important biological entities in their own right, and it is vital that we in Ensembl do the best we can to represent them accurately. However, the annotation of these UTRs is complicated, so we’re going to focus on exploring the annotation process for 3’ UTRs in this article (Figure 1).