In its latest release, Ensembl has completely reviewed its reporting of potential Transcription Factor (TF) binding sites. TF proteins are key players of gene expression regulation that bind to specific DNA regions characterised by approximate sequence patterns, or transcription factor binding motifs (TFBM). These motifs are generally represented as a Position Specific Frequency Matrix, or Binding Matrix. Ensembl scans genomes for occurrences of these motifs, reporting Motif Features at each possible location.
This blog post is a joint contribution by Joannella Morales, Jane Loveland, Adam Frankish, Fiona Cunningham and Astrid Gall.
We are pleased to introduce the Matched Annotation from the NCBI and EMBL-EBI (MANE) project. This new joint initiative between EMBL-EBI’s Ensembl project and NCBI’s RefSeq project aims to release a genome-wide transcript set that contains one well-supported transcript per protein-coding locus. All transcripts in the MANE set will perfectly align to GRCh38 and will represent 100% identity (5’UTR, coding sequence, 3’UTR) between the RefSeq (NM) and corresponding Ensembl (ENST) transcript.
We’ve just released Ensembl Genomes 94, which includes genomes for Emmer wheat and over 200 new fungi, updated gene trees and host-pathogen interactions from PHI-base.
The latest version of Ensembl, release 94, is out and have we got some treats for you. As well as GENCODE updates for human and mouse, we’ve also got loads of new fish. Plus, we have brand new transcription factor binding motifs, additional predictors of variant pathogenicity and updated gene tree pipelines.
A common use case for the VEP is as a first step towards identifying the causal genetic variant of a rare phenotype from whole genome/exome sequencing. The VEP tells you which genes are hit, what effects they have on them, and you have to begin the long laborious process of filtering those down. Things you might consider include allele frequency, association with genes known to be involved in rare disease and whether both genes in a diploid organism are affected. Rather than faffing about doing this manually, you can use the G2P (genotype to phenotype) plugin instead, which was recently published as a preprint.
Rating variants for their potential deleteriousness is vital for solving the link between genotypes and phenotypes. There are many different algorithms for predicting how likely it is that a human variant would affect the function of a protein, and in release 94 of Ensembl, we’ll be making more of these available.
We’re excited to be trying a new conference this year: the African Society of Human Genetics (AfSHG) conference in collaboration with H3Africa, in Kigali Rwanda, 19th-21st September. The conference is a fantastic opportunity for African scientists to showcase their work, build collaborations and learn more about their field of research. For us, it’s great to see what research is going on outside of our usual sphere, as well as to promote our free database and training to researchers who could benefit from it.
If you don’t want to analyse your variants on external servers or have more than 1000 or so to annotate, you probably want to use the VEP script. Setting it up might not always be straightforward as there are dependencies you need, but the installation script takes away a lot of the trouble.
It’s probably reasonable to assume that the coding sequence (CDS) of a protein-coding transcript model is the feature that is of primary interest to most people who use Ensembl. However, both the 5’ and 3’ untranslated regions (UTRs) are important biological entities in their own right, and it is vital that we in Ensembl do the best we can to represent them accurately. However, the annotation of these UTRs is complicated, so we’re going to focus on exploring the annotation process for 3’ UTRs in this article (Figure 1).