Flagging the removed evidences of our databases

All gene annotations in Ensembl are supported by biological sequence alignments. These sequences used as supporting evidence are downloaded from public databases, for example UniProt and ENA, at the start of the gene annotation process. Public sequence databases are updated regularly, meaning that sequences are added to and removed from them. We don’t update our gene sets every release; for some species the gene annotations may not be updated for several releases. It follows that gene annotations in the Ensembl gene sets may be supported by biological sequences that have been withdrawn from the public databases.

In order to indicate these changes, we now flag sequences that we have used as supporting evidence but which have been withdrawn from the public databases. These flags are updated every release by checking all protein-coding transcripts and exons for all species against the most current sequence databases. Transcripts based on evidence that has been withdrawn are flagged and coloured grey (instead of yellow) on the Transcript’s Supporting Evidence page. Transcripts supported by only a grey protein sequence should be considered less well supported. Below is the example of the transcript ENSGGOT00000034302 from the gorilla which was built using the human protein A6NKB4.2

In addition to sequences from external public databases, Ensembl translations from well-annotated species may also be used as supporting evidence for annotation in other species, particularly primates and species with fragmented assemblies. Withdrawn Ensembl translations are flagged in the same way as described above.

You can also access these data programmatically using the API by looking for the transcript attribute “NoEvidence”.

Comments are closed.