A tweak to Ensembl transcript IDs

Ensembl transcripts have two identifiers, the versioned ENST, which is stable through time and can be tracked from release to release, and a separate identifier that incorporates a gene symbol. The latter have changed in e!89; read on for more details.

Each transcript annotated by Ensembl has, in addition to its formal ENST ID, a secondary identifier comprising a gene symbol, like BRCA2, followed by a numerical code. Until now, for five organisms—human, mouse, rat, pig and zebrafish—this numerical code could begin with a 0 or a 2; however, in e!89 you may have noticed that these identifiers have changed such that they all begin with 2.

Ensembl collaborates with the HAVANA group, which manually annotates vertebrate genes from the species listed above. In previous releases of the Ensembl browser, secondary IDs beginning with 0—BRCA2-001, for example—were a shorthand to indicate that HAVANA had either annotated or reviewed the transcript model. Those beginning with a 2 indicated the transcript model had been annotated by the automated Ensembl genebuild methods. Both methods use high-quality primary evidence as the basis for their predictions.

Alongside the discontinuation of the Vega resource updates, we are phasing out this shorthand with an eye toward developing a new type of secondary identifier and transcript quality metric. This new metric will evaluate the strength of the biological evidence underlying a model. Expect to see these in a future release, but for now, please bear with us as our secondary identifiers change.

Comments are closed.