In the current Ensembl release (Ensembl 115), the GRCh38 human Ensembl/GENCODE reference annotation was updated to include approximately 121,000 new protein-coding transcripts. This expanded set is based on long-read RNA-seq data processed by the manually supervised automated pipeline TAGENE, which is presented as the source in both browser and files.
The latest partial release for GRCh38 on the new Ensembl (partial release 2026-01-26) includes this new geneset. Some genes and genomic features in this set have several hundred transcripts, such as ZBTB20 (ENSG00000181722) which has 360 transcripts.
Continue reading