Global biodiversity projects, including the Darwin Tree of Life project, are producing a large number of high-quality genome assemblies. At Ensembl our goal is to generate high quality, evidence-based gene annotations for all high-quality assemblies, using the latest software components to ensure a continued high standard of genome annotation, even when the available data are limited.
The Ensembl annotation process relies heavily on transcriptomic data to produce gene annotations, with homology-based annotations via protein alignments used to fill in gaps where the transcriptomic data are absent or fragmented.
While many species have (or will soon have) transcriptomic data available, there is a large and rapidly growing number of high-quality genome assemblies that do not have suitable transcriptomic data. Moreover, we may never have transcriptomic data to annotate critically endangered or recently extinct species. Homology-based annotation also becomes difficult when species are distant from existing reference proteomes.
We recognise that in these cases there is a desire from the community to have a draft annotation, even in the absence of suitable evidence. To address this need, we have started to run BRAKER2, to generate hint-guided ab initio gene predictions of protein-coding genes, using clade-specific proteins from UniProt and OrthoDB to run in the default protein mode.
BRAKER2 is a popular choice for generating annotations, particularly in the non-vertebrate community, where many of these new genomes are coming from. We hope that by running this in a standardised way we can help provide researchers with a draft annotation, even in the absence of transcriptomic data.
As transcriptomic data becomes available for these species, we will then update the annotation using the Ensembl Annotation Pipeline. We will still keep the BRAKER2 annotation available as a secondary annotation track and via our FTP site.
This new version of Rapid Release contains species annotated via BRAKER2 and also the addition of supplementary BRAKER2 annotation tracks to species already annotated by the Ensembl Annotation Pipeline. BRAKER2 annotations will be labelled on both the species landing page and the species list page as having “BRAKER2” annotation provider and method.
More detailed documentation for the generation of the BRAKER2 annotation track is available on the Ensembl Rapid Release website.