Ensembl 109

Known bugs in Ensembl

Missing data in ontology database

Affects: Live site Expected fix: Ensembl 111
Description: There seems to be some data missing in ensembl_ontology_109 database.
Workaround: This will be fixed in upcoming Ensembl release. Meanwhile, please use the archive site to get the related phenotype results.

Missing datasets in Vertebrates GeneMart and VariationMart

Affects: Live site Expected fix: Ensembl 110
Description: Due to an unexpected data processing error during BioMart builds the following data sets are not available in release 109.Sus scrofa transcript variation
Canis lupus familiaris variation set
Scophthalmus maximus homologs
Oncorhynchus mykiss homologs
Ornithorhynchus anatinus homologs

Data production systems have been reformulated in order to avoid future issues in BioMart builds for these tables.

Workaround: You may use the archive site to access same datasets.

Problem loading very large gene trees in Ensembl Fungi

Affects: Live site Expected fix: Ensembl 112
Description: A misconfiguration of the QuickTreeBreak step of the gene-tree pipeline led to the creation of some very large Fungi protein trees, 26 of which have more than 1500 gene members (e.g. gene tree for Aspergillus nidulans gene ANIA_07510 and Multi view of the same tree).This is particularly noticeable for the 13 largest trees, some of which may fail to load (e.g. https://fungi.ensembl.org/Multi/GeneTree/Image?gt=EFGT01080000065285 ).

The configuration of QuickTreeBreak has been fixed, and the issue is currently expected to be resolved in Ensembl release 112.

Workaround: No workaround

Inaccurate alignment and synteny metadata in Ensembl Fungi 108

Affects: Live site Expected fix: Ensembl 110
Description: Some inaccuracies were identified in the alignment and synteny metadata of Ensembl Fungi in Ensembl release 108, which have had effects on the “Available alignments” page ( https://fungi.ensembl.org/info/genome/compara/compara_analyses.html ).Due to an incorrect ‘reference_species’ tag in the F. solani versus F. verticillioides LastZ alignment, LastZ alignments are incorrectly listed between each of these two species and “Fusarium graminearum str. PH-1”. Because these alignments with F. graminearum do not exist, none of their “example” or “stats” links function as expected.

In addition, all synteny stats are inaccurate, each listing ‘num_blocks’ as 2, though all of the synteny resources have many hundreds of synteny blocks.

These issues will be fixed in Ensembl 110 by correcting the erroneous metadata.

Workaround: No workaround

Region comparison view not functioning for two syntenies in Ensembl Fungi

Affects: Live site, Ensembl 105, 106, 107, 108  Expected fix: Ensembl 110
Description: For two fungal syntenies — F. oxysporum versus F. verticillioides, and F. oxysporum versus F. solani — the corresponding LastZ alignments have been retired, so their “Region comparison” views are not working. An example of this can be seen by clicking the “Region comparison” link on the following page: https://fungi.ensembl.org/Fusarium_oxysporum/Location/Synteny?db=core&r=8%3A1216748-1218410&otherspecies=Fusarium_solaniThis issue will be resolved in Ensembl 110 when the affected syntenies are retired, or regenerated using new LastZ alignments.
Workaround: The Region comparison view of the affected syntenies can be accessed in the December 2020 release of Ensembl Fungi.
Example link: https://nov2020-fungi.ensembl.org/Fusarium_oxysporum/Location/Synteny?db=core&r=8%3A1216748-1218410&otherspecies=Fusarium_solani

Marked data reduction in Pig breeds protein trees

Affects: Live site Expected fix: Ensembl 110
Description: During the reindexing update of Pig breeds protein trees in Ensembl 109, changes in the annotation of the Horse outgroup species brought about the removal of a significant number of Horse protein members, which in turn resulted in an unexpectedly large reduction in the number of gene trees and homologies.As a result, the number of Pig breeds protein trees has been reduced by more than half, from 10,561 in Ensembl 108 to 4,694 in Ensembl 109.

The number of Pig breeds homologies has been reduced by 24% overall relative to Ensembl 108, with the number of orthologies particularly impacted, having been reduced by 66%.

This will be fixed in Ensembl 110 when the Pig-breeds protein trees will be regenerated afresh.

Workaround: As the Pig breeds themselves are unchanged from Ensembl 108 to 109, we recommend to use the Pig-breeds gene tree and homology data in Ensembl 108.

Gene family lacking CAFE analysis

Affects: Live site Expected fix: Ensembl 110
Description: The gene tree of the olfactory receptors, family 5 (ENSGT01090000260058) won’t have an associated CAFE analysis in Ensembl 109 due to runtime issues that couldn’t be addressed within the time available during production.
Workaround: No workaround

Member stable ID clashes between Arabidopsis halleri and Lingula anatina in Compara REST API

Affects: Live site, Ensembl 100, 101, 102, 103, 104, 105, 106, 107, 108 Expected fix: Ensembl 110
Description: The following Compara REST API endpoints take one or more of a gene, transcript, or translation stable identifier as a parameter:
  • GET cafe/genetree/member/id/:id
  • GET genetree/member/id/:id
  • GET homology/id/:id

When using these endpoints, if two or more different gene/sequence members share the same stable ID, the data returned is for a member arbitrarily selected from the set with the given stable ID. In Ensembl 109 there are 22,243 such clashing gene stable IDs (e.g. ‘g10000’), each representing two genes: one in Plants species Arabidopsis halleri and one in Metazoa species Lingula anatina. Altogether this represents 69% of gene members in Arabidopsis halleri genes and 65% in Lingula anatina.

The data retrieved appears to be constant for a given species in each release, so that when querying with one of affected stable IDs, REST responses return data for the Lingula anatina gene in Ensembl releases 105, 107 and 109, and for Arabidopsis halleri in all other releases tested. Tests confirmed the issue on the REST API for every release from Ensembl 98, though it’s possible this issue arose after the introduction of Arabidopsis halleri in Ensembl 95 (Ensembl Genomes 42).

We are planning to replace these endpoints in Ensembl 110, and the new endpoints are expected to resolve this issue.

Workaround: Specify an appropriate ‘compara’ parameter (i.e. ‘metazoa’ or ‘plants’) for affected REST endpoints, and set a ‘species’ parameter for any REST endpoint supporting it.

Metazoa homologies lack Whole Genome Alignment coverage scores

Affects: Live site Expected fix: Ensembl 110
Description: Due to a pipeline configuration issue, Metazoa homologies lack Whole Genome Alignment (WGA) coverage scores. As a consequence of the absence of these scores, approximately 600,000 orthologies have been classified as not being high-confidence, which would otherwise have been classified as high-confidence had the WGA coverage scores been calculated. This represents an approximately 32% drop in the number of high-confidence orthology annotations relative to the previous release for affected species pairs.Because WGA coverage is only calculated for pairs of species with a whole-genome alignment configured for use in the protein-trees pipeline, the list of available genomic alignments in Metazoa can be consulted to check which pairs of species may have been affected by this issue.

If there is a LastZ alignment available for a given pair of species, then WGA coverage scores of orthologies between those two species will have been affected.

The set of high-confidence orthologies has been impacted for all pairs of species with a LastZ alignment except the following:

  • Aedes aegypti (LVP_AGWG) and Anopheles gambiae
  • Anopheles gambiae and Culex quinquefasciatus
  • Apis mellifera (DH4) and Nasonia vitripennis (AsymCx)
  • Atta cephalotes and Nasonia vitripennis (AsymCx)
  • Bombus impatiens and Nasonia vitripennis (AsymCx)
  • Bombus terrestris and Nasonia vitripennis (AsymCx)
  • Nasonia vitripennis (AsymCx) and Solenopsis invicta (M01_SB)

This issue does not affect orthologies in any pair of species that does not have a LastZ alignment.

With the pipeline configuration issue addressed, this issue is expected to be fixed for species having LastZ alignments in Ensembl 110.

Workaround: For a more comprehensive set of high-confidence orthology annotations between affected pairs of species, please use the most recent available Ensembl Metazoa archive site.

Mismatching or missing comparative genomics data for Panamanian white-faced capuchin

Affects: Live site, Ensembl 108 Expected fix: Ensembl 110
Description: With its species name being updated from Cebus capucinus to Cebus imitator in Ensembl 108, the Panamanian white-faced capuchin (or Capuchin) was inadvertently included under its old name in comparative genomics processing for Ensembl 108, then omitted entirely from comparative genomics processing in Ensembl 109.As a result, Ensembl 108 has missing elements of Capuchin comparative genomics data, and Ensembl 109 lacks comparative genomics data for this species.

In Ensembl 108, this issue may manifest in ways such as the following:

  • In the gene-tree view image, Capuchin genes are shown with the correct gene display name, but the species name is shown as “Ancestral sequence”. The ZMenu shows the updated species name “Panamanian white-faced capuchin (Cebus imitator)”, while files downloaded via the “Export” button refer to it by the old species name (i.e. “cebus_capucinus” or its abbreviated form “Ccap”).
  • The orthologue table view of a Capuchin gene shows its orthologues in other species, and the “Download orthologues” button enables download of alignment files including the Capuchin gene itself. However, for genes of other species, Capuchin orthologues are missing from both the orthologue table view and downloadable orthologue alignment files.
  • In gene gain/loss tree views, gene-tree stats and species tree pages, Capuchin may be omitted or have missing elements (e.g. thumbnail image).
  • Genomic alignments of Capuchin are not available via the Ensembl 108 website.

This issue will be fixed in Ensembl 110 with the inclusion of the Panamanian white-faced capuchin in comparative genomics processing under its updated species name, Cebus imitator.

Workaround: The most recent Capuchin comparative genomics data unaffected by issues associated with this species-name update can be accessed from the Ensembl 107 archive site.

Inconsistent stable ID versions of 3 Metazoa species in Pan-taxonomic Compara

Affects: Live site, Ensembl 108 Expected fix: Ensembl 110
Description: During preparation of Ensembl 108, to allow for more consistent access of Metazoa genes and sequences via the URL “www.ensemblgenomes.org/id/”, stable ID versions were effectively removed from those genes/sequences in the core databases of approximately 50 Metazoa species. Corresponding changes were applied to the gene and sequence members of 38 Metazoa Compara genomes, so the versions of these stable IDs are consistent between Metazoa Compara and their corresponding core databases in Ensembl 108 and 109.However, these stable ID version changes were not applied to 3 of the Metazoa species which are also in Pan-taxonomic Compara: Anopheles gambiae, Aedes aegypti (LVP_AGWG), and Pediculus humanus. As a result, some inconsistencies may be observed between Metazoa and Pan Compara for these species in Ensembl 108 and 109.

For example, when printing the sequence members of a protein-tree to a FASTA file using MemberSet::print_sequences_to_file with id_type set to VERSION, a protein from one of these 3 species may be output with a versioned stable ID when accessing its Pan-taxonomic tree and with an unversioned stable ID when accessing its Metazoa tree.

In addition, attempting to access a Compara REST endpoint with a versioned member stable ID may succeed when accessing Pan-taxonomic Compara (e.g. https://rest.ensembl.org/homology/id/AGAP012196.1?content-type=application/json;compara=pan_homology ) but fail when accessing Metazoa Compara (e.g. https://rest.ensembl.org/homology/id/AGAP012196.1?content-type=application/json;compara=metazoa ).

This inconsistency will be resolved in Ensembl 110 with the removal of the stable ID versions from the 3 affected species in Pan Compara.

Workaround: When dealing with the 3 affected Metazoa species in Pan-taxonomic Compara 108 and 109, use unversioned stable IDs where possible, and in particular avoid setting id_type to VERSION when calling the Compara Perl API method MemberSet::print_sequences_to_file.

Gene synonym name issue

Affects: Live site, Ensembl 106, 107108 Expected fix: Ensembl 110
Description: Example (Human 106): KT2 has the synonym ‘PKBβ’, however, it’s displayed as PKBβ on the website and the API:

The problem is caused by the character encoding in the core DB – `latin1` – which cannot deal with greek characters.

To fix it, we have to change the character encoding (and related collation) in the core DBs.

To keep the impact at the minimum, we might consider to apply changes to char encoding/collation to the `external_synonym` table only.
In this case though, char encoding/collation would be inconsistent across the tables in core schema.

Workaround: Manually patching data in the DBs

β –> beta
α –> alpha
γ –> gamma

Please see https://github.com/Ensembl/staging-patches/pull/567 for Ensembl 109

Missing MT ‘sequence_location’ attrib

Affects: Live site, Ensembl 105, 106, 107, 108 Expected fix: Ensembl 110
Description: Mitochondrial DNA sequences from 12 species were mistakenly processed as nuclear DNA sequences. This affected pairwise and multiple genome alignments, and consequently syntenies for vertebrate and metazoan genomes involved. The species affected are:

Anolis carolinensis (Green anole)

Anopheles coluzzii (Ngousso)

Ficedula albicollis (Collared flycatcher)

Macaca fascicularis (Crab-eating macaque)

Mastacembelus armatus (Zig-zag eel)

Meleagris gallopavo

Nannizzia gypsea CBS 118893

Papio anubis (Olive baboon)

Pongo abelii (Sumatran orangutan)

Rattus norvegicus (Rat)

Scophthalmus maximus (Turbot)

Ustilago bromivora str. UB2112

Workaround: No workaround

Inconsistency in transcripts numbering in GFF3 and GTF exported files

Affects: Live site, Ensembl 102, 103, 104, 105, 106, 107, 108 Expected fix: Ensembl 110
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species.

The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice)

https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199
https://github.com/Ensembl/ensembl-io/blob/release/104/modules/Bio/EnsEMBL/Utils/IO/GTFSerializer.pm#L112

This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes.
This currently only happens with genes on patches, where some transcripts can be entirely outside of the patch region due to the fact that we create a fake chromosome including the patch.
In the future, we are planning to store the patches as standalone scaffolds, and those transcripts will be removed entirely, hence not being included in either the GTF or GFF3 dumps

We plan to fix this from 106 onwards.

Workaround: No workaround. Except using the most up to date datasets.

Vertebrates species and gene trees lack divergence times

Affects: Live site, Ensembl 108 Expected fix: Ensembl 110
Description: There was a failure to load TimeTree divergence times in Ensembl 108 and 109, most likely due to a parse failure on timetree.org webpages with recently updated layouts.As a result, the number of taxon divergence times displayed in species and gene trees is reduced by almost 90% in Ensembl 109, and by 100% in Ensembl 108. Missing divergence times may appear as an empty field, as the string ‘~0 MYA’, or as the default string ‘NO_WORK’.

We plan to fix this from Ensembl 110 onwards.

Workaround: Divergence times are still displayed in the Ensembl 107 archive site: http://jul2022.archive.ensembl.org/index.html

Human HGVS lookup not available in 109

Affects: Live site Expected fix: Ensembl 110
Description: For human (GRCh38), searching for a variant by HGVS ID is not enabled on the website for release 109.

Workaround: This option will be re-enabled in release 110.

Broken Cactus HAL alignment on the web for Brugia malayi

Affects: Live site,Ensembl 105, 106, 107, 108 Expected fix: Ensembl 110

Description: For Brugia malayi, the Cactus HAL alignment in the “Example region” shows an error.

The issue does not affect the whole Brugia malayi genome. There are regions in Brugia malayi for which the display works. Moreover, pairwise alignment for some species (for example, against C. brenneri) displays alignment.

The underlying bug in the HAL multiple alignment code has been identified and fixed for Ensembl release 110. 

Workaround: This issue will be fixed in Ensembl 110. 

Assembly name does not match its source data for some Metazoan species

Affects: Live site Expected fix: Ensembl 110

Description: We detected a bug in our code that omitted the existing assembly name from some genomes we imported and failed back to use the GCA accession instead.

This has been resolved for e110 for all the species affected and the bug has been fixed in our codebase.

Workaround: This issue will be fixed in Ensembl 110.