Known bugs in Ensembl
Missing data in ontology database |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: There seems to be some data missing in ensembl_ontology_109 database. | |
Workaround: This will be fixed in upcoming Ensembl release. Meanwhile, please use the archive site to get the related phenotype results. |
Missing datasets in Vertebrates GeneMart and VariationMart |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: Due to an unexpected data processing error during BioMart builds the following data sets are not available in release 109.Sus scrofa transcript variation Canis lupus familiaris variation set Scophthalmus maximus homologs Oncorhynchus mykiss homologs Ornithorhynchus anatinus homologs
Data production systems have been reformulated in order to avoid future issues in BioMart builds for these tables. |
|
Workaround: You may use the archive site to access same datasets. |
Problem loading very large gene trees in Ensembl Fungi |
|
Affects: Live site | Expected fix: Ensembl 112 |
Description: A misconfiguration of the QuickTreeBreak step of the gene-tree pipeline led to the creation of some very large Fungi protein trees, 26 of which have more than 1500 gene members (e.g. gene tree for Aspergillus nidulans gene ANIA_07510 and Multi view of the same tree).This is particularly noticeable for the 13 largest trees, some of which may fail to load (e.g. https://fungi.ensembl.org/Multi/GeneTree/Image?gt=EFGT01080000065285 ).
The configuration of QuickTreeBreak has been fixed, and the issue is currently expected to be resolved in Ensembl release 112. |
|
Workaround: No workaround |
Inaccurate alignment and synteny metadata in Ensembl Fungi 108 |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: Some inaccuracies were identified in the alignment and synteny metadata of Ensembl Fungi in Ensembl release 108, which have had effects on the “Available alignments” page ( https://fungi.ensembl.org/info/genome/compara/compara_analyses.html ).Due to an incorrect ‘reference_species’ tag in the F. solani versus F. verticillioides LastZ alignment, LastZ alignments are incorrectly listed between each of these two species and “Fusarium graminearum str. PH-1”. Because these alignments with F. graminearum do not exist, none of their “example” or “stats” links function as expected.
In addition, all synteny stats are inaccurate, each listing ‘num_blocks’ as 2, though all of the synteny resources have many hundreds of synteny blocks. These issues will be fixed in Ensembl 110 by correcting the erroneous metadata. |
|
Workaround: No workaround |
Region comparison view not functioning for two syntenies in Ensembl Fungi |
|
Affects: Live site, Ensembl 105, 106, 107, 108 | Expected fix: Ensembl 110 |
Description: For two fungal syntenies — F. oxysporum versus F. verticillioides, and F. oxysporum versus F. solani — the corresponding LastZ alignments have been retired, so their “Region comparison” views are not working. An example of this can be seen by clicking the “Region comparison” link on the following page: https://fungi.ensembl.org/Fusarium_oxysporum/Location/Synteny?db=core&r=8%3A1216748-1218410&otherspecies=Fusarium_solaniThis issue will be resolved in Ensembl 110 when the affected syntenies are retired, or regenerated using new LastZ alignments. | |
Workaround: The Region comparison view of the affected syntenies can be accessed in the December 2020 release of Ensembl Fungi. Example link: https://nov2020-fungi.ensembl.org/Fusarium_oxysporum/Location/Synteny?db=core&r=8%3A1216748-1218410&otherspecies=Fusarium_solani |
Marked data reduction in Pig breeds protein trees |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: During the reindexing update of Pig breeds protein trees in Ensembl 109, changes in the annotation of the Horse outgroup species brought about the removal of a significant number of Horse protein members, which in turn resulted in an unexpectedly large reduction in the number of gene trees and homologies.As a result, the number of Pig breeds protein trees has been reduced by more than half, from 10,561 in Ensembl 108 to 4,694 in Ensembl 109.
The number of Pig breeds homologies has been reduced by 24% overall relative to Ensembl 108, with the number of orthologies particularly impacted, having been reduced by 66%. This will be fixed in Ensembl 110 when the Pig-breeds protein trees will be regenerated afresh. |
|
Workaround: As the Pig breeds themselves are unchanged from Ensembl 108 to 109, we recommend to use the Pig-breeds gene tree and homology data in Ensembl 108. |
Gene family lacking CAFE analysis |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: The gene tree of the olfactory receptors, family 5 (ENSGT01090000260058) won’t have an associated CAFE analysis in Ensembl 109 due to runtime issues that couldn’t be addressed within the time available during production. | |
Workaround: No workaround |
Member stable ID clashes between Arabidopsis halleri and Lingula anatina in Compara REST API |
|
Affects: Live site, Ensembl 100, 101, 102, 103, 104, 105, 106, 107, 108 | Expected fix: Ensembl 110 |
Description: The following Compara REST API endpoints take one or more of a gene, transcript, or translation stable identifier as a parameter:
When using these endpoints, if two or more different gene/sequence members share the same stable ID, the data returned is for a member arbitrarily selected from the set with the given stable ID. In Ensembl 109 there are 22,243 such clashing gene stable IDs (e.g. ‘g10000’), each representing two genes: one in Plants species Arabidopsis halleri and one in Metazoa species Lingula anatina. Altogether this represents 69% of gene members in Arabidopsis halleri genes and 65% in Lingula anatina. The data retrieved appears to be constant for a given species in each release, so that when querying with one of affected stable IDs, REST responses return data for the Lingula anatina gene in Ensembl releases 105, 107 and 109, and for Arabidopsis halleri in all other releases tested. Tests confirmed the issue on the REST API for every release from Ensembl 98, though it’s possible this issue arose after the introduction of Arabidopsis halleri in Ensembl 95 (Ensembl Genomes 42). We are planning to replace these endpoints in Ensembl 110, and the new endpoints are expected to resolve this issue. |
|
Workaround: Specify an appropriate ‘compara’ parameter (i.e. ‘metazoa’ or ‘plants’) for affected REST endpoints, and set a ‘species’ parameter for any REST endpoint supporting it. |
Metazoa homologies lack Whole Genome Alignment coverage scores |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: Due to a pipeline configuration issue, Metazoa homologies lack Whole Genome Alignment (WGA) coverage scores. As a consequence of the absence of these scores, approximately 600,000 orthologies have been classified as not being high-confidence, which would otherwise have been classified as high-confidence had the WGA coverage scores been calculated. This represents an approximately 32% drop in the number of high-confidence orthology annotations relative to the previous release for affected species pairs.Because WGA coverage is only calculated for pairs of species with a whole-genome alignment configured for use in the protein-trees pipeline, the list of available genomic alignments in Metazoa can be consulted to check which pairs of species may have been affected by this issue.
If there is a LastZ alignment available for a given pair of species, then WGA coverage scores of orthologies between those two species will have been affected. The set of high-confidence orthologies has been impacted for all pairs of species with a LastZ alignment except the following:
This issue does not affect orthologies in any pair of species that does not have a LastZ alignment. With the pipeline configuration issue addressed, this issue is expected to be fixed for species having LastZ alignments in Ensembl 110. |
|
Workaround: For a more comprehensive set of high-confidence orthology annotations between affected pairs of species, please use the most recent available Ensembl Metazoa archive site. |
Mismatching or missing comparative genomics data for Panamanian white-faced capuchin |
|
Affects: Live site, Ensembl 108 | Expected fix: Ensembl 110 |
Description: With its species name being updated from Cebus capucinus to Cebus imitator in Ensembl 108, the Panamanian white-faced capuchin (or Capuchin) was inadvertently included under its old name in comparative genomics processing for Ensembl 108, then omitted entirely from comparative genomics processing in Ensembl 109.As a result, Ensembl 108 has missing elements of Capuchin comparative genomics data, and Ensembl 109 lacks comparative genomics data for this species.
In Ensembl 108, this issue may manifest in ways such as the following:
This issue will be fixed in Ensembl 110 with the inclusion of the Panamanian white-faced capuchin in comparative genomics processing under its updated species name, Cebus imitator. |
|
Workaround: The most recent Capuchin comparative genomics data unaffected by issues associated with this species-name update can be accessed from the Ensembl 107 archive site. |
Inconsistent stable ID versions of 3 Metazoa species in Pan-taxonomic Compara |
|
Affects: Live site, Ensembl 108 | Expected fix: Ensembl 110 |
Description: During preparation of Ensembl 108, to allow for more consistent access of Metazoa genes and sequences via the URL “www.ensemblgenomes.org/id/”, stable ID versions were effectively removed from those genes/sequences in the core databases of approximately 50 Metazoa species. Corresponding changes were applied to the gene and sequence members of 38 Metazoa Compara genomes, so the versions of these stable IDs are consistent between Metazoa Compara and their corresponding core databases in Ensembl 108 and 109.However, these stable ID version changes were not applied to 3 of the Metazoa species which are also in Pan-taxonomic Compara: Anopheles gambiae, Aedes aegypti (LVP_AGWG), and Pediculus humanus. As a result, some inconsistencies may be observed between Metazoa and Pan Compara for these species in Ensembl 108 and 109.
For example, when printing the sequence members of a protein-tree to a FASTA file using MemberSet::print_sequences_to_file with id_type set to VERSION, a protein from one of these 3 species may be output with a versioned stable ID when accessing its Pan-taxonomic tree and with an unversioned stable ID when accessing its Metazoa tree. In addition, attempting to access a Compara REST endpoint with a versioned member stable ID may succeed when accessing Pan-taxonomic Compara (e.g. https://rest.ensembl.org/homology/id/AGAP012196.1?content-type=application/json;compara=pan_homology ) but fail when accessing Metazoa Compara (e.g. https://rest.ensembl.org/homology/id/AGAP012196.1?content-type=application/json;compara=metazoa ). This inconsistency will be resolved in Ensembl 110 with the removal of the stable ID versions from the 3 affected species in Pan Compara. |
|
Workaround: When dealing with the 3 affected Metazoa species in Pan-taxonomic Compara 108 and 109, use unversioned stable IDs where possible, and in particular avoid setting id_type to VERSION when calling the Compara Perl API method MemberSet::print_sequences_to_file. |
Gene synonym name issue |
|
Affects: Live site, Ensembl 106, 107, 108 | Expected fix: Ensembl 110 |
Description: Example (Human 106): KT2 has the synonym ‘PKBβ’, however, it’s displayed as PKBβ on the website and the API:
The problem is caused by the character encoding in the core DB – `latin1` – which cannot deal with greek characters. To fix it, we have to change the character encoding (and related collation) in the core DBs. To keep the impact at the minimum, we might consider to apply changes to char encoding/collation to the `external_synonym` table only. |
|
Workaround: Manually patching data in the DBs
|
Missing MT ‘sequence_location’ attrib |
|
Affects: Live site, Ensembl 105, 106, 107, 108 | Expected fix: Ensembl 110 |
Description: Mitochondrial DNA sequences from 12 species were mistakenly processed as nuclear DNA sequences. This affected pairwise and multiple genome alignments, and consequently syntenies for vertebrate and metazoan genomes involved. The species affected are:
Anolis carolinensis (Green anole) Anopheles coluzzii (Ngousso) Ficedula albicollis (Collared flycatcher) Macaca fascicularis (Crab-eating macaque) Mastacembelus armatus (Zig-zag eel) Meleagris gallopavo Nannizzia gypsea CBS 118893 Papio anubis (Olive baboon) Pongo abelii (Sumatran orangutan) Rattus norvegicus (Rat) Scophthalmus maximus (Turbot) Ustilago bromivora str. UB2112 |
|
Workaround: No workaround |
Inconsistency in transcripts numbering in GFF3 and GTF exported files |
|
Affects: Live site, Ensembl 102, 103, 104, 105, 106, 107, 108 | Expected fix: Ensembl 110 |
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species.
The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice) https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199 This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes. We plan to fix this from 106 onwards. |
|
Workaround: No workaround. Except using the most up to date datasets. |
Vertebrates species and gene trees lack divergence times |
|
Affects: Live site, Ensembl 108 | Expected fix: Ensembl 110 |
Description: There was a failure to load TimeTree divergence times in Ensembl 108 and 109, most likely due to a parse failure on timetree.org webpages with recently updated layouts.As a result, the number of taxon divergence times displayed in species and gene trees is reduced by almost 90% in Ensembl 109, and by 100% in Ensembl 108. Missing divergence times may appear as an empty field, as the string ‘~0 MYA’, or as the default string ‘NO_WORK’.
We plan to fix this from Ensembl 110 onwards. |
|
Workaround: Divergence times are still displayed in the Ensembl 107 archive site: http://jul2022.archive.ensembl.org/index.html |
Human HGVS lookup not available in 109 |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: For human (GRCh38), searching for a variant by HGVS ID is not enabled on the website for release 109. | |
Workaround: This option will be re-enabled in release 110. |
Broken Cactus HAL alignment on the web for Brugia malayi |
|
Affects: Live site,Ensembl 105, 106, 107, 108 | Expected fix: Ensembl 110 |
Description: For Brugia malayi, the Cactus HAL alignment in the “Example region” shows an error. The issue does not affect the whole Brugia malayi genome. There are regions in Brugia malayi for which the display works. Moreover, pairwise alignment for some species (for example, against C. brenneri) displays alignment. The underlying bug in the HAL multiple alignment code has been identified and fixed for Ensembl release 110. |
|
Workaround: This issue will be fixed in Ensembl 110. |
Assembly name does not match its source data for some Metazoan species |
|
Affects: Live site | Expected fix: Ensembl 110 |
Description: We detected a bug in our code that omitted the existing assembly name from some genomes we imported and failed back to use the GCA accession instead. This has been resolved for e110 for all the species affected and the bug has been fixed in our codebase. |
|
Workaround: This issue will be fixed in Ensembl 110. |