Ensembl 107

Variation BioMart missing some datasets for Chicken and Mouse

Affects: Live site Expected Versions: Ensembl 107
Description: Due to unforeseen circumstances variant consequence predictions are not included in BioMart for this release for the following species:
* Chicken (Gallus gallus)
* Mouse (Mus musculus)
Workaround: Use the archived version from previous release.

Malfunctioning search for Hyaloperonospora arabidopsidis

Affects: Live site Expected Versions: Ensembl 107
Description: Users may find that doing a search for Hyaloperonospora arabidopsidis on our sites returns it as a fungi. However, please note that this species has now being placed back in Ensembl Protists. The link to this species in Ensembl Protists is: [http://protists.ensembl.org/Hyaloperonospora_arabidopsidis]
Workaround: The link to this species in Ensembl Protists is: [http://protists.ensembl.org/Hyaloperonospora_arabidopsidis]

Broken links in fungal compara

Affects: Live site Expected Versions: Ensembl 107
Description: We are in the process of removing Hyaloperonospora arabidopsidis out of Ensembl Fungi and back into Ensembl Protists. This may mean that there will be some dead links on the fungal site while this is completely resolved. Thank you for your patience. All will be resolved by release 108.
Workaround: No workaround

Gene synonym name issue

Affects: Live site, Mirrors Expected Versions: Ensembl 106, Ensembl 107, Ensembl 108
Description: *Example (human 106)*AKT2 has the synonym ‘PKBβ’
However, it’s displayed as PKBβ on the website and the API:

The problem is caused by the character encoding in the core DB – `latin1` – which cannot deal with greek characters.

To keep the impact at the minimum, we might consider to apply changes to char encoding/collation to the `external_synonym` table only.
In this case though, char encoding/collation would be inconsistent across the tables in core schema.

Workaround: None

Pig breed list labeled incorrectly

Affects: Live site, Mirrors Expected Versions: Ensembl 107
Description: The list of Pig breeds has been labeled as “Pig – Duroc breeds”, which is inaccurate.
Workaround: This will be corrected in Ensembl release 108.

Incorrect Species name for Cebus capucinus

Affects: Live site, Mirrors, Mobile, Archives Expected Versions: Ensembl 107
Description: NCBI Taxonomy information for Cebus capucinus has been updated to Cebus imitator The Ensembl representation still exists as Cebus capucinus [https://www.ensembl.org/Cebus_capucinus/Info/Index]
Workaround: This will be updated in Ensembl release 108

Corrupted conservation scores in Compara MySQL dump file

Affects: Live site Expected Versions: Ensembl 106, Ensembl 107
Description: During the Ensembl Compara 106 production process, data corruption occurred in the MySQL BLOB columns used to store the ‘expected_score’ and ‘diff_score’ in the Compara ‘conservation_score’ table.This corruption was identified during the course of Compara MySQL data dumps, and the Compara release database ‘conservation_score’ table was restored from a backup containing the correct data.

However, the fix was not applied to the conservation-score MySQL dump file at the following location:

[https://ftp.ensembl.org/pub/release-106/mysql/ensembl_compara_106/conservation_score.txt.gz]

As a result, while a query against ‘ensembl_compara_106’ on the Ensembl public MySQL server will return correct conservation-score data, the same query applied to a locally installed ‘ensembl_compara_106’ database may encounter an error or return invalid conservation-score data.

Because the issue was fixed in the ‘ensembl_compara_106’ release database, it will not affect Ensembl Compara 107 or later.

Workaround: The Ensembl 105 ‘conservation_score’ MySQL dump file  [http://ftp.ensembl.org/pub/release-105/mysql/ensembl_compara_105/conservation_score.txt.gz] can be used as a drop-in replacement for the 106 version for the species sets ‘amniotes’, ‘mammals’, ‘pig_breeds’ and ‘sauropsids’, because there were no changes in the conservation-score data for these species sets between Ensembl 105 and 106.Unfortunately, there is no equivalent workaround for the ‘fish’ species set, because its alignment and associated conservation scores were updated in Ensembl 106. It is safe to have the Ensembl 105 fish conservation scores in a Compara 106 database, as their database IDs don’t overlap with those of the Ensembl 106 fish alignments. But they will be taking up space in the ‘conservation_score’ table. If this is a problem, the 105 fish conservation scores can be removed by deleting rows from the ‘conservation_score’ table where they have a ‘genomic_align_block_id’ between 19720000000001 and 19729999999999 (inclusive).

PolyPhen-2 and SIFT scores not available for new human transcripts

Affects: Live site Expected Versions: Ensembl 107
Description: Translations of human transcripts which are new to Ensembl release 107 will not have SIFT or PolyPhen-2 predictions available in Variation or Transcript views in the browser for this release. These data will also be missing from the Ensembl VEP cache.Predictions will be available in our 108 release.
Workaround: A separate database of predictions is available for Ensembl VEP command line use. This can be added to VEP analyses using the PolyPhen_SIFT VEP plugin [https://github.com/Ensembl/VEP_plugins/blob/postreleasefix/107/PolyPhen_SIFT.pm]

Update the db_version of refseq_import

Affects: Live site Expected Versions: Ensembl 107
Description: Users cannot know which version of the RefSeq annotation is loaded in the otherfeatures database.
Workaround: In the database we store the timestamp of the GFF3 loaded, users can use this information with the RefSeq website to know when the annotation was done and if the file has been created after the current annotation

The lack of documentation on flagging high-confidence orthologies

Affects: Live site Expected Versions: Ensembl 105, Ensembl 106, Ensembl 107
Description: Our orthology predictions are flagged as being high-confidence or not. Public documentation on the classification criteria is available at [https://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html], section “High-confidence orthologies” but it is incomplete. Users interested in the inference procedure have to contact Outreach team for more information or delve into [https://github.com/Ensembl/ensembl-compara/] to find it out on their own.
Workaround: Outreach team answers to users requests.

Stable ID mapping missing for new bird references

Affects: Live site Expected Versions: Ensembl 107
Description: The stable ID for the following species have not been mapped from their previous versions:
* Small Tree Finch (Camarhynchus_parvulus_V1.1)
* New caledonian crow (bCorMon1.pri)
* Budgerigar (bMelUnd1.mat.Z)
Workaround: The stable ID mapping will be available in release e108 and on rapid.ensembl.org from May 2022.

Missing RNASeq data for GRCg7w and GRCg7b

Affects: Live site Expected Versions: Ensembl 107
Description: The RNASeq tracks and alignment files are not available for the new Chicken annotations (GRCg7w and GRCg7b).
Workaround: The tracks and data will be available in Ensembl release 108 and on rapid.ensembl.org from May 2022.

Data dumps for B. tabaci Uganda-1 directs to 404 error

Affects: Live site, Mirrors Expected Versions: Ensembl 106, Ensembl 107
Description: Download Fasta/Genes/GFF3 directs to 404. The reason this happens linked to naming change causing an issue for this species. Original name changed from sweetpotug to uganda-1.
Workaround: Navigate directly to the Ensembl Genomes FTP site: [http://ftp.ensemblgenomes.org/pub/metazoa/release-53/]

Alignment of External Feature – unable to retrieve sequence from Uniprot/SwissProt

Affects: Live site, Mirrors Expected Versions: Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105, Ensembl 106, Ensembl 107
Description: Alignment of External Feature for ENST00000216181.11 MYH9-201 page is unable to retrieve the sequence from UniProt/SWISSProt.This is the link to the page [https://dec2021.archive.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=uniprot/swissprot;g=ENSG00000100345;r=22:36281280-36388010;sequence=P35579.240;t=ENST00000216181])

The URL incorrectly contains the UniProt entry and its version (P35579.240), and this leads to the problem.

Workaround: Manually remove the version number from the UniProt entry in the URL.E.g [https://dec2021.archive.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=uniprot/swissprot;g=ENSG00000100345;r=22:36281280-36388010;sequence=P35579.240;t=ENST00000216181]becomes

[https://dec2021.archive.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=uniprot/swissprot;g=ENSG00000100345;r=22:36281280-36388010;sequence=P35579;t=ENST00000216181]

Gene family missing CAFE analysis

Affects: Live site Expected Versions: Ensembl 107
Description: The gene tree [ENSGT00940000154136|https://www.ensembl.org/Multi/GeneTree/Image?gt=ENSGT00940000154136] with almost 900 members won’t have a CAFE analysis this release due to runtime issues that couldn’t be addressed on time during production.
Workaround: No workaround

No gene trees and no homologies for a supertree

Affects: Live site Expected Versions: Ensembl 107
Description: Due to the long runtime, we had to cancel processing a gene family with 54,865 members classified together by PANTHER HMM PTHR24015 (More info on the family: [http://www.pantherdb.org/panther/family.do?clsAccession=PTHR24015])The genes come from the following 108 species: actinidia chinensis, aegilops tauschii, amborella trichopoda, ananas comosus, arabidopsis halleri, arabidopsis lyrata, arabidopsis thaliana, arabis alpina, asparagus officinalis, beta vulgaris, brachypodium distachyon, brassica napus, brassica oleracea, brassica rapa ro18, caenorhabditis elegans, camelina sativa, cannabis sativa female, capsicum annuum, chara braunii, chenopodium quinoa, chlamydomonas reinhardtii, chondrus crispus, ciona savignyi, citrullus lanatus, citrus clementina, coffea canephora, corchorus capsularis, corylus avellana, corymbia citriodora, cucumis melo, cucumis sativus, cyanidioschyzon merolae, cynara cardunculus, daucus carota, digitaria exilis, dioscorea rotundata, drosophila melanogaster, echinochloa crusgalli, eragrostis curvula, eragrostis tef, eucalyptus grandis, eutrema salsugineum, ficus carica, galdieria sulphuraria, glycine max, gossypium raimondii, helianthus annuus, homo sapiens, hordeum vulgare, ipomoea triloba, juglans regia, kalanchoe fedtschenkoi, lactuca sativa, leersia perrieri, lupinus angustifolius, malus domestica golden, manihot esculenta, marchantia polymorpha, medicago truncatula, musa acuminata, nicotiana attenuata, nymphaea colorata, olea europaea, oryza barthii, oryza brachyantha, oryza glaberrima, oryza glumipatula, oryza indica, oryza longistaminata, oryza meridionalis, oryza nivara, oryza punctata, oryza rufipogon, oryza sativa, ostreococcus lucimarinus, panicum hallii, papaver somniferum, phaseolus vulgaris, physcomitrium patens, pistacia vera, pisum sativum, populus trichocarpa, prunus avium, prunus dulcis, prunus persica, quercus lobata, rosa chinensis, saccharomyces cerevisiae, saccharum spontaneum, secale cereale, selaginella moellendorffii, sesamum indicum, setaria italica, setaria viridis, solanum lycopersicum, solanum tuberosum, sorghum bicolor, theobroma cacao, trifolium pratense, triticum aestivum, triticum dicoccoides, triticum turgidum, triticum urartu, vigna angularis, vigna radiata, vigna unguiculata, vitis vinifera, zea mays

There will be no gene trees for the 54,865 sequences involved nor any homology information.

Workaround: No workaround

Missing C. elegans homologies information from BioMart

Affects: Live site Expected Versions: Ensembl 107
Description: All homologues are missing for C.elegans in Vertebrates BioMart, the result is that you can not filter any more on this criteria:
Workaround: Use previous archives or metazoan BioMart where data are present.

Duplicate species in protists and fungi

Affects: Live site Expected Versions: Ensembl 105, Ensembl 106, Ensembl 107
Description: Hyaloperonospora arabidopsidis (version 2) was accidentally loaded into Fungi release 105 and incorporated into fungal compara. While some communities think of this as a fungus (e.g. FungiDB where we got the species from), in Ensembl it has traditionally been a protist.So we already have an older version in protists (version 1). To keep with how we have thought about this in Ensembl, it may be sensible to only have this species in protists.Data dumps, compara, search indices etc has made this really complex. Temporarily, the species has gone off Ensembl Protists. This will be reinstated in release 107. Until then, please use our archive sites for this data. Meanwhile, there is a newer version in Ensembl Fungi. We will remove this in release 108. After that point, this species will go back to existing only in Ensembl Fungi.
Workaround: No workaround

Missing alias in meta table for Rat

Affects: Live site, Staging Expected Versions: Ensembl 105, Ensembl 106, Ensembl 107
Description: The Rat DB for 105 is missing aliases for species name in the ‘meta’ table, compared to Release 104.Specifically, the record with meta.meta_key = ‘species.alias’ are the missing ones.

As a result, the Perl API (Registry) can query the DB using the “species.production_name” only, which is “Rattus_norvegicus”.
Also, this must be an exact string match.

Although not blocking an issue, it was customary to use aliases – e.g. “Rat”, “Rattus norvegicus” – to query the DB.

Workaround: To identify the species, the string specified in meta table, having meta_key =  “species.production_name”, must be used.In the rat case, the string is “rattus_norvegicus” (without quotes).

Broken Cactus HAL alignment on the web for Brugia malayi

Affects: Live site, Archives Expected Versions: Ensembl 107
Description: For {_}Brugia malayi{_}, the Cactus HAL alignment in the *”Example region”* throws the following error:
{code:java}AJAX error – Runtime Error in component “EnsEMBL::Web::Component::Location::Compara_AlignSliceBottom [content]”The issue does not affect the whole _Brugia malayi_ genome. There are regions in _Brugia malayi_ for which the display works.
Workaround: No workaround

Missing ancestral allele information

Affects: Live site Expected Versions: Ensembl 107
Description: For human, variants imported from COSMIC and ClinVar are missing ancestral allele information, this also affects BioMart.Ensembl VEP is unaffected.
Workaround: Older versions of these data are available on the Ensembl 106 archive site.