Ensembl 112

Vertebrates Core-like DB Datachecks failures

Affects: Live Site Versions: Ensembl 112, Ensembl 113
Description: core-like  data checks failed for otherfeatures and ranaseq dcThis will be fixed for Release 113.
Workaround: No work around. Please use the most up to date datasets.

Drosophila rhopaloa has the incorrect GCA accession

Affects: Rapid Release Versions: Ensembl 112
Description: Due to a mistake on our loading system Drosophila rhopaloa core db got assigned GCA_000236305.2 in its database name, and assembly.accession and species.production_name meta keys, when the actual assembly and annotation loaded into the core database was GCF_018152115.1.Note: it also affects Rapid Release, drosophila_rhopaloa_gca000236305v2rs_compara_110 and drosophila_rhopaloa_gca000236305v2rs_core_110 should be drosophila_rhopaloa_gca018152115v1rs_*.
Workaround: No work around. Please use the most up to date datasets.

Drop in paralogues from 110 to 111 for 2 species

Affects: Live Site Versions: Ensembl 111, Ensembl 112
Description: There is a significant drop in the number of paralogues for these 2 species: Cebus imitator and Heterocephalus glaber female. We hope to fix this in Ensembl 113.
Workaround: Both species are in 110. We recommend that the users interested in ncRNA for theses 2 species to look it at 110 archive.

Missing somatic variation from ClinVar

Affects: Live Site Versions: Ensembl 111, Ensembl 112
Description: We have missing somatic variation from ClinVar in BioMart ([https://www.ensembl.org/info/data/biomart]).
Workaround: The somatic variation from COSMIC is available in BioMart.

Missing MAFs

Affects: Live Site Versions: Ensembl 111, Ensembl 112, Ensembl 113
Description: During the update to dbSNP156 in Ensembl 111 MAF data was not imported directly to the database.The missing data will be in for Ensembl 113.
Workaround: No work around. Please use the most up to date datasets

VEP Missing data for Bacteria

Affects: Live Site Versions: Ensembl 110, Ensembl 111, Ensembl 112
Description: With our fresh load of newly annotated Bacteria species, we were unable to compute the VEP related dataset for a subset of species:* actinomyces_naeslundii_gca_002860635

* actinomyces_urogenitalis_gca_002861525

* actinotignum_timonense_gca_002860725

* aerococcus_christensenii_gca_002861505

* bifidobacterium_longum_gca_002861445

* brevibacterium_ravenspurgense_gca_002861415

* corynebacterium_amycolatum_gca_002861405

* corynebacterium_aurimucosum_gca_002861385

* corynebacterium_coyleae_gca_002861345

* corynebacterium_coyleae_gca_002861365

* corynebacterium_riegelii_gca_002861325

* corynebacterium_tuscaniense_gca_002884935

* fusobacterium_nucleatum_gca_002884895

* gardnerella_vaginalis_gca_0028611

* gardnerella_vaginalis_gca_002861145

* gardnerella_vaginalis_gca_002861885

* gardnerella_vaginalis_gca_002861905

* gardnerella_vaginalis_gca_002861925

* gardnerella_vaginalis_gca_002862005

* gardnerella_vaginalis_gca_002884775

* kocuria_rhizophila_gca_002861865

* lactobacillus_crispatus_gca_002861805

* lactobacillus_crispatus_gca_002861815

* limosilactobacillus_pontis_gca_002940945

* micrococcus_luteus_gca_002863375

* micrococcus_luteus_gca_002884675

* moraxella_osloensis_gca_002863315

* neisseria_perflava_gca_002863305

* neisseria_sicca_gca_002863285

* oligella_urethralis_gca_002884655

* prevotella_buccalis_gca_002884635

* rothia_mucilaginosa_gca_002861015

* staphylococcus_pettenkoferi_gca_002884615

* staphylococcus_sp_umb0328_gca_002940975

* streptococcus_macedonicus_gca_002860805

* streptococcus_mitis_gca_002860825

* streptococcus_mitis_gca_002860865

* streptococcus_oralis_subsp_dentisani_gca_002860885

* streptococcus_oralis_subsp_dentisani_gca_002860905

* streptococcus_parasanguinis_gca_002860845

* streptococcus_salivarius_gca_002860765

* streptococcus_salivarius_gca_002860785

* winkia_neuii_gca_002860625

Therefore those annotations won’t be available for VEP computation this release.

Workaround: Annotation being new, the previous datasets are not entirely compatible, but may still be used.

Gene Ontology Annotation drop for Plants

Affects: Live Site Versions: Ensembl 107, Ensembl 108, Ensembl 109, Ensembl 110, Ensembl 111, Ensembl 112
Description: We have observed a big drop in number of Xref imported with Gene Ontology annotation set for plants:GO Direct Xref import with a drop by 80% or more for Interpro, Uniprot/SWISSPROT, Uniprot/SPTREMBL:

* arabidopsis_thaliana

* arabis_alpina

* eutrema_salsugineum

* ostreococcus_lucimarinus

* pisum_sativum

* prunus_avium

Project GO Xrefs import with a drop by 66% or more for RHEA – Uniprot – Interpro – GOC

* actinidia_chinensis

* aegilops_tauschii

* amborella_trichopoda

* ananas_comosus

* arabidopsis_lyrata

* arabis_alpina

* asparagus_officinalis

* brachypodium_distachyon

* brassica_napus

* brassica_oleracea

* capsicum_annuum

* chara_braunii

* chlamydomonas_reinhardtii

* chondrus_crispus

* citrus_clementina

* coffea_canephora

* corchorus_capsularis

* cucumis_melo

* cucumis_sativus

* cynara_cardunculus

* daucus_carota

* eucalyptus_grandis

* eutrema_salsugineum

* glycine_max

* gossypium_raimondii

* helianthus_annuus

* hordeum_vulgare

* hordeum_vulgare_goldenpromise

* hordeum_vulgare_tritex

* juglans_regia

* lupinus_angustifolius

* malus_domestica_golden

* manihot_esculenta

* marchantia_polymorpha

* medicago_truncatula

* nicotiana_attenuata

* nymphaea_colorata

* oryza_indica

* oryza_sativa

* ostreococcus_lucimarinus

* panicum_hallii

* panicum_hallii_fil2

* papaver_somniferum

* phaseolus_vulgaris

* physcomitrium_patens

* pisum_sativum

* populus_trichocarpa

* prunus_avium

* prunus_dulcis

* prunus_persica

* selaginella_moellendorffii

* sesamum_indicum

* setaria_italica

* setaria_viridis

* solanum_tuberosum_rh8903916

* sorghum_bicolor

* theobroma_cacao

* triticum_urartu

* vigna_angularis

* vitis_vinifera

* zea_mays

Workaround: No work around. Except using most up to date datasets.

Inaccurate gene member homology stats in non-default gene trees

Affects: Live site Versions: Ensembl 106, Ensembl 107, Ensembl 108, Ensembl 109, Ensembl 110, Ensembl 111, Ensembl 112
Description: In non-default gene-tree pipelines such as Murinae (in Ensembl Vertebrates) or Protostomes (in Ensembl Metazoa), homology data is removed if it would clash with the corresponding default collection (e.g. Mouse-Rat orthologies are removed from Murinae homology data, while Loa loa paralogies are removed from Protostomes homologies).The gene-tree pipeline for non-default collections was restructured in Ensembl 106 to ensure that such clashing homology data would be removed correctly. However, in the restructured pipeline clashing homologies are removed after the calculation of gene-member homology statistics, with the result that those statistics are inaccurate. When checked in Ensembl 110 homology data for non-default gene trees, reported gene-member homology stats were found to have a Pearson correlation coefficient with corrected gene-member homology stats in the range ~95-99%.

We plan to correct the calculation of these statistics in Ensembl release 115.

In the meantime, gene-member homology counts for collections other than ‘default’ should be considered as approximate, whether accessed from the ‘gene_member_hom_stats’ table of a Compara database or via Compara Perl API methods such as ‘GeneMember::number_of_orthologues’ or ‘GeneMember::number_of_paralogues’.

Workaround: If precise gene-member homology statistics are needed for non-default collections, these could be calculated from the homology data.

Missing Interpro Data

Affects: Live Site Versions: Ensembl 106, Ensembl 107, Ensembl 108, Ensembl 109, Ensembl 110, Ensembl 111, Ensembl 112
Description: We have several species that are missing uniprot data. This is easily observed by the missing TSV  file. All protein sequences should have a uniprot id (with the exception of bacteria) within a month of submission. New updates/added organisms are the worst affected and a majority of Fungi are missing these annotations.
Workaround: No work around. Please use the most up to date datasets.

Canonical sequence discrepancies in Fungi and Protists Compara

Affects: Live Site Versions: Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105, Ensembl 106, Ensembl 107, Ensembl 110, Ensembl 111, Ensembl 112
Description: Due to a software bug, some gene members in Fungi and Protists Compara have been assigned canonical members which did not match the canonical sequence of their corresponding core gene. For a more severely affected subset of these, gene tree and homology views of genes with a discrepant canonical are inaccessible in the Ensembl website.During work on Fungi Compara 112, the severely affected subset — comprising 6,538 (0.2%) genes in gene trees — was identified and fixed. A comparable subset of affected genes was subsequently identified in Protists — consisting of 1,323 (0.1%) genes in gene trees. Though it was not feasible to fix the most severely affected genes in Protists Compara 112, these will be fixed for Ensembl 113, along with all canonical discrepancies in both the Fungi and Protists comparative databases.

This issue has been confirmed to have affected Protists comparative data in Ensembl releases 102-106, 110, and 112. Fungi Compara is affected in Ensembl releases 102-107, 110, 111, and (to a lesser extent) in 112.

Workaround: There is currently no workaround.

Subset of genes omitted from protein trees in Metazoa Compara

Affects: Live Site Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105, Ensembl 106, Ensembl 107, Ensembl 108, Ensembl 109, Ensembl 110, Ensembl 111, Ensembl 112
Description: During a cross-check of gene members used in comparative analyses such as protein trees, the gene sets of a number of genomes in Plants, Metazoa and Pan Compara were found to be incomplete in their respective Compara databases relative to the corresponding core databases. As a result, the affected genes had been inadvertently omitted from inference of protein trees and homologies.This issue was addressed for most affected genomes in Ensembl 112 by reloading their gene sets from their respective core databases.

Gene sets of the following affected Metazoa species will be reloaded in Ensembl 113:

* Atta cephalotes (Leaf-cutter ant)

* Bombus impatiens (Common eastern bumblebee)

* Culex quinquefasciatus (Southern house mosquito, JHB)

* Glossina fuscipes (Tsetse fly, IAEA_lab_2018)

* Musca domestica (House fly, aabys)

* Nasonia vitripennis (Jewel wasp, AsymCx)

* Solenopsis invicta (Red fire ant, M01_SB)

* Stomoxys calcitrans (Stable fly, USDA)

Workaround: There is currently no workaround.

Canonical sequence discrepancy affecting 3299 genes in Plants Compara

Affects: Live Site Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105, Ensembl 106, Ensembl 107, Ensembl 108, Ensembl 109, Ensembl 110, Ensembl 111, Ensembl 112
Description: During routine pre-release checks of the Ensembl site, it was found that different canonical sequences had been used in Plants and Pan Compara for 3,299 of 34,310 genes (9.6%) in Brachypodium distachyon. Further investigation confirmed that the Plants Compara gene members were inconsistent with their corresponding core genes.This discrepancy will be fixed by reloading the affected gene members from their core database in Ensembl 113.
Workaround: There is currently no workaround.