Ensembl 115

Internal stop symbols in protein sequences of three vertebrate genomes
Affects: Ensembl 115Fix version: Ensembl 116
During preparation of gene member data for Vertebrates comparative processing, protein-coding member sequences were patched so that each would be consistent with its corresponding translation in the core database.

This inadvertently reintroduced some protein sequences containing internal stop symbols (‘*’), which had previously been masked with ‘X’ amino-acid ambiguity codes.

There are 1201 affected protein-coding genes across 3 vertebrate genomes: 602 in Mus pahari, 594 in Mus caroli and 5 in Zebrafish.

The inclusion of these protein sequences with internal stop symbols may have have affected aspects of protein-tree analyses such as alignment or phylogenetic placement.

The effects of this are expected to be modest in most cases. However, a minority of genes may be particularly affected. For example, in release 115 Mus caroli gene MGP_CAROLIEiJ_G0019362 was placed in the gene tree among other genes of the same species, while in release 116 this gene is placed among non-mammalian species.

Users are advised to interpret with caution the protein trees and homologies of coding genes whose canonical sequence contains an internal stop symbol.
Workaround: Ensembl Vertebrates release 115 includes comparative analyses in which stop symbols have been masked with ‘X’ amino-acid ambiguity codes. Users may wish to access these via the Ensembl 114 archive site.
Avena sativa cv. OT3098 homoeologues misclassified as paralogues
Affects: Ensembl 109, 110, 111, 112, 113, 114, 115, Ensembl GenomesFix version: Ensembl 116
When first made available in Ensembl Plants 109, Avena sativa cultivar OT3098 was loaded into Plants Compara databases without subgenome component information.

As a result, homologues that would otherwise be classified as homoeologues will have been annotated as paralogues.
Workaround: No current workaround
Some core statistics missing for human in Biomart 115
Affects: 115, BioMartFix version: Ensembl 116
Some core statistics were not built for human in the latest Biomart build, such as %GC content.

These will be reprocessed and produced for the Ensembl 116 release.
Workaround: Use a recent Ensembl archive to export statistics such as %GC content for human.
 Mosquito Anopheles melas (Mosquito, CM1001059_A) – sequences not available in Biomart
Affects: 115, Ensembl Genomes, BioMartFix version: N/A
Gene info for Anopheles melas (Mosquito, CM1001059_A) was included in the latest Biomart build in error.

Sequences and other data will not be available via Biomart for this species. Sequence info is accessible via the browser and FTP sites.
Workaround: No current workaround
Missing Gene Ontology pages on Gene-based displays
Affects: Ensembl 114, 115, Ensembl Genomes, BioMartFix version: Ensembl 116
Gene Ontology (GO) related pages on the Gene-based display tabs are unavailable. The following pages are unavailable on the live site:

GO: Cellular component
GO: Biological Process
GO: Molecular function

Please make use of the Ensembl archive sites to access these gene-based display pages.

This also affects Ensembl Genome sites and Biomart, please use the latest archive where possible
Workaround: Please use the latest Ensembl archive site to access these pages.
Gene Ontology and Phenotype VEP plugins are non-functional for Plants and Metazoa
Affects: Ensembl 114, 115, Ensembl Genomes, BioMartFix version: N/A
Description:
Due to issues affecting the Phenotypes and Gene Ontology VEP plugins, these not functional for plant and metazoan species.
Workaround: There is currently no workaround. Avoid enabling these plugins to prevent VEP jobs failing.
Missing orthologue projections for Drosophila virilis
Affects: Live SiteFix version: Ensembl 116
Description:

Drosophila virilis assembly and annotation has been updated in e115 but are missing orthologue projections.

Workaround: Users can access this information from the latest archive.

Archive of release 59 of EnsemblMetazoa: eg59-metazoa.ensembl.org (May 2024)
Some Vertebrates and Plants LastZ alignments have conflicting reference metadata
Affects: Live SiteFix version: Ensembl 116
Description:

Due to a configuration consistency issue, some LastZ alignments have the same genome configured as both a ‘reference_genome’ and ‘non_reference_genome’. Eleven alignment datasets are affected: eight in Ensembl Plants and three in Ensembl Vertebrates.


This issue has resulted in these alignments being dumped to the Ensembl FTP with the reference genome switched.


For example, the LastZ alignment represented by ‘oana_mornana1.p.v1.v.mdom_asm229v1.lastz_net.tar.gz’ in release 114 is represented by ‘mdom_asm229v1.v.oana_mornana1.p.v1.lastz_net.tar.gz’ in release 115. Statistics pages of affected pairwise alignments refer to the same genome as both reference and non-reference.


Furthermore, Compara Perl API method ‘MethodLinkSpeciesSet::find_pairwise_reference’ may return incorrect results for affected pairwise alignment MLSSes.


The affected Ensembl Plants datasets are:
Oryza nivara vs Oryza sativa Japonica Group (MLSS ID: 9262)
Oryza barthii vs Oryza sativa Japonica Group (MLSS ID: 9269)
Oryza punctata vs Oryza sativa Japonica (MLSS ID: 9275)
Oryza glumipatula vs Oryza sativa Japonica Group (MLSS ID: 9280)
Oryza meridionalis vs Oryza sativa Japonica Group (MLSS ID: 9434)
Oryza longistaminata vs Oryza sativa Japonica Group (MLSS ID: 9442)
Triticum aestivum vs Oryza sativa Japonica Group (MLSS ID: 9631)
Triticum dicoccoides vs Triticum aestivum (MLSS ID: 9811)

The affected Ensembl Vertebrates datasets are:
Lamprey vs Ciona intestinalis (MLSS ID: 798)
Zebrafish vs Japanese medaka (MLSS ID: 1285)
Platypus vs Opossum (MLSS ID: 1819)
Workaround: All affected Plants alignments are available in the Ensembl Plants archive of May 2024.

Affected Vertebrates alignments can be accessed via the Ensembl release 114 archive.

Three contributing ncRNA trees not used in TreeBeST inference

Affects: Live SiteFix version: Ensembl 116
Description:

Due to a pipeline synchronisation issue, three ncRNA trees were inferred by TreeBeST before all contributing trees were ready, and as a result those contributing trees were not used when inferring the final gene tree.

* The phylogeny inferred by PhyML using a genomic alignment (‘pg_it_phyml’) did not contribute to the inference of the gene tree with stable_id ‘RF00929’.

* The phylogeny inferred by FastTree using a genomic alignment (‘ftga_it_nj’) did not contribute to the inference of the gene tree containing Mouse gene Gm22811 (ENSMUSG00000088093).

* The phylogeny inferred by RAxML using the S16 model (‘ss_it_s16’) did not contribute to the inference of the gene tree containing Abingdon island giant tortoise gene ENSCABG00000006715.

We aim to fix this issue in release 116.

Workaround: For the first two gene trees listed, we recommend accessing their counterparts in the release 114 Ensembl archive.

For Abingdon island giant tortoise gene ENSCABG00000006715, the most recent available alternative gene tree can be accessed from the Ensembl release 96 database or FTP dumps.

Missing GO annotations for 29 plant species

Affects: Live SiteFix version: Ensembl 1116
Description:

Due to a faulty pipeline, GO annotations are missing for 29 plant species as below:

Actinidia chinensis
Aegilops umbellulata
Amborella trichopoda
Ananas comosus
Arabidopsis lyrata
Arabis alpina
Cajanus cajan
Cannabis sativa (female)
Capsicum annuum
Chara braunii
Chenopodium quinoa
Citrus clementina
Coffea canephora
Corchorus capsularis
Cucumis melo
Cucumis sativus
Cynara cardunculus
Daucus carota
Eragrostis curvula
Eucalyptus grandis
Eutrema salsugineum
Glycine soja
Gossypium raimondii
Juglans regia
Lupinus angustifolius
Prunus avium
Prunus dulcis
Setaria viridis
Solanum tuberosum RH89-039-16
Workaround: Users can access the GO annotations if a species is available in the archives. There is no other workaround at the moment.
Archive sites:
Archive of release 59 of EnsemblPlants: eg59-plants.ensembl.org (May 2024)
Archive of release 56 of EnsemblPlants: eg56-plants.ensembl.org (Feb 2023)
Archive of release 52 of EnsemblPlants: eg52-plants.ensembl.org (Dec 2021)
Archive of release 49 of EnsemblPlants: eg49-plants.ensembl.org (Dec 2020)
Archive of release 45 of EnsemblPlants: eg45-plants.ensembl.org (Sep 2019)
Cactus guide tree branch length overestimation
Affects: Ensembl 112, 113, 114, 115, Ensembl GenomesFix version: Ensembl 116
We have discovered that the branch lengths of the guide trees used for several cactus-whole genome alignments were substantially overestimated. This was due to a bug in the component of our BUSCO-based species tree pipeline which estimates the branch lengths based on fourfold degenerate sites. The bug did not affect the topology of the guide tree, and it has effectively been fixed in the latest version of the BUSCO-based species-tree pipeline by changing the software used to produce back-translated codon alignments.

This issue is unlikely to have affected the quality of the whole-genome alignments; however, the branch lengths of the guide trees stored in the HAL files of the affected alignments should not be used for downstream analyses, such as inference of conservation scores and constrained elements.

The issue should not affect the inference of constrained elements using PhyloP as implemented in the HAL package, as the protocol includes the re-estimation of neutral rates.

The issue likely affected the branch lengths stored in the following HAL files:
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Actinopterygii_123-way_20221206.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Aqua-faang_38-way_20220303.hal 
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Coleoptera_36-way_20230217.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Crustacea_16-way-20230217.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Lepidoptera_218-way_20230215.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Percomorpha_38-way_202203.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Pigs_27-way_20230220.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Rice_27-way_202208.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Rodent_7-way_20221018.hal
https://ftp.ensembl.org/pub/rapid-release/data_files/multi/hal_files/Wheat_37-way_20221206.hal
https://ftp.ensembl.org/pub/misc/compara/multi/hal_files/Aves-59-way_20230814.hal
https://ftp.ensembl.org/pub/misc/compara/multi/hal_files/Drosophila-40-way_20230928.hal
https://ftp.ensembl.org/pub/misc/compara/multi/hal_files/Fowl-10-way_20240131.hal
https://ftp.ensembl.org/pub/misc/compara/multi/hal_files/Mammals-100-way_20230606.hal 
Workaround: The branch lengths of the guide tree must be re-estimated before downstream analyses, for example, by using the halPhyloPTrain.py script from the HAL package.
Amino-acid substitution model used with cDNA alignment in 59 Rice cultivar gene trees
Affects: Ensembl 112, 113, 114, 115, Ensembl GenomesFix version: Ensembl 116
Due to a software bug, the WAG amino-acid substitution model was used with a cDNA alignment to infer 59 Rice cultivar gene trees in Ensembl Plants release 112.

The 59 affected gene trees are listed below. We urge users to exercise caution when interpreting these gene trees or their associated homologies.
Workaround: There is currently no workaround.