Ensembl 105

Known bugs in Ensembl

Inconsistency in transcripts numbering in GFF3 and GTF exported files

Affects: Live site Versions: Ensembl 102, 103, 104, 105
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.

Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species.

The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice)

https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199
https://github.com/Ensembl/ensembl-io/blob/release/104/modules/Bio/EnsEMBL/Utils/IO/GTFSerializer.pm#L112

This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes.
This currently only happens with genes on patches, where some transcripts can be entirely outside of the patch region due to the fact that we create a fake chromosome including the patch.
In the future, we are planning to store the patches as standalone scaffolds, and those transcripts will be removed entirely, hence not being included in either the GTF or GFF3 dumps

We plan to fix this from 106 onwards.

Workaround: No work around. Except using most up to date datasets

1000 Genomes minor allele frequency incorrect for duplications

Affects: Live site, Staging, GRCh37 Expected Versions: Ensembl 103, 104, 105
Description: Some insertion/deletion variants which can be described as duplications currently have incorrect global allele frequencies from the 1000 Genomes Project reported in the Ensembl variant and transcript views, BioMart and in Ensembl VEP. Versions 103 – 104 are affected. The continental population frequencies for these variants are correct and the problem can be identified by comparing the two. Example: rs199588481 where an ‘A’ is inserted adjacent to an ‘A’ , the VCF reference allele ‘A’ is annotated as the minor allele, when the alternate allele ‘AA’ should be. This issue will be resolved in Ensembl VEP version 105, which will be released in the autumn. BioMart and the Ensembl browsers will be fixed for version 106.
Workaround: We advise ignoring these global frequencies and filtering using the continental frequencies instead.

COSMIC data not present in Ensembl 105 GRCh38 browser views or BioMart

Affects: Live site Expected Versions: Ensembl 105
Description: Somatic mutations from the COSMIC project will not be displayed in browser views, available for querying in BioMart, or available via our APIs for assembly GRCh38 in Ensembl version 105. These data will return in Ensembl version 106.
Workaround: The Ensembl version 104 archive site and REST archive service can be used to view/access COSMIC data

Corylus avellana left out of Compara

Affects: Live site Expected Versions: Ensembl 105, 106
Description: Because of a typo Corylus avellana has been left out of the compara gene tree in 105, it will be added back in 106.
Workaround: There is currently no workaround for this.

Missing MT ‘sequence_location’ attribute

Affects: Live site Expected Versions: Ensembl 105, 106
Description: Mitochondrial DNA sequences from 12 species (see list of species below) were mistakenly processed as nuclear DNA sequences which then affected pairwise and multiple genome alignments, and consequently syntenies for vertebrate and metazoan genomes involved. Anolis carolinensis
Ficedula albicollis
Macaca fascicularis
Mastacembelus armatus
Meleagris gallopavo
Papio anubis
Pongo abelii
Rattus norvegicus
Scophthalmus maximus
Anopheles coluzzii ngousso
Nannizzia gypsea cbs 118893
Ustilago bromivora
Workaround:There is currently no workaround for this.

Missing Rat microarray data

Affects: Live site Expected Versions: Ensembl 105, 106
Description: The Rat assembly was updated in Ensembl Release 105 to mRatBN7.2, but microarray probeset information (oligo probes) was not updated. This means that microarray and oligo probe mapping is missing from Release 105 in the genome browser and BioMart.
Workaround: Microarray and oligo probe information for the previous assembly (Rnor_6.0) can be accessed using the archive sites or the 104 release of BioMart (http://may2021.archive.ensembl.org/index.html).

Issues with the FTP sites for non vertebrates

Affects: Live site Expected Versions: Ensembl 105
Description: There are truncated unzipped and zipped files on the FTP site. This has affected `ALL` non vertebrates. We are currently working to re-sync the data and resolve this shortly.
Workaround: Data can be accessed from the FTP sites of the previous release.