Ensembl 104

Known bugs in Ensembl

Inconsistency in transcripts numbering in GFF3 and GTF exported files

Affects: Live site Versions: Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.

Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species.

The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice)

https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199
https://github.com/Ensembl/ensembl-io/blob/release/104/modules/Bio/EnsEMBL/Utils/IO/GTFSerializer.pm#L112

This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes.
This currently only happens with genes on patches, where some transcripts can be entirely outside of the patch region due to the fact that we create a fake chromosome including the patch.
In the future, we are planning to store the patches as standalone scaffolds, and those transcripts will be removed entirely, hence not being included in either the GTF or GFF3 dumps

We plan to fix this from 106 onwards.

Workaround: No work around. Except using most up to date datasets

1000 Genomes minor allele frequency incorrect for duplications

Affects: Live site, Staging, GRCh37 Expected Versions: Ensembl 103, 104
Description: Some insertion/deletion variants which can be described as duplications currently have incorrect global allele frequencies from the 1000 Genomes Project reported in the Ensembl variant and transcript views, BioMart and in Ensembl VEP. Versions 103 – 104 are affected. The continental population frequencies for these variants are correct and the problem can be identified by comparing the two. Example: rs199588481 where an ‘A’ is inserted adjacent to an ‘A’ , the VCF reference allele ‘A’ is annotated as the minor allele, when the alternate allele ‘AA’ should be. This issue will be resolved in Ensembl VEP version 105, which will be released in the autumn. BioMart and the Ensembl browsers will be fixed for version 106.
Workaround: We advise ignoring these global frequencies and filtering using the continental frequencies instead..

Missing gene trees and homologies for capuchin monkey

Affects: Live site Expected Versions: Ensembl 104
Description: Due to an internal issue regarding the taxonomic ranking of cebus_capucinus, this species was accidentally omitted from all gene trees. For this reason, there is no homology data for capuchin in Ensembl 104. As a consequence, gene name assignment was also affected for this species.
Workaround: We recommend using the previous Ensembl release.

Non-current exons in human core database

Affects: Live site Expected Versions: Ensembl 103, Ensembl 104
Description: A large number of exons are erroneously labelled as non-current in the human core database (exon.is_current = 0). This bug may impact Ensembl API users, since several ExonAdaptor methods filter for current exons:
* fetch_all
* fetch_by_stable_id
* fetch_by_stable_id_versionThis bug does not seem to affect the website.
Workaround:If possible, use alternative API methods to fetch exons, such as fetch_all_by_Transcript.

Merged RNA-seq data not available for sheep

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The RNA-seq merged BAM files and their associated tracks are not available for the Rambouillet sheep (Oar_rambouillet_v1.0) nor for the Texel sheep (Oar_v4.0). This also affects Ensembl Rapid Releases from 8 onwards.
Workaround: There is currently no workaround for this.

Broken/missing links for transcripts with biotypes “tRNA” and “IG” for RefSeq tracks

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: When viewing the RefSeq track, the links to NCBI for transcripts with biotypes “tRNA” and “IG” are broken or incorrect.
Workaround:This will be fixed in an upcoming Ensembl release, in the meantime the links will be disabled.

Compara ncRNA trees stats not described accurately

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The stats computed in ncRNA trees under the names {{nb_genes_in_tree}} and {{nb_orphaned_genes}} are not actually referring to the final trees but to the earlier stage of unfiltered clusters. In Ensembl 103 we have corrected this problem and they will match their name, but their values will decrease significantly in at least 50% of the species reported.
Workaround: There is currently no workaround for this.

Some protein coding genes turned into non_translating_CDS

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: Peptide fasta files are considerably shorter for pachysolen_tannophilus_nrrl_y_2460_gca_001661245 (fungus). This is because in release 42 a lot of its protein coding genes were marked as nontranslating_CDS (although the underling data and annotation has not changed).
Workaround: Please use FTP archives for Ensembl Fungi release 41.

Missing Goldfish JSON file

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The JSON file for the goldfish (Carassius auratus) is currently unavailable on the Ensembl ftp site.
Workaround: JSON file content can be found across other FTP files.