Ensembl 100

Known bugs in Ensembl

Truncated GVF and VCF files for some non-vertebrate species

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description: The GVF and VCF files that we make available on the FTP site, which store variation data, were inadvertently truncated for all fungi, metazoa, and protist species. This affects Ensembl Genomes releases 47 and 48.
Workaround: The affected GVF and VCF files have been fixed and are now available in the respective FTP directories.

Missing Goldfish JSON file

Affects: Live site Versions: Ensembl 100
Description: The JSON file for the Goldfish (Carassius auratus) is not available on the Ensembl ftp site.
Workaround: The data that would have been in the JSON file is available in a range of other files.

Incomplete mapping in Assembly Converter

Affects: Live site, Mirrors Versions: Ensembl 98, Ensembl 99, Ensembl 100
Description: The following species have no mappings between new and old assemblies.
This means that the Assembly Converter tool will not present these, even though such mappings are possible.

Fungi: Saccharomyces cerevisiae (EF1 and R64-1-1)

Protists: Thalassiosira pseudonana (ASM14940v1_bd and ASM14940v2)

Workaround:

No workaround.

Inconsistencies between core and core-like dbs

Affects: Live site, Mirrors Versions: Ensembl 100
Description: The assembly name and accession do not match between the human rnaseq (GRCh38.p10, GCA_000001405.25), otherfeatures (GRCh38.p12, GCA_000001405.27) and core (GRCh38.p13, GCA_000001405.28) databases. The otherfeatures database has the expected GRCh38.p10 assembly and seq_region tables though.

The assembly name and accession do not match between the mouse rnaseq (GRCm38.p5, GCA_000001635.7) and core (GRCm38.p6, GCA_000001635.8) databases. The rnaseq database has the expected GRCm38.p6 assembly and seq_region tables though.

Workaround: No workaround.

Erroneous transcript (ENSRNASEQT00001237576) in Pig RNAseq data

Affects: Live site, Archives Versions: Ensembl 99, Ensembl 100
Description: There is a transcript in the Pig reference rnaseq database with no exons, and no translation. We will remove this transcript and hand over the corrected rnaseq database.
Workaround: No workaround.

Links to RefSeq genes do not work

Affects: Live site, Mirrors Versions: Ensembl 99, Ensembl 100
Description: Links to RefSeq genes in region views do not work, because they use an internal identifier rather than the gene ID.
Workaround: Links to transcripts are correct, so these can be used to navigate to the correct page on the NCBI website.

Drosophila melanogaster RNA gene cross-reference links do not work

Affects: Live site, Mirrors Versions: Ensembl 99, Ensembl 100
Description: Rfam and miRBase cross-reference links do not work, because they use the FlyBase ID instead of the RNA gene.
Workaround: Search for the Rfam or miRBase ID on the respective website.

Gene name cross-reference links do not work

Affects: Live site, Mirrors Versions: Ensembl 99, Ensembl 100
Description: Cross-reference links to HGNC, MGI, and ZFIN do not link to the correct page, because they use the name rather than the numeric identifier.
Workaround: Search for the gene name on the HGNC, MGI, or ZFIN website.

Remove semicolons [;] from gene names in dumped GTF files

Affects: Live site Versions: Ensembl 100
Description: The Arabidopsis thaliana GTF file available for download contains semicolons in the gene_name within the attributes. This is a disallowed character for many downstream programmes (for example htseq-count).

This does not affect the GFF version of the same file.

Workaround: Escape forbidden GTF characters, such as semicolons, within attributes.

Gene trees missing human ncRNA genes

Affects: Live site Versions: Ensembl 99, Ensembl 100
Description: Due to missing Rfam references, a number of human ncRNA genes have not been clustered correctly and are therefore missing from gene trees and homology predictions.
Workaround: No workaround. Use Ensembl 98 if possible.

Drosophilidae cores imported from FlyBase have stop codon missing from their CDSs

Affects: Live site Versions: Ensembl 98, Ensembl 99, Ensembl 100, Ensembl 101
Description: Drosophilidae cores imported from FlyBase have missing stop codon in their CDS. No proteins and their domains are affected.
Workaround: No workaround. The species core will be reimported from FlyBase in a future release.