Ensembl 101

Known bugs in Ensembl

JSON formatted files missing on the FTP site

Affects: Live site Expected Versions: Ensembl 101
Description:

We experienced problems generating the content of the JSON dumps for our FTP site, leading to many files missing from the ftp://ftp.ensembl.org/pub/current_json directory. All Ensembl and Ensembl Genomes divisions have been affected, please refer to the attached file for the complete list of affected species.

Workaround:

This will be fixed in upcoming Ensembl release. Meanwhile, please use the JSON files from previous release, which can be found in our archives: http://apr2020.archive.ensembl.org/index.html

Incorrect display ids/labels captured for UCSC external references in mouse

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102
Description:

Ensembl identifiers (ENS ids) are displayed as UCSC external references for mouse.

Workaround:

The linking out to UCSC website works correctly.

Truncated GVF and VCF files for some non-vertebrate species

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description:

The GVF and VCF files that we make available on the FTP site, which store variation data, were inadvertently truncated for all fungi, metazoa, and protist species. This affects Ensembl Genomes releases 47 and 48.

Workaround:

The affected GVF and VCF files have been fixed and are now available in the respective FTP directories.

Missing RefSeq data in homo_sapiens otherfeatures 101

Affects: Live site Versions: Ensembl 101, Ensembl 102
Description:

There are a number of RefSeq genes missing in the homo_sapiens_otherfeatures_101_38 database. An example is in this region:  [http://www.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000139618;r=17:42922692-43246971]

This will also affect VEP queries using the RefSeq transcript set.

Workaround: When using the VEP, command line users need to use the e!100 cache, and the e!100 archive for REST API users. E.g.  http://apr2020.rest.ensembl.org/vep/human/id/rs80357183?content-type=application/json&refseq=1

There is no work-around for users of the VEP web interface.

This will be fixed in Ensembl 103.

Missing BioMart datasets for Sorghum bicolor and apple

Affects: Live site Versions: Ensembl 101
Description: Sorghum bicolor and apple (Malus domestica Golden) datasets in BioMart are faulty.
Workaround: They will be fixed for next release.

Wrong figures in the block-size distribution of EPO alignments

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description: The per-size range statistics about the alignment blocks of EPO alignments (http://e100.ensembl.org/info/genome/compara/mlss.html?mlss=1628) are approximately doubled. This affects both the number of blocks and the total size.
Workaround: Figures will be corrected in the Ensembl release 102.

Missing GOC scores for polyploids

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description:

Polyploid species only have Gene Order Conservation (GOC) scores computed for one of their sub-genomes.

Workaround:

The issue will be fixed in the Ensembl 102. In the meantime, you can retrieve the complete data for the species available in the release 99 from our archives.

No GOC scores for homoeologues

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description:

Gene Order Conservation scores have not been computed for homoeologues.

Workaround:

You can retrieve Gene Order Conservation scores for homoeologues listed in the Ensembl 99 from our archives.

Long-read tracks missing for blue whale and vaquita

Affects: Live site Versions: Ensembl 101
Description:

Long-read transcriptomic data was used to annotate the blue whale and vaquita genomes, but the tracks are not currently available to view.

Workaround:

Tracks will be available in Ensembl 102.

rfam_genes have wrong strand when loaded with ensembl-genomeloader

Affects: Live site Versions: Ensembl 96, Ensembl 97, Ensembl 98, Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102
Description:

The https://github.com/Ensembl/ensembl-genomeloader (GL) is used to load non-vertebrate genomes and their annotations from the ENA. It is also used to annotate non-coding genes matching RFAM HMMs, but in some cases the assigned strand is the template strand. This affects some microbial and plant genomes loaded with the GL.

Workaround:

We will remove the rfam_genes from the affected genomes and run the RNA features pipeline instead.

Missing gene trees for vault RNAs

Affects: Live site Versions: Ensembl 101
Description:

Due to an issue with the biotype group in the core databases, vault RNAs were not correctly parsed by our pipelines.  As we had to reuse the Ensembl 100 data, the gene IDs from the release 100 are used in the current release. Since these vaultRNA genes have disappeared, the pipeline removed the associated gene tree(s) and homologies.

Workaround:

The ncRNA trees are being reused from Ensembl 100, so the archive will still hold the correct data.

GRCh38 – COSMIC insertion coordinates off by +1

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description:

The coordinates for insertions imported for COSMIC source are off by +1.

For GRCh38 Ensembl 100, Ensembl 101: 2.63 % (256,099 / 9,729,777) COSMIC variation are affected.

Workaround:

The previous release can be used. GRCh38 99 contained 10,067,510 COSMIC variation data with 251,353 being insertions.

GRCh37 – COSMIC insertion coordinates off by +1

Affects: Live site Versions: Ensembl 100, Ensembl 101, Ensembl 102
Description:

The coordinates for insertions imported for COSMIC source are off by +1.

For GRCh37 Ensembl 100, Ensembl 101: 2.66 % (253,428 / 9,511,409) COSMIC variation is affected.

Workaround:

The previous release can be used. GRCh37 99 contained 4,478,854 COSMIC variation data.

Gene trees missing ncRNA genes for all 101 species

Affects: Live site Versions: Ensembl 101
Description:

Due to issues with the ncRNA gene trees pipeline, we were unable to make the required updates this release. We will be reusing the data from Ensembl 100 and, therefore, all ncRNA trees will be missing the newest species.

Workaround:

None.

Links to INSDC protein cross-references do not work for sugar beet

Affects: Live site, Mirrors, Archives Versions: Ensembl 100, Ensembl 101
Description:

Links to INSDC protein records lead to pages on the ENA site with no records.
For example, the “INSDC protein ID” link for a beet transcript is KMS96696.1, which displays no information.
The “[align]” link next to the accession does not produce results either, for the same reason.

Workaround:

Protein records are available at NCBI, using the pattern “https://www.ncbi.nlm.nih.gov/protein/<ID>”, e.g. https://www.ncbi.nlm.nih.gov/protein/KMS96696.

Incorrect links to SGD website for Saccharomyces cerevisiae

Affects: Live site Versions: Ensembl 100, Ensembl 101
Description:

There has been a change in the syntax on how SGD is accessed. The current release displays outdated malformed links that do not work.

Workaround:

You can fix the broken link by changing:
http://db.yeastgenome.org/cgi-bin/locus.pl?locus=###ID###
to:
https://www.yeastgenome.org/locus/###ID###

Incorrect links to KEGG for some plant and microbe species

Affects: Live site, Archives Versions: Ensembl 100, Ensembl 101
Description:

Links to KEGG for some microbe and plant species are malformed and do not work.

For example, in the General Identifiers page of a beet transcript, the KEGG link for 00052+2.4.1.22 goes to https://www.genome.jp/dbget-bin/www_bget?path:00052+2.4.1.22, rather than https://www.genome.jp/kegg-bin/show_pathway?map00052+2.4.1.22.

Workaround:

The workaround is to manually edit the url to the correct format.

Compara documentation on Ensembl Genomes sites is outdated

Affects: Live site Versions: Ensembl 99, Ensembl 100, Ensembl 101
Description:

The documentation about comparative genomics on Ensembl Genomes sites is outdated. The “Gene Tree (?)” help link on tree pages, such as http://plants.ensembl.org/Arabidopsis_thaliana/Gene/Compara_Tree?g=AT3G52430;r=3:19431095-19434450;t=AT3G52430.1, takes you to http://plants.ensembl.org/Help/View?id=137 and http://plants.ensembl.org/info/genome/compara/homology_method.html , which should be updated to describe the HMM-based classification of sequences. Also the section on dNdS should be removed.

Workaround:

No workaround.

Drosophila melanogaster RNA gene cross-reference links do not work

Affects: Live site, Mirrors Versions: Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102
Description:

Rfam and miRBase cross-reference links do not work, because they use the FlyBase ID instead of the RNA gene.

Workaround:

Search for the Rfam or miRBase ID on the respective websites.

Gene trees missing human ncRNA genes

Affects: Live site Versions: Ensembl 99, Ensembl 100, Ensembl 101
Description:

Due to missing RFAM cross-references, a number of human ncRNA genes have not been clustered correctly and are, therefore, missing from gene trees and homology predictions.

Workaround:

No workaround. Use Ensembl release 98 if possible.

Drosophilidae cores imported from the FlyBase have stop codon missing from their CDSs

Affects: Live site Versions: Ensembl 98, Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103
Description:

Drosophilidae cores imported from FlyBase have missing stop codon in their CDS.
No proteins and their domains are affected.

There since since 2017-08.

Workaround:

No workaround. The species core will be reimported from FlyBase in a future release.