Ensembl 103

Known bugs in Ensembl

Mouse species.common_name needs to be patched

Affects: Live site Expected Versions: Ensembl 103
Description: In mouse 39 the species.common_name is set to ‘house mouse’, a change from ‘mouse’ in earlier releases. The consequences of this are
1) When searching from the home page, results from the reference mouse as shown with a species of ‘House Mouse’ whereas all strains are shown with a species of ‘Mouse’. This means that clicking on ‘Mouse’ does not show the results from the reference.
2) Whereas ‘Mouse’ is a favourite and therefore elevated in the list of species, ‘House mouse’ is not. This means that you will need to expand the list to find ‘House Mouse’.
Workaround: To filter search results to show the reference mouse, you will need to scroll down the long list of species to find ‘House Mouse’.

Missing RFAM xrefs in mouse core database

Affects: Live site, Mirrors, Staging Expected Versions: Ensembl 103
Description: The RFAM xrefs are missing from the mouse core database. As a consequence, a variable number of genes and transcripts of the biotypes ‘misc_RNA’, ‘ribozyme’, ‘rRNA’, ‘snoRNA’ and ‘snRNA’, will get a clone-based name instead of the RFAM name. Descriptions may be empty for these genes and transcripts.
Workaround: none

Microarray data not present in Ensembl Metazoa BioMart for Drosophila melanogaster

Affects: Live site Expected Versions: Ensembl 103
Description: No microarray data for Drosophila melanogaster in BioMart on Ensembl Metazoa.
Workaround: Since that data is present in both Ensembl Metazoa and Ensembl sites, the work around would be to use Ensembl for Drosophila melanogaster for release 103.

Some genes need to be updated for the wheat cultivar Stanley

Affects: Live site Expected Versions: Ensembl 102, Ensembl 103
Description: For the wheat cultivar Stanley, a different and newer genome assembly version has been uploaded to the sequence archives. The gene projections in Ensembl Plants refer to an older version of the Stanley assembly, which has not been uploaded to the archives. This results in an inconsistency between GFF file and genome assembly sequence file. However, the only difference between the two Stanley assembly versions is one scaffold that changed orientation (chr2A:1-5191484). That means that the overall gene content remains completely UNCHANGED and the CDS and protein sequences in the fasta files remain valid. Only thing to note is that a few genes (those in the region of the flipped scaffold) in the GFF file will have incorrect coordinates with relation to the latest Stanley assembly sequence. The corrected GFF (along with all other files) can also be accessed here:
https://wheat.ipk-gatersleben.deAnd will be updated in Ensembl 104.
Workaround: The corrected GFF (along with all other files) can also be accessed here:
https://wheat.ipk-gatersleben.de

Merged RNA-seq data not available for some sheep

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The RNA-seq merged BAM files and their associated tracks are not available for the Rambouillet sheep (Oar_rambouillet_v1.0) nor for the Texel sheep (Oar_v4.0). This also affects Ensembl Rapid Releases from 8 onwards.
Workaround: none

Missing variant pathogenicity predictions for REVEL, MetaLR and MutationAssessor

Affects: Live site, Mirrors, Staging, Test Expected Versions: Ensembl 102, Ensembl 103
Description: We are missing variant pathogenicity predictions from REVEL, MetaLR and MutationAssessor on:
* Variant page > Genes and regulation view
* Transcript page > Variant table viewThis only affects human GRCh38 views. Predictions for CADD, SIFT and PolyPhen-2 are still available.
 This problem does not impact Ensembl VEP.
Workaround: The scores can still be retrieved: 

  • in release 102 through VEP, using the web and command line VEP tool
  • using release 101 views

Genomes have been over-masked

Affects: Live site Expected Versions: Ensembl 102, Ensembl 103
Description: Repeatmasked genomes have been masked using Repeatmodeler libraries for some species – we are not confident that this is not masking gene families and so will remove this masking, i.e. only mask the genomes using Repbase libraries.
Workaround: For the time being, masked genomes have been masked using the Repeatmodeler libraries.

Broken/ missing links for transcripts with biotypes “tRNA” and “IG” for RefSeq tracks

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: When viewing the RefSeq track, the links to NCBI for transcripts with biotypes “tRNA” and “IG” are broken or incorrect.
Workaround: This will be fixed in an upcoming Ensembl release, in the meantime the links will be disabled.

Compara ncRNA trees stats not described accurately

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The stats computed in ncRNA trees under the names {{nb_genes_in_tree}} and {{nb_orphaned_genes}} are not actually referring to the final trees but the unfiltered clusters (earlier stage). In Ensembl 103 we have corrected this problem and they will match their name, but their values will decrease significantly in at least 50% of the species reported.
Workaround: none

Some protein coding genes mysteriously turned into non_translating_CDS

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103
Description: A user spotted that peptide fasta files are considerably shorter for pachysolen_tannophilus_nrrl_y_2460_gca_001661245 (fungus). Turns out that this is because in release 42 a lot of its protein coding genes were marked as nontranslating_CDS (although the underling data and annotation has not changed).Needs investigation!
Workaround: none

GRCh37 – COSMIC insertion coordinates off by +1

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103
Description: The coordinates for insertions imported for COSMIC source are off by +1.For GRCh37 e100, e101: 2.66 % (253,428 / 9,511,409) COSMIC variation is affected.
Workaround: The previous release can be used. GRCh37 99 contained 4,478,854 COSMIC variation data.

Drosophila melanogaster RNA gene cross-reference links do not work

Affects: Live site, Mirrors, Staging Expected Versions: Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103
Description: Rfam and miRBase cross-reference links do not work, because they use the FlyBase ID instead of the RNA gene.
Workaround: Search for the Rfam or miRBase ID on the respective websites.

Extra character in Drosophila file dumps 

Affects: Live site Expected Versions: Ensembl 103
Description: an extra `_` has been inserted in some GTF file paths during FTP dumps for Drosophila species:ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_virilis/Drosophila_virilis.dvir_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_yakuba/Drosophila_yakuba.dyak_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_ananassae/Drosophila_ananassae.dana_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_mojavensis/Drosophila_mojavensis.dmoj_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_pseudoobscura/Drosophila_pseudoobscura.Dpse_3.0_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_simulans/Drosophila_simulans.ASM75419v3_.50.gtf.gz
(note the extra underscore in the assembly string).
Workaround: Add the _