Ensembl 103

Known bugs in Ensembl

Inconsistency in transcripts numbering in GFF3 and GTF exported files

Affects: Live site Versions: Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.

Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species.

The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice)

https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199
https://github.com/Ensembl/ensembl-io/blob/release/104/modules/Bio/EnsEMBL/Utils/IO/GTFSerializer.pm#L112

This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes.
This currently only happens with genes on patches, where some transcripts can be entirely outside of the patch region due to the fact that we create a fake chromosome including the patch.
In the future, we are planning to store the patches as standalone scaffolds, and those transcripts will be removed entirely, hence not being included in either the GTF or GFF3 dumps

We plan to fix this from 106 onwards.

Workaround: No work around. Except using most up to date datasets

1000 Genomes minor allele frequency incorrect for duplications

Affects: Live site, Staging, GRCh37 Expected Versions: Ensembl 103, 104
Description: Some insertion/deletion variants which can be described as duplications currently have incorrect global allele frequencies from the 1000 Genomes Project reported in the Ensembl variant and transcript views, BioMart and in Ensembl VEP. Versions 103 – 104 are affected. The continental population frequencies for these variants are correct and the problem can be identified by comparing the two. Example: rs199588481 where an ‘A’ is inserted adjacent to an ‘A’ , the VCF reference allele ‘A’ is annotated as the minor allele, when the alternate allele ‘AA’ should be. This issue will be resolved in Ensembl VEP version 105, which will be released in the autumn. BioMart and the Ensembl browsers will be fixed for version 106.
Workaround: We advise ignoring these global frequencies and filtering using the continental frequencies instead..

Missing clone information for mouse GRCm39

Affects: Live site, Mirrors Expected Versions: Ensembl 103
Description: The coordinates of the following clones libraries have not been loaded into the Mouse GRCm39 database, they are not visible in the Location view
* B6Ng01
* C3H
* CH25
* CH26
* CH28
* CH29
* CH33
* CH34
* CH36
* CT7
* DN
* MHPN
* MHPP
* MM_DBa
* MSMg01
* RP21
* RP22
* RP23
* RP2
* WI
* bMQ
Workaround: At the bottom of the Location view page, you should click on “View in archive site” and select “Ensembl 102: Nov 2020 (GRCm38.p6)”. You can now click on the cog to configure the page and view the clone information for the region of interest. In most of the cases the coordinates would have shifted.

Mouse species.common_name needs to be patched

Affects: Live site Expected Versions: Ensembl 103
Description: In mouse 39 the species.common_name is set to ‘house mouse’, a change from ‘mouse’ in earlier releases. The consequences of this are
1) When searching from the home page, results from the reference mouse as shown with a species of ‘House Mouse’ whereas all strains are shown with a species of ‘Mouse’. This means that clicking on ‘Mouse’ does not show the results from the reference.
2) Whereas ‘Mouse’ is a favourite and therefore elevated in the list of species, ‘House mouse’ is not. This means that you will need to expand the list to find ‘House Mouse’.
Workaround: To filter search results to show the reference mouse, you will need to scroll down the long list of species to find ‘House Mouse’.

Missing RFAM xrefs in mouse core database

Affects: Live site, Mirrors, Staging Expected Versions: Ensembl 103
Description: The RFAM xrefs are missing from the mouse core database. As a consequence, a variable number of genes and transcripts of the biotypes ‘misc_RNA’, ‘ribozyme’, ‘rRNA’, ‘snoRNA’ and ‘snRNA’, will get a clone-based name instead of the RFAM name. Descriptions may be empty for these genes and transcripts.
Workaround: none

Microarray data not present in Ensembl Metazoa BioMart for Drosophila melanogaster

Affects: Live site Expected Versions: Ensembl 103
Description: No microarray data for Drosophila melanogaster in BioMart on Ensembl Metazoa.
Workaround: Since that data is present in both Ensembl Metazoa and Ensembl sites, the work around would be to use Ensembl for Drosophila melanogaster for release 103.

Some genes need to be updated for the wheat cultivar Stanley

Affects: Live site Expected Versions: Ensembl 102, Ensembl 103
Description: For the wheat cultivar Stanley, a different and newer genome assembly version has been uploaded to the sequence archives. The gene projections in Ensembl Plants refer to an older version of the Stanley assembly, which has not been uploaded to the archives. This results in an inconsistency between GFF file and genome assembly sequence file. However, the only difference between the two Stanley assembly versions is one scaffold that changed orientation (chr2A:1-5191484). That means that the overall gene content remains completely UNCHANGED and the CDS and protein sequences in the fasta files remain valid. Only thing to note is that a few genes (those in the region of the flipped scaffold) in the GFF file will have incorrect coordinates with relation to the latest Stanley assembly sequence. The corrected GFF (along with all other files) can also be accessed here:
https://wheat.ipk-gatersleben.deAnd will be updated in Ensembl 104.
Workaround: The corrected GFF (along with all other files) can also be accessed here:
https://wheat.ipk-gatersleben.de

Merged RNA-seq data not available for some sheep

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The RNA-seq merged BAM files and their associated tracks are not available for the Rambouillet sheep (Oar_rambouillet_v1.0) nor for the Texel sheep (Oar_v4.0). This also affects Ensembl Rapid Releases from 8 onwards.
Workaround: none

GENCODE Basic track doesn’t get displayed

Affects: Live site Expected Versions: Ensembl 103
Description: It is not possible to display the GENCODE Basic gene annotation track in the genome browser
Workaround: none

Missing variant pathogenicity predictions for REVEL, MetaLR and MutationAssessor

Affects: Live site, Mirrors, Staging, Test Expected Versions: Ensembl 102, Ensembl 103
Description: We are missing variant pathogenicity predictions from REVEL, MetaLR and MutationAssessor on:
* Variant page > Genes and regulation view
* Transcript page > Variant table viewThis only affects human GRCh38 views. Predictions for CADD, SIFT and PolyPhen-2 are still available.
 This problem does not impact Ensembl VEP.
Workaround: The scores can still be retrieved:  

  • in release 102 through VEP, using the web and command line VEP tool
  • using release 101 views

Genomes have been over-masked

Affects: Live site Expected Versions: Ensembl 102, Ensembl 103
Description: Repeatmasked genomes have been masked using Repeatmodeler libraries for some species – we are not confident that this is not masking gene families and so will remove this masking, i.e. only mask the genomes using Repbase libraries.
Workaround: For the time being, masked genomes have been masked using the Repeatmodeler libraries.

Regulation Mart missing

Affects: Live site Expected Versions: Ensembl 103
Description: Some dataset have unfortunately been missed out in regulation mart for this release.Missing datasets are:

  • Human Regulatory Evidence
  • Mouse Binding Motifs
  • Mouse Other Regulatory Regions
  • Mouse Regulatory Features
  • Mouse Regulatory Evidence
  • Mouse Regulatory Features
  • Mouse miRNA Target Regions

We apologise for the inconvenience and doing our best to restore these datasets for release 104.

Workaround: In the meantime, you can use the archive site to retrieve the data, unfortunately, Mouse would be using the GRCm38.p6 assembly and not the latest version GRCm39.

Non-current exons in human core database

Affects: Live site Expected Versions: Ensembl 103
Description: A large number of exons are erroneously labelled as non-current in the human core database (exon.is_current = 0). This bug may impact Ensembl API users, since several ExonAdaptor methods filter for current exons:fetch_all
fetch_by_stable_id
fetch_by_stable_id_version
This bug does not seem to affect the website.
Workaround: If possible, use alternative API methods to fetch exons, such as fetch_all_by_Transcript.

GO Term Filters not available in non-vertebrate BioMart

Affects: Live site Expected Versions: Ensembl 103
Description: GO Term Accession and GO Term Name filters do not show up in the Ensembl Genomes BioMart across all divisions.
Workaround: Please use the archived release 59, available here:
https://nov2020-plants.ensembl.org/biomart/martview
To navigate to other divisions, please modify the URL by changing the division name.

EPO and EPO extended MSAs not displayed correctly

Affects: Live site Expected Versions: Ensembl 103
Description: In e103, we found out that EPO and EPO Extended MSAs have been displayed incorrectly for quite some time (based on a later GitHub track, most likely since e86 or when Alignment(image) was made available). After some investigation we have found out that the bug is located in Compara’s AlignSliceAdaptor and was causing issues like species displayed twice or the ancestral sequence not having the correct information.
Workaround: none

Broken/ missing links for transcripts with biotypes “tRNA” and “IG” for RefSeq tracks

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: When viewing the RefSeq track, the links to NCBI for transcripts with biotypes “tRNA” and “IG” are broken or incorrect.
Workaround: This will be fixed in an upcoming Ensembl release, in the meantime the links will be disabled.

Compara ncRNA trees stats not described accurately

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104
Description: The stats computed in ncRNA trees under the names {{nb_genes_in_tree}} and {{nb_orphaned_genes}} are not actually referring to the final trees but the unfiltered clusters (earlier stage). In Ensembl 103 we have corrected this problem and they will match their name, but their values will decrease significantly in at least 50% of the species reported.
Workaround: none

Some protein coding genes mysteriously turned into non_translating_CDS

Affects: Live site Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103
Description: A user spotted that peptide fasta files are considerably shorter for pachysolen_tannophilus_nrrl_y_2460_gca_001661245 (fungus). Turns out that this is because in release 42 a lot of its protein coding genes were marked as nontranslating_CDS (although the underling data and annotation has not changed).Needs investigation!
Workaround: none

GRCh37 – COSMIC insertion coordinates off by +1

Affects: Live site Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103
Description: The coordinates for insertions imported for COSMIC source are off by +1.For GRCh37 e100, e101: 2.66 % (253,428 / 9,511,409) COSMIC variation is affected.
Workaround: The previous release can be used. GRCh37 99 contained 4,478,854 COSMIC variation data.

Drosophila melanogaster RNA gene cross-reference links do not work

Affects: Live site, Mirrors, Staging Expected Versions: Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103
Description: Rfam and miRBase cross-reference links do not work, because they use the FlyBase ID instead of the RNA gene.
Workaround: Search for the Rfam or miRBase ID on the respective websites.

Extra character in Drosophila file dumps 

Affects: Live site Expected Versions: Ensembl 103
Description: an extra `_` has been inserted in some GTF file paths during FTP dumps for Drosophila species:ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_virilis/Drosophila_virilis.dvir_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_yakuba/Drosophila_yakuba.dyak_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_ananassae/Drosophila_ananassae.dana_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_mojavensis/Drosophila_mojavensis.dmoj_caf1_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_pseudoobscura/Drosophila_pseudoobscura.Dpse_3.0_.50.gtf.gz
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_simulans/Drosophila_simulans.ASM75419v3_.50.gtf.gz
(note the extra underscore in the assembly string).
Workaround: Add the _