Ensembl 99 and earlier

RepeatMasker unavailable for BLAST queries between Ensembl 95 and Ensembl 99, inclusive

The option to enable RepeatMasker in BLAST searches to filter query sequences was unavailable between Ensembl 95 and Ensembl 99, inclusive. This meant that even when BLAST queries were submitted with this option selected, the query sequences were not filtered using RepeatMasker.

Stop codon readthroughs displaying as ‘polymorphic pseudogene’ in Ensembl 99

The 13 affected human genes are displayed as polymorphic pseudogenes at gene level and the stop codon readthrough transcript is polymorphic pseudogene at transcript level. This will be fixed in Ensembl 100.

Stop codon readthrough genes:

ACP2, AMD1, AQP4, BR13BP, LDHB, MAPK10, MDH1, MPZ, OPRK1, OPRL1, SACM1L, VDR, VEGFA

Interproscan failures – Leading to missing protein features integration

Somes species failed the Protein Features processing issued from Interproscan. For these species, some protein features data (around 100 proteins are impacted per species) is missing:

  • moschus_moschiferus (deer ~100)
  • canis_familiaris (dog ~100)
  • catagonus_wagneri (pig breed ~200)
  • mus_musculus_pwkphj (Mouse strain ~100)
  • mus_musculus_nodshiltj (Mouse strain ~100)
  • sus_scrofa_bamei (pig breed ~100)
  • sus_scrofa_hampshire (pig breed ~100)
  • sus_scrofa_berkshire (pig breed ~200)

This will be fixed in Ensembl 100.

Drosophilidae genes imported from FlyBase have stop codon missing from CDS

Genes for species of the family Drosophilidae imported from FlyBase have missing stop codon in their CDS. No proteins and their domains are affected.

Mismapping of Xenopus tropicalis stable IDs

Stable id mapping is a complicated process where we are trying to make sure the stable id of a gene is kept between two assemblies for the same species. Our pipeline failed to assign the correct stable ids in the new Xenopus tropicalis assembly.

This means that the naming of the genes are also impacted and should not be considered.

This will be fixed in Ensembl 100.

Assembly Converter does not support maize assembly version B73_RefGen_v4

The Zea mays assembly converter currently does not allow mapping old assemblies (AGPv2, AGPv3) to the current B73_RefGen_v4. The missing mapping will be added back in the version 99.

Oryza sativa gene descriptions in the Plants and Pan Compara database (e98)

The Oryza sativa gene descriptions are missing from the Plants and Pan Compara database and will be added back in the version 99.

Saccharomyces cerevisiae stable IDs in the Vertebrates Compara database (e97 and e98)

The Saccharomyces cerevisiae translation stable IDs have had their _mRNA suffix removed in e97 but this was not backported to the Verteberates Compara database. This will be fixed in the version 99

New line characters in MySQL dumps (e97)

During release 97, an error occurred while releasing our MySQL dumps, hence some of the SQL dumps were wrongly generated with unexpected new line characters. Another side of this is that some data may be truncated on our public database servers. The dumps have been fixed by now and data are now clean on our FTP site (~/mysql/directories), but in case you have downloaded the file before we made the updates, you might encounter some errors when loading data into your own database. Then you may want to simply download the file again. We have no plan to update the public MySQL server.

Hybrid cattle missing in the vertebrate gene mart e97

The two new hybrid cattle species are missing from the e97 vertebrate gene mart:

– Hybrid – Bos Taurus (bos_taurus_hybrid_core_97_1)

– Hybrid – Bos Indicus (bos_indicus_hybrid_core_97_1)

Data for these species will be generated in e98.

HGNC gene names in e96

40 genes are missing their up-to-date HGNC gene names in e96. A further 14 genes have gene names but these were assigned from links to NCBI, so they have no link to HGNC. A full list of the genes affected can be found here.

Panicum hallii FIL2 and Panicum hallii HAL2 missing in the Plants gene mart e96

The two new species Panicum hallii FIL2 and Panicum hallii HAL2 are missing from the e96 plants gene mart. Data for these species will be generated in e97.

Empty “High Confidence” column for ncRNAs and mouse-strains orthologues in e96 (Vertebrates)

The “High Confidence” field for orthologues has only been populated for protein-coding genes on the default set of species, i.e. not the mouse-strains and not non-coding genes. This affects the website, the APIs (Perl and REST), the FTP dumps and BioMart. Data will be generated as normal in e97.

Wrong stable identifiers for Plants gene-trees in e96

In e96, most Plants gene-trees have a stable identifier of the form Node_12345 instead of EPlGT0094000XXXXX. This means that gene-trees can’t be accessed using past identifiers. In e97, the usual form of identifiers will be restored. This affects the website, the APIs (Perl and REST), the FTP dumps and BioMart.

Missing MT genome in Cow (Bos taurus) in e95

The cow genome in e95 was missing the mitochondrial (MT) genome. This has now been fixed and is present in e96, but the archive for release 95 does not contain the MT genome for cow.

Missing Protein Coding Annotation for Mouse Lemur Chromosome 1 (e93)

Mouse Lemur (Microcebus murinus – GCA_000165445.3) is missing protein coding annotations from chromosome 1. This issue will be fixed in e94.

Missing GO xrefs for Ciona Intestinalis (e92)

Ciona Intestinalis is missing GO xrefs in release 92. This issue will be fixed in e93.

Missing TMHs for Mouse and Human (e90)

There was an issue with TMHs when we run interproscan and we excluded it. This will be fixed in e91.

Missing Gene/Phenotype page for Mouse homologues (e89)

In release 89 Mouse genes were wrongly mapped to their species trees.
Unfortunately these homologues will not be displayed on e89.
We advise users to use e88 archive (http://mar2017.archive.ensembl.org/index.html) if they want to see mouse homologues.
This will be fixed in e90.

Incomplete probe to transcript mappings for C. elegans and S. serevisiae (e89)

In release 89 only a subset of the probe sequences for C. elegans and S. cerevisiae have been mapped to transcripts. This will be fixed in e90.

Missing PubMed link for macaque structural variants (e88)

The link to PubMed is missing for the structural variant study ‘nstd3’ (Macaca mulatta).

Incorrect VISTA enhancers (e87)

The VISTA enhancers for human in GRCh37 and GRCh38 have been imported incorrectly starting in release 87, resulting in a lower number of features and incorrect genomic coordinates for those available in Ensembl. This affects the queries through BioMart, the API and their display on the Ensembl genome browser.
This will be fixed in e90 for GRCh38 and the next GRCh37 update.

UPDATE: All VISTA enhancers (human GRCh37 and GRCh38 and mouse GRCm38) have been successfully re-imported for release 89.

Mouse protein annotation kegg-related xrefs (e87)

There are 1,091 kegg-related xrefs missing from the mouse core database. This will be fixed in e88.

Human merged miRNA genes (e85, e86 and e87)

The miRNA genes whose exons overlap with other gene exons have been wrongly merged into them. These miRNAs genes should have been kept as separated genes. There are 304 affected transcripts out of 1,876 transcript models.

CTCF (e86)

The human and mouse Regulatory Builds are missing CTCF elements.

Regulatory Features (e86)

There is no stable id mapping for regulatory features nor underlying structure (motifs) in regulatory features. All “Open Chromatin” and “TF binding site” regulatory features have ‘NA’ activity levels across all cell types.

Gene-order conservation score (e86)

We were unable to compute the gene-order conservation score for about 7% of the orthologues and the score computed for the new assemblies (chicken, macaque, mouse-lemur) may be wrong where the assembly significantly changed.

Gene gain/loss trees (e85 and e86)

Some p-values are mistakenly reported as 0 instead of 1. Data will be fixed in e87, but in the meantime you can consider the p-value to be 1 when the gene-count does not change over a branch.

Z-menu Human Regulatory Features (e85)

Transcription factors are not displayed in the Z-menu of a regulatory feature. This is also affecting Ensembl Variation, as it was not possible to calculate the overlap between variants and motif features.

CTCF (e85)

The human Regulatory Build is missing CTCF regions/peaks.

Histone modifications (e85)

Following histone modifications are missing:
H2AK5ac
H2AZ
H2BK120ac
H2BK12ac
H2BK15ac
H2BK20ac
H2BK5ac
H3K14ac
H3K18ac
H3K23ac
H3K23me2
H3K4ac
H3K56ac
H3K79me1
H3K79me2
H3K9me1
H4K5ac
H4K8ac
H4K91ac

BLUEPRINT epigenomes (e85)

There are six additional BLUEPRINT epigenomes that were added by mistake despite not having the complete set of histone modifications necessary:
CD38- naïve B cell (CB)
CD38- naive B cell (VB)
CD4+ ab T cell (CB)
CD8+ ab T cell (VB)
EM CD8+ ab T cell (VB)
Naïve B cell (To)

API Tutorial (e85)

Due to significant changes in the database schema our API tutorial is out of date.

Incorrect configuration of VEP on the human GRCh37 site

Apologies but there was a misconfiguration of our GRCh37 website for about 4 days between 27/05/16 (1700h GMT) and 31/05/16 (1000h GMT). The consequence of this is that the assembly version associated with VEP jobs run during this period will be incorrectly recorded as GRCh38. However the analysis was performed against the GRCh37 assembly and the coordinates and consequences will be correct. This error will not have affected analysis ran on the main GRCh38 Ensembl website, or those run using the command line script. If you have any further queries about analyses you ran during this period please contact ensembl-helpdesk@ensembl.org.

Wrong variant consequences affecting the Ensembl and variation marts in release 84

The variant consequences for human, mouse, cow, dog and pig were wrong in the release 84 gene and variation marts between Wednesday 9th of March and Friday 8th of April 2016.

Missing Transcription Factor Motifs in release 84

We aim to fix this problem in release 85.

BHLHE40
CTCF
Cfos
Cjun
Cmyc
Egr1
FOXA2
Gabp
IRF4
MEF2A
MEF2C
NFKB
Nrsf
PU1
Pax5
RXRA
Srf

Missing transcript annotations for HTA-2_0 in release 84

We aim to fix this problem in release 85.

Please note that the alignments are still available on our current live site, and the transcript annotations are still available through the archive site:
http://Dec2015.archive.ensembl.org/index.html

Corrupted gene targets for TarBase in human

The genes targets associated to Diana TarBase miRNA features have been corrupted on human chromosomes 7, 14 and 20 in Ensembl 83.

Duplicated CCDS genes in mouse

The mouse otherfeatures database contains CCDS (consensus coding sequence) genes for both Ensmebl release 81 and 82. This means that all those present in 81 have been duplicated. The problem will be fixed in release 83.

GFF Export of UTR regions from the website

We have discovered a long standing bug in our export code that can cause the coordinates of the start and the end of coding sequences (CDS) to be incorrectly reported. Specifically, where the CDS start (or end) lies in the middle of an exon, then the CDS coordinates reported are those of the start (or end) of the exon itself rather than the correct location. The problem is only seen when exporting features from Location View in GFF(3) format using the ‘Export Data’ link. It is present in live and all archive versions of the site, but not in the files on our FTP site. We will rectify this by the time Ensembl is updated to release 84.

APPRIS attributes missing in e79 Biomart

The APPRIS attributes are not retrievable in the e79 Ensembl Biomart release. This issue will be rectified in the e80 Ensembl release.

Missing transcript annotations in release 79

GRCh38
HuEx-1_0-st-v2
HuGene-1_0-st-v1
HuGene-2_0-st-v1

HTA-2_0

GRCh37
HuEx-1_0-st-v2
HuGene-1_0-st-v1
HuGene-2_0-st-v1

HC-G110
HG-Focus
HG-U133A
HG-U133A_2
HG-U133B
HG-U133_Plus_2
HG-U95A
HG-U95Av2
HG-U95B
HG-U95C
HG-U95D
HG-U95E
HuGeneFL
PrimeView
U133_X3P

This issue will be fixed in release 80.

Please note that the alignments are still available on our current live site, and the transcript annotations are still available through the respective archive sites:
GRCh38 (e78): http://dec2014.archive.ensembl.org
GRCh37 (e75): http://feb2014.archive.ensembl.org

Genes names with additional semi-colon in release 78

Some semi-colons (‘;’) have been left in gene names assigned using Uniprot gene names.
The display has been fixed on the website, and the release 78 API will mask the problem but it remains in biomart results. The GTF files on our FTP site have been fixed, but there was a period of vulnerability between 03/12/2014 and 14/02/2015. If you downloaded GTF files in this period, please replace them with a fresh download.
This affects 16 species: human, mouse, marmoset, guinea pig, cat, cod, turkey, ferret, microbat, pig, tetraodon, platyfish, orangutan, nile tilapia, gibbon and spotted gar

Ensembl mart mouse Affy Moex 1 0 st v1 probeset ids in e78

The mouse Affy Moex 1 0 st v1 probeset ids in the Ensembl mart 78 contains an extra semi-colon. This issue will be fixed in release 79.

VEP cache incomplete in chromosome Y in e77

We have corrected a bug in the VEP version 77 cache files for human that were released at the start of October 2014. The October files were missing some transcripts on the Y chromosome and so VEP requests for variants on Y that fell within some genes were erroneously called as ‘intergenic’. As of November 18th 2014 this is fixed for the websites, off-line script and REST API.

For script users, please update your cache files with these new ones from here:
ftp://ftp.ensembl.org/pub/current_variation/VEP/homo_sapiens_vep_77_GRCh37.tar.gz
ftp://ftp.ensembl.org/pub/current_variation/VEP/homo_sapiens_vep_77_GRCh38.tar.gz

EPO alignments in e76/77

The coverage of the EPO alignments on the cat (Felis catus) genome has decreased from 89.58% base pair coverage (in release 75) to 58.20% base pair coverage (in releases 76 and 77). This was caused by the use of an old set of anchor sequences (these sequences are used in the first stage of the generation of the EPO alignments) which where missing cat-specific sequences. This will be rectified in the next EPO alignment build.

LRG genes missing from Ensembl Families, release 76

This is a due to a lack of synchronisation between different pipelines. The issue will be addressed in the future releases, but the data will be missing in e76.

Mis-assigned HGNC names in human, release 76

Due to a bug in our external references mapping pipeline, 2,373 HGNC symbols have been mis-assigned, corresponding to 2,570 genes.
Another 34,520 HGNC symbols have been correctly assigned to 31,775 genes.
This issue will be fixed in release 77.
If in doubt regarding an assigned HGNC symbol, please check whether other external references, for example EntrezGene or Uniprot, confirm that symbol.
These erroneous entries can be identified in our database as having the info_text ‘Generated via ccds’.

Gene gain/loss trees, release 75

Due to a bug in our gene gain/loss analysis pipeline, the predicted numbers of ancestral genes are all set to 0. We advice to use the data of Ensembl 74 if you have to stick to the GRCh37 assembly of the human genome, or switch to a more recent release otherwise.

Incorrect TarBase data, release 75

The coordinates of the TarBase data from mouse are largely incorrect due to a problem with a projection between assemblies.
The TarBase data from Human contains some duplicate entries and the features are not ordered in ascending coordinates. This affects only queries through BioMart or the API fetch_all function.
These issues will be corrected in release 76.

Individual genomes data in the Location Resequencing View, release 74

The differences between the reference sequence and the genome sequences of James Watson and Craig Venter are not available in the Location: Resequencing View in Ensembl release 74. These data have not changed and are available in release 73 in the archive site. They will be re-instated in release 75.

Incorrect sequence for chicken chr Z – Updated

Since the new chicken assembly was released in April 2013 (release 71), there has been a problem with chromosome Z. In particular, we have incorrectly used contig AC186840.3 instead of AC186840.2 for scaffold JH375087.1. This will be fixed as soon as possible, for the next release (e74) in November 2013. Chicken chromsome Z was also incorrect on our Pre! site from January 2012 – April 2013.

The correct chromosome Z is now available on our FTP site: ftp://ftp.ensembl.org/pub/release-73/fasta/gallus_gallus/dna/. Apologies for the inconvenience.

Problems with ENCODE WGBS data

The Encode whole-genome bisulfite sequencing data (GEO ref: GSE 40832) have been flagged as erroneous by its producers, namely the strand column contains errors. New data files are expected to be deposited soon in GEO.

Missing human ncRNA genes

In release: 72

Due to an error in the ncRNAs import process, there are 99 ncRNA genes which are missing from the human gene set:

ENSG00000194647, ENSG00000199438, ENSG00000199537, ENSG00000199789, ENSG00000200000, ENSG00000200280, ENSG00000200285, ENSG00000200654, ENSG00000200837, ENSG00000201061, ENSG00000201103, ENSG00000201686, ENSG00000201784, ENSG00000201976, ENSG00000202181, ENSG00000202294, ENSG00000202323, ENSG00000202641, ENSG00000206696, ENSG00000206753, ENSG00000206830, ENSG00000207040, ENSG00000207315, ENSG00000207427, ENSG00000207447, ENSG00000207498, ENSG00000207718, ENSG00000207787, ENSG00000207793, ENSG00000207809, ENSG00000208007, ENSG00000208011, ENSG00000208342, ENSG00000211521, ENSG00000212300, ENSG00000215944, ENSG00000216036, ENSG00000221311, ENSG00000222234, ENSG00000222417, ENSG00000222687, ENSG00000222944, ENSG00000222946, ENSG00000223010, ENSG00000223279, ENSG00000223292, ENSG00000238353, ENSG00000238439, ENSG00000238461, ENSG00000238505, ENSG00000238547, ENSG00000238636, ENSG00000238682, ENSG00000238779, ENSG00000238944, ENSG00000238994, ENSG00000239062, ENSG00000239071, ENSG00000239088, ENSG00000239187, ENSG00000239337, ENSG00000239421, ENSG00000239688, ENSG00000239800, ENSG00000240379, ENSG00000240620, ENSG00000242926, ENSG00000243133, ENSG00000243835, ENSG00000243922, ENSG00000244684, ENSG00000251736, ENSG00000251820, ENSG00000251845, ENSG00000252360, ENSG00000252384, ENSG00000252527, ENSG00000252564, ENSG00000252665, ENSG00000262405, ENSG00000263454, ENSG00000264132, ENSG00000264152, ENSG00000264394, ENSG00000264460, ENSG00000264581, ENSG00000264854, ENSG00000265276, ENSG00000265701, ENSG00000265805, ENSG00000266031, ENSG00000266067, ENSG00000266351, ENSG00000266506, ENSG00000266623, ENSG00000266661, ENSG00000266742, ENSG00000266752

Stable ID ENSG00000199654 in Ensembl release 72 has been wrongly assigned to the corresponding gene on the patch rather than the parent gene, which is missing.

Polyphen predictions, release 71.

Polyphen predictions are not available for human variants or proteins which are novel to release 71. No Polyphen data is available through BioMart for this release (Polyphen predictions are available through the Ensembl variation 70 mart here http://jan2013.archive.ensembl.org/biomart/martview/.)

Updated information will be available in release 72

Drosophila funcgen DB release 70

The gene set in the fruitfly core database was updated from FlyBase version 5.39 to 5.46, but the regulation database was not updated correspondingly. Many transcript IDs remained the same (22,659/23,657 = 96%) but some were removed (998/23,657 = 4%), and others were added (4,257).  Consequently, some mappings between probesets and FlyBase transcripts (2,532/51,711 = 5%) refer to transcript IDs that are no longer current, and mappings for new transcript IDs do not exist (10,773/59,952 = 18%).  A small number of mappings between REDfly annotations and FlyBase genes refer to gene IDs that have been replaced (3/339 = 1%). BioTIFFIN regulation features have no explicit links to FlyBase data, so are unaffected.

A regulation database with up-to-date mappings between probes and transcripts will be available in EnsemblGenomes release 17, scheduled for 29 Jan 2013, and in Ensembl release 71.

BioMart release 70: missing mouse strains

In release 70 Ensembl variation mart

There were some changes made to the variation sample table that has meant approximately half of the mouse strains are missing from the variation mart database. You will be able to see the available strains here:

Filter-> GENERAL VARIATION FILTERS-> Limit to variations from strain(s)

As an alternative, please use the Ensembl variation 69 mart here:

http://oct2012.archive.ensembl.org/biomart/martview/

The strains will be added again for release 71

Rat eQTLs

In Release 70:

The eQTL data is not available for rat for this release. These data were not available for us to download for the new assembly, Rnor_5.0 at the time we prepared our databases.

Ensembl read-through transcripts

In releases: up to and including 72

We have identified 15 Ensembl read-through transcripts which have not been assigned the correct gene due to a bug in our Ensembl-HAVANA merge code.

There are 11 human genes affected whose HGNC names are: CFB, ARHGAP8, AKAP2, CFHR4, C20orf141, FDX1L, APOC2, TNNI3K , C1QTNF5, APITD1 and TMEM189. The Ensembl read-through transcripts within these genes should have been annotated as part of other neighbouring genes.

We are currently working on a fix for this issue.

Regulation: Chr Y blacklist filtering

In Releases: 64-69

Peak calls based on ChIP-Seq and DNase1 date are filtered using a list of black list regions curated by the ENCODE project.  In release 64-69, a bug was introduced caused by the addition of filtering support for the Y pseudo-autosomal regions in human. This resulted in all black list regions on the human Y chromosome (including the PARs) being omitted from the filtering. The effect of this is two fold: PAR regions appear to have duplicate data at a given location, as data from the corresponding X PAR is projected across; a small amount of low quality regulatory features (~150-200) and associated supporting evidence have not been filtered out. This will be rectified in the release 70.

BioMart release 69 bugs.

In Release: 69

1) Ferret (Mustela putorius furo) is missing Orthologs, possible Orthologs and Paralogs in the filter and attribute sections.
2) Ferret (Mustela putorius furo) and Platyfish (Xiphophorus maculatus) are missing the following id list limit filters in the gene section: ensembl_gene_id, ensembl_transcript_id, ensembl_protein_id and ensembl_exon_id.
3) The Homologs attributes section is not working when using a second dataset.
These issues will be fixed for release 70.

Ensembl-annotated lincRNA genes

In Release: 68

The mouse gene set for Ensembl release 68 is missing approximately 700 Ensembl-annotated lincRNA genes. These genes will be incorporated in the gene set as part of the standard Ensembl annotation of mouse for e70.

Mis-assignment of Canonical Transcripts in Mouse

In Release: 68

There are [700] mouse genes that have not been assigned the correct canonical transcript due to the CCDS transcripts not being prioritised over other transcripts. Side-effects include a reduced number of orthologs to other species. The issue will be fixed in Ensembl 70.

Missing stable ids for mouse estgene exons

In Release: 68

For the EST alignment based gene models, stable ids are missing on the exon level. This issue will be fixed in Ensembl 68.

Mis-assignment of Canonical Transcripts in Human

In Release: 68

The number of human genes not using the same canonical transcript as was declared in Ensembl 67 has risen by 5%. Side-effects include a reduced number of orthologs receiving annotations (display names and GO terms) from human genes. The issue will be fixed in Ensembl 69.

Ensembl Gene mart UTR start and end coordinate error has been fixed.

In Release: 67

There was a bug in the Exon.pm module that led to the miscalculation of the 5′ and 3′ UTR coordinates for the Ensembl Gene mart in release 67 (9th May). This issue has now been fixed in the API and the Ensembl Gene mart has been patched.  The fixed database has been pushed to the live site, the public mysql database and the FTP site (25th May 2012). The BioMart central portal (www.biomart.org) been made aware of the issue and will update the version on their portal as soon as possible. If you are using the biomaRt package from BioConductor, please set your host to www.ensembl.org to get the most up to date version until the fix has been made live on the BioMart central portal.

BioMart SIFT and PolyPhen scores

In Release: 67

The SIFT and PolyPhen scores are not available in the filters and attributes section in the variation mart for this release.

As a work around, you can get these scores in the variation attributes section of the Ensembl mart. This issue will be fixed for release 68.

Ensembl-annotated lincRNA genes

In Release: 66

The Human gene set for Ensembl release 66 is missing approximately 300 Ensembl-annotated lincRNA genes. These genes will be incorporated in the gene set as part of the standard Ensembl annotation of human for e67.

BioMart release 65 bug

In Release: 65

There is an issue with the retrieval of UniProt/TrEMBL Accession(s) and UniProt/Swissprot Accession(s) in the filters and attributes for Drosophila melanogaster in the Ensembl Gene mart. As a workaround, one can use the Ensembl Genomes’ metazoa mart (http://metazoa.ensembl.org/biomart/martview/) where a query against UniProt does not result in the same error. This issue will be fixed for release 66.

Regulation GFF dumps

In Releases: 63 & 64

Start and end loci in the RegulatoryFeature and AnnotatedFeature GFF dumps are truncated to the nearest mega base. This was due to the implementation of a Slice Iterator in the dump script which erroneously used local coordinates of the 1MB slices used to perform the dumps. These have now been corrected on the ftp site.

BioMart “ID list limit” filter issues

In Release: 64

There is an issue with the “ID list limit” filters in the Ensembl Gene mart for RefSeq mRNA, RefSeq mRNA predicted, RefSeq ncRNA and RefSeq ncRNA predicted. If one inputs a RefSeq accession for one of these categories, it throws a “table does not exist” error.

As a work around, users can still use the “Limit to genes” filter and download a list of all genes that have, for example, RefSeq mRNA external references and then filter for their NM_* accessions of interest. This bug will be rectified for release 65.

The PFAM IDs have also had versions added to the ID during the running of the protein annotation pipeline (e.g. PF07654.8). This makes it difficult to use the “ID list limit” filter in BioMart. This version will be removed for release 65.

Missing HGNC symbols

In Release: 64

The Xref mapping system has failed to map HGNC symbols for 175 human genes, which had symbols in Ensembl 63, none of which are active CCDS entries. This will also affect genes in species that receive projected human display names. {filelink=3}

Missing orthology data in Gorilla

In Release 64.

With the update of the gorilla assembly, several gorilla genes have been misplaced in the gene trees due to a problem while extracting their genomic sequences in our pipeline. This has affected about a thousand genes. As a result, the orthology predictions for these genes is missing or inaccurate. The missing orthology relationships are often found in the set of  ‘possible orthologs’. However we recommend using Ensembl 63 for gorilla orthologs.

Rat Codelink alignments and transcript annotations

Up to and including release 63.

During the array import and mapping process for the rat Codelink array, a fasta file was erroneously truncated. This did not affect the import of the array design (i.e. probes), hence passed our current health checks. However, it did impact on the genomic and transcript alignment steps, resulting in only 30% of the probes being aligned to the genome rather than ~90%. In turn this impacted on the transcript annotation step which assigned xrefs to only ~15% rather than ~50% of the Codelink probes. A new health check will be added to the array mapping pipeline to prevent this in future. Updated MySQL dumps are available here:

ftp://ftp.ensembl.org/pub/misc/codelink_fix_rattus_norvegicus_funcgen_63_34.tar.gz

Protein alignments in GeneTrees

In Release: 62

We have been using an experimental extension of M-Coffee, the exon-disaligner (AKA decaf module), in the last few releases. In short, we inform M-Coffee about the exon boundaries in order to reduce the amount of over-alignments spanning exon boundaries. The exon-disaligner module has been disaligning too much sequence in the alignments in e!62. As a result, dN/dS values and similarity stats are unreliable in the affected alignments.

Read more on the GeneTree pipeline

Human RegulatoryFeature Stable IDs

In Release: 62

The stable ID mapping procedure produced some erroneous results for the Human Regulatory Feature sets. Approximately ~150k out of a total of ~445k ‘MultiCell’ Regulatory Features were erroneously assigned new stable IDs rather than being projected from the previous Regulatory Build (v61).

Transcript names for human, mouse and zebrafish

In Release: 62

Transcript names in human, mouse and zebrafish are suffixed with a number starting with either ‘0’ or ‘2’. If the number starts with ‘0’ then it is a merged or manually curated transcript from Havana/Vega. If the number starts with ‘2’ then it is an automatically annotated transcript from Ensembl. For release 62, some of the transcript numbers have been set incorrectly. To know whether a transcript is merged, from Ensembl or from Havana, see the “Prediction Method” line on the Transcript Summary page.
2257 merged or Havana human transcripts have the transcript number starting ‘2’.
1029 Ensembl human transcripts have the transcript number starting ‘0’.
980 merged mouse transcripts have the transcript number starting ‘2’.
19 Ensembl mouse transcripts have the transcript number starting ‘0’.
324 merged or Havana zebrafish transcripts have the transcript number starting ‘2’.
234 Ensembl zebrafish transcripts have the transcript number starting ‘0’.

Gene set on human haplotypes

In Release: 62

The Human gene set for Ensembl release 62 is missing gene annotation from the Ensembl automatic pipeline on the haplotypes. All but two of these haplotypes still contain gene annotation as imported directly from Havana. We plan to generate Ensembl annotation for all haplotypes for e63.
The following haplotype regions have annotation from Havana only: HSCHR6_MHC_APD, HSCHR6_MHC_COX, HSCHR6_MHC_DBB, HSCHR6_MHC_MANN, HSCHR6_MHC_MCF, HSCHR6_MHC_QBL, HSCHR6_MHC_SSTO.
The following haplotype regions have no gene annotation: HSCHR17_1, HSCHR4_1.

Canonical transcripts and gene trees (Human and mouse)

In release: 61

A subset of genes in human and mouse have their canonical transcript set incorrectly. The canonical transcripts are used by the Comparative Genomics team to generate gene trees and so this bug has also caused some gene trees to be incorrect. For human, 3393 of 53515 genes (6.34%) have an incorrect canonical transcript. For mouse, 2072 of 36817 genes (5.63%) have an incorrect canonical transcript. This will be fixed for e62.

Missing SNP status (Human)

In release: 61

We are missing information about validation status for most human variations due to this data being unavailable from dbSNP at the time of import. The validation status for each rsId (exported from dbSNP on
2011-01-20) is available via FTP as a tab-separated file.

Consequence Types in Mart (all species)

In release: 58

The attribute “Consequence Type (Transcript Variation)” is missing from the Ensembl Mart due to a mart building bug. The correct transcript consequence for a SNP may be found by either using the Variation API or by using the Variation Mart.

Missing UniProtKB/Swiss-Prot secondary accessions in human (Human)

In release: 57

There are no UniProtKB/Swiss-Prot secondary accessions for human, due to a change in the way we obtain Uniprot-Ensembl mappings. Previously these were stored as synonyms of the primary accessions, and, although they were not visible on the website they were searchable and available via BioMart. The UniProtKB/Swiss-Prot secondary accessions will be restored for Ensembl release 58. Users with a need to use the secondary accessions are advised to use Ensembl release 56 until Ensembl 58 is released.

Variation flanking sequence (Human, Mouse, Zebrafish, Rat, Cow)

In releases: Up to and including 57

For a small number of variations, the flanking sequences displayed in the variation property tab may contain sequence permutations. This situation arises when the flanking sequence is a composite of a sequence which has been determined by an experimental assay and sequence extracted from e.g. a genomic database. The number of affected variations in Ensembl release 57 for respective species is shown in the table below.

SpeciesAffected Variations
Human11,237
Cow363
Rat11,536
Mouse101
Zebrafish6,156

Incorrect consequence type (Human, Mouse, Rat)

In releases: Up to and including 57

146593 variants in human, 739 in mouse, and 26947 in rat have the wrong consequence and should have consequence type of “INTERGENIC” as they fall in an area with no transcript. e.g. rs7298705 is non-synonymous but it should be intergenic.

SNP flanking sequence (Human)

In release: 56

It has come to our attention that some code operating on the variation database flanking_sequence table failed for 1,421,205 SNPs which originally mapped to the reverse strand. Although the website reports the SNPs as being on the forward strand, the displayed flanking sequences are from the reverse strand. Only the flanking sequence is affected; the genotypes and alleles are correct.

Source name misspelling (Human)

In release: 56

Watson’s entry in the source table is misspelled as “ENSENBL:Watson”

rsIDs not merged (Human)

In release: 56

In Ensembl 56, rsIDs were not merged, leaving ~25,000 extra rsIDs in variation/variation_feature that should be in variation_synonym.

Catarrhini primates EPO alignments (Human, Chimp, Macaque, Orangutan, Gorilla)

In releases: 55 – 56

Due to a bug in Ortheus, all the internal gaps in these alignments are shifted by 1 position.

Tetraodon BLAST indices corrupted (Tetraodon)

In releases: 53 – 56

The Tetraodon blast index is corrupted; between release 53 and release 56, the Tetraodon genome was only partially indexed – the following chromosomes were ABSENT from the blast-db: 4, 6, 7, 8, 9, 11, 14, 15, 16, 18, 19, 20, 21, MT.

Eutherian mammals EPO alignments (multiple species)

In releases: 49 – 56

Due to a bug in Ortheus, all the internal gaps in these alignments are shifted by 1 position. This will have also affected the GERP constraint elements we derive from these alignments.

Eutherian mammals alignment (multiple species)

In release: 55

In Ensembl 55, the web interface incorrectly lists “10-way eutherian mammals EPO”, which is not actually present in the database. A 9-way eutherian alignment is available in the database and via API. You can also download the EMF files from our FTP server at ftp://ftp.ensembl.org/pub/release-55/emf/ensembl-compara/epo_9_eutherian/

Mouse Regulatory Features (Mouse)

In releases: 54 – 55

For release 54 to release 55 there was a number(~3.7%) of duplicate RegulatoryFeatures. These were present on the later parts of chromsomes 1 and 17. The duplicates were removed in release 56

ENSEMBL:Sanger SNPs (Mouse)

In releases: Up to and including 53

Up until and including release 53, all SNP data with sample name “ENSEMBL:Sanger” should be on mouse strain “C3HeB/FeJ” and not “C3H/HeJ”

LD calculations (all species)

In releases: Up to and including 53

Up until release 53, there was an error in the linkage disequilibrium calculation script that was causing the values of r2 and D’ to be incorrect in some cases and also miscalculated about 5% of the set of tag SNPs.

Probeset Annotations (Human, Mouse)

In releases: 43 – 49

Between releases 43 and 49 the probeset transcript annotation method contained a bug where in some instances probes were being assigned to transcripts on the wrong strand. The effect on the final transcript annotations varied across dependant on the species in question, with the human and mouse having approximately 10% of annotations affected.

Errors in human sequence near haplotypic regions

In releases: 38-46

We found a problem in the human genome sequence for version 46, such that there are only N’s in the region Chr5:70946027-71169807. This appears to be a mistake in the mapper that was used to position the nearby haplotype (c5_H2, at positions 68965368-70760237). These Ns are not a result of repeat masking. Similar errors may be present in previous releases of this assembly, but are correct from release 47 onwards.

We therefore recommend downloading NCBI36 sequence data from release 54, the last Ensembl release with this assembly.