Both Ensembl release 97 and Ensembl Genomes release 44 are scheduled for mid- to late-June.
Included are a number of new livestock, fish, metazoan, plant, and protist genomes and genebuilds, as well as updates to the mouse and human GENCODE annotation and the human regulatory build.
Read on to see further details of the new data you can look forward to.
GENCODE updates and lncRNA biotype changes
Both human and mouse GENCODE genesets will be updated to versions 31 and M22 respectively.
This release will bring a significant update to long non-coding RNAs (lncRNAs), with several thousand new transcripts being added as a result of a new pipeline created by the GENCODE team: TAGENE. On a related note, there are also significant changes to the biotype categories of lncRNA transcripts in the GENCODE gene sets for human and mouse. There are currently nine biotypes classified under the header of ‘lncRNA’:
Terms 1-8 will be retired from release 97 onwards. Going forward, any transcripts that previously had these biotypes will now be referred to simply as lncRNAs. The exception will be for retained_intron transcripts which will remain unchanged, and will continue to be assigned in the future. However, terms 1-8 will now be stored as ‘legacy’ terms in the download files.
Updates to the human Regulatory Build: New data from Roadmap Epigenomics and TarBase
We are adding data from Roadmap Epigenomics for 13 new cell/tissue types. Along with this we have also curated our current cell/tissue types to condense some into a single record and to treat them as replicates in a unique cell/tissue type. Including the 13 new cell/tissue types we now have a total of 118 epigenomes. We are also updating our miRNA target features for human and mouse with this release; these are being imported from TarBase v8.0.
With these new data we will re-run our Regulatory Build pipeline, to refine and improve our regulatory feature predictions based on this new data.
New species and strains
- New pig cross-breed (Sus scrofa USMARC)
- New cattle cross-breed (Bos indicus X Bos taurus, maternal haplotype)
- New cattle cross-breed (Bos indicus X Bos taurus, paternal haplotype)
- Electric eel (Electrophorus electricus)
- Elephant shark (Callorhinchus milii)
- Barramundi perch (Lates calcarifer)
- Huchen (Hucho hucho)
- Bare nosed-wombat (Vombatus ursinus)
- Scrub typhus (Leptotrombidium deliense)
- Velvet mite (Dinothrombium tinctorium)
- Lancelet (Branchiostoma lanceolatum)
- Common liverwort (Marchantia polymorpha)
- 48 new genomes from the ENA across the following groups: Alveolata, Amoebozoa, Choanoflagellida, Cryptophyta, Euglenozoa, Fornicata, Heterolobosea, Parabasalia, Rhizaria, Stramenopiles.
Updated assemblies and annotations
- Three nematode species will be updated:
- Caenorhabditis elegans, gene annotation update
- Caenorhabditis briggsae, gene annotation update
- Pristionchus pacificus, genome assembly and gene annotation update
- Cocoa tree (Theobroma cacao), genome assembly update
- The Variant Effect Predictor (VEP) will report if a human GRCh38 transcript is the MANE Select.
- Transposable element genes added for yeast (Saccharomyces cerevisiae).
- Updated metadata for wheat (Triticum aestivum) EMS induced mutations.
- KASP marker information for the TILLING population will be displayed on variation pages for wheat.
- The Pan-taxonomic Compara set of gene trees has been updated and two new plant species added: Marchantia polymorpha and Brachypodium distachyon. Three species were removed: Synechocystis sp. 6803, Rhizobium leguminosarum bv. Viciae 3841 and Chondrus crispus.
Please note that these are intentions and are not guaranteed to make it into the releases.