We are pleased to announce the release of Ensembl 115, and the corresponding release of Ensembl Genomes 62. In this release – around 121,000 new protein-coding transcripts have been added to the GRCh38 human reference gene set. Two new breeds of cattle are now available (UOA_Tuli_1 and UOA_Wagyu_1) and the sheep reference has been updated to ARS-UI_Ramb_v3.0. Seven new plant species have been added: four oats, two garden peas, and one lablab bean. We have also added two new export modes for Newick trees which help to avoid stable ID clashes.
Regulation
Retired least used microarray species in 115
We have retired the following funcgen databases in 115:
Plants:
- Aegilops tauschii
- Arabidopsis halleri
- Arabidopsis thaliana
- Brassica napus
- Brassica oleracea
- Brassica rapa
- Glycine max
- Hordeum vulgare
- Nicotiana attenuata
- Oryza barthii
- Oryza glaberrima
- Oryza glumipatula
- Oryza indica
- Oryza longistaminata
- Oryza meridionalis
- Oryza nivara
- Oryza punctata
- Oryza rufipogon
- Phaseolus vulgaris
- Solanum lycopersicum
- Triticum aestivum
- Triticum dicoccoides
- Vigna angularis
- Vigna radiata
- Zea mays
Retirement of the Manhattan plot of eQTLs
The eQTL Catalogue has retired the API which underlies Ensembl’s Manhattan plot of eQTLs within the Gene view (accessed via the Regulation link). We have therefore had to remove this view from Ensembl. The eQTL Catalogue provides multiple options for accessing this data, but if this view in Ensembl was useful to you, please let us know more about your use case.
Human
Around 121,000 new protein-coding transcripts have been added to the GRCh38 human reference gene set based on long-read RNA-seq data using the TAGENE pipeline.
New Assemblies and/or Annotation
Livestock and Companion Animals:
We have added 2 new breeds of cattle:
- UOA_Tuli_1 (GCA_040285425.1)
UOA_Wagyu_1 (GCA_040286185.1)
Update to sheep reference assembly/annotation:
The sheep reference ARS-UI_Ramb_v2.0 (GCA_016772045.1) has been updated to ARS-UI_Ramb_v3.0 (GCA_016772045.2)
Plants
Additional Rice 3K variation data has been added for Oryza sativa (Rice; GCA_001433935.1). The 3000 Rice Genome Project is an international effort to sequence the genomes of 3,024 rice varieties from 89 countries.
Triticum aestivum Next Generation (TaNG) variation data has been added for Triticum aestivum (Wheat; GCA_900519105.1). The TaNG array was derived from 204 elite wheat lines and 111 wheat landraces from the Watkins ‘Core Collection’.
New Genomes
New plant species for 115
- Avena atlantica (Oat; GCA_910589765.1)
- Avena eriantha (Oat; GCA_910589775.1)
- Avena insularis (Oat; GCA_910574615.1)
- Avena longiglumis (Oat; GCA_910589755.1)
- Lablab purpureus Highworth (Lablab bean; GCA_030347555.1)
- Pisum sativum JI2822 (Garden pea; GCA_964186695.1)
- Pisum sativum Zhongwan6 (Garden pea; GCA_024323335.2)
New Non-Core plant species data for 115
- Oryza sativa 3k variation data (Rice; GCA_001433935.1)
- TaNG variation data (Wheat; GCA_900519105.1)
Compara
- We have deprecated Compara Perl API methods related to selective pressure statistics (e.g. dN/dS). Deprecated methods have not been scheduled for deletion.
- We have introduced two Newick export modes, which may be helpful when accessing gene trees with clashing stable IDs: “Genome and gene ID”, in which leaf names are composed of the genome name and gene stable ID of a gene, and “Genome and product ID”, where each leaf is the genome name and protein/ncRNA product stable ID.
Vertebrates:
- With the update to the sheep reference assembly/annotation, we have updated the Pig breeds gene-tree, pig-breed LastZ alignments and mammals EPO
- Murinae EPO has been updated to add Mus musculus molossinus
Plants:
- Three new plant species have been added to the default Protein trees – Lablab purpureus Highworth (Lablab bean; GCA_030347555.1), Pisum sativum JI2822 (Garden pea; GCA_964186695.1) and Pisum sativum Zhongwan6 (Garden pea; GCA_024323335.2)
- Protein trees were computed for the Hordeum vulgare pangenome, including 75 barley cultivars and relatives. The barley, rye and wheat reference genomes are also present in the Wheat cultivar protein trees. Barley cultivar gene trees may be accessed through barley genes, while wheat cultivar gene trees may be accessed via wheat and rye genes.
Metazoa:
- Insects Protein trees were updated
- We have updated the 46 Pangenome Drosophila Cactus and Pangenome Drosophila protein trees with 6 new genomes
Metazoa:
New assembly on existing species (assembly and annotation)
- Amyelois transitella (Moths, GCA_032362555.1)
- Bactrocera dorsalis (Oriental fruit fly, GCA_023373825.1)
- Bicyclus anynana (Squinting bush brown, GCA_947172395.1)
- Branchiostoma lanceolatum (Amphioxus, GCA_035083965.1)
- Caenorhabditis remanei (Nematode, GCA_010183535.1)
- Danaus plexippus (monarch butterfly, GCA_018135715.1)
- Dendroctonus ponderosae (Mountain pine beetle, GCA_020466585.2)
- Drosophila bipectinata (Pomace flies, GCA_030179905.2)
- Drosophila elegans (Pomace flies, GCA_018152505.1)
- Drosophila kikkawai (Pomace flies, GCA_030179895.2)
- Drosophila suzukii (Pomace flies, GCA_037355615.1)
- Drosophila takahashii (Pomace flies, GCA_030179915.2)
- Drosophila virilis (Pomace flies, GCA_030788295.1)
- Helicoverpa armigera (Cotton bollworm, GCA_030705265.1)
- Hydra vulgaris (Swiftwater hydra, GCA_038396675.1)
- Linepithema humile (Argentine ant, GCA_040581485.1)
- Lytechinus pictus (Painted urchin, GCA_037042905.1)
- Mercenaria mercenaria (Northern quahog, GCA_021730395.1)
- Musca domestica (House fly, GCA_030504385.2)
- Necator americanus (New World hookworm, GCA_031761385.1)
- Nematostella vectensis (Starlet sea anemone, GCA_932526225.1)
- Ostrea edulis (Mud oyster, GCA_947568905.1)
- Sarcoptes scabiei (Itch mite, GCA_020844145.1)
- Stomoxys calcitrans (Stable fly, GCA_963082655.1)
- Tribolium castaneum (Red flour beetle, GCA_031307605.1)
Updated assemblies
- Eufriesea mexicana (Mexican orchid bee, GCA_001483705.1 -> GCA_001483705.2)
- Myopa tessellatipennis (Flies, GCA_943737955.1 -> GCA_943737955.2)
Updated annotations
- Stylophora pistillata (GCF_002571385.2)
Entirely new species (assembly and annotation)
- Amblyomma americanum (Lone Star tick, GCA_030143305.2)
- Bactrocera oleae (Olive fruit fly, GCA_001188975.4)
- Bradysia coprophila (Black fungus gnats, GCA_014529535.1)
- Contarinia nasturtii (Swede midge, GCA_009176525.2)
- Drosophila montana (Pomace flies, GCA_035044405.1)
- Drosophila nasuta (Pomace flies, GCA_023558535.2)
- Drosophila novamexicana (Pomace flies, GCA_003285875.3)
- Drosophila serrata (Pomace flies, GCA_002093755.2)
- Drosophila sulfurigaster albostrigata (Flies, GCA_023558435.2)
- Drosophila tropicalis (Pomace flies, GCA_018151085.1)
- Lucilia sericata (Common green bottle fly, GCA_015586225.1)
- Ornithodoros turicata (Softbacked ticks, GCA_037126465.1)
- Photinus pyralis (Common eastern firefly, GCA_008802855.1)
- Schmidtea mediterranea (Freshwater planarian, GCA_045838255.1)
- Schmidtea mediterranea (Freshwater planarian, GCA_045838265.1)
- Schmidtea nova (Freshwater planarian, GCA_044892505.1)
- Schmidtea polychroa (Freshwater planarian, GCA_044892525.1)
- Steinernema hermaphroditum (Nematode, GCA_030435675.2)
- Tenebrio molitor (Darkling ground beetles, GCA_907166875.3)
- Vespa mandarinia (Asian giant hornet, GCA_014083535.1)
Compara reference updates
- Amyelois transitella – Updated to GCA_032362555.1, replaces GCA_001186105.1
- Bactrocera dorsalis – Updated to GCA_023373825.1, replaces GCA_000789215.2
- Bicyclus anynana – Updated to GCA_947172395.1, replaces GCA_900239965.1
- Dendroctonus ponderosae – Updated to GCA_020466585.2, replaces GCA_000355655.1
- Drosophila bipectinata – Updated to GCA_030179905.2, replaces GCA_000236285.2
- Drosophila elegans – Updated to GCA_018152505.1, replaces GCA_000224195.2
- Drosophila kikkawai – Updated to GCA_030179895.2, replaces GCA_018152535.1
- Drosophila suzukii – Updated to GCA_037355615.1, replaces GCA_013340165.1
- Drosophila takahashii – Updated to GCA_030179915.2, replaces GCA_018152695.1
- Drosophila virilis – Updated to GCA_030788295.1, replaces GCA_003285735.2
- Helicoverpa armigera – Updated to GCA_030705265.1, replaces GCA_023701775.1
- Linepithema humile – Updated to GCA_040581485.1, replaces GCA_000217595.1
- Musca domestica – Updated to GCA_030504385.2, replaces GCA_000371365.1
- Stomoxys calcitrans – Updated to GCA_963082655.1, replaces GCA_001015335.1
- Tribolium castaneum – Updated to GCA_031307605.1, replaces GCA_000002335.3
Variation updated
- Anopheles gambiae (GCA_000005575.1) – Fixes known bug reported in Release 61
The following species cores are outdated and have been dropped from Ensembl release 115 (Ensembl Genomes release 62):
Dropped but not included into Compara analysis:
- Galleria mellonella (GCA_003640425.2)
- Hyalomma asiaticum (GCA_013339685.1)
- Ixodes persulcatus (GCA_013358835.1)
Dropped, including from Compara analysis:
- Bombyx mori (GCA_014905235.2)
- Crassostrea gigas (GCA_902806645.1)
- Culex quinquefasciatus (GCA_000209185.1)
- Diabrotica virgifera (GCA_003013835.2)
- Drosophila ananassae (GCA_000005115.1)
- Drosophila erecta (Drosophila erecta)
- Drosophila grimshawi (GCA_000005155.1)
- Drosophila mojavensis (GCA_000005175.1)
- Drosophila persimilis (GCA_000005195.1)
- Drosophila sechellia (GCA_000005215.1)
- Drosophila simulans (GCA_000754195.3)
- Drosophila willistoni (GCA_000005925.1)
- Ixodes scapularis (GCA_000208615.1) – core and variation
- Biomphalaria glabrata (GCA_000457365.1) – core and variation
- Amyelois transitella (GCA_001186105.1) (clade – insects)
- Bactrocera dorsalis (GCA_000789215.2) (clade – insects)
- Bicyclus anynana (GCA_900239965.1) (clade – insects)
- Dendroctonus ponderosae (GCA_000355655.1) (clade – insects)
- Drosophila bipectinata (GCA_000236285.2) (Drosophila pangenome)
- Drosophila elegans (GCA_000224195.2) (Drosophila pangenome)
- Drosophila kikkawai (GCA_018152535.1) (Drosophila pangenome)
- Drosophila suzukii (GCA_013340165.1) (Drosophila pangenome)
- Drosophila takahashii (GCA_018152695.1) (Drosophila pangenome)
- Drosophila virilis (Drosophila pangenome)
- Drosophila virilis (GCA_003285735.2) (Drosophila pangenome)
- Helicoverpa armigera (GCA_023701775.1) (clade – insects)
- Linepithema humile (GCA_000217595.1) (clade – insects)
- Musca domestica (GCA_000371365.1) (clade – insects)
- Myopa tessellatipennis (GCA_943737955.1) (Drosophila pangenome)
- Stomoxys calcitrans (GCA_001015335.1) (clade – insects)
- Tribolium castaneum (GCA_000002335.3) (clade – insects)
The following databases are outdated and have been dropped in Ensembl release 115 (Ensembl Genomes release 62):
Other features databases dropped in Ensembl release 115 (Ensembl Genomes release 62):
Culex quinquefasciatus other_features (GCA_000209185.1)
Variation databases for the following species have been dropped in Ensembl release 115 (Ensembl Genomes release 62)
- Culex quinquefasciatus variation (GCA_000209185.1)
- Ixodes scapularis variation (GCA_000208615.1)
- Biomphalaria glabrata variation-(GCA_000457365.1)
Variation:
Updating ClinVar Import and Ensembl Variant Effect Predictor (VEP) handling
ClinVar has updated the way clinical significance is represented. Now, three types of variant classifications are available. The current clinical significance reported by Ensembl VEP remains the same, with the addition of one new type of data: somatic classifications from ClinVar.
As a consequence, ClinVar have updated their data schema. To accommodate the new data, the Ensembl Variation import script has been adapted.
Specific changes are:
Variation API updates
– We have added a new method to return the new somatic classifications from ClinVar.
Ensembl VEP updates
– We have updated VEP with new option to return new ClinVar somatic classification: –clinvar_somatic_classification
Ensembl VEP pipeline update
– We have updated the VEP dump pipeline to include the new somatic classification
Web update
– The variation phenotype page includes a new table to display the new somatic classifications
GENCODE Promoter Support
Ensembl VEP now supports GENCODE promoters through the –custom and gff_type=gencode_promoter option on the command line, or by selecting the “Report overlap with GENCODE Promoters” option in web Ensembl VEP.
Supporting Structural Variant Allele Frequencies and Clinical Significance
Web Ensembl VEP has two new options to enable reporting of structural variant allele frequency (from gnomAD) and clinical significance (from ClinVar). Both options have a range of selectable overlap percentages, up to requiring perfect match.
New Ensembl VEP Plugin – available for Command line interphase
MechPredict – This is a plugin for the Ensembl Variant Effect Predictor (VEP) that annotates missense variants with predicted dominant-negative (DN), gain-of-function (GOF), or loss-of-function (LOF) mechanisms derived from a Support Vector Classification (SVC) model (Badonyi et al., 2024).
Automation
New FTP Paths available
New FTP paths are available for data access:
Other updates and changes
- The Ensembl 99 (Jan 2020) and the Ensembl Genomes 45 (Sep 2019) archives are five years old and were retired with the release of Ensembl 115 and Ensembl Genomes 62.
- The Ensembl Virtual Machine is no longer available due to low demand.
