Future Plans

The following updates are planned for upcoming releases of Ensembl.

Please note that we have no fixed timeline for most of these items

New datasets and scaling-up for large data expansion

  • Expanding Ensembl Rapid Release website allowing faster annotation and deployment of new species in a two weeks release cycle:
    • Adding REST server
    • Adding homologue data and possibly pairwise alignments
    • Adding Ensembl VEP
  • Expanding the breath of species with annotation and comparative analyses to Vertebrate Genomes Project (VGP) and Darwin Tree of Life (DToL):
    • Scaling-up the gene tree pipeline
    • Implementing Cactus as a scalable multiple genome aligner
    • Annotating non-vertebrate eukaryotes as part of DToL
  • Expanding the depth of our annotation of farmed animals and other species with high socio-economic value using transcriptomic data, Nanopore, IsoSeq, etc.
  • Defining community-based comparative species sets, e.g. farm animals, crops, VGP
  • Expanding multiple assemblies and comparative analysis within single species: breeds, strains, cultivars, ecotypes and haplotypes
  • Extending the regulatory build to species in the Functional Annotation of Animal Genomes (FAANG) Consortium
  • Capturing gene-gene interactions between microorganisms and host species in microbiomes and disease and presenting them in Ensembl, starting with gene-gene interactions from PHI-base
  • Enabling EVA-driven variation viewing on microbial species, starting with Plasmodium falciparum, Zymoseptoria tritici, Botrytis cinerea, S. cerevisiae and S. pombe
  • Update of crop reference genomes, their cultivars and annotation focusing on wheat (IWGSCv2.0), rice, maize, Sorghum and grapevines
  • New library of repeated elements in plant genomes

New Ensembl website

  • First public release MAP (minimal acceptable product):
    • Automated species processing to enable first seven species
    • Multiple transcripts on genome browser
    • Displays for summary and functional information of genes and transcripts
    • Search
    • Help documentation
    • Landing pages for species
  • Second public release:
    • BLAST and potentially Ensembl VEP
    • Increasing number of species
    • Extended documentation
    • Comparative data including homologues and alignments
  • Final public release:
    • Custom download
    • Variation and regulation views
    • Support for all species

Ensembl-HAVANA manual gene annotation for human and mouse

  • Increasing depth of manual annotation based on long transcriptomic data (PacBio, ONT)
  • Introducing a refined version of Ensembl/GENCODE basic supported by transcriptomic, proteomic, conservation and functional data
    • Support filtering of larger transcript set
  • Refinement of protein-coding gene annotation and coordination with other reference databases to ensure improvements are reflected broadly:
    • Refseq (CCDS, MANE)
    • UniProt (GIFTS)
    • MGI – for mouse
  • Introducing expert manual annotation of regulatory features, focussed on Enhancer-Promoter connections
  • Complete review of all computational-only transcript models in Ensembl/GENCODE geneset
  • Simplifying biotypes and capturing additional information about genes and transcripts in attributes

Improved infrastructure, applications and tools

  • Improving robustness and consistency of the cross references pipeline, which adds associations between external resources and Ensembl identifiers 
  • Track Hub Registry re-implementation for improved maintainability 
  • Extending functionality of Transcript Archive (Tark)
  • Extending functionality of Genome Integration with FuncTion and Sequence  (GIFTS) – a common framework for UniProtKB/Ensembl mapping Public database of securely mapped sets of genes/proteins
  • A new REST API built to support our new Ensembl infrastructure
  • A new GraphQL service to provide an interface to Ensembl data and customisable API interaction
  • Upgrading Ensembl VEP to provide enhanced functionality for variant interpretation: improvements to core functionality and integration of relevant additional key datasets and further algorithms for the prediction of variant deleteriousness
  • FTP data explorer and support for GA4GH’s data retrieval service (DRS) to access these data