Future Plans

The following updates are planned for upcoming releases of Ensembl.

Please note that we have no fixed timeline for most of these items

New datasets and scaling-up for large data expansion

  • Expanding Ensembl Rapid Release website allowing faster annotation and deployment of new species in a two weeks release cycle:
    • Adding REST server
    • Adding homologue data and possibly pairwise alignments
    • Adding Ensembl VEP
  • Expanding the breath of species with annotation and comparative analyses to Vertebrate Genomes Project (VGP) and Darwin Tree of Life (DToL):
    • Scaling-up the gene tree pipeline
    • Implementing Cactus as a scalable multiple genome aligner
    • Annotating non-vertebrate eukaryotes as part of DToL
  • Expanding the depth of our annotation of farmed animals and other species with high socio-economic value using transcriptomic data, Nanopore, IsoSeq, etc.
  • Defining community-based comparative species sets, e.g. farm animals, crops, VGP
  • Expanding multiple assemblies and comparative analysis within single species: breeds, strains, cultivars, ecotypes and haplotypes
  • Extending the regulatory build to species in the Functional Annotation of Animal Genomes (FAANG) Consortium
  • Capturing gene-gene interactions between microorganisms and host species in microbiomes and disease and presenting them in Ensembl, starting with gene-gene interactions from PHI-base
  • Enabling EVA-driven variation viewing on microbial species, starting with Plasmodium falciparum, Zymoseptoria tritici, Botrytis cinerea, S. cerevisiae and S. pombe
  • Update of metazoa species in collaboration with VEuPathDB (Eukaryotic Pathogen, Host and Vector Genomics Resource) and incorporation of reference assemblies and annotation generated by the Corbel, Infravec2 and African Cassava Whitefly Projects.
  • Update of crop reference genomes, their cultivars and annotation focusing on wheat (IWGSCv2.0), rice, maize, Sorghum and grapevines
  • New library of repeated elements in plant genomes

New Ensembl website

  • First public release MAP (minimal acceptable product):
    • Automated species processing to enable first seven species
    • Multiple transcripts on genome browser
    • Displays for summary and functional information of genes and transcripts
    • Search
    • Help documentation
    • Landing pages for species
  • Second public release:
    • BLAST and potentially Ensembl VEP
    • Increasing number of species
    • Extended documentation
    • Comparative data including homologues and alignments
  • Final public release:
    • Custom download
    • Variation and regulation views
    • Support for all species

Ensembl-HAVANA manual gene annotation for human and mouse

  • Increasing depth of manual annotation based on long transcriptomic data (PacBio, ONT)
  • Introducing a refined version of Ensembl/GENCODE basic supported by transcriptomic, proteomic, conservation and functional data
    • Support filtering of larger transcript set
  • Refinement of protein-coding gene annotation and coordination with other reference databases to ensure improvements are reflected broadly:
    • Refseq (CCDS, MANE)
    • UniProt (GIFTS)
    • MGI – for mouse
  • Introducing expert manual annotation of regulatory features, focussed on Enhancer-Promoter connections
  • Complete review of all computational-only transcript models in Ensembl/GENCODE geneset
  • Simplifying biotypes and capturing additional information about genes and transcripts in attributes
  • Lifting mouse annotation to GRCm39 and adding new annotation on novel and updated regions

Improved infrastructure, applications and tools

  • Improving robustness and consistency of the cross references pipeline, which adds associations between external resources and Ensembl identifiers 
  • Track Hub Registry re-implementation for improved maintainability 
  • Extending functionality of Transcript Archive (Tark)
  • Extending functionality of Genome Integration with FuncTion and Sequence  (GIFTS) – a common framework for UniProtKB/Ensembl mapping Public database of securely mapped sets of genes/proteins
  • A new REST API built to support our new Ensembl infrastructure
  • A new GraphQL service to provide an interface to Ensembl data and customisable API interaction
  • Upgrading Ensembl VEP to provide enhanced functionality for variant interpretation: improvements to core functionality and integration of relevant additional key datasets and further algorithms for the prediction of variant deleteriousness
  • FTP data explorer and support for GA4GH’s data retrieval service (DRS) to access these data