The following updates are planned for upcoming releases of Ensembl.
Please note that we have no fixed timeline for most of these items
New datasets and scaling-up for large data expansion
- Expanding Ensembl Rapid Release website allowing faster annotation and deployment of new species in a two weeks release cycle:
- Adding REST server
- Adding homologue data and possibly pairwise alignments
- Adding Ensembl VEP
- Expanding the breath of species with annotation and comparative analyses to Vertebrate Genomes Project (VGP) and Darwin Tree of Life (DToL):
- Scaling-up the gene tree pipeline
- Implementing Cactus as a scalable multiple genome aligner
- Annotating non-vertebrate eukaryotes as part of DToL
- Expanding the depth of our annotation of farmed animals and other species with high socio-economic value using transcriptomic data, Nanopore, IsoSeq, etc.
- Defining community-based comparative species sets, e.g. farm animals, crops, VGP
- Expanding multiple assemblies and comparative analysis within single species: breeds, strains, cultivars, ecotypes and haplotypes
- Extending the regulatory build to species in the Functional Annotation of Animal Genomes (FAANG) Consortium
- Capturing gene-gene interactions between microorganisms and host species in microbiomes and disease and presenting them in Ensembl, starting with gene-gene interactions from PHI-base
- Enabling EVA-driven variation viewing on microbial species, starting with Plasmodium falciparum, Zymoseptoria tritici, Botrytis cinerea, S. cerevisiae and S. pombe
- Update of crop reference genomes, their cultivars and annotation focusing on wheat (IWGSCv2.0), rice, maize, Sorghum and grapevines
- New library of repeated elements in plant genomes
New Ensembl website
- First public release MAP (minimal acceptable product):
- Automated species processing to enable first seven species
- Multiple transcripts on genome browser
- Displays for summary and functional information of genes and transcripts
- Search
- Help documentation
- Landing pages for species
- Second public release:
- BLAST and potentially Ensembl VEP
- Increasing number of species
- Extended documentation
- Comparative data including homologues and alignments
- Final public release:
- Custom download
- Variation and regulation views
- Support for all species
Ensembl-HAVANA manual gene annotation for human and mouse
- Increasing depth of manual annotation based on long transcriptomic data (PacBio, ONT)
- Introducing a refined version of Ensembl/GENCODE basic supported by transcriptomic, proteomic, conservation and functional data
- Support filtering of larger transcript set
- Refinement of protein-coding gene annotation and coordination with other reference databases to ensure improvements are reflected broadly:
- Refseq (CCDS, MANE)
- UniProt (GIFTS)
- MGI – for mouse
- Introducing expert manual annotation of regulatory features, focussed on Enhancer-Promoter connections
- Complete review of all computational-only transcript models in Ensembl/GENCODE geneset
- Simplifying biotypes and capturing additional information about genes and transcripts in attributes
Improved infrastructure, applications and tools
- Improving robustness and consistency of the cross references pipeline, which adds associations between external resources and Ensembl identifiers
- Track Hub Registry re-implementation for improved maintainability
- Extending functionality of Transcript Archive (Tark)
- Extending functionality of Genome Integration with FuncTion and Sequence (GIFTS) – a common framework for UniProtKB/Ensembl mapping Public database of securely mapped sets of genes/proteins
- A new REST API built to support our new Ensembl infrastructure
- A new GraphQL service to provide an interface to Ensembl data and customisable API interaction
- Upgrading Ensembl VEP to provide enhanced functionality for variant interpretation: improvements to core functionality and integration of relevant additional key datasets and further algorithms for the prediction of variant deleteriousness
- FTP data explorer and support for GA4GH’s data retrieval service (DRS) to access these data
