Future Plans

The following updates are planned for upcoming releases of Ensembl.

Please note that we have no fixed timeline for most of these items

New human assembly: GRCh38

The Genome Reference Consortium released a new human genome assembly, GRCh38, which we have annotated and will release in the Ensembl browser in August 2014.
What does this mean for me? As GRCh38 is a new primary assembly, the genomic coordinates of many genes will shift in Ensembl release 76. Even though the coordinates of genes shift, this does not necessarily mean that their underlying DNA or transcripts will change.
What if I still want to use the old assembly, GRCh37? For now, the GRCh37 assembly is still the default human assembly on our website. From Ensembl release 76 (August 2014) onwards, the default human assembly will switch to be GRCh38. If your work is tied to GRCh37, you will still be able to access the data you need via our archive at grch37.ensembl.org.
Linking back to a previous Ensembl release: To link to a specific Ensembl archive site, remember to use the Permanent Link (found at the bottom left of each page) eg. http://jun2013.archive.ensembl.org/Homo_sapiens/Info/Index.
Finding the new coordinates for your region of interest: You may have stored a genomic location on GRCh37, and want to know where the equivalent region is on GRCh38. Simply add the old assembly name into the address bar. eg. www.ensembl.org/Homo_sapiens/Location/View?db=core;r=13:32889611-32973805;a=ncbi36. This will redirect you to new coordinates with a message “Your request for 13:32889611-32973805 in ncbi36 has been mapped to the new GRCh37 coordinates 13:33991611-34075805″.

Gene annotation

  • Genebuilds in progress: Vervet monkey (GCA_000409795.1), and Atlantic salmon (GCA_000233375.3).
  • Pre! sites in progress: for Fugu (GCA_000180615.2).
  • Ensembl release 76 (expected July 2014):
    • New assembly and gene set for human – GENCODE 20
    • Updated gene set for mouse – GENCODE M3
    • New species: olive baboon, Amazon molly
  • Regular updates
    • Minor assembly updates for human and mouse:  incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged when patches are added.
    • Planned updates to human, mouse, rat and zebrafish gene sets:  incorporation of HAVANA manual annotation. For mouse, the gene set is updated every release. For human and zebrafish, the gene sets are updated every second release.
    • CCDS: all CCDS models are included in the human (GENCODE) and mouse gene sets
  • Major assembly updates expected for: zebrafish, several of the low-coverage mammals. For more information about the human and zebrafish assembly updates, please visit the GRC.

Comparative Genomics

  • Widget for tree visualisation
  • Visualisation of the EPO trees
  • Incorporate an HMM-based classification of protein sequences for the Gene Trees pipeline
  • Improved detection of partial / split genes
  • Annotate ohnologs
  • Extend the EPO multiple alignment pipeline to all vertebrates
  • Prediction of ancestral protein sequences
  • Display protein domains on the gene tree alignments

Variation updates

  • Continue to import new variation data from dbSNP and DGVa where available
  • Improve variation annotation using data from the 1000 Genomes Project
  • Continue to import genome wide association study phenotypes for variants from the NHGRI catalog
  • Improve the web interface for the Variant Effect Predictor
  • Import structural variation changes of somatic origin (from COSMIC)
  • Include phenotype data for structural variants
  • Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources

Core API and schema

  • Switchable adaptors to serve data from sources other than MySQL databases
  • Megabase sized feature density tracks
  • New implementation for the get nearest feature method
  • More efficient external reference assignment pipeline
  • New REST server

Regulation

  • Improve the display of Diana TarBase miRNA targets
  • Combine Regulatory Build and Segmentation to give Integrated Regulatory Build track
  • Developing meta-annotation of Multi-cell regulatory elements
  • Nearest gene/feature tool
  • Further development of uniform signal processing
  • Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions
  • Integrating RNA-seq data with regulatory element annotation
  • Web display developments:
    • Further refinements of wiggle track config/display including track highlighting
    • MotifFeature view incorporating variation consequences
  • Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
  • Incorporating ChIP-seq data from further species for possible
    additional regulatory builds e.g. Schmidt et al (PMID: 20378774)
  • Investigate regulatory feature orthologs and/or comparative views

New web features

  • New interface for BLAST, BLAT, VEP and other tools
  • Motif feature display
  • Mobile-friendly version of website

Biomart

  • Investigate ways to improve scalability and retrievability of the data from the various marts.
  • Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.