Future Plans

The following updates are planned for upcoming releases of Ensembl.

Please note that we have no fixed timeline for most of these items

Gene annotation

  • Genebuilds in progress: Zebrafish (GCA_000002035.3) and Rat (GCA_000001895.4)
  • Upcomong genebuilds: Crab-eating macaque (GCA_000364345.1) and Atlantic salmon (GCA_000233375.3).
  • Ensembl release 79 (expected March 2015):
    • Updated gene set for human – GENCODE 22
  • Ensembl release 80 (expected June 2015):
    • Zebrafish GRCz10 genebuild with manual annotation from HAVANA
    • Rat Rnor_6.0 genebuild with manual annotation from HAVANA

     

  • Regular updates
    • Minor assembly updates for human and mouse:  incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged when patches are added.
    • Planned updates to human, mouse, rat and zebrafish gene sets:  incorporation of HAVANA manual annotation. For mouse, the gene set is updated every release. For human and zebrafish, the gene sets are updated every second release.
    • CCDS: all CCDS models are included in the human (GENCODE) and mouse gene sets

Comparative Genomics

  • New widget for tree visualisation
  • Incorporate an HMM-based classification of protein sequences for the Gene Trees and Families pipelines
  • Improved detection of partial / split genes
  • Annotate ohnologs
  • Extend the EPO multiple alignment pipeline to all vertebrates
  • Prediction of ancestral protein sequences

Variation updates

  • Continue to import new variation data from dbSNP and DGVa where available
  • Improve variation annotation using data from the 1000 Genomes Project once the phase 3 data is accessioned.
  • Continue to import genome wide association study phenotypes for variants from the NHGRI catalog, and variants and phenotypes from OMIM, Orphanet and OMIA.
  • Import variation changes of somatic origin (from COSMIC)
  • Include phenotype data for structural variants
  • Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources

Core API and schema

  • Switchable adaptors to serve data from sources other than MySQL databases
  • Megabase sized feature density tracks
  • New implementation for the get nearest feature method
  • More efficient external reference assignment pipeline
  • New REST server

Regulation

  • Nearest gene/feature tool
  • Many more cell types (Roadmap Epigenomics, Blueprint, HipSci…)
  • Attach regulatory elements to genes via eQTLs, chromatin conformation data, etc.
  • Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions
  • Integrating RNA-seq data with regulatory element annotation
  • Web display developments:
    • Further refinements of wiggle track config/display including track highlighting
    • MotifFeature view incorporating variation consequences
  • Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
  • Incorporating ChIP-seq data from further species for possible
    additional regulatory builds e.g. Schmidt et al (PMID: 20378774)
  • Investigate regulatory feature orthologs and/or comparative views

New web features

  • Motif feature display
  • Mobile-friendly version of website (currently in beta testing at m.ensembl.org)
  • Rework of Export / Download functionality to both update codebase and improve usability
  • Review variation views to cope with even more data

Biomart

  • Investigate ways to improve scalability and retrievability of the data from the various marts.
  • Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.