Future Plans

The following updates are planned for upcoming releases of Ensembl.

Please note that we have no fixed timeline for most of these items

Gene annotation

  • Gene annotation updates expected for Ensembl release 72: Human, mouse.
  • Genebuilds in progress: Sheep (GCA_000298735.1), collared flycatcher (GCA_000247815.1).
  • Upcoming genebuilds: Olive baboon (GCA_000264685.1),  spotted gar (GCA_000242695.1) and squirrel monkey (GCA_000235385.1).
  • New species expected:  grass carp, vervet monkey, budgerigar, and naked mole rat (Heterocephalus glaber).
  • Minor assembly updates for human and mouse: regular incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged.
  • Planned updates to human, mouse and zebrafish gene sets: regular incorporation of HAVANA manual annotation. For human, the gene set is updated every release. For mouse and zebrafish, the gene sets are updated every second release.
  • Major assembly updates expected for: human, zebrafish, several of the low-coverage mammals. For more information about the human and zebrafish assembly updates, please visit the GRC.

Comparative Genomics

  • Display super-gene tree
  • Display ncRNA alignments on their predicted secondary structure
  • Display protein domains on the gene tree alignments
  • Incorporate an HMM-based classification of protein sequences for the Gene Trees pipeline
  • Annotate ohnologs
  • Extend the EPO multiple alignment pipeline to all vertebrates

Variation updates

  • Continue to import new variation data from dbSNP and DGVa where available
  • Improve variation annotation using data from the 1000 Genomes Project
  • Continue to import genome wide association study phenotypes for variants from the NHGRI catalog
  • Improve the web interface for the Variant Effect Predictor
  • Import structural variation changes of somatic origin (from COSMIC)
  • Include phenotype data for structural variants
  • Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources

Core API and schema

  • Refinements to gene name and description assignment
  • Switchable adaptors to serve data from sources other than MySQL databases
  • Megabase sized feature density tracks
  • REST services to Ensembl
  • Improved VM support
  • DEB packages for Ensembl dependencies
  • Improved LRG support
  • UCSC genocoding project integration

Regulation

  • Update/replace ncRNA resource i.e. miRanda miRNA Targets
  • Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
  • Integrating RNA-seq data with regulatory element annotation
  • Developing meta-annotation of Multi-cell regulatory elements
  • Further development of uniform signal processing
  • Nearest gene/feature tool
  • Web display developments:
    • Further refinements of wiggle track config/display including track highlighting
    • MotifFeature view incorporating variation consequences
  • Incorporating ChIP-seq data from further species for possible
    additional regulatory builds e.g. Schmidt et al (PMID: 20378774)
  • Combine Regulatory Build and Segmentation to give Integrated Regulatory Build track
  • Investigate regulatory feature orthologs and/or comparative views
  • Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions

New web features

  • New BLAST interface
  • Change search engine to Solr for faceted searching and other improved functionality
  • Ability to create a custom page from a selection of existing components

Biomart

  • Investigate ways to improve scalability and retrievability of the data from the various marts.
  • Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.

Comments are closed.