The following updates are planned for upcoming releases of Ensembl.
Please note that we have no fixed timeline for most of these items
Gene annotation
- Gene annotation updates expected for Ensembl release 72: Human, mouse.
- Genebuilds in progress: Sheep (GCA_000298735.1), collared flycatcher (GCA_000247815.1).
- Upcoming genebuilds: Olive baboon (GCA_000264685.1), spotted gar (GCA_000242695.1) and squirrel monkey (GCA_000235385.1).
- New species expected: grass carp, vervet monkey, budgerigar, and naked mole rat (Heterocephalus glaber).
- Minor assembly updates for human and mouse: regular incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged.
- Planned updates to human, mouse and zebrafish gene sets: regular incorporation of HAVANA manual annotation. For human, the gene set is updated every release. For mouse and zebrafish, the gene sets are updated every second release.
- Major assembly updates expected for: human, zebrafish, several of the low-coverage mammals. For more information about the human and zebrafish assembly updates, please visit the GRC.
Comparative Genomics
- Display super-gene tree
- Display ncRNA alignments on their predicted secondary structure
- Display protein domains on the gene tree alignments
- Incorporate an HMM-based classification of protein sequences for the Gene Trees pipeline
- Annotate ohnologs
- Extend the EPO multiple alignment pipeline to all vertebrates
Variation updates
- Continue to import new variation data from dbSNP and DGVa where available
- Improve variation annotation using data from the 1000 Genomes Project
- Continue to import genome wide association study phenotypes for variants from the NHGRI catalog
- Improve the web interface for the Variant Effect Predictor
- Import structural variation changes of somatic origin (from COSMIC)
- Include phenotype data for structural variants
- Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources
Core API and schema
- Refinements to gene name and description assignment
- Switchable adaptors to serve data from sources other than MySQL databases
- Megabase sized feature density tracks
- REST services to Ensembl
- Improved VM support
- DEB packages for Ensembl dependencies
- Improved LRG support
- UCSC genocoding project integration
Regulation
- Update/replace ncRNA resource i.e. miRanda miRNA Targets
- Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
- Integrating RNA-seq data with regulatory element annotation
- Developing meta-annotation of Multi-cell regulatory elements
- Further development of uniform signal processing
- Nearest gene/feature tool
- Web display developments:
- Further refinements of wiggle track config/display including track highlighting
- MotifFeature view incorporating variation consequences
- Incorporating ChIP-seq data from further species for possible
additional regulatory builds e.g. Schmidt et al (PMID: 20378774) - Combine Regulatory Build and Segmentation to give Integrated Regulatory Build track
- Investigate regulatory feature orthologs and/or comparative views
- Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions
New web features
- New BLAST interface
- Change search engine to Solr for faceted searching and other improved functionality
- Ability to create a custom page from a selection of existing components
Biomart
- Investigate ways to improve scalability and retrievability of the data from the various marts.
- Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.
