The following updates are planned for upcoming releases of Ensembl.
Please note that we have no fixed timeline for most of these items
New human assembly: GRCh38
The Genome Reference Consortium released a new human genome assembly, GRCh38, which we have annotated. For large consortia, we recommend waiting for the GENCODE 21 release which will be available with e77 (October 2014). This is to allow us more time to refine and synchronise our data with other public databases.
What does this mean for me? As GRCh38 is a new primary assembly, the genomic coordinates of many genes will shift in Ensembl release 76. Even though the coordinates of genes shift, this does not necessarily mean that their underlying DNA or transcripts will change. If your work is tied to GRCh37, you will still be able to access the data you need via our archive at grch37.ensembl.org. To link to a specific Ensembl archive site, remember to use the Permanent Link (found at the bottom left of each page) eg. http://jun2013.archive.ensembl.org/Homo_sapiens/Info/Index.
- Genebuilds in progress: Vervet monkey (GCA_000409795.1), and Atlantic salmon (GCA_000233375.3).
- Pre! sites in progress: for Fugu (GCA_000180615.2).
- Ensembl release 77 (expected October 2014):
- Updated gene set for human – GENCODE 21
- Updated gene set for rat (incluing manual annotation from Havana)
- New species: vervet monkey
- Regular updates
- Minor assembly updates for human and mouse: incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged when patches are added.
- Planned updates to human, mouse, rat and zebrafish gene sets: incorporation of HAVANA manual annotation. For mouse, the gene set is updated every release. For human and zebrafish, the gene sets are updated every second release.
- CCDS: all CCDS models are included in the human (GENCODE) and mouse gene sets
- Major assembly updates expected for: zebrafish, several of the low-coverage mammals. For more information about the zebrafish assembly update, please visit the GRC.
- New widget for tree visualisation
- Incorporate an HMM-based classification of protein sequences for the Gene Trees and Families pipelines
- Improved detection of partial / split genes
- Annotate ohnologs
- Extend the EPO multiple alignment pipeline to all vertebrates
- Prediction of ancestral protein sequences
- Continue to import new variation data from dbSNP and DGVa where available
- Improve variation annotation using data from the 1000 Genomes Project once the phase 3 data is accessioned.
- Continue to import genome wide association study phenotypes for variants from the NHGRI catalog, and variants and phenotypes from OMIM, Orphanet and OMIA.
- Import variation changes of somatic origin (from COSMIC)
- Include phenotype data for structural variants
- Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources
Core API and schema
- Switchable adaptors to serve data from sources other than MySQL databases
- Megabase sized feature density tracks
- New implementation for the get nearest feature method
- More efficient external reference assignment pipeline
- New REST server
- Nearest gene/feature tool
- Many more cell types (Roadmap Epigenomics, Blueprint, HipSci…)
- Attach regulatory elements to genes via eQTLs, chromatin conformation data, etc.
- Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions
- Integrating RNA-seq data with regulatory element annotation
- Web display developments:
- Further refinements of wiggle track config/display including track highlighting
- MotifFeature view incorporating variation consequences
- Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
- Incorporating ChIP-seq data from further species for possible
additional regulatory builds e.g. Schmidt et al (PMID: 20378774)
- Investigate regulatory feature orthologs and/or comparative views
New web features
- Motif feature display
- Mobile-friendly version of website (currently in beta testing at m.ensembl.org)
- Rework of Export / Download functionality to both update codebase and improve usability
- Review variation views to cope with even more data
- Investigate ways to improve scalability and retrievability of the data from the various marts.
- Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.