The following updates are planned for upcoming releases of Ensembl.
Please note that we have no fixed timeline for most of these items
New human assembly: GRCh38
The Genome Reference Consortium plans to release a new human genome assembly, GRCh38, in late summer 2013.
Timeline: We plan to release a Pre! site for this assembly in early 2014, and full annotation for release 76 (July 2014).
What does this mean for me? As GRCh38 is a new primary assembly, the genomic coordinates of many genes will shift in Ensembl release 76. Even though the coordinates of genes shift, this does not necessarily mean that their underlying DNA or transcripts will change.
What if I still want to use the old assembly, GRCh37? For now, the GRCh37 assembly is still the default human assembly on our website. This will be the case up to and including release 75. From Ensembl release 76 (July 2014) onwards, the default human assembly will switch to be GRCh38. If your work is tied to GRCh37, you will still be able to access the data you need via Ensembl: We plan to support GRCh37 via a fully functioning archive of Ensembl 75, hosted at grch37.ensembl.org. This archive website is not available yet.
Linking back to a previous Ensembl release: To link to a specific Ensembl archive site, remember to use the Permanent Link (found at the bottom left of each page) eg. http://jun2013.archive.ensembl.org/Homo_sapiens/Info/Index.
Finding the new coordinates for your region of interest: You may have stored a genomic location on GRCh37, and want to know where the equivalent region is on GRCh38. Simply add the old assembly name into the address bar. eg. www.ensembl.org/Homo_sapiens/Location/View?db=core;r=13:32889611-32973805;a=ncbi36. This will redirect you to new coordinates with a message “Your request for 13:32889611-32973805 in ncbi36 has been mapped to the new GRCh37 coordinates 13:33991611-34075805″.
- Genebuilds in progress: Human GRCh38, olive baboon (GCA_000264685.1), vervet monkey (GCA_000409795.1), and Amazon molly (GCA_000485575.1).
- Pre! sites in progress: for Macaca fascicularis (GCA_000364345.1), and hedgehog (GCA_000296755.1).
- Ensembl release 76 (expected July 2014):
- New assembly and gene set for human – GENCODE 20
- Updated gene set for mouse – GENCODE M3
- New species: olive baboon, vervet monkey, Amazon molly
- Regular updates
- Minor assembly updates for human and mouse: incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged when patches are added.
- Planned updates to human, mouse, rat and zebrafish gene sets: incorporation of HAVANA manual annotation. For human, the gene set is updated every release. For mouse and zebrafish, the gene sets are updated every second release.
- CCDS: all CCDS models are included in the human (GENCODE) and mouse gene sets
- Major assembly updates expected for: zebrafish, several of the low-coverage mammals. For more information about the human and zebrafish assembly updates, please visit the GRC.
- Widget for tree visualisation
- Visualisation of the EPO trees
- Incorporate an HMM-based classification of protein sequences for the Gene Trees pipeline
- Improved detection of partial / split genes
- Annotate ohnologs
- Extend the EPO multiple alignment pipeline to all vertebrates
- Prediction of ancestral protein sequences
- Display protein domains on the gene tree alignments
- Continue to import new variation data from dbSNP and DGVa where available
- Improve variation annotation using data from the 1000 Genomes Project
- Continue to import genome wide association study phenotypes for variants from the NHGRI catalog
- Improve the web interface for the Variant Effect Predictor
- Import structural variation changes of somatic origin (from COSMIC)
- Include phenotype data for structural variants
- Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources
Core API and schema
- Switchable adaptors to serve data from sources other than MySQL databases
- Megabase sized feature density tracks
- New implementation for the get nearest feature method
- More efficient external reference assignment pipeline
- New REST server
- Improve the display of Diana TarBase miRNA targets
- Combine Regulatory Build and Segmentation to give Integrated Regulatory Build track
- Developing meta-annotation of Multi-cell regulatory elements
- Nearest gene/feature tool
- Further development of uniform signal processing
- Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions
- Integrating RNA-seq data with regulatory element annotation
- Web display developments:
- Further refinements of wiggle track config/display including track highlighting
- MotifFeature view incorporating variation consequences
- Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
- Incorporating ChIP-seq data from further species for possible
additional regulatory builds e.g. Schmidt et al (PMID: 20378774)
- Investigate regulatory feature orthologs and/or comparative views
New web features
- New interface for BLAST, BLAT, VEP and other tools
- Motif feature display
- Mobile-friendly version of website
- Investigate ways to improve scalability and retrievability of the data from the various marts.
- Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.