The following updates are planned for upcoming releases of Ensembl.
Please note that we have no fixed timeline for most of these items
New human assembly: GRCh38
The Genome Reference Consortium plans to release a new human genome assembly, GRCh38, in late summer 2013.
Timeline: We plan to release a Pre! site for this assembly in early 2014, and full annotation for release 76 (July 2014).
What does this mean for me? As GRCh38 is a new primary assembly, the genomic coordinates of many genes will shift in Ensembl release 76. Even though the coordinates of genes shift, this does not necessarily mean that their underlying DNA or transcripts will change.
What if I still want to use the old assembly, GRCh37? For now, the GRCh37 assembly is still the default human assembly on our website. This will be the case up to and including release 75. From Ensembl release 76 (July 2014) onwards, the default human assembly will switch to be GRCh38. If your work is tied to GRCh37, you will still be able to access the data you need via Ensembl: We plan to support GRCh37 via a fully functioning archive of Ensembl 75, hosted at grch37.ensembl.org. This archive website is not available yet.
Linking back to a previous Ensembl release: To link to a specific Ensembl archive site, remember to use the Permanent Link (found at the bottom left of each page) eg. http://jun2013.archive.ensembl.org/Homo_sapiens/Info/Index.
Finding the new coordinates for your region of interest: You may have stored a genomic location on GRCh37, and want to know where the equivalent region is on GRCh38. Simply add the old assembly name into the address bar. eg. www.ensembl.org/Homo_sapiens/Location/View?db=core;r=13:32889611-32973805;a=ncbi36. This will redirect you to new coordinates with a message “Your request for 13:32889611-32973805 in ncbi36 has been mapped to the new GRCh37 coordinates 13:33991611-34075805″.
- Genebuilds in progress: olive baboon (GCA_000264685.1), vervet monkey (GCA_000409795.1).
- Pre! sites in progress: for Macaca fascicularis (GCA_000364345.1), Amazon molly (GCA_000485575.1) and hedgehog (GCA_000296755.1).
- Ensembl release 76 (expected July 2014):
- Human GRCh38
- Mouse – GENCODE M3
- Upcoming genebuilds: squirrel monkey (GCA_000235385.1).
- Regular updates
- Minor assembly updates for human and mouse: incorporation of new alternate sequence provided by the GRC, with basic gene annotation. The Primary Assembly coordinates remain unchanged when patches are added.
- Planned updates to human, mouse and zebrafish gene sets: incorporation of HAVANA manual annotation. For human, the gene set is updated every release. For mouse and zebrafish, the gene sets are updated every second release.
- CCDS: all CCDS models are included in the human (GENCODE) and mouse gene sets
- Major assembly updates expected for: zebrafish, several of the low-coverage mammals. For more information about the human and zebrafish assembly updates, please visit the GRC.
- Display protein domains on the gene tree alignments
- Incorporate an HMM-based classification of protein sequences for the Gene Trees pipeline
- Annotate ohnologs
- Extend the EPO multiple alignment pipeline to all vertebrates
- Prediction of ancestral protein sequences
- Improved detection of partial / split genes
- Displaying the age of the most recent substitution at a nucleotide position
- Continue to import new variation data from dbSNP and DGVa where available
- Improve variation annotation using data from the 1000 Genomes Project
- Continue to import genome wide association study phenotypes for variants from the NHGRI catalog
- Improve the web interface for the Variant Effect Predictor
- Import structural variation changes of somatic origin (from COSMIC)
- Include phenotype data for structural variants
- Work with Locus-specific databases (LSDBs) to include genotype-phenotype data from these resources
Core API and schema
- Refinements to gene name and description assignment
- Switchable adaptors to serve data from sources other than MySQL databases
- Megabase sized feature density tracks
- REST services to Ensembl
- Improved VM support
- DEB packages for Ensembl dependencies
- Improved LRG support
- UCSC genocoding project integration
- Update/replace ncRNA resource i.e. miRanda miRNA Targets
- Combine Regulatory Build and Segmentation to give Integrated Regulatory Build track
- Developing meta-annotation of Multi-cell regulatory elements
- Nearest gene/feature tool
- Further development of uniform signal processing
- Development of DNA methylation tracks i.e. high level summaries and differentially and variably methylated regions
- Integrating RNA-seq data with regulatory element annotation
- Web display developments:
- Further refinements of wiggle track config/display including track highlighting
- MotifFeature view incorporating variation consequences
- Incorporate DNase1 footprinting and sequence conservation in to TFBS identification
- Incorporating ChIP-seq data from further species for possible
additional regulatory builds e.g. Schmidt et al (PMID: 20378774)
- Investigate regulatory feature orthologs and/or comparative views
New web features
- New interface for BLAST, BLAT, VEP and other tools
- Motif feature display
- RNA secondary structure images
- Mobile-friendly version of website
- Investigate ways to improve scalability and retrievability of the data from the various marts.
- Continue to incorporate new filters and attributes to the marts as new data is added to the Ensembl schemas.