The latest version of Ensembl, release 95, is out. This release brings a brand new human regulatory build for GRCh37 and GRCh38, incorporating new data from the ENCODE and Roadmap epigenomics project, plus an update to the mouse GENCODE gene set. We’ve also got a whole host of new vertebrate species, updated genome assemblies for some important agricultural species and the brand new protein structure viewer.
New Regulatory Build
Ensembl 95 sees the first update to the human regulatory build since 2016. The Ensembl regulatory build uses experimental data to predict features that regulate gene expression, such as promoters, enhancers and transcription factor binding sites. The annotation of these regulatory features is based upon a wide variety of data produced from the Blueprint epigenome project, the Roadmap epigenomics project and the ENCODE project.
Since the last update to the regulatory build in Ensembl, the data available through the ENCODE portal has been continually growing and the new regulatory build incorporates data for 55 new and 38 updated epigenomes. The new regulatory build integrates 123 epigenomes and has now annotated 675,965 individual regulatory features covering 21% of the human genome.
The gene set for mouse has been updated for release 95, bringing us to GENCODE M20.
Update to the human GRCh37 variation database
Protein structure viewer
We’ve also added a new protein structure view. The PDBe Protein Structure viewer uses a LiteMol widget to visualise data from PDBe. These structures are available for more than 3700 Ensembl transcripts. Using the control panel and available options you can select different PDBe protein structures to view as well as highlighting exons, protein domains from Pfam or Gene3D and variants coloured by SIFT or Polyphen-2 scores. You can access the Protein Structure viewer by clicking on ‘3D Protein Model‘ link in the left hand side of the Transcript tab.
Figure 1: Protein structure viewer showing Pfam domains
If you want to visualise the effect of individual variants on the protein structure, you can access the Protein Structure Variation viewer by clicking on ‘3D Protein Model‘ link in the left hand side of the Variant tab. For missense variants (A) the individual amino acid residue that is affected will be highlighted, allowing you to visualise its location within the protein structure. For stop-gain variants (B), the protein structure encoded by sequence following the premature stop codon is coloured in red.
- Probe mapping update for human GRCh38 and GRCh37 assemblies, cow, chicken and fruitfly.
- Revision of gene-tree pipeline for vertebrates to correct paralogy prediction issues identified in Ensembl 94.
- dN/dS analysis will now be calculated for all mammals, reptiles and percomorph fish.
- Constrained elements will be available as bigBed files, instead of bed format.
- Track Hub functionality has been extended in two ways. First, we have added support for multiwig tracks that allow multiple sets of wiggle data to be shown in the same track. Second, data organised as composite tracks are grouped together in the interface which allows for simultaneous configuration of multiple sets of data.
- Retirement of Dec 2013 archive site. This archive site is now more than five years old and is being retired in accordance with our policy.
Find out more
If you would like to find out more about these new changes, see live demos on how to find new data in the site, and ask questions to the Ensembl team, please register for the release webinar at 4pm (GMT) on Wednesday 16th January. A recording of this webinar will be available on our YouTube channel.