We’re fortunate to be part of the EMBL European Bioinformatics Institute (EBI), which puts us alongside stellar bioinformaticians and resources in every discipline. From this, great collaborations can grow. We’ve already worked with our colleagues at Gene Expression Atlas and Reactome to embed widgets in Ensembl for viewing baseline gene expression and biochemical pathways respectively, but our latest collaboration is with the Protein Data Bank in Europe (PDBe) to show genetic variation on protein structures.
Collaborating with PDBe
PDBe is an archive where structural biologists can submit their experimentally derived protein structures. It forms part of the worldwide PDB, working with curators in the USA and Japan to manage, share and display the submitted structures. The protein structures they receive may be wild-type or variant, fragment or complete, monomer or part of a multimer, ligand-bound or not.
To use these proteins alongside Ensembl data, we have to map them to Ensembl protein sequences, which are always a direct translation of the genomic sequence. All structures in the PDBe are mapped to their respective UniProt identifiers. Our close collaboration with UniProt (more friends at the EBI) means that we also have excellent mapping between Ensembl and UniProt proteins, so this is what we use to match up to PDBe. Of course, not all Ensembl genes have experimental protein structures available in the PDBe, but the number is increasing all the time.
Seeing structures in Ensembl
We’ve made the protein structures available for a transcript, for a variant and even for a VEP result on our main (GRCh38) site. They use a LiteMol viewer, which is familiar to many who work frequently with protein structures, but may be new to those in genetics and genomics. Fortunately, PDBe had a handy tutorial video available that we were able to trim down for Ensembl use (we asked first!).
On top of the structure from PDBe, we add variation data, Pfam domains and exons. On the variation and VEP pages, the chosen variant is highlighted from the start, and on all pages you can add missense variants, highlighted green or red with their SIFT and PolyPhen predictions. For protein-truncating variants, such as stop gained or frameshift, the variant or VEP pages show which parts of the structure are lost.
We rank the protein structure based on coverage of the Ensembl peptide sequence and the PDBe quality scores, and make the top ten of these available to view. On the transcript and VEP protein structure pages, you can pick from these; on the variation page you can also choose which transcript you want a structure for.
To see the structure from VEP results, you need to tick protein domains in the input form. You can click on the Protein Structure View to open a new page with the structure and the variant highlighted.