From release 68, we are using Sequence Ontology (SO) terms for the variation consequences, in an effort to standardise terms across the different browsers, making it easier for users to do a cross comparison of variation annotation. The UCSC Genome Browser will use these terms on their SNP details page around mid-August, dbSNP will update their web display in the next few weeks and the ICGC also intend to standardise on SO terms for describing somatic mutation consequences.
At the same time, we have added a couple more specific consequences for SNPs and in-dels (splice donor variant and splice acceptor variant for example) and consequences for larger structural variants are now available through the Variant Effect Predictor (VEP). The complete list of terms and definitions are in our documentation.As you will see, the SO equivalents for our old terms are fairly straightforward. The most notable difference is that we have replaced “non-synonymous” with the more specific term “missense”, for changes in amino acid which do not include stop gained, as we already have a specific term for stop gained.
The old Ensembl terms are still available on the website (using”Configure this page”) and if you have text files or VEP output files with our old Ensembl terms, you can easily update these to using the SO terms by running the following script.
For release 67 we changed how we store the protein function predictions from SIFT and PolyPhen so that they also can be used for more than just Ensembl transcripts, including RefSeq transcripts. We use these tools to compute the predicted effect of every possible amino acid substitution in the human proteome (over 2 billion predictions!). Now, the complete set of predictions for a particular protein are retrieved using the protein sequence itself as an identifier rather than an Ensembl stable identifier (we actually use the MD5 hash of the sequence). This means that you can retrieve predictions for any protein that has the same amino acid sequence as an Ensembl translation. So if you work with RefSeq transcripts, you can now get SIFT and PolyPhen predictions for any missense variants that fall in the 95% of RefSeq transcripts that match an Ensembl transcript exactly, using both the Variant Effect Predictor (VEP) and the Variation API.
New in release 67 are also predictions from both classifier models supplied with PolyPhen. Previously we provided predictions using a classifier trained on the HumVar dataset which is intended to distinguish between severely deleterious alleles against the background of abundant variation with milder effects. This is still the default, but when using the API you can now also opt to use predictions from the classifier trained on the HumDiv dataset which is intended to help evaluate rarer alleles potentially involved in complex disease. For more details on how these datasets are composed, please refer to the PolyPhen website.
Variation consequence types, such as “intronic” or “non-synonymous”, describe the variation location or effect of a variation on a transcript. For the latest version of Ensembl (release 62) we have made some significant changes to the way in which we determine these consequence types, and we’d like to provide an overview of these improvements.
Firstly, we are now able to assign a specific effect to every allele of a variant. For example, rs12795274 has three alleles, the reference allele is T, and it also has two alternative alleles; C and A. The A is predicted to cause an amino acid change, while the C is synonymous. We now list the effect of each individual allele on the website and you also can fetch them separately when using the variation API
Another improvement we’ve made is that “under the hood” we now use terms defined in the Sequence Ontology (SO) to describe the consequence types. Moving to this set of externally maintained terms should make it easier to compare Ensembl annotations with those from other groups. The SO also groups the various terms we use into a hierarchical tree and, in the future, this will let users query for variants with particular effects in a much smarter way than is possible now. On the website we are still using our old terms by default, but you can see the mapping between the old terms and the SO terms on the variation documentation page and you can use “Configure this page” on several variation views to choose which set of terms you want to see (here‘s an example).
We also now provide SIFT and PolyPhen predictions for any variant that is predicted to cause an amino acid substitution in human. These are popular tools developed by external groups that try to predict the effect of a non-synonymous mutation on the function of the protein. You can see these predictions on several variation views, a useful example is the protein variation view. You can find more information about these tools and how we run them in Ensembl on the variation documentation page.
All of these improvements are also available for you to use to analyse your own data using the Variant Effect Predictor (VEP). The VEP has new configuration options that allow you to choose which set of terms you want to use for the consequence annotations, and also offers options to fetch SIFT and PolyPhen predictions for any missense mutations in your data. We are able to provide these predictions for novel mutations by computing the predictions from SIFT and PolyPhen for all possible amino acid substitutions in human proteins and storing these in the variation database. We hope that this makes the VEP even more useful for mining your data and we have plans to add support for these sort of tools in other species in the near future.
A new section has been added to the variation page in Ensembl 58 that allows you to find other variations in strong linkage disequilibrium with the SNP you are viewing.
Clicking on “Linked variations” from the menu on the left hand side of the variation page takes you to a view like this one for rs1333049. Linkage disequilibrium values are calcuated on the fly and presented in one table for each population.
The table shows both r2 and
D' values, along with the distance between the linked and current variations, any overlapping genes and any phenotypes associated with the linked variations. The table can be sorted by any of these columns by clicking on the column header (see previous post). The view is extensively configurable – clicking on Configure this page allows you to select populations to be displayed, change the distance over which linked variations are looked for, and filter the variations returned.
This view is currently only available for Ensembl Human, and is limited to variations with enough associated genotypes to calculate linkage disequilibrium values.
Ensembl is always extending the variation pages to include more information. Did you know that the latest data from SNPedia is now available?
SNPedia is a wiki-style resource for human genetics with public annotation of over 11,000 SNPs, released under a Creative Commons style license. We have integrated it into Ensembl, so you can view these SNP reports along with our other information including variations, genotype and allele frequencies from dbSNP, and SNPs from other sources including UniProt, Affymetrix and Illumina chipsets and phenotype annotations from several genome-wide association studies.
You need to configure the page to view SNPedia. From the variation page, e.g. rs1333049, click on “Configure this page” and then click on “External Data” to select SNPedia to appear in the left hand side menu of all variation pages via DAS. As this information comes directly from SNPedia via DAS it is always up-to-date.
Did you know that you can use the ensembl API to predict the consequences of your own SNP positions? This is a really popular question and there is some example code on the website to guide you through this. See an example here. This functionality is available from ensembl release 56 but we have also recently patched release 54 in case you need to use the NCBI 36 human assembly.
Soon there will be a page on the website where you can upload your data and we will project SNP consequences for you.