Cool stuff the Ensembl VEP can do: getting and using allele frequency data

Allele frequency data is important for variant prioritisation – it helps to identify variants that are less likely to be causing a phenotype or disease. With the Ensembl VEP, you can get allele frequencies for variants that are identical with the variants you analysed and you can use allele frequencies to filter the results of your analysis.

If a variant is observed at a high frequency in a healthy population, such as those from the 1000 Genomes Project, it is unlikely to be causal for a severe phenotype. Allele frequencies can vary significantly between populations of different ancestries; therefore it is crucial to have ‘baseline’ information from healthy individuals of different populations available and to take it into account in clinical and common disease research settings.

To get allele frequencies for co-located known variants, use the options in the ‘Variants and frequency data’ section of the VEP online tool. Hover over the options to see the definitions. You can choose to include allele frequencies from the 1000 Genomes Project (global minor allele frequency and continental allele frequencies), NHLBI-Exome Sequencing Project (ESP) and Genome Aggregation Database (gnomAD).

To use allele frequencies to pre-filter VEP results, use the ‘Filtering options’ of the VEP online tool. You can either exclude common variants, that have a minor allele frequency (MAF) of > 1% in phase 1 of the 1000 Genomes Project, or you can choose to apply advanced filters. Advanced filtering allows you to exclude variants from the result – or include only variants – that have a MAF of greater – or smaller – than the frequency threshold you specify, in a specific population. You can choose from the combined and continental 1000 Genomes populations or the African-American and European-American ESP population. Pre-filtering means that you don’t see all possible results, so skip this and use the filter tool on the results table if you want to try different frequency cut-offs, as this will be a lot quicker. 

Of course, you can get allele frequencies with the VEP command line tool (–af) and use them for pre-filtering (–filter_common or –check_frequency) or post-filtering (with the filter script) too. The command line tool allows you to use a wider range of populations than the web or REST interfaces and can report the highest allele frequency observed in any population from 1000 Genomes, ESP or gnomAD (–max_af) as well. Have a look at our documentation for all details.