Ensembl VEP calculates the location and likely impact of variant alleles on genes, producing extensive annotations, but there are now a huge number of human transcripts to consider. The new GENCODE Primary transcript set streamlines the variant annotation process, saving time in both analysis and results filtering/ interpretation.
Transcript choice is very important in variant interpretation and both Ensembl/GENCODE and RefSeq sets are available in Ensembl VEP. MANE Select transcripts, which are the recommended default transcripts for each gene to be used for variant reporting are also highlighted, but as these don’t contain all of the observed exons, you need to consider a fuller transcript set in your analysis to capture all potential variant impacts. The new GENCODE Primary set captures all protein-coding exons with evidence of evolutionary constraint and high expression, in a minimal set of transcripts, which includes all the MANE/ canonical transcripts.
To only annotate your variants against these transcripts:
- in the web interface, select ‘Ensembl/GENCODE primary transcripts’ as the transcript database to use
- if you are running the command line tool, use the –gencode_primary option.
- when using the REST API, then add gencode_primary=1 (example)
This will reduce analysis time and output size by only predicting molecular consequences for the reduced transcript set. In Ensembl 113, GENCODE Primary only includes protein coding transcripts, results are not reported for non-coding transcripts. Ensembl 114 and later versions include non-coding transcripts for a comprehensive annotation set.
If you are running the command line tool you can analyse all transcripts, but filter for the minimal set, by using the –flag_gencode_primary option. You can then use filter_vep (–filter “GENCODE_PRIMARY = 1”) to restrict the results by removing protein coding transcripts not in the minimal set.
Note: the Ensembl browser now only displays variant consequences for the GENCODE Primary transcript set, so if you analyse a known variant against one of the other transcript sets in Ensembl VEP, you will see more results. (A VCF file containing all the GRCh38 variants shown on the Ensembl browser annotated against the full transcript set is available on the Ensembl FTP site https://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/all_transcripts/).
Authored by Jamie Allen and Sarah Hunt, edited by Aleena Mushtaq
