Ensembl 98 (and Ensembl Genomes 45) are due out next month, so it’s time to pig-out on the tasty morsels we have to offer. As with all releases, we cannot guarantee that anything listed here will make it into the final release.
As usual, human will have a GENCODE update, bringing it up to GENCODE 32. An exciting development in GENCODE 32 will be the addition of twelve stop codon readthroughs. These are protein coding transcripts where we know that the first stop codon in the coding sequence can be translated by the ribosome. This allows translation to continue to the next stop codon in the mRNA, thus generating an extended protein isoform. It is currently unknown exactly how the first stop codon is translated, for example, whether it encodes a specific amino acid or one chosen at random, hence we represent this codon as an X in the protein sequence.
We will also be updating our human variation database to dbSNP152. There are a lot of changes relating to dbSNP’s new SPDI format, which will be detailed in a separate blog post.
We have updated annotation for some of our mammals: dog, cat, horse, rabbit, grey short-tailed opossum, marmoset and rhesus monkey. There’s also a new genome assembly for Xenopus tropicalis.
But hogging the limelight this release is pig, which hams it up with a selection of eleven breeds: Hampshire, Jinhua, Berkshire, Large White, Landrace, Pietrain, Rongchang, Meishan, Tibetan, Wuzhishan and Bamei.
We also have a number of new fish genomes: Channel bull blenny, Indian glassy fish, denticle herring, Siamese fighting fish, blunt-snouted clingfish, Atlantic herring, Reedfish and large yellow croaker. We have also added the Pachon cavefish, which is the cave-dwelling strain of Astyanax mexicanus.
We’ve got new genomes in Ensembl Plants for caffeine addicts (Coffea canephora), foodies (Capsicum annuum and Cynara cardunculus) and grain lovers (Eragrostis tef). For wheat researchers, we now have durum wheat Triticum turgidum (cultivar Svevo, tetraploid AABB).
The rice Oryza sativa japonica annotation, recently imported from RAP-DB, was added genes annotated in the organelle genomes.
Wheat also benefits from gaining gene names and expression data as external references. We will also add PEATmoss references for Physcomitrella patens and Knet miner references for a number of species. Tomato and cacao will also gain gene descriptions.
We have one new genome in Ensembl Metazoa: the marine worm Hofstenia miamia.
For malarial researchers, you’ll see that the Plasmodium falciparum genome, ASM276v2, has been repaired with a corresponding update to the annotation. We also have a new species: Pseudo-nitzschia multistriata, a marine planktonic diatom.
If you work with the offline version of the Ensembl VEP, you’ll need to update htslib and Bio::DB::HTS to 1.9 and v2.11 respectively for e98. VEP will report allele-specific clinical significance assertions by default (to avoid filtering by allele use the option –clin_sig_allele 0).
Post-GWAS Analysis Pipeline Tool
We are pleased to be launching a new beta version of the Ensembl Post-GWAS Analysis Pipeline as an online tool. In the same way that the VEP allows you to upload a VCF file and annotate the variants, the Post-GWAS Analysis Pipeline allows you to upload a tab-delimited file with GWAS summary statistics. The variant p-values and effect sizes are then finemapped and collocalised with GTEx eQTL summary statistics, to highlight likely causal gene candidates and the tissue where this effect takes place.