We’re pleased to announce the release of Ensembl 98, and the corresponding Ensembl Genomes release 45. We have a new Post-GWAS Analysis tool and a drove of pig and fish genomes.
Human gene annotation
Release 98 brings us up to GENCODE 32 on our human genome annotation. As always, this brings a number of new genes and changes to existing ones.
Human variation data
Another change in human is to the variation database. Reflecting changes from dbSNP in their release of dbSNP152, you’ll see some changes to insertions and deletions. Specifically, equivalent variants and expansions/retractions of the same tandem repeat have been merged into one variant identifier, while different kinds of variants at the same locus, such as insertions and deletions of other bases, have been split into another variant identifier. Another change is the discontinuation of HapMap population frequencies. We have a detailed blog post explaining these changes.
This release we have made variants from the Genome Aggregation Database (gnomAD) available as the default track in our Region in Detail view. The ‘1000 genomes – all’ track which was previously on by default is not available this release and neither are the ‘All LSDB-associated variants’ and ‘Genotyping chip variants’ tracks. Smaller groupings, such as the set of variants the 1000 Genomes Project identified in the African population are still available. These three sets will not be available as variation set filters or attributes in BioMart either. They will be back again in our next release.
Post-GWAS Analysis Pipeline Tool
For analysis of human GWAS data, we have our new Post-GWAS Analysis Pipeline. This is a new online tool, which allows you to upload summary statistics from GWAS analyses, including variant IDs, p-values and beta values. The Post-GWAS Analysis Pipeline identifies all variants in linkage disequilibrium with the variants in your input, based on 1000 Genomes genotypes. It then identifies all genes affected by those variants, both based on VEP analysis and by comparing to the GTEx eQTLs in different tissues, to score the genes for the probability that they cause the phenotype investigated by the GWAS.
This tool has been launched as a beta version and we’re keen to get your feedback on it.
Our big news (or should I say pig news) is whole genome assemblies and annotation available for 11 pig breeds: Hampshire, Jinhua, Berkshire, Large White, Landrace, Pietrain, Rongchang, Meishan, Tibetan, Wuzhishan and Bamei. These are each available as separate genomes with their own genes annotated, based on the reference pig genome Sscrofa11.1, which also has updated gene annotation. We have a new EPO multiple genome alignment of all the new pigs, plus the reference pig, the USMARC pig genome which came out in release 97, and the related agricultural species sheep, cow and horse. We also have computed a specific set of gene-trees for those genomes.
If you think it’s better down where it’s wetter, then you’ll be pleased to hear about our nine new fish genomes. We now have genomes for Channel bull blenny, Indian glassy fish, denticle herring, Siamese fighting fish, blunt-snouted clingfish, Atlantic herring, large yellow croaker and reedfish. In addition to the Mexican tetra genome we already have, we have added the genome of the blind cave-dwelling Pachon strain of the same species. As a result, both of our fish multiple genome alignments have been recalculated, now with 33 fish in the high coverage EPO and 60 fish in the low coverage EPO.
Some mammals have new gene annotation: dog, cat, horse, rabbit, grey short-tailed opossum, marmoset and rhesus monkey. We also have a new genome assembly and gene annotation for the model frog, Xenopus tropicalis.
We continue to improve our resources on crops and this release have added the genomes of coffee (Coffea canephora), bell pepper (Capsicum annuum), artichoke (Cynara cardunculus), tef (Eragrostis tef) and durum wheat (Triticum turgidum cultivar Svevo). Triticum turgidum is of particular interest to wheat researchers, because of its evolutionary relationship to bread wheat and its AABB tetraploid genome.
We’ve also added chloroplast and mitochondrial gene annotation to rice (Oryza sativa japonica). Wheat (Triticum aestivum) has been updated with gene names and expression data, and tomato (Solanum lycopersicum) and chocolate (Theobroma cacao) have new gene descriptions.
We have one new genome in Ensembl Metazoa: the marine worm Hofstenia miamia, a model organism for the study of wound regeneration.
Ensembl Protists now containsPseudo-nitzschia multistriata, a marine planktonic pennate diatom capable of producing domoic acid, a neurotoxin that can contaminate seafood and cause a syndrome called amnesic shellfish poisoning. The life cycle of this species includes a sexual phase and its genetics are controllable. The availability of this genome enables explorations into the molecular processes of diatoms, gene function and toxin production. We have also updated Plasmodium falciparum (ASM276v2), the deadliest species of Plasmodium causing malaria in humans. There has been a fresh import of PHI-base annotation across all 236 protist genomes and new comparative genomics and pan-taxonomic databases.
For improved annotation of clinical significance assertions from ClinVar, Ensembl VEP is now only reporting the clinical significance associated with your input allele, by default. If you’re working with the offline Ensembl VEP, you can still get all clinical assertions, ignoring the allele, using –clin_sig_allele 0.
We’re also moving with the times and have updated our htslib and Bio::DB::HTS dependencies. If you’re working with Ensembl VEP offline, you’ll need to update these to 1.9 and v2.11 respectively when you update your Ensembl VEP to e98.