Ensembl 91 is scheduled for December 2017 and we’re continuing our push to include the genome annotation for lots of new species. This time, we’re adding a whole new set of primate species to Ensembl.

Here’s what you can look forward to:

New assemblies, gene sets and annotations

  • Annotation of 12 new primate genomes, as well as updates to 6 existing genomes:
    • Nancy Ma’s night monkey
    • White-headed capuchin
    • Sooty mangabey
    • Angola colobus
    • Crab eating macaque
    • Southern pig-tailed macaque
    • Drill
    • Bonobo
    • Coquerel’s sifaka
    • Black snub-nosed monkey
    • Golden snub-nosed monkey
    • Black-capped squirrel monkey
    • Chimpanzee (update)
    • Gibbon (update)
    • Gorilla (update)
    • Mouse lemur (update)
    • Olive baboon (update)
    • Tarsier (update)
  • Annotation on the latest Cat genome assembly, Felis_catus_8.0
  • C. elegans gene set and annotation updated to Wormbase release WS260
  • Fruitfly gene set and annotation updated to Flybase release FB2017_04 (dmel_r6.17)
  • Updated Human cDNA alignments
  • Updated Mouse cDNA alignments
  • Updated microarray probe mappings and comparative genomics analyses for all new and updated species

Other updates and highlights

  • Updating our human variation database with:
    • COSMIC 82 somatic variants
    • HGMD 2017.2
    • DGVa structural variants
    • Phenotypes from NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Cosmic Gene Census, DDG2P, MIM Morbid and Orphanet
  • In other species we also have variation updates as follows:
    • dbSNP 150 in macaque, mouse, zebrafish, sheep, pig, horse, cow and chicken
    • DGVa in cow, dog and mouse, horse, macaque, pig, sheep and zebrafish
    • Phenotype updates from relevant databases in rat, zebrafish and mouse
  • Links to PharmGKB added from human variants
  • New web tool for Linkage Disequilibrium (LD) calculation
  • Updated GRCh37 regulatory features

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

The upcoming Ensembl release (e!91) will include several updates to the regulation API and with it a farewell to many objects that have given the regulation API its characteristic look and feel over the years.

The changes listed below only affect the way we store high-throughput sequencing experiments and their results. Probe feature related objects and regulatory features are not affected. If you use any of the following in your scripts, please keep an eye for our updated doxygen documentation once the Ensembl release 91 is out.

ResultSet and InputSubset

The long serving ResultSet object and its faithful companion, the InputSubset object, will be removed. Over the last releases these data types have been extensively modified and moved to more specific API objects, until they only served to store information about the read files (InputSubset) and their respective alignments (ResultSet).

From now on alignments are handled by a new API object, called “Alignment”.

The InputSubset object will be replaced by two new objects:

  1. ReadFile
  2. ReadFileExperimentalConfiguration

A ReadFile represents a FASTQ file generated by a high-throughput sequencing experiment, such as ChIP-seq or DNAse-seq.

The experimental configuration that led to the creation of the read file is stored in the ReadFileExperimentalConfiguration object. It links the Experiment object to the ReadFiles generated by it and contains the following information:

  • which biological and
  • which technical replicate a ReadFile is within an Experiment,
  • whether it is paired-end, and
  • whether it is the result of multiple sequencing runs of the same sample.

Using the experimental configuration the Ensembl Regulation Sequence Alignment (ERSA) pipeline decides how to analyse the various high-throughput sequencing data.

AnnotatedFeature and FeatureSet

In the current API the AnnotatedFeature object represents enriched regions or peaks from ChIP-seq and DNase-seq experiments.

In the future the AnnotatedFeature API object will become the Peak object.

AnnotatedFeature objects used to be accessed by first fetching an appropriate FeatureSet object and then the AnnotatedFeatures linked to it.

A FeatureSet object that links to a set of AnnotatedFeatures represented a peak calling analysis from a ChIP-seq-like experiment. These are now represented by the new PeakCalling object.

DataSet

The venerable DataSet object will be retired and it will not be replaced.

CoordSystem

The CoordSystem object in regulation, not to be confused with the CoordSystem object used for Ensembl core databases, has been retired after many years of service.

It was mostly known for its Adaptor, which gave scary error messages, if the Registry had been misconfigured. It could also make features unexpectedly vanish from the website.

There are no plans to replace its function.

Summary

Current Object New Object Notes
InputSubset ReadFile
ReadFileExperimentalConfiguration
ResultSet Alignment
AnnotatedFeature Peak
FeatureSet PeakCalling
DataSet Retired.
CoordSystem Retired. Regulation-specific object. Not to be confused with that used for the Ensembl core databases.

We are pleased to announce that Ensembl Genomes 36 has now been released, which includes new and updated genome assemblies and gene annotation as well as updated variation data and comparative genomics analyses. Find out more below:

  • Ensembl Bacteria includes an additional 142 genomes from release 35 together with an update to gene families.
  • Ensembl Fungi has added gene symbols for 1-to-1 orthologues from S. cerevisiae to Botrytis cinerea and includes updated PHI-base 4.3 annotations.
  • Ensembl Metazoa now has automated RNA gene annotation for 37 species (i.e. all species that have not been imported from FlyBase, VectorBase or WormBase) and alignment of Rfam 12.2 covariance models for all species. There are also updated protein features, which now includes features from new sources (CDD, MobiDB and SFLD).
  • Ensembl Protists now has new automatic ncRNA alignments across all protist species as well as updated PHI-base 4.3 annotations.
  • Ensembl Plants now includes the new genome assembly for Hordeum vulgare (barley), the biggest diploid yet sequenced, which is included in updated comparative peptide analyses for all species. There are also new ncRNA gene annotations and new plant reactome cross references across all plant species. New and updated variation data has also been included in this release for both Oryza sativa and Arabidopsis thaliana. Last, but not least, 80829 variation markers from the iSelect 90k array and 13.8 million Inter-Homoeologous Variants (IHVs) have been added to the wheat assembly, along with chloroplast and mitochondrial components (including gene annotations) imported from ENA.

Please see the release notes for full details of the updates.

Ensembl 90 is scheduled for August 2017 and it’s set to be our biggest release ever in terms of new genome annotation. Here’s what you can look forward to:

New assemblies, gene sets and annotations

  • Annotation of 15 rodent genomes, including three updates to old genomes:
    • Brazilian guinea pig
    • Chinese hamster
    • Damara mole rat
    • Degu
    • Golden Hamster
    • Guinea Pig (update)
    • Kangaroo rat (update)
    • Lesser Egyptian jerboa
    • Long-tailed chinchilla
    • Naked mole-rat – we have two different assemblies for naked mole-rat so you can keep working with your preferred genome
    • Northern American deer mouse
    • Prairie vole
    • Squirrel (update)
    • Upper Galilee mountains blind mole rat
  • Bringing in annotation of the well-used rodent cell-line, Chinese Hamster Ovary, and two mouse species, Ryukyu mouse and Shrew mouse.
  • Annotation on the latest Pig genome assembly, Sscrofa11.1
  • Updating the Human gene set to GENCODE 27.
  • Updating the Mouse gene set to GENCODE M15.
  • Adding transcript models from RNA-seq to the gene database and pri-miRNAs to the otherfeatures database in Zebrafish.

Other updates and highlights

  • Updating our human variation database with:
    • COSMIC 81 somatic variants
    • HGMD 2016.4
    • dbSNP 150
    • DGVa structural variants
    • TopMed in GRCh37
    • Phenotypes from NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Cosmic Gene Census, DDG2P, MIM Morbid and Orphanet
  • In other species we also have variation updates as follows:
    • DGVa in Cow, Dog and Mouse
    • Phenotype updates from relevant databases in Cat, Chicken, Chimpanzee, Cow, Dog, Horse, Macaque, Mouse, Pig, Rat, Sheep, Turkey and Zebrafish
  • Updating our microarray probe mappings in:
    • C.intestinalis
    • Caenorhabditis elegans
    • Chicken
    • Chimpanzee
    • Cow
    • Dog
    • Fruitfly
    • Human
    • Macaque
    • Mouse
    • Mouse 129S1/SvImJ
    • Mouse A/J
    • Mouse AKR/J
    • Mouse BALB/cJ
    • Mouse C3H/HeJ
    • Mouse C57BL/6NJ
    • Mouse CAST/EiJ
    • Mouse CBA/J
    • Mouse DBA/2J
    • Mouse FVB/NJ
    • Mouse LP/J
    • Mouse NOD/ShiLtJ
    • Mouse NZO/HlLtJ
    • Mouse PWK/PhJ
    • Mouse SPRET/EiJ
    • Mouse WSB/EiJ
    • Pig
    • Platypus
    • Rabbit
    • Rat
    • Saccharomyces cerevisiae
    • Xenopus
    • Zebrafish

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

We are pleased to announce that Ensembl Genomes 35 has now been released.

New and updated genomic sequences are available in all EG sub-portals, while updated comparative peptide analyses have been performed for Fungi, Metazoa, Plants, and Protists:

  • Ensembl Bacteria now incorporates 2460 new genomes, as well as revised assemblies and annotation for 188 and 234 genomes, respectively;
  • Ensembl Fungi now incorporates more than 100 new genomes, including the Puccinia striiformis f. sp. tritici PST-130 v1.0 assembly from the Joint Genome Institute, and provides updates to existing genomes and annotation. In particular, a new, manually-annotated genebuild, curated by the community using the WebApollo tool, has been added for Botrytis cinerea B05.10;
  • Ensembl Metazoa adds three new genomes, including that of Hessian fly. In addition, orthologue metrics have been calculated for all metazoan species and have been used to compute a set of “high-confidence” orthologues;
  • Ensembl Plants includes a new genome assembly and genebuild for Sorghum bicolor, and an updated genebuild for maize. New variation data are available for bread wheat, as are new comparative peptide analyses for all species;
  • Ensembl Protists contains 11 new genomes, along with revised genomic assemblies for more than 25 other species. Variation data have been newly included for Phaeodactylum tricornutum, and have been updated for Phytophthora infestans and Plasmodium falciparum; new comparative peptide analyses have also been performed.

Please see the release notes for full details of the updates: http://ensemblgenomes.org/info/release-notes/35