What’s New in e86:

Mouse strain genomes

In Ensembl 86, you will now be able to view the annotated genome assemblies, variation data and comparative analyses of 16 different mouse strains, produced by the Mouse Genomes Project. While the GRCm38 assembly (produced from Mus musculus strain C57BL/6J) remains the reference assembly, variants and comparative analyses for the other strains can be viewed through the Gene tab and the Location tab. You can find the gene trees and orthologue/paralogue predictions for the mouse strains through the Strains option in the menu in the Gene tab. The mouse strain gene tree depicts the evolutionary history of genes (left) and protein alignment (right) for the individual mouse strains and rat. mouse strain treemouse strain orthologues You can find the variants between these mouse strains through the Strain table option in the menu in the Location tab. The strain table displays the alleles identified at variant positions across the 16 mouse strains. strain variant table

Updated assemblies, gene sets and annotations

In Ensembl 86, there will also be a number of updates to the assemblies and gene sets for a number of different species:

  • Human: updated cDNA alignments and RefSeq import
  • Mouse: updated cDNA alignments and RefSeq import
  • Zebrafish: updated gene set and RefSeq import
  • Chicken: updated to the Galgal_5.0 assembly
  • Mouse lemur: updated to the Mmur_2.0 assembly
  • Macaque:  updated to the Mmul_8.0.1 assembly

New lincRNA data

New Mobile Site Views

As of release 86, you can now view transcripts on the mobile version of Ensembl. You can also view exon sequence, cDNA sequence and protein sequence by clicking on the lefthand arrow.

mobile site- transcript[1]mobile site- transcript[2]

The gene sequence is also now available to view on mobile devices. Just go to any gene page and click on the left hand arrow and then choose sequence.

1

Other News

  • Variation and phenotype databases updated
  • You can now select ‘Manhattan plot’ as an option when configuring bigWig files

A complete list of the changes can be found on the Ensembl website

Find out more about the new release and ask the team questions, in our free webinar: Tuesday 11th October, 4pm BST. Register here.

Ensembl 86 is scheduled for September 2016, highlights include:

New mouse strains

  • Annotated genome assemblies, variation data and comparative analyses of 16 different mouse strains will be included in Ensembl 86.

Updated assemblies, gene sets and annotations

  • Human: updated cDNA alignments and RefSeq import
  • Mouse: updated cDNA alignments and RefSeq import
  • Zebrafish: updated gene set and RefSeq import
  • Chicken: updating to the Galgal_5.0 assembly
  • Mouse lemur: updating to the Mmur_2.0 assembly
  • Macaque:  updating to the Mmul_8.0.1 assembly

New lincRNA data

New GRCh37 tools converted from 1000 Genomes Project

A number of tools previously developed for use in the 1000 Genomes Project browser have now been converted for use with the GRCh37 assembly in Ensembl:

  • Dataslicer tool- This tool allows you to get a subset of data from a BAM or VCF file.
  • Variation pattern finder tool- This tool allows you to identify variation patterns in a chromosomal region of interest for different individuals.
  • Forge analysis tool- This tool takes a list of variants and analyses their enrichment in functional regions from the ENCODE or Roadmap Epigenome project on a tissue specific basis.

Other updates and highlights

  • Variation and phenotype databases updates

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

As part of Ensembl 85, we are excited to introduce expression quantitative trait loci (eQTL) data, through our partnership with the Genotype-Tissue Expression (GTEx) project.

The GTEx project has the goal of identifying the influence of genetics on tissue-specific gene expression, i.e. to map correlations between genotype (SNPs) and gene expression levels (RNA-seq). eQTLs are variants which are found to be significantly correlated with differences in gene expression. Though still in its infancy, we hope that in time this type of data will allow us to conclusively determine the link between regulatory features and their gene targets.

Thanks to our use of HDF5 technology, we offer the only rapid look-up service across all GTEx SNP-gene association tests. We have included all of the correlated variants, including those that fall short of the significance threshold. The GTEx V6 dataset represents 7051 tissue samples from 44 tissues of 449 donors, and a total of 6 billion data points.

 

GTEx eQTLs in the Ensembl Browser

To view GTEx eQTL data for any gene, navigate to the gene tab and select ‘regulation’ in the left panel. The display will show one example track of GTEx data for a single tissue. Configuring the page allows you to add more GTEx tracks for each tissue type, by selecting ‘other regulatory regions’ and choosing the tissues you are interested in:

Screenshot 2016-08-12 15.20.33

The SNPs are displayed in a Manhattan plot on these tracks, and are coloured according to their consequences on the transcript – as determined by the VEP. Clicking on any of the variants will display correlation statistics and a link to the variant tab. Where the SNPs are clustered, clicking will bring up a list of all variants nearby:

Screenshot 2016-08-12 11.56.19

 

GTEx eQTLs via REST API

We have also provided Ensembl REST API endpoints to access these data. Currently, these methods allow you to quickly find the beta correlations and their p-values filtered by gene, SNP and/or tissue. You can also list all the tissue types that are currently available on our server.

Screenshot 2016-08-10 10.54.46

 

What’s next?

Currently we are displaying the variants around a gene and their correlation to its expression level. In our next release (e86), on the Variant view, we will display all the genes whose expression levels are correlated to that variant. We will also display the beta effect sizes on the Manhattan plots.
If you have any feedback or questions relating to eQTLs in Ensembl, please contact the helpdesk.

What’s new?

Ensembl Plants now has an archive site, where we will keep selected previous releases of Ensembl Plants publicly available. The first release available on the archive site is release 31, and includes the previous assemblies for wheat and maize.

plant archive

New assemblies in Ensembl Plants include:

  • A new assembly of the bread wheat genome (TGACv1). The assembly has a scaffold N50 of 88 Kbp and a total length of 13.4 Gbp in contigs greater than 500 bp. Approximately 99,000 genes (99% of the total) annotated on the previous IWGSC Chromosome Survey Sequence Assembly have been mapped to the new assembly
  • An updated assembly of the Zea mays genome (AGPv4)
  • Genome assemblies for 5 new species, including Beta vulgaris (sugar beet), Brassica napus (rapeseed) and Trifolium pratense (red clover)

 

Ensembl Metazoa: Rfam covariance models have been applied to all metazoan genomes, and are shown in the ‘Rfam models’ track in the genome browser. Click on a model to see the description and the secondary structure.

rfam_model_example_1

Ensembl Bacteria now includes the latest versions of 41,610 genomes (41,198 bacteria and 412 archaea) from the INSDC archives. In this release we added 2269 new genomes, 15 genomes with updated assemblies, 212 genomes with updated annotation, 906 genomes where the assigned name has changed, and 243 genomes removed since the last release.

Ensembl Fungi has been updated with 47 newly available genomes and now includes 634 genomes from 388 species. PHI-base references have been added where available, as have non-coding RNA matches to Rfam.

25 new genomes have been added to Ensembl Protists, which now includes 178 genomes from 114 species.

You can find more details in the release notes.

What’s New in e85:

  • 30 new human epigenomes from the Roadmap Epigenomics Project
  • Human and mouse: Updated GENCODE set; including manually annotated HAVANA annotation, and all CCDS genes 
  • Imported symbol names from the Vertebrate Gene Nomenclature Committee (VGNC) for Chimpanzee
  • Improved highlighting options in the Location View for Userdata and Tracks
  • Wasabi Tree viewer

30 New Epigenomes from the Roadmap Epigenomics Project

Roadmap Epigenomics is producing epigenomic maps for stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease. In Ensembl 85, we have run our regulatory pipeline on Roadmap Epigenomics data for 30 cell/tissue types.

Additionally, the peak calling component of the Ensembl Regulation Sequencing Analysis pipeline has been improved. All of the existing ENCODE and BLUEPRINT data in Ensembl’s Regulation database have been reprocessed.

Human Gene Set Update and New Assembly Patches

The human gene set now corresponds to GENCODE 25 and the assembly has been updated to include new assembly patches for GRCH38.p7

VGNC Symbols for Chimpanzee

We now import symbol names from the Vertebrate Gene Nomenclature Committee (VGNC) for Chimpanzee. VGNC is an extension of HGNC for standardising naming across vertebrates lacking a nomenclature committee by transferring gene symbols from human to known orthologues. This replaces our own system for naming chimpanzee genes. This has improved our naming of chimpanzee genes as seen in POMGNT2, which was previously named GTDC2 in Ensembl 84.

Userdata and Track Highlighting

We have updated the highlighting functions in Location View:

  • User data Highlighting – When you upload your own data to Ensembl the newly uploaded track will be automatically highlighted. This highlighting will disappear when you hover your cursor over the track
  • Highlight track on hover – When placing the cursor over any track, the whole width of the track will be highlighted
  • Track menu Highlight Icon –  Track menu now contains an icon that allows you to manually turn highlighting on/off

track-click-menu

Wasabi Tree Viewer

Wasabi has replaced Jalview as a way to view gene trees and multiple alignments. Clicking on any node within the gene tree will give you the option to ‘View in Wasabi’. 

wasabi-menu-option

Clicking on ‘View in Wasabi’ will open a pop-up window with the tree and alignment:

wasabi-output

Other News

  • Updated Ensembl-Havana rat gene set; a merge of complete Ensembl gene models and the latest Havana gene annotation
  • Human and mouse CRISPR sites, predicted by Wellcome Trust Sanger Institute Genome Editing (WGE) have been added
  • Human and mouse databases have been updated to dbSNP147 and dbSNP146 respectively
  • Phenotype data updated for several species, including human, mouse, pig and chicken
  • GTEX eQTL data for 14 human tissues has been added to the Gene Regulation view
  • The Allele Frequency Calculator has been migrated from the 1000Genomes website to our GRCh37 archive. This tool takes a VCF file and a matching sample panel file, and calculates allele frequencies for one or more 1000G populations for a defined chromosomal region
  • New tool: File Chameleon customises files from the Ensembl FTP server. Current functions include; adding ‘chr’ to your chromosome names for use with UCSC’s genome browser and removing long genes

A complete list of the changes can be found on the Ensembl website

Find out more about the new release and ask the team questions, in our free webinar: Wednesday 27th July, 4pm BST. Register here.

Ensembl 85 is scheduled for July 2016, highlights include:

Updated gene sets and annotations

  • Human: updated to GENCODE release 25, new CCDS import
  • Mouse: updated to GENCODE release M10,
  • Rat: updated Ensembl/HAVANA gene set
  • Vega 65 annotation added for human, mouse and rat
  • C. elegans gene set and other annotations updated from WormBase release WS250
  • Zebrafish: new development and tissue-specific RNA-seq tracks
  • Armadillo/Dog/Ferret: new lincRNA models

Variation data imports and updates

  • dbSNP updates for human (v147) and mouse (v146)
  • COSMIC v77 data update
  • New and updated structural variant studies from DGVa for human and dog
  • Updated phenotype data for several species, including; human, mouse, rat, zebrafish and cat

Regulation data

  • 23 new human epigenomes from the Roadmap Epigenomics Project
  • ENCODE and BLUEPRINT data reprocessed with improved peak-calling pipeline

New web features

  • Track highlighting for newly displayed tracks and on hover-over
  • Summary information from SNPedia is included on Variation Summary pages
  • Wasabi will replace Jalview for gene trees and multiple sequence alignment visualisation
  • Web code for session records will be migrated to use Rose ORM

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

We have scheduled the next releases of Ensembl (Release 85) and Ensembl Genomes (Release 32) for July 2016. Details of the declared intentions will be announced nearer the time.

Please contact the helpdesk if you have any questions or feedback.

What’s new in Ensembl Genomes 31?

There are legs and tentacles everywhere in this release of Ensembl Metazoa, as ten new species scuttle, swim and slither into our databases. From the Antarctic midge to the California two-spot octopus, the new species illustrate the diversity of metazoa. Our new Metazoan species also include dog and rat parasites (the itch mite and a nematode), as well as species that pose significant problems for agriculture (Australian sheep blowfly) and aquaculture (the salmon louse and a myxosporean). The common bumblebee is an important pollinator, a brachiopod represents a new phylum in Ensembl Metazoa, while the African social velvet spider is a fascinating model of sociality and is the first spider in Ensembl Genomes.

Belgica_antarcticaBombus_impatiensLingula_anatinaLucilia_cuprinaOctopus_bimaculoidesSarcoptes_scabieiStegodyphus_mimosarumStrongyloides_rattiLepeophtheirus_salmonisThelohanellus_kitauei

Not to be outdone, Ensembl Protists is now updated to 158 genomes from 104 species and Ensembl Bacteria has been updated to include the latest versions of 39,584 genomes (39,183 bacteria and 401 archaea) from the INSDC archives.

Other news

Fungi: Updated annotations based on PHI-base 4.0 have been included. New variation data for Schizosaccharomyces pombe.

Protists: Addition of 4 protist species for pan-taxonomic comparative analysis (Monosiga brevicollis, Thecamonas trahens, Cryptomonas paramecium and Chondrus crispus), meaning that Ensembl Compara now includes protists from all the major Eukaryotic clades.

Plants: There are now 350,000 new rice variations across 3,000 rice accessions from 89 different countries as well as track hubs for more than 900 public RNA-Seq studies, totalling more than 16,000 tracks across 35 different plant species.

MetazoaUpdated gene sets for the leaf cutter antred fire ant and the two-spotted spider mite as well as updated gene sets from VectorBase and WormBase.

Check out all the changes on our Ensembl Genomes website.

Any questions or comments? Email us.

What’s new in e84:

  • Human: Incorporation of BLUEPRINT Epigenome data and methylation data
  • Pairwise Linkage Disequilibrium (LD) calculation on LD variant page
  • Track hub registry interface
  • Transcript haplotype view

Incorporation of BLUEPRINT Epigenome data

BLUEPRINT is a large scale research project aimed at deciphering the epigenome of blood cells. ChIP-seq and DNase hypersensitivity data from the BLUEPRINT project has now been incorporated into Ensembl. All of the cell types analysed in the BLUEPRINT project are listed here. In Ensembl 84, we are including BLUEPRINT data for the following 20 independent cell types, divided based on cell lineage and tissue source:

CD14+ CD16- monocyte from Venous Blood
CD14+ CD16- monocyte from Cord Blood
CD4+ ab T cell from Venous Blood
CD8+ ab T cell from Cord Blood
CM CD4+ ab T cell from Venous Blood
eosinophil from Venous Blood
EPC from Venous Blood
erythroblast from Cord Blood
HUVEC prol from Cord Blood
M0 macrophage from Cord Blood
M0 macrophage from Venous Blood
M1 macrophage from Cord Blood
M1 macrophage from Venous Blood
M2 macrophage from Cord Blood
M2 macrophage from Venous Blood
MSC from Venous Blood
naive B cell from Venous Blood
neutro myelocyte from Bone Marrow
neutrophil from Cord Blood
neutrophil from Venous Blood

This data can be viewed alongside other tracks in Ensembl by using the ‘Configure this Page’ option and selecting your cells of interest.  configure this pageBLUEPRINTex2

Pairwise LD calculation

You are now able to calculate linkage disequilibrium (LD) between any two variants in Ensembl. To calculate the r2 and D’ values for LD between two specific variants, enter the ID of any variant into the LD calculation text box on the specific page of the reference variant. This feature can be found by clicking on ‘Linkage Disequilibrium’ from the menu on any variant page.

LDcalc2

Track Hub registry interface

With the arrival of the new Track Hub Registry, we have added a feature that allows you to search for track hubs of interest and attach them directly to Ensembl. Just click on the ‘Add your data/Manage your data’ button on any Ensembl page, and select ‘Track Hub Registry Search’ from the lefthand menu. manage your dataTrackHubRegistryInterface

The interface will only search for hubs that have assemblies available for the site you are on; to see the full range of species and assemblies, visit the Track Hub Registry site directly.

Transcript haplotype view

The transcript haplotype view is a new data view we have implemented that allows you to explore observed transcript sequences that results from variants identified from resequencing data from the 1000 Genomes Project. By clicking on the ‘Haplotypes’ link on any transcript page, you are able to view protein consequences, population frequencies and protein alignments of all the haplotypes for that particular transcript.

Transcript_haplotype_view Screen Shot 2016-03-02 at 11.01.34Screen Shot 2016-03-02 at 11.02.04

Other news

  • Mouse: update to GENCODE M9 annotation
  • Zebrafish: updated gene set, including manually annotated HAVANA annotation
  • Baboon: lincRNA model update
  • Latest sequence variants from dbSNP build 146 for human, cow and dog
  • Import of COSMIC 75 cancer data
  • New and updated studies from DGVa for several species such as human, mouse, zebrafish, macaque, cow and dog
  • Gene trees: new option to prune by target species/ taxon in the REST API
  • Ensembl Families now defined by an HMM library, based upon the Panther database.
  • Alignments in CRAM format
  • DAS support ended
  • Regulatory segments retired from the Ensembl regulation BioMart, but now available in bigbed format through the ftp site

A complete list of the changes can be found on the Ensembl website.

Find out more about the new release, and ask the team questions, in our free webinar. Wednesday 16th March, 4pm GMT. Register here.