Please note that the archive websites for Ensembl release 62 (April 2011) and 63 (June 2011) will be retired in August when version 76 is released.

This is in accordance with our rolling retirement policy, whereby archives more than three years old are retired unless they include the last instance of the previous assembly from one of our key species (human, mouse and zebrafish).

For more information about how to use archives, please see our previous blog post on the topic; a list of all current archives is available on the main website.

With the release of Ensembl 76 fast approaching, the variation team would like to provide more information on how we moved our variation data to the new human assembly, GRCh38. There are different methods available for re-annotating variants on a new assembly. The most accurate way would be to re-run experiments, or variant calling pipelines that identified the variant in the first place, on the new assembly. The necessary material and computational resources required for such an endeavour, however, are very expensive. Therefore, we have developed computational methods so that, for most of the data, such investments are not necessary.

Considering that the new assembly retained lots of sequence information from the previous assembly, we can use computational methods that try to derive the new location based on information about a variant we have already available namely the:

  • Location on the old assembly
  • Flanking sequence (DNA sequence from the old assembly surrounding the variant)

Based on this prior knowledge we can either project or remap our variation data.

Projecting variants

The projection algorithm compares two assemblies and computes the new location based on sequence similarity between the two assemblies. The computation of the new location is successful for ~98% of our variation data. However, when the sequence in the new assembly has changed too much compared to the old assembly, the projection fails, and for those variants we then go out to use a remapping strategy, as explained below. The projection functionality is implemented in the Ensembl core API.

Remapping variants

For the remapping approach we generate a sequence read by adding upstream and downstream sequence from the old assembly to a variant. We then map the read to the new assembly using BWA.

Workflow

We have ~64M variants and ~69M variation features (VF) in Ensembl release 75, GRCh37. You can think of a variation feature as a combination of a variant and its location on the genome. Most variants have one variation feature. If a variant maps to multiple locations on the genome, the variant has as many variation features as it has locations on the genome.
We can divide variation features into:

  1. VF that map uniquely to the reference genome (chromosome 1-22, X, Y, MT)
  2. VF that have multiple mappings on the reference genome
  3. VF that are located on an alternative locus
  4. VF that are located on a fix patch region
remapping_workflow

Workflow for re-annotating variation data to the new assembly

We first attempted to project all VF that map uniquely to the genome or are located on alternative loci. In a second attempt we use our remapping approach for VF that couldn’t be projected. For the ~62M variants with a unique location on GRCh37, only ~200,000 variants could not be projected and were remapped to the new assembly. Variants with multiple mappings on GRCh37 have been remapped to GRCh38 using their flanking sequence information as submitted to dbSNP.

As a result, both projection and remapping create the new set of variation feature locations in release 76. We do not need to re-annotate variants located on fix patch regions from GRCh37 because the fix patch regions have been incorporated into the primary sequence for the new assembly.

Alternative Loci

The Genome Reference Consortium increased the number of alternative sequence representations for variant regions (ALT LOCI) in GRCh38. In our workflow diagram we described how we re-annotate variants to ALT LOCI that were present in GRCh37. Additionally, we provide variant annotations to new ALT LOCI by remapping variation features from the primary reference sequence (GRCh38) that overlap an alternative locus. We added ~1.5M extra variation features with this approach. This gives however only an idea of how known variants map to ALT LOCI. Ideally, you would do the variant calling against the set of primary reference sequences and ALT LOCI. We can expect variants will be called on ALT LOCI in the near future as variant calling tools include the option of including ALT LOCI information.

Are you ready to move to GRCh38?

Ensembl provides a reliable representation of variation data on the new human assembly, GRCh38. In addition to re-annotating variation data from release 75 to 76, we also updated our data (e.g. from ClinVar, the NHLBI GO Exome Sequencing Project or from COSMIC) and projected and remapped the data where necessary to the new assembly GRCh38. But there is no need to worry if you are not yet ready to make the move to GRCh38. Starting with Ensembl release 76 we will support and update variation annotation for GRCh37 and GRCh38. If you have questions or comments, please get in touch with us.

With the release 76 looming large in our calendars and the final deadlines out of the way for GRCh38 data production, it’s a good time to look back and take stock of what we’ve been doing in the Ensembl Regulation office. We have been rather quiet the past few months, working feverishly on an ambitious overhaul of our infrastructure. We’ve already given you a sneak peak at the new Ensembl Regulatory Build, so I’d like to take a look at the work horse underlying all of our data, the ‘Ensembl Regulation Analysis Pipeline’.

The end result is a core resource that centralises epigenomic data from multiple public sources, processes them through a universal pipeline, then summarises them into easily understood annotations. Ensembl Regulation aims to be a single entry point to obtain an overview of all the available regulatory data, from individual datasets to summary annotations, all coming to a browser near you, very soon. Underlying it is a full ‘end to end’ pipeline for producing the input data to the Regulatory Build, from fastq download, to alignment, IDR processing, peak calling and finally motif alignments.

The inputs to the Regulatory Segmentation and Build are experiments (Chip-Seq & DNAse-Seq) describing the chromatin status (i.e. histone modifications) and transcription factor landscape across various cell lines. These experiments come from large projects (e.g. ENCODE, Roadmap Epigenomics and BLUEPRINT), through to individual experiments made accessible via archives such as the ERA/ENA, SRA and GEO.

The main outputs of the pipeline are genome alignments, peak calls and  ‘collection’ files which provide coverage statistics across the genome. Managing and processing these data is no simple task, and we expect the number of available epigenomic datasets to increase significantly in the years to come. Also, with the arrival of GRCh38, we needed to reprocess all of the existing data in a short timeframe. We therefore integrated our processes into a shiny new fully automated pipeline using the ensembl-hive framework. Here follows a brief summary of the new features of the regulation analysis pipeline.

The Tracking Data Base

This now constitutes our main analysis and archive database, tracking the data both within our pipeline, but also in external repositories. In it, we register the meta-data from different projects and data repositories, providing a single point of reference to query the data available in the public domain. This has been crucial in determining which cell lines meet the requirements for a build.

Read Alignment and Peak Calling

We first align reads using BWA, then call peaks using SWEMBL for short regions and CCAT for broader ranging histone modifications. Replicates are processed in parallel to  support ENCODE’s Irreducible Discovery Rate (IDR) methodology.

Pipeline Improvements

Flexibility has been a key aim of the redesign, and the hive infrastructure has helped here by allowing us to define each logical part of the pipeline as a separate configuration which can be ‘topped up’ as required. This means that it’s easy to run just the read alignment stage (which we require as input to the segmentation), or at your pleasure add in the peak calling and collection file writing stages whilst it’s still running.  All the necessary state information is captured in the tracking database, so it’s really easy to pick things up at any point and start running the later stages of the pipeline.

Due to the size of our input data set and the resulting rolling data footprint, we set up a garbage collection of intermediate files and added inline archiving. This has limited our footprint, and enabled us to reprocess the entire human data set in one go.

The combination of the above improvements, the new ensembl-hive implementation and a whole load of other refinements, means much less manual intervention is required, resulting in a large reduction in run times.  For the alignments in particular, what was taking several weeks now takes just ~5 days!

What does the future hold?

We’ve already identified some more optimisations to the structure of the pipeline, so the runtimes are likely to drop even further. This will be crucial to handle the hundreds of cell types currently being examined within Roadmap Epigenomics, Blueprint, ENCODE 3 and other projects. We will also be revising our schemas to better reflect tissue specific data. This is part of a larger push within Ensembl to better describe the dynamics of gene regulation and transcription.

Finally, we are keeping up with lab techniques, and will be extending our pipelines to handle newer types of data, such as chromatin conformation assays or eQTLs. Although we do not process this data ourselves, we already integrated and remapped the FANTOM5 CAGE-tag annotations onto GRCh38.

p.s. If you want even more info on the, keep an eye on this page. Once release 76 is out it will be updated with our new Regulatory Build documentation.

We’re now only a couple of weeks away from releasing our full annotation of the new human genome assembly (GRCh38). Before we make it publicly available we’d like to update you on our progress and to share a few key pieces of information.

Changes in the assembly

The GRCh38 assembly is made up of 455 top-level sequences. These sequences include 24 chromosomes, mitochondrial DNA, alternative reference loci and a number of unplaced scaffolds. For the first time ever, centromere sequences have also been included in a human reference assembly. The total contig length for this new assembly is 3.4 Gb, a small increase on the previous assembly, and the total chromosomal length is 3.1 Gb (excluding haplotypes). There are 261 alternate loci, including the LRC/KIR complex on chromosome 19 (35 alternate sequences) and the MHC region on chromosome 6 (7 alternate sequences). We have aligned nearly half a million proteins and over 200,000 cDNAs to the new assembly and have annotated a total of 63,263 models, 22,469 of which are protein-coding.

karyotype

Blue regions represent assembly gaps
Image credit: Kerstin Howe

For GRCh38, in addition to the usual steps involved in a genebuild, we have also made clone data available. The clone sets were loaded, along with other data, into the core human database. Although these data are not required for genebuilding, the information is extremely useful for some of our users.

What stage is the annotation at?

The Genebuilders have completed the final gene set, which has been merged with manual annotation from HAVANA to create the GENCODE 20 set. The data were then passed on to other teams within Ensembl so that they could carry out the remaining analyses. This entire process of data exchange between the different Ensembl teams is coordinated by the Ensembl Production team, who also conduct a series of quality control steps along the way.

The comparative genomics team (Compara) have now generated orthologues to all other Ensembl species from the new human geneset. They’ve also revised all pairwise and multi-species whole genome and transcript alignments so that users can identify conserved and constrained regions between human and other vertebrate species. Updating with the new human assembly, therefore, means that a large part of the Compara database also needs to be updated.

The Variation team have now collected all variant and phenotype data, linking the information to other data in Ensembl. This is so that useful variation data can be accessed and interpreted by our users. The variant effect predictor (VEP), for example, is an extremely useful tool that determines effects of variants, such as SNPs or indels, on genes, transcripts, proteins, regulatory regions and phenotypes. A user simply has to input the coordinates and sequence changes of the variants of interest.

And finally, the Regulation team have used the new Ensembl regulatory annotation build to locate regions in the human genome that are involved in the regulation of gene expression.

EnsEMBL_Web_Component_Location_ViewBottom-Homo_sapiens-Location-View-76-

Some sample regulatory features as seen in the Ensembl browser

Now that the last parts of the relevant analyses are being completed, the Ensembl Webteam are currently working on the Ensembl website, ensuring that all the relevant data will be accessible to you in the most user-friendly manner.

The final release is still on target for the end of July, after which the GRCh37 annotation will be available on a separate archive site. Although we have produced the GENCODE 20 gene set for the upcoming Ensembl release (e76), we are still in the process of refining it. We therefore recommend, particularly for large consortia, waiting for the GENCODE 21 release, which will be available with e77. In the mean time, until the e76 release, the human Pre! site is still up and running.

If you have any questions then please don’t hesitate to contact us, either through twitter or by emailing helpdesk.

As mentioned in another post, due to the presence of patches in both GRCh37 and GRCh38, the assembly mapping has proven challenging.
Related to this, another novelty arises when assigning stable ids to genes.

Every time a gene set is updated for a species, we compare the newest gene set with the previous one.
If we find a perfect match between the two gene sets, the stable id assigned to the older model will be used for the new model.
Even if the model has changed slightly (longer UTR for example), we try to map the old stable id whenever possible, with a version change to indicate that it was not a perfect match.

To provide a better comparison between the last GRCh37 gene set (e!75) and the new GRCh38 gene set (e!76), we have decided to project the old set onto the new assembly. This allows for overlap comparisons rather than simple sequence alignments. However, this means that around 2% of the genes are lost, as they can not be mapped onto the new assembly. If these gene models are still present in the new assembly, they are being assigned a new stable id.

Putting this in perspective of patch fixes integrated into the new reference, we also have cases where two genes in GRCh37 (one of the reference, one on the patch) both match the same gene on the new reference in GRCh38.
In that case, we have decided to arbitrarily keep the longest standing stable ID, which is likely to be the one on the reference.
The stable ID which was used on the patch is recorded as retired but a link is provided to its replacement. For example, searching for ENSG00000260384 (SERINC2 gene on HG989_PATCH) will redirect the user to ENSG00000168528 (SERINC2 on the primary assembly).

Screen Shot 2014-06-27 at 10.46.23Screen Shot 2014-06-27 at 10.48.13

This resulted in the deletion of around 3% of our genes.

In other cases, the difference between the GRCh37 reference (without patch) and the GRCh38 reference (with integrated patch fix from GRCh37) is too important to project annotations from the reference. Only annotations from the patch are then kept, along with the stable ids. For these cases, if there is a known alt_allele to a gene on the GRCh37 reference, it is added as a link to its equivalent on the patch.

Consequently, searching for ENSG00000183678 (CTAG1A gene on the GRCh37 primary assembly) will redirect the user to ENSG00000268651 (CTAG1A gene on HG1497_PATCH in GRCh37, on the primary assembly in GRCh38).

As mentioned in the blog post about the new gene set, a new assembly implies a number of underlying changes in the gene structure.
Despite this, 95% of all the gene stable ids have been assigned to the new gene models.
With this work, we try and ensure that you will still be able to find your favourite gene using the same stable id as in GRCh37.

We are pleased to announce the public release of manual annotation on the new human GRCh38 assembly on the Vega website.This release follows on from the publication of a preliminary gene set on Pre! Ensembl and represents one of the final steps before the release of the full human Gencode 20 gene set in Ensembl release 76.

Vega website.

The Vega website uses Ensembl technology to present the latest manual annotation produced by the Havana group based at the Welcome Trust Sanger Institute. It has significance for researchers who want to see the most up to date annotation – every two weeks we run a streamlined, automated production pipeline that identifies new or updated annotation and presents it on Vega. Consequently there is never more than 14 days between annotation being created or updated by Havana and being made available to the public.

Vega update gene

Annotation of gene PCDHB9 has been updated within the last two weeks

Human GRCh38 manual annotation gene set.

The actual gene numbers have not changed greatly overall, but there has been a lot of work going on in the background to refine the gene set. The numbers of genes on GRC patches have been reduced from GRCh37 as many of these patches have now been incorporated into the primary genome assembly.

The initial step in the manual annotation of the new assembly was a computational one, projecting the manual annotation from GRCh37 onto GRCh38. As a part of this process we generated a list of the loci that did not project due to genomic changes. Many of them were in the regions of greatest change between assemblies including regions of chromosomes 1, 9, 17 and X. There were about 800 of these loci, and each of these needed manual intervention. This took a dedicated effort by the Havana group over about a three week period. The changes made fall into a number of categories:

(i) The use of single haplotypes across certain gene clusters, such as the XAGE and GAGE gene families on the X chromosome.

(ii) Filling, moving or even introducing gaps in the assembly to give a much more accurate representation of difficult regions. An example of such re-arrangement is the XAGE1B gene that is now placed on the opposite strand compared to the previous assembly.

(iii) A decrease in the number of polymorphic pseudogenes due to changes made in the assembly to include a haplotype with a coding version of the gene.  Polymorphic pseudogenes are coding in some individuals and disabled in other individuals due to sequence variation.

(iv) A large increase in the number of long non-coding RNAs (lncRNA) because we have been able to take advantage of new RNA-seq and PolyA-seq data rather than because of the new assembly per se.

Further annotation of the new assembly is ongoing, with the focus having changed from fixing projection errors to finalizing the annotation.

Merge with Ensembl geneset (Gencode 20)

The Havana manual annotation has been merged with the annotation arising from the rerun of the Ensembl genebuild pipeline. This improves the gene set, primarily by taking into account new experimental evidence generated since the manual annotation was originally performed. In addition, the comparison between the manually and automatically generated gene sets contributes to the continuous enhancement of both annotation systems. It is the merged gene set that will be released as Gencode 20.

The Amazon molly (Poecilia formosa) is now available on Ensembl Pre! This particular species is especially interesting to scientific research due to its origins, its method of reproduction and the manner in which it interacts with other closely related fish.

Amazon molly

Amazon molly

The single-sex interspecies school
Considering the name of the species, you would be forgiven for thinking these fish can be found swimming around The Amazon River. The Amazon molly actually resides in the warm waters of North-eastern Mexico and Southern Texas, and derives its name from something far more interesting than its habitat.

One of very few asexual vertebrates, this fish reproduces via a process known as gynogenesis, or sperm-dependent parthenogenesis. Despite being a method of asexual reproduction, gynogenesis does involve the mating of a male with a female. However, the genetic material from the male is not incorporated into the already diploid eggs and the sperm serves only to trigger embryonic development, thereby producing clones of the mother. The entire species is therefore female, and is thus named after the legendary society of female Amazon warriors.

Life finds a way
Due to the absence of male Amazon mollies, the females act as sexual parasites by mating with males from other closely related species. These mates come from species such as P. latipinna, P. mexicana, P. latipunctata and, occasionally, P. sphenops. In fact, it is thought that the Amazon molly originated from a hybridization event between two of these species, the Atlantic molly (P. mexicana) and the Sailfin molly (P. latipinna), approximately 280 KYA. However, all attempts to create P. formosa-like hybrids in the laboratory have, so far, been unsuccessful.

molly_distr

Distribution of molly species in coastal regions of The Gulf of Mexico

As the male fish do not contribute their genes to the next generation, one would expect that natural selection would act against them being ‘fooled’ into mating with the heterospecific Amazon females. Furthermore, experiments indicate that the males are able to tell the difference between females of their own species and the Amazon species. So why do they mate with these Amazon mollies? Unfortunately, the answer is that we simply don’t know. However, findings have suggested that the male individuals may actually benefit from this behaviour as mating with Amazon mollies seems to make them more attractive to females from their own species. The strange relationship between the Amazon mollies and these male mollies may therefore benefit both parties.

Asexual versus sexual
The main advantage of asexual over sexual reproduction, in any species, is an increase in reproductive output. With asexual females, there is no need to produce males that cannot give birth, resulting in twice the amount of grandchildren than would be produced by sexual reproduction. Asexual reproduction, therefore, should be the preferred method. As Amazon molly offspring are clones of the mother in an environment in which the mother was able to survive, they are also likely to survive and reproduce. This type of reproduction helps colonize new territory very quickly, but a population that reproduces in this manner will likely be unable to adapt to changing environments. Additionally, according to an evolutionary theory known as Muller’s ratchet, deleterious mutations in small asexual populations can accumulate at a fast rate due to a lack of gene recombination, which can eventually result in extinction.

Why study the Amazon molly?
A popular endeavour in modern evolutionary biology is to explain the evolution and persistence of sexual reproduction, given the higher costs of producing male individuals when compared with asexual reproduction. One effective way to research the relative strengths and weaknesses of the two reproductive methods is to study the dynamics of the coexistence of sexual and asexual organisms. The Amazon molly’s unique situation, both with respect to the way in which it reproduces and its interaction with other molly species, makes it an extremely valuable model. It has already been used in studies focused on determining whether or not sexual selection is necessary for high diversity of the MHC. Findings have suggested that the asexual molly has polymorphic MHC loci despite its clonal reproduction, yet these loci are more polymorphic in the sexual species. The Amazon molly is also used as a model for carcinogenicity studies, and is extremely easy to breed and rear in captivity. Furthermore, the clonality of the fish allows researchers to carry out studies on individuals that are genetically identical.

Browsing the genome
The Amazon molly genome assembly was made publicly available in October 2013. We have carried out a preliminary gene annotation, generated by alignments of Ensembl human, stickleback and zebrafish translations from Ensembl release 75. You can find this information on our Pre! site.

region_in_detail

Region of the Amazon molly genome as seen in the Ensembl browser. The gene models shown are derived from human and zebrafish proteins.

We’re extremely excited to be carrying out a complete genebuild, incorporating data such as RNASeq, which will be available in a future Ensembl release. Keep an eye on our blog to find out when, and if you have any questions feel free to contact us.

 

As you may know, the new GRCh38 assembly for human was released in December 2013. This is a major update for Ensembl and will require months of hard work to provide high quality annotation for our users. Our goal is to provide a full genebuild on the GRCh38 assembly, as well as regulation, comparative and variation features.

As part of the Ensembl core team, I am responsible for generating a reliable mapping between the GRCh37 and GRCh38 assemblies. This mapping will be used by other teams to project existing annotations onto new coordinates. Therefore, it is important to get this right if we don’t want to end up with features in the wrong location!

The basic principle of assembly mapping is relatively simple. Let’s say we are mapping chromosome 1 in GRCh37 to chromosome 1 in GRCh38. For both chromosomes, we get the list of contigs used to construct the chromosome. If the same contigs are used, in the same order, in both chromosomes, these can be mapped directly. For the remaining unmapped regions, where no shared contigs can be found, the sequences are aligned using lastz.

Screen Shot 2014-02-28 at 17.38.03

Results of the mapping: how similar are the assemblies?

Faced with the results, the mapping meets our expectations. For the 24 chromosomes as well as haplotype regions, we map between 95 and 100% of the non-N sequence. Out of 82 regions, 72 map over 99%. To check the consistency of these mappings, the Ensembl gene set (GENCODE) is copied from GRCh37 to GRCh38. 97% of the transcripts find an identical model in GRCh38, with 98.5% of exons mapped correctly. Only 1.5% of the total transcripts do not have an equivalent model in the new assembly. This is expected, as we know some regions in GRCh37 do not exist in GRCh38.
For example, the gene PPIAL4A, associated to CCDS30835.1, is on a reference region in GRCh37 which is overlapped by patch HG1287. In GRCh38, that region does not exist and PPIAL4A is lost. The PPIAL is a family of retrogenes and other PPIAL4 models will still be in GRCh38.
Screen Shot 2014-02-27 at 10.19.06

Two additional regions have proved challenging for our mapping.

Chromosome 17:22904289-37003842:
This region in GRCh37 has become a haplotype in GRCh38 (HSCHR17_1_CTG4). As we do not provide mappings between haplotypes, we have only an approximate alignment between the reference in GRCh37 and the reference in GRCh38.

Chromosome 9:42900000-66450000, flanking the centromeric region:
This region in GRCh37 corresponds to 9:40700000-61600000 in GRCh38 and has undergone some massive changes on a sequence level. Some of the contigs have been split, shortened, extended, or simply removed. This means that gene models located on this region will change considerably from the gene models in GRCh37.

If your favourite gene is not in one of these regions though, there is a good chance you will be able to identify it using the same stable_id as in release 75! You’ll be able to read more about stable id mapping in a future post in this series.

Challenge: Patches

One of the major challenges when mapping GRCh37 to GRCh38 comes with patch regions. Shortly after GRCh37 was first released, a number of sequence differences were noticed. Rather than provide a whole new assembly, the concept of patches was introduced.

For regions where a sequencing error was corrected, a patch fix was added. It contains the corrected reference sequence as well as some padding on both ends, to locate it onto the genome.

In Ensembl, we provide annotation for both the reference and the patch region. Where the modified sequence is relatively short, a number of annotations are identical between the reference and the patch.

For example, CHAMP1 is a merged gene on chromosome 13 but has also been annotated on patch HG531_PATCH.
Screen Shot 2014-02-26 at 13.29.16 Screen Shot 2014-02-26 at 13.28.48

For regions where an alternative sequence was found, a patch novel is added.

In GRCh38, all patch novels will still exist as haplotypes. For the patch fix, it is another story altogether. Given these patches are fixing an error in the reference sequence in GRCh37, they will become the reference in GRCh38, replacing the GRCh37 sequence. This means that we are likely to keep the annotation produced on patches in GRCH37 while losing the GRCh37 reference annotation.

To deal with the special patch cases, we add an additional step in the assembly mapping. For patch fixes in GRCh37, we know their contig composition, as well as where they are mapped against the reference. Presuming the contig composition has not changed, we should be able to locate the same region in the reference in GRCh38. It should then be possible to map any feature in GRCh37, whether on patch or reference, onto GRCh38.

Working as an Outreach Officer for Ensembl means lots of exciting adventures around the world to teach Ensembl to local scientists. Last month I was privileged to be able to travel to South Korea to give an ENCODE workshop with Bob Kuhn from UCSC.

Bob and me, with our Korean hosts

Bob and me, with our Korean hosts

The participants were interested to learn about how Ensembl and UCSC use genome assemblies from projects like the Genome Reference Consortium. It was clear that many people had not realised the extent of the work that goes into producing a genome assembly, by sequencing the genome in contigs then putting them back together (learn more in this video). Or how errors in the human genome are dealt with using patches. I was able to explain to them how Ensembl works together with groups like Havana to produce the genes for the trusted GENCODE gene sets for human and mouse, and how they could find out about these genes in Ensembl.

Even more excitingly, I got to preview some new data. The Ensembl regulation team gave me access to their new regulatory build track hub, which you can learn more about in Daniel Zerbino’s blog post. I was able to show off how Ensembl bring together and process the raw ChIP-seq data from ENCODE and other sources to try to identify where regulatory features might be on a genome-wide scale, and the activities of those features. It was exciting to be able to preview something new for the workshop participants, and their feedback suggests that the new regulatory build is going to be a hit.

Bob showed off the UCSC genome browser in the same workshop, and we worked together well. Though we have competing browsers, we’re really on the same team: the team that believes high quality genomic data and tools should be available to all and works hard to provide that. We can learn from each other to provide the best way of giving our users the data and analyses they need.

My next stop is the Open Door Workshop in Uruguay. Don’t forget, you can host a workshop at your institute to learn the basics of using Ensembl, and to find out about the latest Ensembl functionality and what’s coming with the new human assembly.