This month we’re meeting Bronwen Aken who heads up our Vertebrate Annotation team.
What is your job in Ensembl?
I’m the Vertebrate Annotation Team Leader at EMBL-EBI. My team comprises Ensembl Compara and Ensembl Genebuild.
We create comprehensive, up-to-date gene annotation and comparative genomics resources on publicly available reference genomes. It means we identify the location, structure and expression of genes in each genome assembly in Ensembl, and then link the data from these diverse species by computing gene orthologues, multi-species whole genome alignments and conserved genomic regions.
These data are a foundation for clinical and research communities, including a major input to the Ensembl VEP that Will helps to develop.
My job is to work with the Ensembl team to ensure that we produce high quality genome annotation for an increasing number of genome assemblies. We do this by developing and running scalable pipelines that make the best use of public data.
What do you enjoy about your job?
We’re involved in such a wide variety of projects, from detailed curation of specific genes in the human genome (GENCODE, CCDS) to large-scale annotation of diverse species such as chicken, coelacanth, and Tasmanian devil.
As the number of genome assemblies grow, the opportunities to better understand ourselves and other life on Earth will grow. It’s so exciting to see that we can contribute to these studies.
What are you currently working on?
One of the biggest challenges we face now is the growing number of reference genome assemblies. When Ensembl first started, our system was designed around supporting one assembly per species. Now, there projects such as Genome10K which aim to sequence thousands of vertebrate species, and for some species there is more than one assembly of interest to the community.
Not only do we need to scale our pipelines and infrastructure to meet the growing volumes of data, but we also need to deal with increasing complexity of data as various breeds and strains are sequenced and as our input data changes in the face of new sequencing technologies.
What is your typical day?
I usually have a couple of hours of meetings. The rest of my time is spent between catching up with people individually about specific projects, administration, and emails.
How did you end up here?
I am fascinated by evolution and the natural world. From a young age, I enjoyed hunting for fossils in South Africa where I grew up, and for a long time wanted to become a paleontologist.
My undergraduate degree was in Molecular and Cell Biology, but I soon realised that wet lab research was not for me. One of my friends at university studied computer science and this led me to discovering bioinformatics. We learned about Ensembl on a genomics course that I attended.
I boarded a flight from South Africa to the UK in 2005, with the intention of backpacking for a year. It wasn’t long before I missed the challenges of science and I soon found myself happily in the Ensembl family. Here we integrate and analyse molecular data, which enables research in areas such as gene function and evolution. So I guess I’m still involved in evolutionary studies, but it’s using computers instead of digging for fossils in the dust.
What surprised you most about Ensembl when you started working here?
Ensembl, and the Wellcome Genome Campus in general, provides such a wonderfully international and collaborative environment to work in. I enjoy this immensely.
What is the coolest tool or data type in Ensembl that you think everybody should know about?
The alternate sequences that we have for the human genome and selected other species can be really informative. They allow us to annotate different alleles, and sometimes new gene altogether! I often use our Region Comparison view to compare two alleles.
For example, on human chromosome 1 there is no DNA sequence for the gene PRAMEF22. The GRC, who maintain the human genome, have added an alternate sequence that provides the DNA underlying this gene and so allows us to annotate the gene PRAMEF22. You can compare chromosome 1 to the alternate sequence here. The GRC have also released a patch for the ABO (blood group type) gene, which you can see here. Region Comparison is also available for between-species comparisons and can be useful for identifying assembly errors, for example.