We think conferences are great opportunities to use talks and posters to tell people about all the cool stuff we’re developing, provide training with workshops and learn more about what’s going on in our fields of interest. Ensembl team members attend many conferences a year and 2018 is no exception, we’re planning to attend twenty (so far)!

Continue reading

We’re holding an Ensembl Perl API course at the Genome Campus in the UK in April. The course give you chance to learn how to access the database directly from the people who produce the databases and write the APIs themselves. It is aimed at bioinformaticians and wet-lab scientists who are familiar with Object Oriented Perl.

This four-day course costs only £140, which includes daily transport to the campus from Cambridge city centre and refreshments (the fee is to cover only these expenses).

Please visit the course page for more details on the content and how to apply.

Following the success of last year’s course, we’re pleased to announce a second Free Ensembl Webinar Course.

This course allows you to learn about Ensembl for free from the comfort of your own office (or bed, no-one’s judging you), with the ability to interact live with the instructors. Perfect for those who can’t attend or host one of our live courses.

What is it?

The Ensembl online training series comprises a series of live webinars, once a week over seven weeks. Each webinar explores a specific aspect of Ensembl data or tools with a presentation and a demonstration – see the online course for details. You can then practice what you’ve learnt over the following week with online exercises.

Not all of the topics will be useful to you, so you can dip in and out of the webinars. If life gets in the way and you miss one you are keen on, we will post the videos to our YouTube channel and YouKu for those of you in China and embed them in the online course so that you can catch up.

What makes it special is that the course is fully interactive. If you attend the live webinars, you will have an opportunity to ask the instructors questions in real time. Afterwards, while you work on the exercises, you can interact with the instructors and other participants via our dedicated Facebook group. If you prefer not to use Facebook, you can also email us for help. Plus, you’ll be able to re-watch all or part of the videos at your leisure.

When is it?

We start on the 6th April, and will hold seven webinars on Thursday mornings, up until the 18th May. The live webinars will take place at 9 am BST (GMT+1), but if you are unable to attend live, the videos will be posted shortly afterwards. Since last year’s course was held in the afternoons, good for our American friends, we’re hoping that this morning course will be easier to access for anyone in Asia or Oceania.

After the live course finishes, we will leave the full course of recordings and exercises online, so that you can take it independently whenever you choose.

69.7% are very likely to recommend this course, 30.3% are likely to.

Is it any good?

We think so, but don’t take our word for it. Here’s what the attendees from last year had to say:

“Thank you. I really appreciate having access to this course. I’ve learned a lot.”

“Thank you so much for organising this. I really enjoyed!”

“Thank you; the course is very useful. I´m very happy”

How do I sign up?

You can visit the course pages to see what’s going on without signing up. If you want to attend the webinars live, you will need to sign up (or sign up here from China), but there’s no charge for doing so. You may also wish to join the Facebook group.

We think the Ensembl workshops that we offer are a brilliant way to familiarise yourself, and other people in your research institute, with Ensembl data and tools. Don’t take our word for it, over 99% of the people who attended our workshops in the first six months of 2016 would recommend them to a colleague.

Pie chart of who would recommend Ensembl workshops.

68% of participants say they are “Very Likely” to recommend our workshops, while 31% say they are “Likely” to.

So if you (and your colleagues) want to get training on Ensembl, the best option is to join the over 50 institutes a year who benefit from hosting an Ensembl browser workshop, training over 1000 people. Send us an email to find out more.

We appreciate, however, that that’s not an option for everybody. Although we do not charge fees for workshops, we ask our hosts to pick up the tab for all expenses incurred, such as travel, accommodation and subsistence. While we do our best to alleviate these costs, such as tagging together overseas workshops in a series, you may still not have the spare budget for this or may want to host workshop to a different timetable than we can support.

In this case, if you are an experienced Ensembl-user, you may consider teaching an Ensembl workshop of your own. We’d like to help. We want everybody learning about Ensembl to receive the most extensive and up-to-date training possible, and we believe that the second best way to do this (after having us do it) is with our support.

How can I teach a workshop?

All our workshops are hands-on, usually in a computer teaching room, although we can work with people bringing laptops (provided suitable WiFi and somewhere to charge them through the course). We recommend a similar set-up.

Our general style for a workshop is that we split it into modules. The modules we usually offer are:

  • Introduction to Ensembl and the Region view
  • Genes and transcripts in Ensembl
  • Data export with BioMart
  • Genetic variation data in Ensembl, including annotating your own variants with the VEP.
  • Comparative genomics: homologues and whole genome alignments.
  • The Ensembl Regulatory Build: finding features that regulate genes.
  • Advanced access to Ensembl data and viewing custom data in Ensembl

Within each module, there are three elements:

  1. Presentation, where we introduce what the data or tool is and where it comes from.
  2. Demonstration, where we have a hands-on walkthrough of finding that data or using that tool. We give out printed booklets containing screenshots that take participants through these walkthroughs. This provides a suitable place to make extra notes, and gives participants something to take away and use later. Some participants choose to join in with the walkthroughs, while others just make notes while we go through it on the screen.
  3. Exercises, where participants can practice using Ensembl to find information. These exercises build on what we do in the demonstration. During the exercises, we circulate the room, ready to answer any question that might come up. We also provide answer sheets (usually electronic only) that guide the participants on how to get the answers and what they are.

We find that combining these three elements gives the participants all the information they need, and provides a holistic learning experience that appeals to different kinds of learning style. We think this is why we consistently get such excellent feedback from our course participants.

You can see an example of a course that follows this structure, with the full set of modules, each with three elements, in our webinar course that we held in Spring 2016. You’re free to harvest the presentations (embedded as pdfs), demonstrations and exercises from that course for your own teaching, although under our Creative Commons BY licence, you need to credit us with their creation.

While we would usually have the workshop as one intensive day of learning, the flexibility of being in your home institute might mean that you prefer to have a module a day over a number of days or or one a week.

Where do I get the materials from?

As well as the webinar course I already mentioned, we have a page of walkthroughs and exercises. We use these ourselves in workshop creation, copying and pasting them together to make our courses.

We like to tailor our courses to match our participants’ interests, so will try to use exercises and walkthroughs that feature the species they’re working with. This is why our exercise page has many similar exercises and walkthroughs with different species. We recommend finding suitable exercises and demos to match your group’s interests and skills. You can also copy the process and style of an existing walkthrough or exercise for an example in a new species of interest.

Because we only update these exercises and walkthroughs when we use them, they can get out-of-date. The “Updated” column on the tables shows you for which Ensembl or Ensembl Genomes release they were last updated. If the exercise or walkthrough you’re looking at is not from the current release (check the release news section of this blog to see what’s current – note the different release numbers for Ensembl and Ensembl Genomes), then you might want to check the content to see if it needs editing at all.

If you do any updates, or make any new exercises, we’d love to hear about it. Email us your new material and we’ll add it to the page for other people to use (and maybe steal it for ourselves too).

For presentations (in pdf) and to see how a whole workshop might fit together, all of the Outreach team post their course materials online for use during and after the courses. You can see the materials from on our training pages.

Remember, if you use any of our materials, do credit us with their creation, as they are distributed under a CC BY licence.

What about website downtime?

Website downtime can be a disaster for a workshop, leaving you floundering with no way to teach. However, downtime is also an occasional necessity when we are running such a huge website and database. If you are planning on running a workshop, we recommend you get in touch to ask if there is any planned downtime. You can usually get around downtime by using one of our mirror sites, but we can advise you on this.

Similarly, if we put out a new release in the days between your preparation and workshop delivery, it can make some of your materials out-of-date. This can be a valuable lesson for your participants in how bioinformatic databases can change, or you can run the workshop from the previous archive site instead.

A release on the day of your workshop means both downtime and changes in the data. If we know you’re having a workshop, we’ll make sure that the archive site from the previous release is up and working before we take the main site down, so that you have something to work with, and you can keep using that even once the new site is up.

Need more help?

The Outreach team are here to support you. Just send us an email if you want practical support on how best to run a workshop, if you have any background questions on our data or tools, or indeed with any other questions or problems you might have with Ensembl.

Do you want to learn more about the Ensembl browser? Are you unable to host or attend an in-person Ensembl workshop? Do you still want to learn in real-time with instructors on hand to help you out?

The new Ensembl online training series might be for you.

What is it?

The Ensembl online training series consists of a series of live webinars, once a week over seven weeks. In each webinar you will learn about a specific aspect of Ensembl data or tools – see the online course for details. You will then have access to exercises so that you can practice what you’ve learnt.

You can dip in and out of webinars, taking only those that interest you. If you miss one, we will post the videos to our YouTube channel and embed them in the online course so that you can catch up.

What makes it special is that the course is fully interactive. If you attend the live webinars, you will have an opportunity to ask the instructors questions in real time. Afterwards, while you work on the exercises, you can interact with the instructors and other participants via our dedicated Facebook group. If you prefer not to use Facebook, you can also email us for help. Plus, you’ll be able to re-watch all or part of the videos at your leisure.

When is it?

We start on the 24th March, and will hold seven webinars on Thursday afternoons, up until the 5th May. The live webinars will take place at 4 pm British time (GMT before 27th March, BST after 27th March), but if you are unable to attend live, the videos will be posted shortly afterwards.

After the live course finishes, we will leave the full course of recordings and exercises online, so that you can take it independently whenever you choose.

How do I sign up?

You can visit the course pages to see what’s going on without signing up. If you want to attend the webinars live, you will need to sign up, but there’s no charge for doing so. You may also wish to join the Facebook group.

Ensembl Variation recently incorporated the latest versions of the dbSNP and 1000 Genomes datasets. While we are able to import all of the variant loci from phase 3 of the 1000 Genomes project, the vast amount of genotype data (2500 individuals x 80 million sites = 200 billion data points!!!) meant we had to create a new solution to deliver this data through our API and website.

To this end we have extended the Ensembl Variation API to read genotype data directly from tabix-indexed VCF files. The API then calculates frequency and linkage disequilibrium (LD) data from these genotypes on-the-fly. You can see this in action on a typical population genetics page:
Screen Shot 2015-06-18 at 14.55.53
In order to use this functionality with your local API installation, there’s a couple of extra dependencies to install. You may even have them already!

Tabix

The tabix utility is used for rapid random access into compressed position-based text files. It also allows access to data across HTTP and FTP protocols, downloading only a small index file in the process.

To install it, we clone it from GitHub and run a couple of “make” statements. From here on we assume that you typically install things in your $HOME/src/ directory and that you are using bash or a bash-like terminal.

cd ~/src
git clone git@github.com:samtools/tabix.git
cd tabix
make
cd perl
perl Makefile.PL PREFIX=${HOME}/src/
make && make install

You may need the tabix binary in your path; you can either copy ~/src/tabix/tabix to a directory in your path, or add this to your path:

PATH=${PATH}:${HOME}/src/tabix/
export PATH

If it isn’t already, you should also add the relevant path to your PERL5LIB environment variable; the path in question is shown in the output from the “make && make install” command above.

PERL5LIB=${PERL5LIB}:${HOME}/src/lib/perl/5.14.2/
export PERL5LIB

ensembl-io

The ensembl-io package contains objects and methods for parsing and writing data formats commonly used in bioinformatics. If you installed the API using Git and Ensembl Git tools, chances are you already have the module.

If not, it’s simple to install with git:

cd ~/src
git clone git@github.com:Ensembl/ensembl-io.git
PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-io/modules
export PERL5LIB

Using in the API

That’s it! Now to use this in an API script, there’s a simple flag we have to set on the Variation DBAdaptor object:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');

# Tell API to use VCFs
$variation_adaptor->db->use_vcf(1);

my $variation = $variation_adaptor->fetch_by_name('rs699');
my $alleles = $variation->get_all_Alleles();

foreach my $allele (@{$alleles}) {
  next unless 
    (defined $allele->population) &&
    (defined $allele->frequency);
  my $allele_string = $allele->allele;
  my $frequency = $allele->frequency;
  my $population_name = $allele->population->name;
  printf("Allele %s has frequency %.3g in %s\n", $allele_string, $frequency, $population_name);
}

This script should print out frequency data for a number of populations, including those from 1000 Genomes phase 3:

....
Allele A has frequency 0.121 in 1000GENOMES:phase_3:KHV
Allele G has frequency 0.879 in 1000GENOMES:phase_3:KHV
Allele A has frequency 0.149 in 1000GENOMES:phase_3:JPT
Allele G has frequency 0.851 in 1000GENOMES:phase_3:JPT
Allele A has frequency 0.295 in 1000GENOMES:phase_3:ALL
Allele G has frequency 0.705 in 1000GENOMES:phase_3:ALL

You can use the “->db->use_vcf(1)” stub on any adaptor from the variation adaptor group.

Once set, it will affect fetching objects of the following types:

  • Allele
  • PopulationGenotype
  • IndividividualGenotype
  • LDFeatureContainer

Advanced configuration

The value we pass to use_vcf() also affects the behaviour of the API:

  • 0 : fetch data only from database
  • 1 : fetch data from VCFs and database
  • 2 : fetch data only from VCFs

One final thing; the API is pre-configured to use VCFs hosted on the Ensembl FTP site. It is also possible to use VCFs on your local machine or any arbitrary server. The configuration is found in the ensembl-variation folder:

cat ~/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json
{
 "collections": [
   {
     "id": "1000genomes_phase3",
     "species": "homo_sapiens",
     "assembly": "GRCh37",
     "type": "remote",
     "strict_name_match": 1,
     "filename_template": "ftp://ftp.ensembl.org/pub/grch37/release-79/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.vcf.gz",
     "chromosomes": [
       "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22"
     ],
     "individual_prefix": "1000GENOMES:phase_3:"
   },
   {
     "id": "1000genomes_phase3",
     "species": "homo_sapiens",
     "assembly": "GRCh38",
     "type": "remote",
     "strict_name_match": 1,
     "filename_template": "ftp://ftp.ensembl.org/pub/release-80/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
     "chromosomes": [
       "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12","13", "14", "15", "16", "17", "18", "19", "20", "21", "22"
     ],
     "individual_prefix": "1000GENOMES:phase_3:"
   }
 ]
}

Feel free to edit the filename_template entry in this file. Note there are separate entries for the two currently supported human assemblies, GRCh37 and GRCh38; the relevant entries will be used depending on which port you connect to in your API script (3306 for GRCh38, 3337 for GRCh37).

“###CHR###” is a placeholder that allows the API to read from a set of files distributed as one per chromosome. This is not mandatory, and indeed a single genome-wide VCF file could be used. The only requirement is that the chromosomes contained in the VCF or set of VCFs are listed in the “chromosomes” field of the JSON configuration file.

Any questions, don’t hesitate to get in touch!