We think conferences are great opportunities to use talks and posters to tell people about all the cool stuff we’re developing, provide training with workshops and learn more about what’s going on in our fields of interest. Ensembl team members attend many conferences a year and 2018 is no exception, we’re planning to attend nineteen (so far)!

Continue reading

We’re holding an Ensembl Perl API course at the Genome Campus in the UK in April. The course give you chance to learn how to access the database directly from the people who produce the databases and write the APIs themselves. It is aimed at bioinformaticians and wet-lab scientists who are familiar with Object Oriented Perl.

This four-day course costs only £140, which includes daily transport to the campus from Cambridge city centre and refreshments (the fee is to cover only these expenses).

Please visit the course page for more details on the content and how to apply.

Following the success of last year’s course, we’re pleased to announce a second Free Ensembl Webinar Course.

This course allows you to learn about Ensembl for free from the comfort of your own office (or bed, no-one’s judging you), with the ability to interact live with the instructors. Perfect for those who can’t attend or host one of our live courses.

What is it?

The Ensembl online training series comprises a series of live webinars, once a week over seven weeks. Each webinar explores a specific aspect of Ensembl data or tools with a presentation and a demonstration – see the online course for details. You can then practice what you’ve learnt over the following week with online exercises.

Not all of the topics will be useful to you, so you can dip in and out of the webinars. If life gets in the way and you miss one you are keen on, we will post the videos to our YouTube channel and YouKu for those of you in China and embed them in the online course so that you can catch up.

What makes it special is that the course is fully interactive. If you attend the live webinars, you will have an opportunity to ask the instructors questions in real time. Afterwards, while you work on the exercises, you can interact with the instructors and other participants via our dedicated Facebook group. If you prefer not to use Facebook, you can also email us for help. Plus, you’ll be able to re-watch all or part of the videos at your leisure.

When is it?

We start on the 6th April, and will hold seven webinars on Thursday mornings, up until the 18th May. The live webinars will take place at 9 am BST (GMT+1), but if you are unable to attend live, the videos will be posted shortly afterwards. Since last year’s course was held in the afternoons, good for our American friends, we’re hoping that this morning course will be easier to access for anyone in Asia or Oceania.

After the live course finishes, we will leave the full course of recordings and exercises online, so that you can take it independently whenever you choose.

69.7% are very likely to recommend this course, 30.3% are likely to.

Is it any good?

We think so, but don’t take our word for it. Here’s what the attendees from last year had to say:

“Thank you. I really appreciate having access to this course. I’ve learned a lot.”

“Thank you so much for organising this. I really enjoyed!”

“Thank you; the course is very useful. I´m very happy”

How do I sign up?

You can visit the course pages to see what’s going on without signing up. If you want to attend the webinars live, you will need to sign up (or sign up here from China), but there’s no charge for doing so. You may also wish to join the Facebook group.

We think the Ensembl workshops that we offer are a brilliant way to familiarise yourself, and other people in your research institute, with Ensembl data and tools. Don’t take our word for it, over 99% of the people who attended our workshops in the first six months of 2016 would recommend them to a colleague.

Pie chart of who would recommend Ensembl workshops.

68% of participants say they are “Very Likely” to recommend our workshops, while 31% say they are “Likely” to.

So if you (and your colleagues) want to get training on Ensembl, the best option is to join the over 50 institutes a year who benefit from hosting an Ensembl browser workshop, training over 1000 people. Send us an email to find out more.

We appreciate, however, that that’s not an option for everybody. Although we do not charge fees for workshops, we ask our hosts to pick up the tab for all expenses incurred, such as travel, accommodation and subsistence. While we do our best to alleviate these costs, such as tagging together overseas workshops in a series, you may still not have the spare budget for this or may want to host workshop to a different timetable than we can support.

In this case, if you are an experienced Ensembl-user, you may consider teaching an Ensembl workshop of your own. We’d like to help. We want everybody learning about Ensembl to receive the most extensive and up-to-date training possible, and we believe that the second best way to do this (after having us do it) is with our support.

How can I teach a workshop?

All our workshops are hands-on, usually in a computer teaching room, although we can work with people bringing laptops (provided suitable WiFi and somewhere to charge them through the course). We recommend a similar set-up.

Our general style for a workshop is that we split it into modules. The modules we usually offer are:

  • Introduction to Ensembl and the Region view
  • Genes and transcripts in Ensembl
  • Data export with BioMart
  • Genetic variation data in Ensembl, including annotating your own variants with the VEP.
  • Comparative genomics: homologues and whole genome alignments.
  • The Ensembl Regulatory Build: finding features that regulate genes.
  • Advanced access to Ensembl data and viewing custom data in Ensembl

Within each module, there are three elements:

  1. Presentation, where we introduce what the data or tool is and where it comes from.
  2. Demonstration, where we have a hands-on walkthrough of finding that data or using that tool. We give out printed booklets containing screenshots that take participants through these walkthroughs. This provides a suitable place to make extra notes, and gives participants something to take away and use later. Some participants choose to join in with the walkthroughs, while others just make notes while we go through it on the screen.
  3. Exercises, where participants can practice using Ensembl to find information. These exercises build on what we do in the demonstration. During the exercises, we circulate the room, ready to answer any question that might come up. We also provide answer sheets (usually electronic only) that guide the participants on how to get the answers and what they are.

We find that combining these three elements gives the participants all the information they need, and provides a holistic learning experience that appeals to different kinds of learning style. We think this is why we consistently get such excellent feedback from our course participants.

You can see an example of a course that follows this structure, with the full set of modules, each with three elements, in our webinar course that we held in Spring 2016. You’re free to harvest the presentations (embedded as pdfs), demonstrations and exercises from that course for your own teaching, although under our Creative Commons BY licence, you need to credit us with their creation.

While we would usually have the workshop as one intensive day of learning, the flexibility of being in your home institute might mean that you prefer to have a module a day over a number of days or or one a week.

Where do I get the materials from?

As well as the webinar course I already mentioned, we have a page of walkthroughs and exercises. We use these ourselves in workshop creation, copying and pasting them together to make our courses.

We like to tailor our courses to match our participants’ interests, so will try to use exercises and walkthroughs that feature the species they’re working with. This is why our exercise page has many similar exercises and walkthroughs with different species. We recommend finding suitable exercises and demos to match your group’s interests and skills. You can also copy the process and style of an existing walkthrough or exercise for an example in a new species of interest.

Because we only update these exercises and walkthroughs when we use them, they can get out-of-date. The “Updated” column on the tables shows you for which Ensembl or Ensembl Genomes release they were last updated. If the exercise or walkthrough you’re looking at is not from the current release (check the release news section of this blog to see what’s current – note the different release numbers for Ensembl and Ensembl Genomes), then you might want to check the content to see if it needs editing at all.

If you do any updates, or make any new exercises, we’d love to hear about it. Email us your new material and we’ll add it to the page for other people to use (and maybe steal it for ourselves too).

For presentations (in pdf) and to see how a whole workshop might fit together, all of the Outreach team post their course materials online for use during and after the courses. You can see the materials from on our training pages.

Remember, if you use any of our materials, do credit us with their creation, as they are distributed under a CC BY licence.

What about website downtime?

Website downtime can be a disaster for a workshop, leaving you floundering with no way to teach. However, downtime is also an occasional necessity when we are running such a huge website and database. If you are planning on running a workshop, we recommend you get in touch to ask if there is any planned downtime. You can usually get around downtime by using one of our mirror sites, but we can advise you on this.

Similarly, if we put out a new release in the days between your preparation and workshop delivery, it can make some of your materials out-of-date. This can be a valuable lesson for your participants in how bioinformatic databases can change, or you can run the workshop from the previous archive site instead.

A release on the day of your workshop means both downtime and changes in the data. If we know you’re having a workshop, we’ll make sure that the archive site from the previous release is up and working before we take the main site down, so that you have something to work with, and you can keep using that even once the new site is up.

Need more help?

The Outreach team are here to support you. Just send us an email if you want practical support on how best to run a workshop, if you have any background questions on our data or tools, or indeed with any other questions or problems you might have with Ensembl.

Do you want to learn more about the Ensembl browser? Are you unable to host or attend an in-person Ensembl workshop? Do you still want to learn in real-time with instructors on hand to help you out?

The new Ensembl online training series might be for you.

What is it?

The Ensembl online training series consists of a series of live webinars, once a week over seven weeks. In each webinar you will learn about a specific aspect of Ensembl data or tools – see the online course for details. You will then have access to exercises so that you can practice what you’ve learnt.

You can dip in and out of webinars, taking only those that interest you. If you miss one, we will post the videos to our YouTube channel and embed them in the online course so that you can catch up.

What makes it special is that the course is fully interactive. If you attend the live webinars, you will have an opportunity to ask the instructors questions in real time. Afterwards, while you work on the exercises, you can interact with the instructors and other participants via our dedicated Facebook group. If you prefer not to use Facebook, you can also email us for help. Plus, you’ll be able to re-watch all or part of the videos at your leisure.

When is it?

We start on the 24th March, and will hold seven webinars on Thursday afternoons, up until the 5th May. The live webinars will take place at 4 pm British time (GMT before 27th March, BST after 27th March), but if you are unable to attend live, the videos will be posted shortly afterwards.

After the live course finishes, we will leave the full course of recordings and exercises online, so that you can take it independently whenever you choose.

How do I sign up?

You can visit the course pages to see what’s going on without signing up. If you want to attend the webinars live, you will need to sign up, but there’s no charge for doing so. You may also wish to join the Facebook group.

Ensembl Variation recently incorporated the latest versions of the dbSNP and 1000 Genomes datasets. While we are able to import all of the variant loci from phase 3 of the 1000 Genomes project, the vast amount of genotype data (2500 individuals x 80 million sites = 200 billion data points!!!) meant we had to create a new solution to deliver this data through our API and website.

To this end we have extended the Ensembl Variation API to read genotype data directly from tabix-indexed VCF files. The API then calculates frequency and linkage disequilibrium (LD) data from these genotypes on-the-fly. You can see this in action on a typical population genetics page:
Screen Shot 2015-06-18 at 14.55.53
In order to use this functionality with your local API installation, there’s a couple of extra dependencies to install. You may even have them already!

Tabix

The tabix utility is used for rapid random access into compressed position-based text files. It also allows access to data across HTTP and FTP protocols, downloading only a small index file in the process.

To install it, we clone it from GitHub and run a couple of “make” statements. From here on we assume that you typically install things in your $HOME/src/ directory and that you are using bash or a bash-like terminal.

cd ~/src
git clone git@github.com:samtools/tabix.git
cd tabix
make
cd perl
perl Makefile.PL PREFIX=${HOME}/src/
make && make install

You may need the tabix binary in your path; you can either copy ~/src/tabix/tabix to a directory in your path, or add this to your path:

PATH=${PATH}:${HOME}/src/tabix/
export PATH

If it isn’t already, you should also add the relevant path to your PERL5LIB environment variable; the path in question is shown in the output from the “make && make install” command above.

PERL5LIB=${PERL5LIB}:${HOME}/src/lib/perl/5.14.2/
export PERL5LIB

ensembl-io

The ensembl-io package contains objects and methods for parsing and writing data formats commonly used in bioinformatics. If you installed the API using Git and Ensembl Git tools, chances are you already have the module.

If not, it’s simple to install with git:

cd ~/src
git clone git@github.com:Ensembl/ensembl-io.git
PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-io/modules
export PERL5LIB

Using in the API

That’s it! Now to use this in an API script, there’s a simple flag we have to set on the Variation DBAdaptor object:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');

# Tell API to use VCFs
$variation_adaptor->db->use_vcf(1);

my $variation = $variation_adaptor->fetch_by_name('rs699');
my $alleles = $variation->get_all_Alleles();

foreach my $allele (@{$alleles}) {
  next unless 
    (defined $allele->population) &&
    (defined $allele->frequency);
  my $allele_string = $allele->allele;
  my $frequency = $allele->frequency;
  my $population_name = $allele->population->name;
  printf("Allele %s has frequency %.3g in %s\n", $allele_string, $frequency, $population_name);
}

This script should print out frequency data for a number of populations, including those from 1000 Genomes phase 3:

....
Allele A has frequency 0.121 in 1000GENOMES:phase_3:KHV
Allele G has frequency 0.879 in 1000GENOMES:phase_3:KHV
Allele A has frequency 0.149 in 1000GENOMES:phase_3:JPT
Allele G has frequency 0.851 in 1000GENOMES:phase_3:JPT
Allele A has frequency 0.295 in 1000GENOMES:phase_3:ALL
Allele G has frequency 0.705 in 1000GENOMES:phase_3:ALL

You can use the “->db->use_vcf(1)” stub on any adaptor from the variation adaptor group.

Once set, it will affect fetching objects of the following types:

  • Allele
  • PopulationGenotype
  • IndividividualGenotype
  • LDFeatureContainer

Advanced configuration

The value we pass to use_vcf() also affects the behaviour of the API:

  • 0 : fetch data only from database
  • 1 : fetch data from VCFs and database
  • 2 : fetch data only from VCFs

One final thing; the API is pre-configured to use VCFs hosted on the Ensembl FTP site. It is also possible to use VCFs on your local machine or any arbitrary server. The configuration is found in the ensembl-variation folder:

cat ~/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json
{
 "collections": [
   {
     "id": "1000genomes_phase3",
     "species": "homo_sapiens",
     "assembly": "GRCh37",
     "type": "remote",
     "strict_name_match": 1,
     "filename_template": "ftp://ftp.ensembl.org/pub/grch37/release-79/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.vcf.gz",
     "chromosomes": [
       "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22"
     ],
     "individual_prefix": "1000GENOMES:phase_3:"
   },
   {
     "id": "1000genomes_phase3",
     "species": "homo_sapiens",
     "assembly": "GRCh38",
     "type": "remote",
     "strict_name_match": 1,
     "filename_template": "ftp://ftp.ensembl.org/pub/release-80/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
     "chromosomes": [
       "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12","13", "14", "15", "16", "17", "18", "19", "20", "21", "22"
     ],
     "individual_prefix": "1000GENOMES:phase_3:"
   }
 ]
}

Feel free to edit the filename_template entry in this file. Note there are separate entries for the two currently supported human assemblies, GRCh37 and GRCh38; the relevant entries will be used depending on which port you connect to in your API script (3306 for GRCh38, 3337 for GRCh37).

“###CHR###” is a placeholder that allows the API to read from a set of files distributed as one per chromosome. This is not mandatory, and indeed a single genome-wide VCF file could be used. The only requirement is that the chromosomes contained in the VCF or set of VCFs are listed in the “chromosomes” field of the JSON configuration file.

Any questions, don’t hesitate to get in touch!

Are you a rat person, i.e. do you work on rat?
Are you joining the 9th Rat Genomics and Models conference in December?
Could you spare another day after the meeting before heading back home?

If so, this post is for you!

Ensembl is extremely pleased to announce that for the first time ever we will be running a workshop specifically targeted at the rat community! The timing could not be more perfect as we have just released the first set of golden genes in rat, i.e. the merge between the Ensembl automatic and the Havana manual annotation.

Slide1

The rat genome and golden genes in Ensembl.

The ‘Ensembl workshop: browser and tools for accessing the Rat genome’ will consist of talks by different members of the Ensembl team, live demos and hands-on exercises.

Registration is free on a first come, first served basis by filling out this form.

The only pre-requisites are a general knowledge of molecular biology and genomics, in addition to familiarity with web-based genome browsers.

The detailed program is depicted below:

  • Day I 04/12/14 (14:00-18:00)

Ensembl Project: Introduction
Ensembl Browser: Live demo
Ensembl Tools: BLAST/BLAT, BioMart

  • Day II 05/12/14 (09:30-13:30)

Ensembl Genebuild: Annotating rat genes
Ensembl Variation: Sequence variants in the rat genome
Ensembl Tools: VEP, REST
Workshop wrap up and feedback

Please note that the attendees of the 9th Rat genomics and models conference will be prioritised for this workshop. If there are still spaces available we will open attendance to a wider audience. The maximum number of participants is 30.

The workshop will take place in the beautiful grounds of Wellcome Trust Genome Campus in Hinxton.

428110_10150677227503745_795188569_n

The Wellcome Trust Genome Campus on a snowy day in winter.

 

If you are working on large sets of genomic data or carrying out detailed and complex bioinformatic analyses, keep on reading.

Do any of the following thoughts ring a bell for you?

  1. I’d love to fetch protein coding genes from my species of interest.
  2. It’d be great to be able to get orthologous of the genes I’m working on.
  3. I want to find out if my sequence variants fall in regulatory regions and I want to know it now!

If so, the Ensembl Perl APIs are the the way to go!

We can teach API workshops at your institution

We offer Perl API workshops on a regular basis. Our last off-site course was at the Roslin Institute in Edinburgh. We had a whopping 26 attendees. Four members of our Ensembl team, namely Magali Ruffier, Laurent Gil, Thomas Juettemann, and Stephen Fitzgerald delivered the modules on the Core, Variation, Regulation and Comparative Genomic aspects of the Ensembl database. Have a look at some of the feedback we had:

  • ‘Skills from the workshops have opened up my options for accessing Ensembl data which will allow me to more efficiently cross compare information’
  • ‘I will be retrieving specific data more efficiently now’
  • ‘It is quite easy to retrieve the whole set of exons from the genome with several lines of Perl script’
  • ‘The regulatory features can be easily fetched by chromosomal location and that helps me looking at over-expressed regions in my RNA-Seq experiments’
859202_10100811342969591_1645735500_o

Thomas Juettemann from the Ensembl Regulation team and his happy crowd!

How can you host an API workshop at your institution? Just get in touch.  We request that travel, accommodation and subsistence costs of the instructor(s) are reimbursed by our hosts.

API workshop in Cambridge, UK

If you are in or around the UK at the end of this year, you may want to sign up for our next API course at the University of Cambridge. It’ll take place on December 2nd-5th and places are still available. For more information and registration please have a look at the course description.

If these dates are no good, don’t despair. We have got a couple of API courses already lined up for 2015. Check our calendar to see where we are going next.

More information on our APIs

The Ensembl project provides a comprehensive set of APIs (Application Programme Interfaces) that allows our users to access genome wide information rather efficiently and quickly. Our APIs are of two types: Perl and REST.

Find more about the Ensembl Perl APIs on our help and documentation page and watch our filmed course. For tips on how to install the API via GIT and FTP, have a look at our youtube video.

The Genome Reference Consortium (GRC) is a collaboration between the EMBL-EBI, NCBI, Wellcome Trust Sanger Institute, and Genome Institute at Washington University. They are responsible for maintaining the human, mouse and zebrafish reference genome assemblies that you can see in Ensembl, including updating to new assemblies such as the new human assembly GRCh38. They have also been developing methods that allow for the representation of different sequence paths for loci where allelic diversity is needed (PLoS Biol. 2011 Jul:9(7):e1001091).

The GRC would like to invite you to a highly technical workshop, which is planned for the morning of Sunday 21st September. The workshop will be chaired by the Wellcome Trust Sanger Institute’s Richard Durbin and Deanna Church from Personalis. Members of the GRC will present and discuss a range of topics including:

  • Alignment/Mapping tools for using the full assembly: distinguishing allelic duplication from paralogous duplication.
  • Representing alignment data in BAM files.
  • Variant calling.
  • Representing variant calls in VCF (or other formats).
  • Reporting results to users in biological friendly ways.
  • Relationship to parallel interests in the Global Alliance for Genomics and Health (GA4GH) Data Working Group.

The GRC workshop is open to everybody, not just Genome Informatics conference attendees. The workshop is free to attend, but there are limited places so please register if you’d like to come along.

Other events

The 14th Genome Informatics conference will be held at Churchill College, Cambridge, UK, and Ensembl will be there. In addition to the Genome Reference Consortium workshop, we will also be at:

  • The Livestock Genomics meeting (18 – 20 September)
  • Workshop introducing Ensembl’s automatic gene annotation system (19 – 20 September)

Do you want to annotate genes and transcripts of your favourite genome?
Will you be in Cambridge (UK) for the Genome Informatics 2014 meeting?
Have you worked with the Unix command line?

ebang-60If your answer is yes to any of the above, you may want to attend our ‘Introduction to Ensembl automatic gene annotation’ workshop on 19-20th of September 2014. Registration is free, but participants need to cover their own accommodation, sustenance and transport expenses.

THIS COURSE IS NOW FULL. Registration is closed.

The workshop

Dan and Fergal from the Ensembl Genebuild team will show how to create your own core database for genome annotation, load a genome assembly and run some of the analyses using the Ensembl genebuild system.

Pre-requisites

Unix (or Linux) knowledge is mandatory. Participants are also expected to have some knowledge of relational databases (e.g. MySQL) and object-oriented programming (the Ensembl API uses Perl).

Topics

  • Introduction to the Ensembl genebuild system, including data input types, how to generate protein-coding transcript models, and add UTR to these models
  • Introduction to assembly structure (toplevel, contigs, scaffolds,  chromosomes)
  • Core database schema
  • Tracking jobs in the system
  • Runnable and RunnableDB modules

Practical sessions

  • Creating a genebuild database
  • Loading an assembly into the database
  • Running algorithms first on the commandline and then using the  pipeline
  • Understanding how the pipeline code interacts with the algorithms and the database
  • Understanding the pipeline’s job tracking system
  • Visualisation of results with Apollo

Slide1

Genomics Informatics 2014

Our Ensembl Gene Annotation workshop will precede this year’s Genome Informatics conference taking place in Cambridge (UK) on 21-24th September.

Screen shot 2014-04-09 at 12.59.20

Please click here for more details on Genome Informatics 2014, including deadlines and programme.