Do you want to learn more about the Ensembl browser? Are you unable to host or attend an in-person Ensembl workshop? Do you still want to learn in real-time with instructors on hand to help you out?

The new Ensembl online training series might be for you.

What is it?

The Ensembl online training series consists of a series of live webinars, once a week over seven weeks. In each webinar you will learn about a specific aspect of Ensembl data or tools – see the online course for details. You will then have access to exercises so that you can practice what you’ve learnt.

You can dip in and out of webinars, taking only those that interest you. If you miss one, we will post the videos to our YouTube channel and embed them in the online course so that you can catch up.

What makes it special is that the course is fully interactive. If you attend the live webinars, you will have an opportunity to ask the instructors questions in real time. Afterwards, while you work on the exercises, you can interact with the instructors and other participants via our dedicated Facebook group. If you prefer not to use Facebook, you can also email us for help. Plus, you’ll be able to re-watch all or part of the videos at your leisure.

When is it?

We start on the 24th March, and will hold seven webinars on Thursday afternoons, up until the 5th May. The live webinars will take place at 4 pm British time (GMT before 27th March, BST after 27th March), but if you are unable to attend live, the videos will be posted shortly afterwards.

After the live course finishes, we will leave the full course of recordings and exercises online, so that you can take it independently whenever you choose.

How do I sign up?

You can visit the course pages to see what’s going on without signing up. If you want to attend the webinars live, you will need to sign up, but there’s no charge for doing so. You may also wish to join the Facebook group.

Ensembl Variation recently incorporated the latest versions of the dbSNP and 1000 Genomes datasets. While we are able to import all of the variant loci from phase 3 of the 1000 Genomes project, the vast amount of genotype data (2500 individuals x 80 million sites = 200 billion data points!!!) meant we had to create a new solution to deliver this data through our API and website.

To this end we have extended the Ensembl Variation API to read genotype data directly from tabix-indexed VCF files. The API then calculates frequency and linkage disequilibrium (LD) data from these genotypes on-the-fly. You can see this in action on a typical population genetics page:
Screen Shot 2015-06-18 at 14.55.53
In order to use this functionality with your local API installation, there’s a couple of extra dependencies to install. You may even have them already!

Tabix

The tabix utility is used for rapid random access into compressed position-based text files. It also allows access to data across HTTP and FTP protocols, downloading only a small index file in the process.

To install it, we clone it from GitHub and run a couple of “make” statements. From here on we assume that you typically install things in your $HOME/src/ directory and that you are using bash or a bash-like terminal.

cd ~/src
git clone git@github.com:samtools/tabix.git
cd tabix
make
cd perl
perl Makefile.PL PREFIX=${HOME}/src/
make && make install

You may need the tabix binary in your path; you can either copy ~/src/tabix/tabix to a directory in your path, or add this to your path:

PATH=${PATH}:${HOME}/src/tabix/
export PATH

If it isn’t already, you should also add the relevant path to your PERL5LIB environment variable; the path in question is shown in the output from the “make && make install” command above.

PERL5LIB=${PERL5LIB}:${HOME}/src/lib/perl/5.14.2/
export PERL5LIB

ensembl-io

The ensembl-io package contains objects and methods for parsing and writing data formats commonly used in bioinformatics. If you installed the API using Git and Ensembl Git tools, chances are you already have the module.

If not, it’s simple to install with git:

cd ~/src
git clone git@github.com:Ensembl/ensembl-io.git
PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-io/modules
export PERL5LIB

Using in the API

That’s it! Now to use this in an API script, there’s a simple flag we have to set on the Variation DBAdaptor object:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
  -host => 'ensembldb.ensembl.org',
  -user => 'anonymous'
);

my $variation_adaptor = $registry->get_adaptor('homo_sapiens', 'variation', 'variation');

# Tell API to use VCFs
$variation_adaptor->db->use_vcf(1);

my $variation = $variation_adaptor->fetch_by_name('rs699');
my $alleles = $variation->get_all_Alleles();

foreach my $allele (@{$alleles}) {
  next unless 
    (defined $allele->population) &&
    (defined $allele->frequency);
  my $allele_string = $allele->allele;
  my $frequency = $allele->frequency;
  my $population_name = $allele->population->name;
  printf("Allele %s has frequency %.3g in %s\n", $allele_string, $frequency, $population_name);
}

This script should print out frequency data for a number of populations, including those from 1000 Genomes phase 3:

....
Allele A has frequency 0.121 in 1000GENOMES:phase_3:KHV
Allele G has frequency 0.879 in 1000GENOMES:phase_3:KHV
Allele A has frequency 0.149 in 1000GENOMES:phase_3:JPT
Allele G has frequency 0.851 in 1000GENOMES:phase_3:JPT
Allele A has frequency 0.295 in 1000GENOMES:phase_3:ALL
Allele G has frequency 0.705 in 1000GENOMES:phase_3:ALL

You can use the “->db->use_vcf(1)” stub on any adaptor from the variation adaptor group.

Once set, it will affect fetching objects of the following types:

  • Allele
  • PopulationGenotype
  • IndividividualGenotype
  • LDFeatureContainer

Advanced configuration

The value we pass to use_vcf() also affects the behaviour of the API:

  • 0 : fetch data only from database
  • 1 : fetch data from VCFs and database
  • 2 : fetch data only from VCFs

One final thing; the API is pre-configured to use VCFs hosted on the Ensembl FTP site. It is also possible to use VCFs on your local machine or any arbitrary server. The configuration is found in the ensembl-variation folder:

cat ~/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json
{
 "collections": [
   {
     "id": "1000genomes_phase3",
     "species": "homo_sapiens",
     "assembly": "GRCh37",
     "type": "remote",
     "strict_name_match": 1,
     "filename_template": "ftp://ftp.ensembl.org/pub/grch37/release-79/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.vcf.gz",
     "chromosomes": [
       "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22"
     ],
     "individual_prefix": "1000GENOMES:phase_3:"
   },
   {
     "id": "1000genomes_phase3",
     "species": "homo_sapiens",
     "assembly": "GRCh38",
     "type": "remote",
     "strict_name_match": 1,
     "filename_template": "ftp://ftp.ensembl.org/pub/release-80/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
     "chromosomes": [
       "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12","13", "14", "15", "16", "17", "18", "19", "20", "21", "22"
     ],
     "individual_prefix": "1000GENOMES:phase_3:"
   }
 ]
}

Feel free to edit the filename_template entry in this file. Note there are separate entries for the two currently supported human assemblies, GRCh37 and GRCh38; the relevant entries will be used depending on which port you connect to in your API script (3306 for GRCh38, 3337 for GRCh37).

“###CHR###” is a placeholder that allows the API to read from a set of files distributed as one per chromosome. This is not mandatory, and indeed a single genome-wide VCF file could be used. The only requirement is that the chromosomes contained in the VCF or set of VCFs are listed in the “chromosomes” field of the JSON configuration file.

Any questions, don’t hesitate to get in touch!

Are you a rat person, i.e. do you work on rat?
Are you joining the 9th Rat Genomics and Models conference in December?
Could you spare another day after the meeting before heading back home?

If so, this post is for you!

Ensembl is extremely pleased to announce that for the first time ever we will be running a workshop specifically targeted at the rat community! The timing could not be more perfect as we have just released the first set of golden genes in rat, i.e. the merge between the Ensembl automatic and the Havana manual annotation.

Slide1

The rat genome and golden genes in Ensembl.

The ‘Ensembl workshop: browser and tools for accessing the Rat genome’ will consist of talks by different members of the Ensembl team, live demos and hands-on exercises.

Registration is free on a first come, first served basis by filling out this form.

The only pre-requisites are a general knowledge of molecular biology and genomics, in addition to familiarity with web-based genome browsers.

The detailed program is depicted below:

  • Day I 04/12/14 (14:00-18:00)

Ensembl Project: Introduction
Ensembl Browser: Live demo
Ensembl Tools: BLAST/BLAT, BioMart

  • Day II 05/12/14 (09:30-13:30)

Ensembl Genebuild: Annotating rat genes
Ensembl Variation: Sequence variants in the rat genome
Ensembl Tools: VEP, REST
Workshop wrap up and feedback

Please note that the attendees of the 9th Rat genomics and models conference will be prioritised for this workshop. If there are still spaces available we will open attendance to a wider audience. The maximum number of participants is 30.

The workshop will take place in the beautiful grounds of Wellcome Trust Genome Campus in Hinxton.

428110_10150677227503745_795188569_n

The Wellcome Trust Genome Campus on a snowy day in winter.

 

If you are working on large sets of genomic data or carrying out detailed and complex bioinformatic analyses, keep on reading.

Do any of the following thoughts ring a bell for you?

  1. I’d love to fetch protein coding genes from my species of interest.
  2. It’d be great to be able to get orthologous of the genes I’m working on.
  3. I want to find out if my sequence variants fall in regulatory regions and I want to know it now!

If so, the Ensembl Perl APIs are the the way to go!

We can teach API workshops at your institution

We offer Perl API workshops on a regular basis. Our last off-site course was at the Roslin Institute in Edinburgh. We had a whopping 26 attendees. Four members of our Ensembl team, namely Magali Ruffier, Laurent Gil, Thomas Juettemann, and Stephen Fitzgerald delivered the modules on the Core, Variation, Regulation and Comparative Genomic aspects of the Ensembl database. Have a look at some of the feedback we had:

  • ‘Skills from the workshops have opened up my options for accessing Ensembl data which will allow me to more efficiently cross compare information’
  • ‘I will be retrieving specific data more efficiently now’
  • ‘It is quite easy to retrieve the whole set of exons from the genome with several lines of Perl script’
  • ‘The regulatory features can be easily fetched by chromosomal location and that helps me looking at over-expressed regions in my RNA-Seq experiments’
859202_10100811342969591_1645735500_o

Thomas Juettemann from the Ensembl Regulation team and his happy crowd!

How can you host an API workshop at your institution? Just get in touch.  We request that travel, accommodation and subsistence costs of the instructor(s) are reimbursed by our hosts.

API workshop in Cambridge, UK

If you are in or around the UK at the end of this year, you may want to sign up for our next API course at the University of Cambridge. It’ll take place on December 2nd-5th and places are still available. For more information and registration please have a look at the course description.

If these dates are no good, don’t despair. We have got a couple of API courses already lined up for 2015. Check our calendar to see where we are going next.

More information on our APIs

The Ensembl project provides a comprehensive set of APIs (Application Programme Interfaces) that allows our users to access genome wide information rather efficiently and quickly. Our APIs are of two types: Perl and REST.

Find more about the Ensembl Perl APIs on our help and documentation page and watch our filmed course. For tips on how to install the API via GIT and FTP, have a look at our youtube video.

The Genome Reference Consortium (GRC) is a collaboration between the EMBL-EBI, NCBI, Wellcome Trust Sanger Institute, and Genome Institute at Washington University. They are responsible for maintaining the human, mouse and zebrafish reference genome assemblies that you can see in Ensembl, including updating to new assemblies such as the new human assembly GRCh38. They have also been developing methods that allow for the representation of different sequence paths for loci where allelic diversity is needed (PLoS Biol. 2011 Jul:9(7):e1001091).

The GRC would like to invite you to a highly technical workshop, which is planned for the morning of Sunday 21st September. The workshop will be chaired by the Wellcome Trust Sanger Institute’s Richard Durbin and Deanna Church from Personalis. Members of the GRC will present and discuss a range of topics including:

  • Alignment/Mapping tools for using the full assembly: distinguishing allelic duplication from paralogous duplication.
  • Representing alignment data in BAM files.
  • Variant calling.
  • Representing variant calls in VCF (or other formats).
  • Reporting results to users in biological friendly ways.
  • Relationship to parallel interests in the Global Alliance for Genomics and Health (GA4GH) Data Working Group.

The GRC workshop is open to everybody, not just Genome Informatics conference attendees. The workshop is free to attend, but there are limited places so please register if you’d like to come along.

Other events

The 14th Genome Informatics conference will be held at Churchill College, Cambridge, UK, and Ensembl will be there. In addition to the Genome Reference Consortium workshop, we will also be at:

  • The Livestock Genomics meeting (18 – 20 September)
  • Workshop introducing Ensembl’s automatic gene annotation system (19 – 20 September)

Do you want to annotate genes and transcripts of your favourite genome?
Will you be in Cambridge (UK) for the Genome Informatics 2014 meeting?
Have you worked with the Unix command line?

ebang-60If your answer is yes to any of the above, you may want to attend our ‘Introduction to Ensembl automatic gene annotation’ workshop on 19-20th of September 2014. Registration is free, but participants need to cover their own accommodation, sustenance and transport expenses.

THIS COURSE IS NOW FULL. Registration is closed.

The workshop

Dan and Fergal from the Ensembl Genebuild team will show how to create your own core database for genome annotation, load a genome assembly and run some of the analyses using the Ensembl genebuild system.

Pre-requisites

Unix (or Linux) knowledge is mandatory. Participants are also expected to have some knowledge of relational databases (e.g. MySQL) and object-oriented programming (the Ensembl API uses Perl).

Topics

  • Introduction to the Ensembl genebuild system, including data input types, how to generate protein-coding transcript models, and add UTR to these models
  • Introduction to assembly structure (toplevel, contigs, scaffolds,  chromosomes)
  • Core database schema
  • Tracking jobs in the system
  • Runnable and RunnableDB modules

Practical sessions

  • Creating a genebuild database
  • Loading an assembly into the database
  • Running algorithms first on the commandline and then using the  pipeline
  • Understanding how the pipeline code interacts with the algorithms and the database
  • Understanding the pipeline’s job tracking system
  • Visualisation of results with Apollo

Slide1

Genomics Informatics 2014

Our Ensembl Gene Annotation workshop will precede this year’s Genome Informatics conference taking place in Cambridge (UK) on 21-24th September.

Screen shot 2014-04-09 at 12.59.20

Please click here for more details on Genome Informatics 2014, including deadlines and programme.

The Cold Spring Harbor Laboratory will be hosting a winter conference on Avian Model Systems in March this year, and the abstract deadline is fast approaching.

Prior to the meeting, the EMBL-EBI and the WTSI will run a two-day workshop on Avian Genomics with a focus on analyses of NGS data, such as RNA-Seq, ChIP-Seq, and on Ensembl Genome Browsing.

In the current version of Ensembl (release 74, December 2013), we provide detailed annotations of genes, transcripts and proteins for five birds, namely chicken, duck, zebra finch, flycatcher and turkey. On our Pre Ensembl, we also display the preliminary analysis of the budgerigar genome.

blog_birds

Our gene annotation in Ensembl is built based on biological evidence that has been experimentally validated, such as mRNA, ESTs and proteins. For two out of the five birds listed above (i.e. chicken and flycatcher), we also used RNA-Seq data for the annotation of their genomes.

During this Ensembl Browser Workshop, we will be navigating the Ensembl browser to cover gene annotation, variation and comparative genomics data, and we will also introduce some of our genomic tools, such as BioMart and the VEP.

The deadline for abstract submission to the Meeting and the Pre-Meeting Genomic Workshop is January 24th.

If you want to attend this workshop, please contact Val Pakaluk.

 

We’re pleased to announce the launch of our new online API course.

Take it for free on EBI’s Train Online platform.

This course provides an introduction to the Ensembl API and how to use it to explore Ensembl gene, sequence, variation, regulation and comparative genomics data, and was filmed over a three-day course at the EMBL-EBI. You take the course from start to finish, or you can dip in and out of your favourite module. The course is complete with video lectures and exercises with full solutions, including sample scripts and commentary.

Many thanks to all the instructors who were filmed for this course: Magali Ruffier, Anja Thormann, Nathan Johnson, Matthieu Muffato and Stephen Fitzgerald. The course was compiled together by Ensembl’s Emily Pritchard, working with Mark Adams from EBI’s Outreach and Training who filmed, edited and processed the videos and slides.

If you like the course, you might want to host a real-life course at your institute.

Any questions? Contact Ensembl helpdesk.

Ensembl is holding a workshop titled, ‘Introduction to automatic gene annotation’ aimed at developers. The workshop runs on 29-30th of October 2013 at Cold Spring Harbor Laboratory, New York.

Registration for this workshop is free, but participants will need to cover their own accommodation and meal expenses. Please contact Bert (bert@ebi.ac.uk) for more details or to register.

Two Ensembl developers will present sessions on how to create your own core database, including the loading of a genome assembly into a database and the running of simple analyses using the Ensembl genebuild system.

Participants will be expected to have experience in programming and a background in object-oriented programming. A good familiarity with Perl, a Unix/Linux environment, and MySQL are essential to follow the workshop and the programming examples. Knowledge of the Ensembl core API is also essential.

Topics to be presented:

  • Introduction to the Ensembl genebuild system, including data input types, generating protein-coding transcript models, and adding UTR to these models
  • An introduction to assembly structure (toplevel, contigs, scaffolds,  chromosomes)
  • Overview of the Ensembl Analysis and Pipeline APIs
  • Obtaining the Ensembl API (cvs checkout)
  • Core database schema
  • Tracking jobs in the system
  • Runnable and RunnableDB modules

Practical sessions:

  • Creating a genebuild database
  • Loading an assembly into the database
  • Running algorithms first on the commandline and then using the  pipeline
  • Understanding how the pipeline code interacts with the algorithms and the database
  • Understanding the pipeline’s job tracking system
  • Visualisation of results with Apollo

Would you like to join us? Please contact Bert (bert@ebi.ac.uk) for more details or to register.

Related Cold Spring Harbor Conference:
Genome Informatics 2013, 30 October to 2nd November, Cold Spring Harbor, New York. Please click here for full details.

download_apiIn May the Ensembl team will again provide a 3-day Ensembl Perl API workshop on the Wellcome Trust Genome Campus in Hinxton, United Kingdom. Although this workshop is primarily meant for campus employees, external participants are also more than welcome to attend.

The workshop itself is free of cost. You should note though that our campus is located a bit in the middle of nowhere and that you have to make your own arrangements for accommodation and/or travel. A similar workshop will be given again end of the year at the University of Cambridge (27-29 November 2013).

For more information and to register please mail me (bert@ebi.ac.uk).

3-DAY ENSEMBL API WORKSHOP
Time: 22-24 May 2013, 9:30-17:00
Place: Teaching room, EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
Instructors: Magali Ruffier, Thomas Juettemann, Anja Thormann, Matthieu Muffato, Javier Herrero
Cost: none

The Ensembl project provides a comprehensive and integrated source of annotation of mainly vertebrate genome sequences. This 3-day workshop is aimed at researchers and developers interested in exploring Ensembl beyond the website. The workshop covers the core, compara, variation and functional genomics (regulation) databases and APIs. For each of these the database schema and the API design as well as its most important objects and their methods will be presented. This will be followed by practical sessions in which the participants can put the learned into practice by writing their own Perl scripts.

Important
This workshop is NOT intended to teach you either Perl or basic molecular biological and genetic concepts! To be able to attend you should be able to code in Perl and be familiar with basic molecular biology and genetics. A basic knowledge of Ensembl is advantageous.