BiomaRt or how to access the Ensembl data from R

BiomaRt is a Bioconductor package that make accessing and retrieving Ensembl data from the R software very easy. The recent Bioconductor 3.1 release includes a new version of BiomaRt packed with many new Ensembl friendly functions allowing you to connect and retrieve data from the Ensembl marts in record time.

To celebrate the new Bioconductor release, we’ve just launched a brand new mart documentation page. This new documentation covers the BioMaRt package but also how to combine species dataBioMart RESTful and Perl API.

You want to get some Ensembl data from BioMart using BiomaRt? Easy, just follow the simple guide below.

How can I install the BiomaRt, R package?

First make sure you have installed the R software on your computer. Then, run the following commands from your R terminal to install the Bioconductor BiomaRt R package:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")

What are the Ensembl marts?

The following functions will give you the list of the current available Ensembl marts

> library(biomaRt)

> listEnsembl()

     biomart               version
1    ensembl               Ensembl Genes 80
2        snp               Ensembl Variation 80
3 regulation               Ensembl Regulation 80
4       vega               Vega 60
5      pride               PRIDE (EBI UK)

Which Ensembl species have Variation data?

The listDatasets function will list all the species available for a given mart.

> library(biomaRt)

> variation = useEnsembl(biomart="snp")

> listDatasets(variation)

biomart_R_1

What data can I get from the Variation mart (filters and attributes)?

The listFilters and listAttributes functions will give you the list of all the filters and attributes available for a given mart.

> library(biomaRt)
 
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> listFilters(variation)

> listAttributes(variation)

biomart_R_filters

 

 

biomart_R_attributes

 

 

 

How can I get data about a variant using an rsID?

In the following example, you will be able to retrieve Variation source, Chromosome locations, Minor allele, Frequency and count, Consequences, Ensembl Gene and Transcript IDs for the Variation name “rs1333049”.

> library(biomaRt)
 
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> rs1333049 <- getBM(attributes=c('refsnp_id','refsnp_source','chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string','ensembl_gene_stable_id','ensembl_transcript_stable_id'), filters = 'snp_filter', values ="rs1333049", mart = variation)

> rs1333049

biomart_R_snp_information

How can I get data on all genes on a chromosome?

In the following example, you will be able to retrieve Ensembl Gene IDs, HGNC symbols and biotypes located on the human chromosome Y.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> chrY_genes <- getBM(attributes=c('ensembl_gene_id','gene_biotype','hgnc_symbol','chromosome_name','start_position','end_position'), filters = 'chromosome_name', values ="Y", mart = ensembl)

> chrY_genes 

biomart_R_gene

How can I get protein domains information mapped to an Ensembl Gene ID?

In the following example, you will be able to retrieve Ensembl Gene, Transcript and Protein IDs, Interpro and Pfam protein domain IDs and locations mapped to the Ensembl Gene ID “ENSG00000198763”.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> domain_location_ENSG00000198763 <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','interpro','interpro_start','interpro_end','pfam','pfam_start','pfam_end'), filters ='ensembl_gene_id', values ="ENSG00000198763", mart = ensembl) 

> domain_location_ENSG00000198763

temporary_screenshot

The Bioconductor BiomaRt R package and complete documentation can be found on the BiomaRt Bioconductor page.

8 thoughts on “BiomaRt or how to access the Ensembl data from R

  1. Has this update somehow broken the ability to use archived versions of biomart? I’m trying to access version 78 but it appears to be down at the moment (Mon Jun 1 18:51:49 UTC 2015). Thanks

  2. I get the following error when trying this out:

    > ensembl=useMart(host=’feb2014.archive.ensembl.org’, biomart=’ENSEMBL_MART_ENSEMBL’, dataset=”hsapiens_gene_ensembl”) #Get the hg19 version in biomart

    Extra content at the end of the document
    Error: 1: Extra content at the end of the document

    Is there a more updated archive that is to be used or is the archive just temporarily unavailable?

  3. Dear Adam and Stephen,

    I am afraid we experienced a power failure affecting many of our servers yesterday and some of our archive websites are still down at the moment.
    We are looking into this and I will let you both know as soon as the archives are back online.

    Adam, the biomaRt update added the new useEnsembl method but the useMart method is still available and work the same way as before, e.g:

    ensembl79 =useMart(“ENSEMBL_MART_ENSEMBL”, host = “mar2015.archive.ensembl.org”, dataset=”hsapiens_gene_ensembl”)

    Regards,
    Thomas

  4. Dear Thomas,

    is this downtime affecting also the API? I’m trying to run a perl code using Ensembl 67 but I get an error

    Can’t call method “get_all_translateable_Exons” on an undefined value at

    It’s weird because the script is now running on a file with transcript IDs that I previously analyzed but I can’t find differences between the old input file and the new one, except for the transcript IDs

    Thanks

  5. Hello,

    As far as I know, the Ensembl Perl API should still be working on older version of Ensembl as it’s connecting to different servers. Could you please email Ensembl Helpdesk: helpdesk@ensembl.org so that someone can have a proper look at your error?
    Could you please also include in your email your script and input file.

    Thanks a lot,
    Thomas

  6. Thanks! I saw that the archive has been restored now anyway, so it must be an error in my input file.

    thanks again

  7. Most of the archive websites are now back online. We are still trying to restore the GRCh37 marts.

    Thanks a lot for your patience.
    Regards,
    Thomas

  8. I can see that there is a lot of improvements that the new version has made , but still personally think it will lead to much better user experiences if the web interface design is more simple under the various functions. As a biochemical supplier, our company will frequently use BiomaRt for related information and that’s how well feel.