BiomaRt or how to access the Ensembl data from R

BiomaRt is a Bioconductor package that make accessing and retrieving Ensembl data from the R software very easy. The recent Bioconductor 3.1 release includes a new version of BiomaRt packed with many new Ensembl friendly functions allowing you to connect and retrieve data from the Ensembl marts in record time.

To celebrate the new Bioconductor release, we’ve just launched aĀ brand new mart documentation page. This new documentation covers theĀ BioMaRt packageĀ but alsoĀ how to combine species data,Ā BioMart RESTful and Perl API.

You want to get some Ensembl data from BioMart using BiomaRt? Easy, just follow the simpleĀ guide below.

How can I install the BiomaRt, R package?

First make sure you have installed the R software on your computer. Then, run the following commands from your R terminal to install the Bioconductor BiomaRt R package:


What are the Ensembl marts?

The following functionsĀ will give you the list of the current available Ensembl marts

> library(biomaRt)

> listEnsembl()

     biomart               version
1    ensembl               Ensembl Genes 80
2        snp               Ensembl Variation 80
3 regulation               Ensembl Regulation 80
4       vega               Vega 60
5      pride               PRIDE (EBI UK)

Which Ensembl species have Variation data?

The listDatasets function will list all the species available for a given mart.

> library(biomaRt)

> variation = useEnsembl(biomart="snp")

> listDatasets(variation)


What data can I get from the Variation mart (filters and attributes)?

The listFilters and listAttributes functions will give you the list of all the filters and attributes available for a given mart.

> library(biomaRt)
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> listFilters(variation)

> listAttributes(variation)








How can I get data about a variant using an rsID?

In the following example, you will be able to retrieveĀ Variation source, Chromosome locations, Minor allele, Frequency and count, Consequences, Ensembl Gene and Transcript IDs for the Variation name “rs1333049”.

> library(biomaRt)
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> rs1333049 <- getBM(attributes=c('refsnp_id','refsnp_source','chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string','ensembl_gene_stable_id','ensembl_transcript_stable_id'), filters = 'snp_filter', values ="rs1333049", mart = variation)

> rs1333049


How can I getĀ data on all genes on a chromosome?

In the following example, you will be able to retrieve Ensembl Gene IDs, HGNC symbols and biotypes located on the human chromosome Y.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> chrY_genes <- getBM(attributes=c('ensembl_gene_id','gene_biotype','hgnc_symbol','chromosome_name','start_position','end_position'), filters = 'chromosome_name', values ="Y", mart = ensembl)

> chrY_genesĀ 


How can I get protein domains information mapped to an Ensembl Gene ID?

In the following example, you will be able to retrieveĀ Ensembl Gene, Transcript and Protein IDs,Ā Interpro and Pfam proteinĀ domain IDs and locations mapped to the Ensembl Gene ID “ENSG00000198763”.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> domain_location_ENSG00000198763 <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','interpro','interpro_start','interpro_end','pfam','pfam_start','pfam_end'), filters ='ensembl_gene_id', values ="ENSG00000198763", mart = ensembl)Ā 

> domain_location_ENSG00000198763


The Bioconductor BiomaRt R package and complete documentation can be found on the BiomaRt Bioconductor page.