I’d like to introduce you an exciting new data set that we’ve introduced in Ensembl release 62: RNASeq data from Illumina’s Human BodyMap 2.0 project. The data, generated on HiSeq 2000 instruments in 2010, consist of 16 human tissue types, including adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells. Raw reads are available for download here. For each tissue, we have aligned the raw reads to the genome and then linked exons into tissue-specific transcript models using the reads that span an exon-exon boundary.
You can view these data in the Region in Detail view. Click on ‘Configure this page’ and choose ‘RNA-Seq’ at the left of the main panel. Enable any or all of the 32 tracks and then close the configuration panel. Out of 32 possible tracks you can draw, 16 are tissue ‘gene model’ tracks, and 16 are ‘intron’ tracks.
The ‘gene model’ track shows you a transcript model. The ‘intron’ track shows you how many raw reads aligned across an exon-exon junction. The higher the intron block, the more highly expressed the transcript isoform is.

In this example, the kidney gene model track shows a transcript (dark blue) with an exon structure that matches the gold-coloured Ensembl transcript AQP6-001. The kidney transcript model includes coding and noncoding exons (in the example above, the empty box is UTR, and the filled boxes are exons).
Click on the kidney intron track to see that 192 raw reads were split between the first and second exons.
This example is interesting because it shows a gene with high expression in kidney tissue, and almost no expression in any other tissue.
The high read coverage for kidney means that the transcript’s exon-intron structure produced for the gene track has a good chance of being correct. When read coverage is very low, it is not always possible to build a full-length transcript model: Look at the colon and brain intron tracks to see that two colon reads and three brain reads have aligned across the transcript’s middle exon-exon junction. Although this read coverage is low, our pipeline has generated a transcript model for brain tissue. The pipeline however was not able to predict the two splice on either side because there were no raw reads from brain aligning over the splice junctions.
Below is a nice example of a gene that seems to be expressed in all 16 tissues, spermidine synthase (SRM).
Try dump_transcripts.pl as an example script to access the RNAseq-based transcript models. Have fun with these new data!


Where can I download the Human BodyMap data from? The link that you provided above is not working.
Thanks.
Dear Vinay,
I just tried all the links and they work for me. Can you try it again and give us the error if you can’t get the data?
Also the data is stored at the EBI, so if you can access the EBI web page but can’t download the file, you will have to email them because we have no access to this data.
Regards
Hi,
Is it possible to download the relative expression data per tissue per gene ?
Thanks!
Dear NoaR,
It is not possible to download the relative expressions per tissue per gene.
But you can get the number of spanning reads for each intron of each model created using the API and the RNA-Seq database homo_sapiens_rnaseq_65_37 available on the public MySQL server ensembldb.ensembl.org. Use the Bio::EnsEMBL::DBSQL::DnaAlignFeatureAdaptor class and then retrieve the score from the DnaAlignFeature object.
Regards
Dear Thibaut,
I would appreciate if you could add some information about the pipeline used for producing ‘intron’ and ‘gene model’ tracks.
Did you use paired-end reads only, as an initial data for the pipeline?
Thanks!
Dear Busa,
You can find more information on the RNASeq pipeline in the Help&Documentation section of Ensembl : http://www.ensembl.org/info/docs/genebuild/rnaseq_annotation.html. We used the method describe for Zebrafish.
Yes, we used only paired-end reads for the pipeline.
Hope this will help!
Hi Thibault,
Do you have any resources where we can find out more about Illumina’s body map project?
Thank you!
Hi Melissa,
If you follow the link to the raw sequences, here, you will have more information on the project.
If it’s not enough, the best thing would be to email Gary Schroth who is the contact for the project.
Hope this will help!
Hi Thibault,
Is it possible to export ‘intron’ tracks from Ensembl?
I couldn’t find this option using ‘export data’ link.
Thanks!
Hi Busa,
I apologize for the late reply.
At the moment you will need to query directly the RNASeq database on our public MySQL server http://www.ensembl.org/info/data/mysql.html. The introns are stored as DnaAlignFeature. You can find here some code example: http://lists.ensembl.org/pipermail/dev/2012-January/002061.html
Regards
Hi,
is there an ftp for the BAM files so I can load them into ENSEMBL browser? I’ve only found fastq files.
Hi Sebastian,
At the moment we have no knowledge of BAM files for the Human Body Map. We are planning to generate BAM files but there is no release date.
Regards
Hi Thibaut,
You mentioned above that, “For each tissue, we have aligned the raw reads to the genome and then linked exons into tissue-specific transcript models using the reads that span an exon-exon boundary.” How can I download the raw data of these exon junctions or boundaries, raw counts, and RPKM for each of these 16 tissues? I read some docs about using perl API to access data from the EMBL core schema. However, I couldn’t find info on which classes or methods I should use to extract the type of data I’m looking for. Any advice is greatly appreciated.
Hi Jerry,
At the moment we are not providing raw data. All the information you can retrieve from the rnaseq database is related to the models created like how many reads span the intron at for a specific splice site.
Here is an example of code: http://lists.ensembl.org/pipermail/dev/2012-January/002061.html
Hope this help,
Thibaut
Hi Thibaut,
By raw data, I mean the coordinates of the exon boundaries. Any suggestion on how I can retrieve that from the rnaseq database?
Thanks again and I apologize for the confusion.
Hi Jerry,
The exon boundaries you will be able to get will only be the boundaries of our models. We can’t assure that it is the right one but it is the one where we had the more support.
So to get the coordinate of the exon boundaries you will need to use the module Bio::EnsEMBL::Exon
Use the example from my previous reply and then you can call your exons via the Bio::EnsEMBL::Transcript module:
foreach my $exon (@{$transcript->get_all_Exons}) {
print "Start: ", $exon->start, " End: ", $exon-end, "\n";
}
I suggest you to subscribe to the Ensembl Dev mailing list, the entire Ensembl community will be able to help you when needed: http://lists.ensembl.org/mailman/listinfo/dev
Thibaut
I ran the script dump_transcripts.pl listed above and got a whole bunch of errors such as those listed below. Does the new release of Ensembl 68 have something to do with it where modules were changed? I re-installed Emsembl API but it still doesn’t work. Perhaps the name of the modules changed thus they need to be changed in the script as well? Please advice. Thank you!
Error Output:
——————– WARNING ———————-
MSG: ‘Bio::EnsEMBL::DBSQL::GeneAdaptor’ cannot be found.
Exception Can’t locate Bio/PrimarySeqI.pm in @INC (
BEGIN failed–compilation aborted at /home/jli/src/ensembl/modules/Bio/EnsEMBL/Slice.pm line 62.
Compilation failed in require at /home/jli/src/ensembl/modules/Bio/EnsEMBL/DBSQL/SliceAdaptor.pm line 99.
BEGIN failed–compilation aborted at /home/jli/src/ensembl/modules/Bio/EnsEMBL/DBSQL/SliceAdaptor.pm line 99.
Compilation failed in require at /home/jli/src/ensembl/modules/Bio/EnsEMBL/DBSQL/GeneAdaptor.pm line 67.
BEGIN failed–compilation aborted at /home/jli/src/ensembl/modules/Bio/EnsEMBL/DBSQL/GeneAdaptor.pm line 67.
Compilation failed in require at (eval 8) line 3.
FILE: Bio/EnsEMBL/Registry.pm LINE: 1015
CALLED BY: dump_transcripts.pl LINE: 88
Date (localtime) = Thu Aug 23 16:54:56 2012
Ensembl API version = 68
Hi Jerry,
Your question is a technical one and a response to it could benefit others trying to use the BodyMap data. Please refer to the Ensembl developers mailing list (http://lists.ensembl.org/mailman/listinfo/dev) for response to your email there. Your question is now answered on the list.
Best regards,
Amonida
I’d like to download the FPKM values or the read counts for each of the samples. Is there a place I can obtain these?
Thank you,
Teja
Hi Teja,
We don’t provide FPKM values.
But we provide the bam files containing the raw alignments of the reads on the Ensembl FTP. You can then use samtools or any other SAM/BAM tool to retrieve the data you want.
You can also look at the intron supporting evidence in the RNASeq database (Public databases) via the API. We store the number of intron-spanning reads for each model generated.
Hope this help
Thibaut
I would like to know if the bam files were generated from:
single read experiment or from pair end experiment
Thanks
Cycy
Hi Cycy,
All the reads for the pooled set are 100bp single end.
For each tissue we had 50bp paired end reads and 75bp single reads.
Hope this helps,
Thibaut
Is it possible to put in a sequence and have the sequenced mapped to RNAseq data do determine the frequency that the query sequence has been detected in the database?
Hi Robert,
No it is not possible. We only store the number of intron spanning reads in the database so there is not an easy way.
What you can do, but it requires some knowledge in writing scripts, is to map your query sequence on the genome then query the BAM files to get the number of reads on the region where your query sequence aligned. You will have to filter the reads depending on how many mismatch you want to allow in your count.
The BAM files contain all the reads that we used. You can also create a blast database from the BAM files.
Hope this help
Thibaut
I download the rawdata, and process using Fastqc. the begin bases of pair end sample are abnomal and the end bases of single end sample are abnormal too. I doubt, they may have adapter, linker or primer pollution. Form E-MTAB-513.sdrf.txt, I got the single end sample linker information, it seems can explain the the end bases of reads abnormal, but for pair end, I have no idea. Is there any information about the sample’s adapter, linker or primer?
Hi Yuting,
We did not produce these data. You should contact Gary P Schroth at Illumina, gschroth@illumina.com, he will be the person who should be able to give you some answers.
Regards
Thibaut