Second update of the Ensembl GRCh37 site

The second update of the GRCh37 archive site has now been released. Some of the data imports and updates for this release include:

  • dbSNP144 human data including data from the Exome Aggregation Consortium (ExAC)
  • Public HGMD data (version 2015.2)
  • Phenotype data from NHGRI-EBI GWAS, OMIM, ClinVar, UniProt, Orphanet and Decipher
  • Exome Sequencing Project data (v.0.0.30. (Nov. 3, 2014))
  • HumanCoreExome-12 chip
  • Gene annotation dumps in GFF3 format

We have re-built the GRCh37 dedicated Ensembl, Regulation and Variation BioMarts to integrate the updated data sets.

You will find a complete list of the changes on the Ensembl GRCh37 website.

4 thoughts on “Second update of the Ensembl GRCh37 site

  1. I am attempting to use Ensemble to browse a genome built by the Harvard Personal Genome Project.
    The data is in the form called ‘GET-Evidence’, and the URL is
    http://evidence.pgp-hms.org/genomes?display_genome_id=75958d26989d7433f0dae280d4e7a983c53957a4&access_token=68ee26a29f2f0687bd92fc96d4636d42

    Whatever tool on Ensemble I try to apply to this data, I get a message saying ‘unknown format’ or ‘wrong format’.

    I am a novice in this, and I do not know WHAT form my genome sequence from PGP is in, and although their website seems to imply that it is in a format called GFF, plugging that format into the Ensemble browser still yields the ‘wrong format’ error.

    For a month now I have been unable to browse my data (since it became available on the Personal Genome Project site), either ON the PGP site itself or on this site,

    I will be very grateful for any help on this.

    • Hi Henry,

      I’m sorry to hear you are having difficulties when using Ensembl. We certainly don’t want that to be the case.
      I’ve had a look at the GET-evidence tool and I can see that their accepted files for upload are VCF, GFF http://evidence.pgp-hms.org/guide_upload_and_annotated_file_formats but I could not find the format their data is provided for downstream analyses using other tools, e.g. Ensembl.
      You can attach/upload any external data on our Ensembl Browser website providing their file formats aresupported by us (such as VCF, GFF, BAM, etc) http://www.ensembl.org/info/website/upload/index.html#formats
      As I said, I’m not sure what the formats of their output their data are. May I suggest you to contact them to find that out? If it is GFF, it should indeed work in our website.
      Once you have that information, please contact us again by sending a message to helpdesk [at] ensembl.org with some screenshots of your workflow (e.g which Ensembb pages you are in, which functionalities you are trying to use, etc), so that we can try to get to the bottom of this.

  2. Denise: Your comments have been helpful, and I have made some progress, but only some.

    First, contacting the Harvard PGP folks to find out what format my genome is in is not easily done, since they have a very small staff and it can take weeks to get a response to a query.

    They are a research project, not a commercial provider of gene sequences, and ‘customer’ service is not their priority since the participants are not actually customers.

    Another problem I WAS having was that the file containing my genome is enormous, and it takes forever to load and try to do anything with.

    I solved THAT problem by splitting my downloaded version (on my desktop) into hundreds of tiny (1KB) files, using a downloadable utility that did the job nicely.

    Now I can just open one of the smaller files and copy and paste the data into the window provided for that purpose on the ‘add data’ page.

    I discovered that specifying VCF in the format choice drop down will allow work, but then I get various kinds of error messages.\

    Specifically, your machine tells me that it ‘expected’ the ‘attribute’ of the file to start and end with an integer, and that therefore it could not load the file.

    So I dutifully INSERTED and integer at the very beginning and the very end of the pasted in data.

    Having done that, the software then told me that it could not match any data of mine to any data in the human genome, and suggests that I make sure I’ve chosen the correct species(!).

    Despite the fact that I sometimes WISH I could change to another species, I do not think that wish has been granted.

    This is progress, but I still wonder why it can’t find any matches.

    Are my 1KB file now too SMALL for the search engine to have enough data to compare with the template?

    Once again, any help will be gratefully accepted.

    Hank

    • Hi Henry, glad you found the comments helpful.
      You can upload smaller files to the Ensembl browser and for larger ones such as VCF, BAM, BigWig, you will need to attach them (rather than upload). This is done via an URL attachment only. So you need to have you data on a FTP or HTTP as explained here. http://www.ensembl.org/info/website/tutorials/userdata.html
      Check in your department or the systems group in our University how you can get your file on a FTP/HTTP.
      If you’ve got a VCF file, it also needs to be indexed. More details can be found below:
      http://www.ensembl.org/info/website/upload/large.html#vcf-format
      So please make sure those two criteria are fulfilled and try to attach the data (via a URL) to our Browser. If you still encounter problems, please send an excerpt of your file to helpdesk [at] ensembl.org and one of us will try to help.