Run a private Ensembl MySQL in the cloud

Single Species private MySQL

Picture 20

Ensembl has just made it even easier to access genome data in the cloud. We are pleased to announce the availability of pre-configured Ensembl MySQL EC2 cloud images. These are pre-built images that allow you to run your own private single species MySQL instances on EC2, the full list is here. There are images for each of our species; just start one up and connect with the API (or mysql client) just as you would with ensembldb, or useastdb – except this MySQL server you’ll have for exclusive use. You’ll also have full root permissions on the instance so you can configure, tweak and restart the MySQL server as you wish.

Getting started

Experienced users of Amazon EC2 need only to skip to the full list of AMIs on the ensembl website and when launching to ensure security group access to port 5306 on your instance.

If you’re not already familiar with Amazon EC2 and would like to get started with ensembl in the cloud then the following set of brief instructions should be enough get you going. The steps are:

  1. Set up your Amazon EC2 Account
  2. Instantiate the Ensembl Image
  3. Connect to the instance

Signing up for your amazon EC2 account is simple if you don’t already have one. Just go to http://aws.amazon.com/ec2 and click Sign Up for Amazon EC2. Then follow the on screen instructions.

Launching an instance is just as simple when you use the amazon EC2 launch wizard. The amazon AWS documentation describes the process expertly, and can be found here. To launch the Ensembl Amazon Machine Image(AMI) follow the  documentation beginning at Step 1 to launch the wizard. We shall digress at step 2, “choosing an AMI” where instead select an Ensembl AMI from the list here. As an example, we select Homo sapiens with id  ami-b0fb5

Ensure that you are in the useast region (which is the default) and in the launcher wizard select the community AMI tab (the green arrow in the figure below) and paste in ami-dccb0fb5 to the box incated with the red arrow and underlined in red.

After clicking Select, for best performance choose the Large(m1.large, 7.5GB) instance type from the dropdown, although you may also choose the smaller Micro(t1.micro) type which would incur a lower cost per hour http://aws.amazon.com/ec2/#pricing

 

Click Continue and follow on with steps 3 & 4 from the amazon docs to create a key-pair. Next create a security group (firewall rule) that will allow access to our MySQL instance. The process is described in detail as step 5 of the amazon docs, with the exception that we require only ports 22 for ssh access and port 5306 to allow access to the MySQL server (don’t forget to click Apply Rule Changes).

Take particular note on the point of caution from the docs regarding restricting access to specific IP addresses.

The quick-start security group enables all IP addresses to access your instance over the specified ports (e.g., SSH). This is acceptable for the short exercise in this tutorial, but it’s unsafe for production environments. In production, you’ll authorize only a specific IP address or range of addresses to access your instance

Finally launch the instance as described in step 6 and record the public DNSname of your new server.

If all has been successful you will now have a private AWS instance running a MySQL server with a single Ensembl species, which is Homo sapiens in the case of this exercise.

Connecting to the instance

To connect to the instance you do so as:

mysql <PUBLIC_DNS_NAME> -u anonymous -P5306

If you cannot connect, check that the instance is running and also review that the security group settings are correct, both ports 22 and 5306 need to be open.

You may ssh directly to the instance with the username ‘ensembl’ as described here.

ssh -i <YOUR_KEYPAIR> ensembl@<PUBLIC_DNS_NAME>

The Ensembl MySQL AMI comes pre-installed and configured with the current perl API and you can run your scripts against Ensembl from within this instance, but targeting MySQL on localhost:5306 rather than the public DNS name.

Amazon bill by the hour so don’t forget to terminate the instance when you have finished.

Full details of  Amazon AWS costs and charges are here http://aws.amazon.com/ec2/#pricing and be aware that you are billed for both storage, and EC2 time, it is your responsibility to monitor your usage.

13 thoughts on “Run a private Ensembl MySQL in the cloud

  1. This looks extremely useful, and will be taking a look. Do you have plans for a single AMI with all of the databases pre-installed?

    • Hi Will,

      It’s early days with the MySQL AMIs so every request is worth considering, so if it’s something you or other would find useful then I can look into it. So long as you’re happy taking on a Terabyte of EBS to your account 😉

  2. Hi,

    I’d like to install phpmyadmin on these machines but it requires me to enter the password for the MySQL root user, can you please tell me what it is? Thank you very much!

  3. Pingback: Amazon cloud computing | paucorral

  4. Will you also be considering to create a cloud image of your annotation pipeline?

  5. I have created an instance using MySQL AMI (ami-ab4b89c2) for Ensembl64 homo_sapiens

    I have two questions.
    1. when I connect to the database, why “mysql” database disappear? How is this database hidden?
    2. all the ENSEMBL database are put on a different drive, but I don’t see any links under /var/lib/mysql folder. what’s the magic? I do see all the EMSEMBL datanase after I connect to them by “show database” command.

    Thanks.

    • Hi Shanrong,

      Thank you for your interest in our cloud MySQL instances. In regards to your 2 questions, firstly the mysql db is present but it is not visible to the anonymous user, this is by design. If you wish to reset the mysql root password and create a more permissive user then instructions on how to do this can be found on the mysql website

      In answer to your second question, the MySQL MYD and MYI files are on an attached volume, mounted on /vols/ensembl_mysql_data, the files in this directory are symbolically linked to /var/mysql/lib

      I hope this answers your questions.

      Stephen

  6. Hi,

    we’re interested in running these locally. Do you happen to have vmdks for these AMI files? Or is there a way to convert them?
    Alternatively, can anyone tell me how storage use is charged when using the human genome AMI? It uses 600+GB. Is the storage use ‘free’ because you start out from a snapshot?

    Thanks

    Wim

    • Hi Wim,

      Unfortunately there’s no easy way to convert these large images. They are entirely created and put together out on Amazon, with the intention that they are run out on AWS; I can’t imagine what it would be like to pull human and convert a 600GB AMI to vmdk. Your best bet for running a local MySQL ensembldb is to pull the flatfile dumps from our FTP site and build it following the instructions on our website http://www.ensembl.org/info/docs/webcode/mirror/install/ensembl-data.html.

      The point of these images is to allow you to avoid this step if you already have an amazon-centric workflow.

      Regarding storage costs, AWS will bill the user upon whose account the image was launched, so you will pay by the hour as soon as you launch it. And of course, you stop paying as soon as you terminate it – and delete the volumes.

      Stephen

  7. Is there a way of getting hold of a snapshot of previous releases? I’d like to do this with release 70 data if possible? Is there a snapshot containing the entire 61 species from release 70 within a single instance? I’m trying to retrieve intron sizes for all 61 species in release 70 for some analyses I’m running and don’t want to DoS the main EnsEMBL servers.

    • Hi Steve,

      We delete the old releases with the advent of new releases. If you would like to keep a previous release around then copying it to your own account is the way to do this. We do not generate a single AMI with all species.

      w.r.t your intron analysis, perhaps you could spread your analysis across ensembldb and useastdb, with some throttling?

      Regards,

      Stephen

      • Dear Stephen,

        Thanks for the swift reply!

        No problem! I’ll bare in mind keeping a backup of the amis! Is it likely you would consider having a copy of the complete species data for a release as an ami? I guess the MySQL dumps on the FTP make this partially redundant?

        I’ve just pulled in the release-70 core MySQL dumps to run things locally! Will save on runtime overall I think anyway? Over 1.1 million records to retrieve data for :-S

        Cheers,

        Steve