The Ensembl FTP site, a one-stop shop

Are you looking for whole genomes, protein sequences, alignments or other genome-wide data from Ensembl?

Look no further; our FTP site is the place for you:

  • Download our data from the current release only (i.e. Ensembl 78)
  • Download our data from current and previous releases (including GRCh37)

These are some of our data that can be downloaded in bulk and for free; file types are described in brackets:

  • DNA, cDNA, CDS, ncRNA sequences (FASTA)
  • Annotations of our coding and non-coding genes (GTF)
  • Annotation of regulatory elements for the human and mouse genomes (GFF)
  • Variation data (VCF) for more than 20 Ensembl species
  • RNASeq reads (BAM) aligned against 25 genomes
  • GERP scores to identify constrained elements (BED)
  • Alignments of resequencing data for several species (EMF)
  • Multiple and pairwise genome alignments (MAF)
  • Ensembl databases for local installation (MySQL)

How can the Ensembl FTP foster research?

Let’s look at coiled-coils, simple dimers in protein sequences found in many species and believed to enable protein-protein interaction in a variety of biological processes.

Structure of coiled-coil domain from PDBe. Homohexameric assembly by Li et al. (2014)

Coiled-coil domains differ immensely from their globular counterparts, and distinct evolutionary constraints on them are expected. How conserved are coiled-coils? What has driven their evolution?

Intrigued by these questions, Surkont and Pereira-Leal (2015) set out on an journey to compare different protein sequences across several vertebrates, and the yeast. They show that substitution patterns do differ in coiled-coil versus globular regions, and they developed an evolutionary model to improve the detection of coiled-coils by homology, and their phylogeny inference.

Where did Surkont and Pereira-Leal find these proteomes for their investigation? In our FTP site.

