File Chameleon, easily transform Ensembl FTP files

File Chameleon, click to enlarge

File Chameleon, click to enlarge

Transforming file formats has always been a troublesome issue in bioinformatics because of the numerous standards and slight eccentricities in formatting required by some software packages. How many times have you needed to transform chromosome names between 1,2,3 and chr1, chr2, chr3 or vice versa? With the introduction of File Chameleon we hope to somewhat smooth this process for data consumers.

File Chameleon is a web service introduced by Ensembl to transform Ensembl FTP files for easier use across the spectrum of bioinformatics tools. Need UCSC style chromosome names? Need genes longer than 4Mbp removed? File Chameleon can do that. From the File Chameleon web interface simply select the species and which flat file you want to download (individual chromosome gtf, full assembly fasta, etc), then select which filters you want to apply. The file will be transcribed and ready to download within a few minutes.

Currently File Chameleon only operates on GTF, GFF3, and FASTA formats and has a very limited set of filters for each format, however we’re committed to expanding the tool over future releases. Please take a look and give us feedback, which of the Ensembl formats would be useful to add, and more importantly what transformations and filters on the data would make it more useful for you? What is the awk or sed script you run on the files you download that we can do for you, or others might find helpful?

File Chameleon is also available as a standalone tool and is designed to have easily pluggable filters. If you find the tool useful, you can run it locally and expand it writing your own plugins to further process files. The package can be downloaded via GitHub along with extensive documentation and examples.