Open source and open access

As an Ensembl Outreach Officer I get asked a lot of questions. Mostly questions about our data and interfaces but occasionally, just occasionally, something a bit more blue sky.

A couple of weeks ago I was teaching an Ensembl Browser workshop at the Erasmus MC in Rotterdam. I was just explaining that all our data and code was completely free to use, open source and open access, when someone asked me: Why? What’s in it for you?

Why indeed? Why are there forty people dedicated to producing this project? Why do our funders give us all this money to do it? Why do we just give it all away for free?

Why do science at all?

The fundamental answer varies for all of us. Things like improving people’s lives, curiosity, discovery. These are the motivations that got most of us into careers in science at all. Ensembl may not be directly be doing research, but we’re enabling it.

Servers from the Ensembl farm

A tiny portion of the Ensembl farm

The Economic argument

There’s also an economic answer – in terms of time, money and infrastructure. How much does it cost to annotate a genome? To do pairwise sequence comparisons of over a million genes? To annotate variation? To make regulatory data meaningful? How much does it cost to put this into an easily accessible format? How much does it cost to regularly update this with new data? How many terabytes of memory do you need to actually store this stuff?

Even though these are non-trival costs, infrastructure projects in bioinformatics are about saving money overall. Funders and scientists understand that lots of different labs need the data and the analysis that we produce. However, it would be horribly inefficient if each lab who needs the resources we provide had to produce it themselves, repeating work that somebody else has already done, spending money that has already been spent, spending time that they could be spending doing other experiments or doing other analysis. Therefore, we have a system where we do it for them and put it all up where they can find it. Nothing’s repeated. Plus, our experience, expertise and raw computing power means that we can do it more cheaply and quickly than most labs can.

Free to be serendipitous

By giving the data away for free, we allow serendipitous discovery. If we charged people to use Ensembl in some kind of per-use manner, then they’d only use Ensembl to look for things they knew they were looking for. Yet we know that much of scientific discovery occurs when people accidentally stumble across things, like Alexander Fleming’s mouldy Staphylococcus plates. By allowing people to browse Ensembl freely, without worrying about costs, they may stumble across the tool or data that will be exactly what they need.

A relatively big group of people work for the project and they don’t work for free. But overall, we save the research community money by enabling science to be built on our foundation.

So, the answer to “what’s in it for me?”: I work for a project that makes science happen as efficiently as possible.