もっと詳しく
Keeping science reproducible in a world of custom code and data

Enlarge (credit: Degui Adil / Getty Images)

It is often said that the difference between science and superstition is that science is reproducible. Unfortunately, many scientific papers aren’t, making them about as reliable as superstition.

Since the mid-1600s, the output from a typical scientific study has been an essay-style journal article describing the results. But today, in fields ranging from astronomy to microbiology, much of the technical work for a journal article involves writing code to manipulate data sets. If the data and code are not available, other researchers can’t reproduce the original authors’ work and, more importantly, may not be able to build upon the work to explore new methods and discoveries.

Thanks to cultural shifts and funding requirements, more researchers are warming up to open data and open code. Even 100-year-old journals like the Quarterly Journal of Economics or the Journal of the Royal Statistical Society now require authors to provide replication materials—including data and code—with any quantitative paper. Some researchers welcome the new paradigm and see the value in pushing science forward via deeper collaboration. But others feel the burden of learning to use distribution-related tools like Git, Docker, Jupyter, and other not-quite words.

Read 29 remaining paragraphs | Comments