Mark Gerstein and I penned a reaction to two pieces published in Nature News last October, “Publish your computer code: it is good enough,” by Nick Barnes and “Computational Science…. Error” by Zeeya Merali. Nature declined to publish our note and so here it is.
We have read with great interest the recent pieces in Nature about the importance of computer codes associated with scientific manuscripts. As participants in the Yale roundtable mentioned in one of the pieces, we agree that these codes must be constructed robustly and distributed widely. However, we disagree with an implicit assertion, that the computer codes are a component separate from the actual publication of scientific findings, often neglected in preference to the manuscript text in the race to publish. More and more, the key research results in papers are not fully contained within the small amount of manuscript text allotted to them. That is, the crucial aspects of many Nature papers are often sophisticated computer codes, and these cannot be separated from the prose narrative communicating the results of computational science. If the computer code associated with a manuscript were laid out according to accepted software standards, made openly available, and looked over as thoroughly by the journal as the text in the figure legends, many of the issues alluded to in the two pieces would simply disappear overnight.
The approach taken by the journal Biostatistics serves as an exemplar: code and data are submitted to a designated “reproducibility editor” who tries to replicate the results. If he or she succeeds, the first page of the article is kitemarked “R” (for reproducible) and the code and data made available as part of the publication. We propose that high-quality journals such as Nature not only have editors and reviewers that focus on the prose of a manuscript but also “computational editors” that look over computer codes and verify results. Moreover, many of the points made here in relation to computer codes apply equally well to large datasets that underlie experimental manuscripts. These are often organized, formatted, and deposited into databases as an afterthought. Thus, one could also imagine a “data editor” who would look after these aspects of a manuscript. All in all, we have to come to the realization that current scientific papers are more complicated than just a few thousand words of narrative text and a couple of figures, and we need to update journals to handle this reality.
Mark Gerstein (1,2,3)
Victoria Stodden (4)
(1) Program in Computational Biology and Bioinformatics,
(2) Department of Molecular Biophysics and Biochemistry, and
(3) Department of Computer Science,
Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520 Mark.Gerstein@Yale.edu
(4) Department of Statistics, Columbia University, 1255 Amsterdam Ave, New York, NY 10027