On November 20 documents including email and code spanning more than a decade were leaked from the Climatic Research Unit (CRU) at East Anglia University in the UK.
The Leak Reveals a Failure of Reproducibility of Computational Results
It appears as though the leak came about through a long battle to get the CRU scientists to reveal the code and data associated with published results, and highlights a crack in the scientific method as practiced in computational science. Publishing standards have not yet adapted to the relatively new computational methods used pervasively across scientific research today.
Other branches of science have long-established methods to bring reproducibility into their practice. Deductive or mathematical results are published only with proofs, and there are long established standards for an acceptable proof. Empirical science contains clear mechanisms for communication of methods with the goal of facilitation of replication. Computational methods are a relatively new addition to a scientist’s toolkit, and the scientific community is only just establishing similar standards for verification and reproducibility in this new context. Peer review and journal publishing have generally not yet adapted to the use of computational methods and still operate as suitable for the deductive or empirical branches, creating a growing credibility gap in computational science.
Verifying Computational Results without Clear Communication of the Steps Taken is Near-Impossible
The frequent near-impossibility of verification of computational results when reproducibility is not considered a research goal is shown by the miserable travails of “Harry,” a CRU employee with access to their system who was trying to reproduce the temperature results. The leaked documents contain logs of his unsuccessful attempts. Harry apparently was unable to reproduce CRU’s published results after four years of documented effort.
This example also illustrates why a decision to leave reproducibility to others, beyond a cursory description of methods in the published text, is wholly inadequate for computational science. Harry seems to have had access to the data and code used and couldn’t computationally replicate the results. The merging and preprocessing of data in preparation for modeling and estimation encompasses a potentially very large number of steps, and a change in any one could produce different results. Fitting models, running simulations, parameter settings, and function invocation sequences must be communicated, again because the final results are a culmination of many decisions and each small step must match the original work – a Herculean task. Responding with raw data when questioned about computational results is likely a canard, not intended to seriously facilitate computational reproducibility.
The story of Penn State professor of meteorology Michael Mann‘s famous hockey stick temperature time series estimates is an example where lack of verifiability had important consequences. Release of the code and data used to generate the results in the hockey stick paper would perhaps have avoided the convening of panels to assess the papers. The hockey stick is a dramatic illustration of global warming and became something of an informal logo for the U.N.’s Intergovernmental Panel of Climate Change (IPCC). Mann was an author of the 2001 IPCC Assessment report, and was a lead author on the “Copenhagen Diagnosis,” a report released Nov 24 and intended to synthesize the hundreds of research papers about human-induced climate change that have been published since the last assessment by the IPCC two years ago. The report was prepared in advance of the Copenhagen climate summit scheduled for Dec 7-18. Emails between CRU researchers and Mann are included in the leak, which happened right before the release of the Copenhagen Diagnosis (a quick search of the leaked emails for “Mann” provided 489 matches).
These reports are important in part because of their impact on policy, as CBS news reports, “In global warming circles, the CRU wields outsize influence: it claims the world’s largest temperature data set, and its work and mathematical models were incorporated into the United Nations Intergovernmental Panel on Climate Change’s 2007 report. That report, in turn, is what the Environmental Protection Agency acknowledged it “relies on most heavily” when concluding that carbon dioxide emissions endanger public health and should be regulated.”
Discussions of Appropriate Level of Code and Data Disclosure on RealClimate.org, Before and After the CRU Leak
For years researchers had requested the data and programs used to produce Mann’s Hockey Stick result, and were resisted. The repeated requests for code and data culminated in Freedom of Information (FOI) requests, in particular those made by Willis Eschenbach, who tells his story of requests he made for underlying code and data up until the time of the leak. It appears that a file, FOI2009.zip, was placed on CRU’s FTP server and then comments alerting people to its existence were posted on several key blogs.
The importance of disclosure of code and data in one part of the climate change community is illustrated in this fascinating discussion on the blog RealClimate.org in February. (Thank you to Michael Nielsen for the pointer.) RealClimate.org has 5 primary authors, one of whom is Michael Mann, and its primary author is Gavin Schmidt. In this RealClimate blog post from November 27, Where’s the Data, the position seems to be now very much all in favor of data release, but the first comment asks for the steps taken in reconstructing the results as well. This is right – reproducibility of results should be the concern (as argued here for example).
Policy and Public Relations
A dangerous ramification from the leak could be an undermining of public confidence in science and the conduct of scientists. My sense is that making code and data readily available in a way that facilitates reproducibility of results, can help avoid distractions from the real science, such as potential evasions of FOIA requests, whether or not data were fudged, or scientists acted improperly in squelching dissent or manipulating journal editorial boards. Perhaps data release is becoming an accepted norm, but code release for computational reproducibility must follow. The issue here is verification and reproducibility, which is important for understanding and assessing whether the core science done at CRU was correct or not, even for peer reviewing scientists.