The following comments were posted in response to the OSTP’s call as posted here: http://www.ostp.gov/galleries/default-file/RFI%20Final%20for%20FR.pdf. The first wave, comments posted here, asked for feedback on implementation issues. The second wave requested input on Features and Technology (our post is here). For the third and final wave on Management, Chris Wiggins, Matt Knepley, and I posted the following comments:
Q1: Compliance. What features does a public access policy need to ensure compliance? Should this vary across agencies?
One size does not fit all research problems across all research communities, and a heavy-handed general release requirement across agencies could result in de jure compliance – release of data and code as per the letter of the law – without the extra effort necessary to create usable data and code facilitating reproducibility (and extension) of the results. One solution to this barrier would be to require grant applicants to formulate plans for release of the code and data generated through their research proposal, if funded. This creates a natural mechanism by which grantees (and peer reviewers), who best know their own research environments and community norms, contribute complete strategies for release. This would allow federal funding agencies to gather data on needs for release (repositories, further support, etc.); understand which research problem characteristics engender which particular solutions, which solutions are most appropriate in which settings, and uncover as-yet unrecognized problems particular researchers may encounter. These data would permit federal funding agencies to craft release requirements that are more sensitive to barriers researchers face and the demands of their particular research problems, and implement strategies for enforcement of these requirements. This approach also permits researchers to address confidentiality and privacy issues associated with their research.
One exemplary precedent by a UK funding agency is the January 2007 “Policy on data management and sharing”
adopted by The Wellcome Trust (http://www.wellcome.ac.uk/About-us/index.htm) according to which “the Trust will require that the applicants provide a data management and sharing plan as part of their application; and review these data management and sharing plans, including any costs involved in delivering them, as an integral part of the funding decision.” A comparable policy statement by US agencies would be quite useful in clarifying OSTP’s intent regarding the relationship between publicly-supported research and public access to the research products generated by this support.
An exemplary precedent by a US funding agency is that of NSF’s “broader impact criterion” (cf. http://www.ndsciencehumanitiespolicy.org/workshop/ for an links to extensive discussions on history and examples of what qualifies as evidence of broad impact). Such an existing requirement could allow, encourage, or require data and code sharing plans as possible examples of broader impact.
A second exemplary precedent by a US funding agency is that of NIH’s development of PubMed Central. Submission of manuscripts resulting from NIH support is now mandatory (cf. http://grants.nih.gov/grants/guide/notice-files/NOT-OD-08-033.html). NIH or other agencies might consider developing a similar repository for code, data, or (even better) full compendia (manuscript, data, and code together) of computational research, and possibly requiring use of this reliable, searchable, open repository for future federal funding. By creating and requiring an open access repository for manuscripts, NIH has avoided the possibility that research results can only be accessed by libraries able to pay the increasing costs of subscriptions to closed-access journals.
Q2: Evaluation. How should an agency determine whether a public access policy is successful? What measures could agencies use to gauge whether there is increased return on federal investment gained by expanded access?
One simple gauge is the proportion of funded projects, by field and by agency, which are in compliance. Compliance could be easily measured: whether the research compendia have been made available according to agency policy and the details of the particular grant funding the researcher. When the work is computational, funding agencies could consider implementation of the Reproducible Research Standard (cf. V. Stodden “Enabling Reproducible Research: Licensing For Scientific Innovation” at http://www.ijclp.net/issue_13.html) to untangle intellectual property rights associated with research release and clarify requirements.
The Reproducible Research Standard (RRS) realigns the Intellectual Property framework faced by computational researchers with longstanding scientific norms. The RRS suggests a licensing structure for research compendia, including code and data, that permits others to use and re-use code and data without having to obtain prior permission or assume a Fair Use exception to copyright, so long as attribution is given. The RRS utilizes existing open licenses that permit the free use of licensed work, so long as attribution is given, and is satisfied if the following four conditions hold:
1. The full research compendium, including code and data, is available on the Internet,
2. The media components such as text or figures, (including original selection and arrangement of the data), are licensed under the Creative Commons Attribution License 3.0 or released to the public domain under CC0,
3. The code components are licensed under one of Apache 2.0, the MIT License, or the Modified BSD license, or released to the public domain under CC0,
4. The data have been released into the public domain under CC0 or according to the Science Commons Open Data Protocol.
Using the RRS on all components of computational scholarship will encourage reproducible scientific investigation, facilitate greater collaboration, and promote engagement of the larger community in scientific learning and discovery.
Moreover, in evaluating compliance, we would also want to encompass the ability to build, run, and verify any source code. This might be accomplished using
* spot checks of the repository
* automated checks akin to unit tests
* tests run by a separate reviewer at the time of inclusion
Q3: Roles. How might a public private partnership promote robust management of a public access policy? Are there examples already in use that may serve as models? What is the best role for the Federal government?
Two notable examples of public-private partnership which have benefited science are http://arxiv.org, which is partially NSF-supported, and http://PDB.org, funded by a number of (public and private) sources. PDB in particular has for more than a decade been an integral part of the funding and publication policies in the structural biology community (cf. http://www.nature.com/nsmb/wilma/v5n3.892130820.html).
That said, previous experimentation with private management of scientific works has been problematic in at least one case. In December 2008 Google shut down http://researchdatasets.google.com – a repository for research data (cf. http://www.wired.com/wiredscience/2008/12/googlescienceda/). Private interests are not aligned with those of the scientific community, and there must be a public role in the preservation of this aspect of our culture. Moreover, reliance on private resources comes with venerability to changing missions of or solvency of these private and/or corporate partners. The principle of Open Access recognizes that such collections should be considered valuable stewards of our culture just as the Library of Congress and the National Archives. Rewards for the availability of scientific compendia — papers, data, and code — come not only through views and downloads, but through the acceleration of scientific research, technological development, and an increase in scientific integrity.
Possible roles for the federal government include:
* facilitating and supporting an open an sustainable database comparable to the PDB for research compendia (manuscripts, data, and code)
* encouraging funding agencies to draft clear statements encouraging reproducibility (e.g., distribution of compendia) and public access to research results (e.g., submission to open access journals or arxiv.org)
* clarification of the relationship between copyright and open access (a topic currently under debate in the form of competing proposed congressional bills, cf.
http://www.taxpayeraccess.org/issues/access/access_supporters/ for background)
* clarification of the relationship between broad impact of publicly-funded research (and public access to the output of this federal support) versus university-specific IP policies (e.g., governing code and data even where generated by publicly-funded research), which often act as a disincentive to sharing the results of federally-funded research.
Yale Law School, New Haven, CT
Science Commons, Cambridge, MA
Columbia University, New York, NY
Matthew G. Knepley
University of Chicago, Chicago, IL
References These issues were discussed at a roundtable convened by one of the authors on research sharing issues held at Yale Law School on November 21, 2009. The webpage, along with thought pieces and research materials, is located at http://www.stanford.edu/~vcs/Conferences/RoundtableNov212009/.
Crossposted to the OSTP blog.