The OSTP's call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government

The following comments were posted in response to the OSTP’s call as posted here: http://www.ostp.gov/galleries/default-file/RFI%20Final%20for%20FR.pdf:

Open access to our body of federally funded research, including not only published papers but also any supporting data and code, is imperative, not just for scientific progress but for the integrity of the research itself. We list below nine focus areas and recommendations for action.

[1] Each Funding Agency Must Address Open Access: The disparate nature of research discourages the use of blanket mandates in favor of an approach, at least initially, tailored to the research environment at the level of the funding agency. For example, the initiative shown by the National Institutes for Health regarding Open Access derives from the established norms of openness emerging from the Human Genome Project which may not be directly applicable to each agency. Awards from agencies may be currently subject to data sharing agreements that must be reconciled with Open Access. We recommend advising the funding agencies to develop plans to implement Open Access within a six month time frame before turning to the powers vested in the Executive Branch. We discuss issues for consideration in the enactment of an Open Access policy for federally funded research.

[2] Public Access to Federally Funded Research: It is imperative to provide public access to tax-payer funded scientific output, not only the final published paper but also the supporting data and code necessary for the reproducibility and skepticism fundamental to scientific communication and progress.

[3] Exceptions to Open Access: These must be minimized. The goal of transparency in research must accommodate exceptions, such as research used for national security purposes or those with privacy or confidentiality concerns. Research relevant to national security interests falls outside the mandate of these recommendations. Confidentiality must be circumscribed to apply to data with individual subjects for which anonymization techniques are ineffective.

[4] Timeliness and Embargo Periods: Funding agencies should require the deposit of agency-funded final peer-reviewed manuscripts. The NIH requires that papers that arise from NIH funds comply with their public access policy: final peer-reviewed journal manuscripts are submitted to PubMed Central upon acceptance for publication, and become accessible to the public no longer than 12 months after publication. Ideally the closer the research is made public to the date of publication the better, but 12 months should be the maximum embargo period for federally funded research.

[5] Digital Archiving: Careful consideration must be given to the locus of the digital archiving. The creation of agency-specific repositories does not facilitate interdisciplinary communication and thwarts scripted search and API usage; a national research repository should be established to house released agency funded manuscripts including supporting digital materials such and data and code, and provide links to research housed elsewhere. Many institutions do not have repositories, nor do they have the resources to maintain them. For computational work, supporting data and code must accompany article release creating additional demands on a repository. For papers whose results can be replicated from short scripts and small datasets, many computational scientists who do engage in reproducible research are able to host their research compendia (paper, data, and code) on their institutional webpages or using hosting resources their institution is willing to provide. These individual contributions, however, may not conform to standardized formats that facilitate scripted search, and nor display transparent versioning and crucial time-stamping of edits and revisions, and may not be labeled with unique object identifiers as required by the NIH Open Access policy. These desiderata could be implemented in a straightforward manner by a neutral third-party site such as one coordinated among multiple funding agencies. Not all computational research involves small amounts of supplemental data and code and an inter-agency repository could host very large datasets or complex bodies of code in cases where institutional support is not available to the researcher. Such a repository could extend the capabilities of arxiv.org or PubMed Central for all federally funded research (data, code, and manuscripts; perhaps renaming PubMed Central the more representative “PubSci” or “PubCentral”).

[6] Copyright and Ownership Issues: The NIH further requires that copyright be lawfully addressed. Many journals require authors to assign copyright to the journal as a condition of publication, but will allow an earlier version to be posted publicly. The NIH has made publication in journals that permit the article — or a version thereof — to be posted in PubMed Central a requirement for funding; this strategy is an option for all funding agencies to consider, as well as a generalization to include data and code deposit (for computational research).

Current complex ownership issues must be clarified between the public, the researcher, the institution at which the researcher works, and publishing entities. OSTP’s current RFI could be viewed as a step in untangling ownership in favor of the taxpayer. Since the passage of the Bayh-Dole Act in 1980, universities have taken a strong interest in maintaining a proprietary interest in research produced at their institutions. Patenting and other forms of intellectual property limit the ability of other researchers to reuse and build upon the research, and thus work against scientific norms and hinder scientific progress.

[7] Incentives to Open Science — Citation and Future Grants: The final requirement the NIH makes of grant recipients is use of the PubMed Central identifier at the end of citations. Encouraging the use of unique identifiers of papers, as well as of data and code, can encourage the release and hence citation of all forms of computational research. Such a unique identifier would indicate compliance with agency open access policies.

Tagging of research compendia is an important issue for communicating work, facilitating topical web searches, and aggregating a researcher’s contributions, including their data and code. Development of a standard RDFa vocabulary for HTML tags for agency funded research would enable search for data, code, and research as well as facilitating the transmission of licensing information, authorship, and sources. Enabling search by author would allow a more granular understanding of a researcher’s contributions, beyond citations. This would provide an incentive to release data and code, and give others — such as funders, award committees, and university hiring and promotion committees — access to a more representative assessment of the researcher’s contributions to the community than mere publication-counting. Such a tagging vocabulary could include unique identifiers for data and code, ideally the same as those required for repository deposit as discussed in the previous section, and thus facilitate and encourage their citation.

It is important that these requirements be tied to grant funding and a mechanism established that allows compliance to be reflected in future grant determinations. Strategies for release of data and code arising from a particular grant should be subject to peer review in the grant evaluation process.

[8] Posted Guidelines and Recommended Best Practices: A “best practices” document should be publicly available at a stable URL, be updated with versions, and provide clarity regarding the above issues, either at the agency level or at the OSTP. It should be framed to suggest ideal recommendations, rather than list a series of requirements. Some points such a document may wish to address follow.

Reproducibility is a goal of computational science, and practicing reproducible research means:
* Uploading the final peer-reviewed journal manuscripts that arise from federally funded research to a digital archive upon acceptance of publication,
* Making the data and code required to reproduce results from federally funded works publicly available online upon acceptance of publication,
* Utilizing appropriate licensing structures for federally funded research, such as the Reproducible Research Standard (see IJCLP Webdoc 1-13-2009 at http://www.ijclp.net/issue_13.html ),
* Utilizing tagging structures for agency funded compendia release, as part of inclusion in repositories or posting in institutional repositories, in order to facilitate search of research results.

[9] References: These issues were discussed at a roundtable on research sharing issues held at Yale Law School on November 21. The webpage, along with thought pieces and research materials on the subject, is located at http://www.stanford.edu/~vcs/Conferences/RoundtableNov212209/ . A possibly useful reference discussing the communication of research and scientific progress, “Reproducible Research in Computational Harmonic Analysis” is available at http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.15 .

Victoria Stodden
Yale Law School, New Haven, CT, 06511
Science Commons, Cambridge, MA 02138
http://www.stanford.edu/~vcs

Chris Wiggins
Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY
http://www.columbia.edu/~chw2/

Cross-posted on http://blog.ostp.gov/2009/12/10/policy-forum-on-public-access-to-federally-funded-research-implementation/.