Monthly Archive for September, 2009

Optimal Information Disclosure Levels: Data.gov and "Taleb's Criticism"

I was listening to the audio recording of last Friday’s “Scientific Data for Evidence Based Policy and Decision Making” symposium at the National Academies, and was struck by the earnest effort on the part of members of the Whitehouse to release governmental data to the public. Beth Noveck, Obama’s Deputy Chief Technology Officer for Open Government, frames the effort with a slogan, “Transparency, Participation, and Collaboration.” A plan is being developed by the Whitehouse in collaboration with the OMB to implement these three principles via a “massive release of data in open, downloadable, accessible for machine readable formats, across all agencies, not only in the Whitehouse,” says Beth. “At the heart of this commitment to transparency is a commitment to open data and open information..”

Vivek Kundra, Chief Information Officer in the Whitehouse’s Open Government Initiative, was even more explicit – saying that “the dream here is that you have a grad student, sifting through these datasets at 3 in the morning, who finds, at the intersection of multiple datasets, insight that we may not have seen, or developed a solution that we may not have thought of.”

This is an extraordinary vision. This discussion comes hot on the heels of a debate in Congress regarding the level of information they are willing to release to the public in advance of voting on a bill. Last Wednesday CBS reports, with regard to the health care bill, that “[t]he Senate Finance Committee considered for two hours today a Republican amendment — which was ultimately rejected — that would have required the “legislative” language of the committee’s final bill, along with a cost estimate for the bill, to be posted online for 72 hours before the committee voted on it. Instead, the committee passed a similar amendment, offered by Committee Chair Max Baucus (D-Mont.), to put online the “conceptual” or “plain” language of the bill, along with the cost estimate.” What is remarkable is the sense this gives that somehow the public won’t understand the raw text of the bill (I noticed no compromise position offered that would make both versions available, which seems an obvious solution).

The Whitehouse’s efforts have the potential to test this hypothesis: if given more information will people pull things out of context and promulgate misinformation? The Whitehouse is betting that they won’t, and Kundra does state the Whitehouse is accompanying dataset release with efforts to provide contextual meta-data for each dataset while safeguarding national security and individual privacy rights.

This sense of limits in openness isn’t unique to governmental issues and in my research on data and code sharing among scientists I’ve termed the concern “Taleb’s crticism.” In a 2008 essay on The Edge website, Taleb worries about the dangers that can result from people using statistical methodology without having a clear understanding of the techniques. An example of concern about Taleb’s Criticism appeared on UCSF’s EVA website, a repository of programs for automatic protein structure prediction. The UCSF researchers won’t release their code publicly because, as stated on their website, “We are seriously concerned about the ‘negative’ aspect of the freedom of the Web being that any newcomer can spend a day and hack out a program that predicts 3D structure, put it on the web, and it will be used.” Like the congressmen seemed to fear, for these folks openness is scary because people may misuse the information.

It could be argued, and for scientific research should be argued, that an open dialog of an idea’s merits is preferable to no dialog at all, and misinformation can be countered and exposed. Justice Brandeis famously elucidated this point in Whitney v. California (1927), writing that “If there be time to expose through discussion the falsehood and fallacies, to avert the evil by the processes of education, the remedy to be applied is more speech, not enforced silence.” Data.gov is an experiment in context and may bolster trust in the public release of complex information. Speaking of the Data.gov project, Noveck explained that “the notion of making complex information more accessible to people and to make greater sense of that complex information was really at the heart.” This is a very bold move and it will be fascinating to see the outcome.

Crossposted on Yale Law School’s Information Society Project blog.