Archive for the 'Law' Category

My input for the OSTP RFI on reproducibility

Until Sept 23 2014, the US Office of Science and Technology Policy in the Whitehouse was accepting comments on their “Strategy for American Innovation.” My submitted comments on one part of that RFI, section 11:

“11) Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the Federal Government leverage its role as a significant funder of scientific research to most effectively address the problem?”

follow (corrected for typos).

This comment is directed at point 11, requesting comments on the reproducibility of scientific findings. I believe there are two threads to this issue: a traditional problem that has been in science for hundreds of years whose traditional solution has been the methods section in the scientific publication; secondly, a new issue that has arisen over the last twenty years as computation has assumed a central role in scientific research. This new element is not yet accommodated in scientific publication, and introduces serious consequences for reproducibility.

Putting aside the first issue of traditional reproducibility, for which longstanding solutions exist, I encourage the federal government, in concert with the scientific community, to consider how the current set of laws and funding agency practices do not support the production of reproducible computational science.

In all research that utilizes a computer, instructions for the research are stored in software and scientific data are stored digitally. A typical publication in computational research is based foundationally on data, and the computer instructions applied to the data that generated the scientific findings. The complexity of the data generation mechanism and the computational instructions is typically very large, too large to capture in a traditional scientific publication. Hence when computers are involved in the research process, scientific publication must shift from a scientific article to the triple of scientific paper, and the software and data from which the findings were generated. This triple has been referred to as a “research compendia” and its aim is to transmit research findings that others in the field will be able to reproduce by running the software on the data. Hence, data and software that permits others to reproducible the findings must be made available.

There are two primary laws come to bear on this idea of computational reproducibility. The first is copyright law, which adheres to software and to some degree to data. Software and data from scientific research should not receive the same legal protection as most original artistic works receive from copyright law. These objects should be made openly available by default (rather than closed by copyright law by default) with attribution for the creators.

Secondly, the Bayh-Dole Act from 1980 is having the effect of creating less transparency and less knowledge and technology transfer due to the use of the computer in scientific research. Bayh-Dole charges the institutions that support research, such as universities, to use the patent system for inventions that derive under its auspices. Since software may be patentable, this introduces a barrier to knowledge transfer and reproducibility. A research compendia would include code and would be made openly available, where as Bayh-Dole adds an incentive to create a barrier by introducing the option to patent software. Rather than openly available software, a request to license patented software would need to submitted to the University and appropriate rates negotiated. For the scientific community, this is equivalent to closed unusable code.

I encourage you to rethink the legal environment that attends to the digital objects produced by scientific research in support of research findings: the software; the data; and the digital article. Science, as a rule, demands that these be made openly available to society (as do scientists) and unfortunately they are frequently captured by external third parties, using copyright transfer and patents, that restrict access to knowledge and information that has arisen from federal funding. This retards American innovation and competitiveness.

Federal funding agencies and other government entities must financially support the sharing, access, and long term archiving of research data and code that supports published results. With guiding principles from the federal government, scientific communities should implement infrastructure solutions that support openly available reproducible computational research. There are best practices in most communities regarding data and code release for reproducibility. Federal action is needed since the scientific community faces a collection action problem: producing research compendia, as opposed to a published article alone, is historically unrewarded. In order to change this practice, the scientific community must move in concert. The levers exerted by the federal funding agencies are key to breaking this collective action problem.

Finally, I suggest a different wording for point 11 in your request. Scientific findings are not the level at which to think about reproducibility, it is better to think about enabling the replication of the research process that is associated with published results, rather than the findings themselves. This is what provides for research that is reproducible and reliable. When different processes are compared, whether or not they produce the same result, the availability of code and data will enable the reconciliation of differences in methods. Open data and code permit reproducibility in this sense and increase the reliability of the scholarly record by permitting error detection and correction.

I have written extensively on all these issues. I encourage you to look at, especially the papers and talks.

Changes in the Research Process Must Come From the Scientific Community, not Federal Regulation

I wrote this piece as an invited policy article for a major journal but they declined to publish it. It’s still very much a draft and they made some suggestions, but since realistically I won’t be able to get back to this for a while and the text is becoming increasingly dated, I thought I would post it here. Enjoy!

Recent U.S. policy changes are mandating a particular vision of scientific communication: public access to data and publications for federally funded research. On February 22, 2013, the Office of Science and Technology Policy (OSTP) in the Whitehouse released an executive memorandum instructing the major federal funding agencies to develop plans to make both the datasets and research articles resulting from their grants publicly available [1]. On March 5, the House Science, Space, and Technology subcommittee convened a hearing on Scientific Integrity & Transparency and on May 9, President Obama issued an executive order requiring government data to be made openly available to the public [2].

Many in the scientific community have demanded increased data and code disclosure in scholarly dissemination to address issues of reproducibility and credibility in computational science [3-19]. At first blush, the federal policies changes appear to support these scientific goals, but the scope of government action is limited in ways that impair its ability to respond directly to these concerns. The scientific community cannot rely on federal policy to bring about changes that enable reproducible computational research. These recent policy changes must be a catalyst for a well-considered update in research dissemination standards by the scientific community: computational science must move to publication standards that include the digital data and code sufficient to permit others in the field to replicate and verify the results. Authors and journals must be ready to use existing repositories and infrastructure to ensure the communication of reproducible computational discoveries.
Continue reading ‘Changes in the Research Process Must Come From the Scientific Community, not Federal Regulation’

Data access going the way of journal article access? Insist on open data

The discussion around open access to published scientific results, the Open Access movement, is well known. The primary cause of the current situation — journal publishers owning copyright on journal articles and therefore charging for access — stems from authors signing their copyright over to the journals. I believe this happened because authors really didn’t realize what they were doing when they signed away ownership over their work, and had they known they would not have done so. I believe another solution would have been used, such as granting the journal a license to publish i.e. like Science’s readily available alternative license. At some level authors were entering into binding legal contracts without an understanding of the implications and without the right counsel.

I am seeing a similar situation arising with respect to data. It is not atypical for a data producing entity, particularly those in the commercial sphere, to require that researchers with access to the data sign a non-disclosure agreement. This seems to be standard for Facebook data, Elsevier data, and many many others. I’m witnessing researchers grabbing their pens and signing, and like in the publication context, feeling themselves powerless to do otherwise. Again, they are without the appropriate counsel. Even the general counsel’s office at their institution typically sees the GC’s role as protecting the institution against liability, rather than the larger concern of protecting the scholar’s work and the integrity of the scholarly record. What happens when research from these protected datasets is published, and questioned? How can others independently verify the findings? They’ll need access to the data.

There are many legitimate reasons such data may not be able to be publicly released, for example protection of subjects’ privacy (see what happened when Harvard released Facebook data from a study). But as scientists we should be mindful of the need for our published findings to be reproducible. Some commercial data do not come with privacy concerns, only concerns from the company that they are still able to sell the data to other commercial entities, and sometimes not even that. Sometimes lawyers simply want an NDA to minimize any risk to the commercial entity that might arise should the data be released. To me, that seems perfectly rational since they are not stewards of scientific knowledge.

It is also perfectly rational for authors publishing findings based on these data to push back as hard as possible to ensure maximum reproducibility and credibility of their results. Many companies share data with scientists because they seek to deepen goodwill and ties with the academic community, or they are interested in the results of the research. As researchers we should condition our acceptance of the data on its release when the findings are published, if there are no privacy concerns associated with the data. If there are privacy concerns I can imagine ensuring we can share the data in a “walled garden” within which other researchers, but not the public, will be able to access the data and verify results. There are a number of solutions that can bridge the gap between open access to data and an access-blocking NDA (e.g. differential privacy) and as scientists the integrity and reproducibility of our work is a core concern that we have responsibility for in this negotiation for data.

A few template data sharing agreements between academic researchers and data producing companies would be very helpful, if anyone feels like taking a crack at drafting them (Creative Commons?). Awareness of the issue is also important, among researchers, publishers, funders, and data producing entities. We cannot unthinkingly default to a legal situation regarding data that is anathema to scientific progress, as we did with access to scholarly publications.

Regulatory steps toward open science and reproducibility: we need a science cloud

This past January Obama signed the America COMPETES Re-authorization Act. It contains two interesting sections that advance the notions of open data and the federal role in supporting online access to scientific archives: 103 and 104, which read in part:

• § 103: “The Director [of the Office of Science and Technology Policy at the Whitehouse] shall establish a working group under the National Science and Technology Council with the responsibility to coordinate Federal science agency research and policies related to the dissemination and long-term stewardship of the results of unclassified research, including digital data and peer-reviewed scholarly publications, supported wholly, or in part, by funding from the Federal science agencies.” (emphasis added)

This is a cause for celebration insofar as Congress has recognized that published articles are an incomplete communication of computational scientific knowledge, and the data (and code) must be included as well.

• § 104: Federal Scientific Collections: The Office of Science and Technology Policy “shall develop policies for the management and use of Federal scientific collections to improve the quality, organization, access, including online access, and long-term preservation of such collections for the benefit of the scientific enterprise.” (emphasis added)

I was very happy to see the importance of online access recognized, and hopefully this will include the data and code that underlies published computational results.

One step further in each of these directions: mention code explicitly and create a federally funded cloud not only for data but linked to code and computational results to enable reproducibility.

Post 3: The OSTP’s call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government

The following comments were posted in response to the OSTP’s call as posted here: The first wave, comments posted here, asked for feedback on implementation issues. The second wave requested input on Features and Technology (our post is here). For the third and final wave on Management, Chris Wiggins, Matt Knepley, and I posted the following comments:

Q1: Compliance. What features does a public access policy need to ensure compliance? Should this vary across agencies?

One size does not fit all research problems across all research communities, and a heavy-handed general release requirement across agencies could result in de jure compliance – release of data and code as per the letter of the law – without the extra effort necessary to create usable data and code facilitating reproducibility (and extension) of the results. One solution to this barrier would be to require grant applicants to formulate plans for release of the code and data generated through their research proposal, if funded. This creates a natural mechanism by which grantees (and peer reviewers), who best know their own research environments and community norms, contribute complete strategies for release. This would allow federal funding agencies to gather data on needs for release (repositories, further support, etc.); understand which research problem characteristics engender which particular solutions, which solutions are most appropriate in which settings, and uncover as-yet unrecognized problems particular researchers may encounter. These data would permit federal funding agencies to craft release requirements that are more sensitive to barriers researchers face and the demands of their particular research problems, and implement strategies for enforcement of these requirements. This approach also permits researchers to address confidentiality and privacy issues associated with their research.


One exemplary precedent by a UK funding agency is the January 2007 “Policy on data management and sharing”
adopted by The Wellcome Trust ( according to which “the Trust will require that the applicants provide a data management and sharing plan as part of their application; and review these data management and sharing plans, including any costs involved in delivering them, as an integral part of the funding decision.” A comparable policy statement by US agencies would be quite useful in clarifying OSTP’s intent regarding the relationship between publicly-supported research and public access to the research products generated by this support.

Continue reading ‘Post 3: The OSTP’s call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government’

Post 2: The OSTP’s call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government

The following comments were posted in response to the second wave of the OSTP’s call as posted here: The first wave, comments posted here and on the OSTP site here (scroll to the second last comment), asked for feedback on implementation issues. The second wave requests input on Features and Technology and Chris Wiggins and I posted the following comments:

We address each of the questions for phase two of OSTP’s forum on public access in turn. The answers generally depend on the community involved and (particularly question 7, asking for a cost estimate) on the scale of implementation. Inter-agency coordination is crucial however in (i) providing a centralized repository to access agency-funded research output and (ii) encouraging and/or providing a standardized tagging vocabulary and structure (as discussed further below).

Continue reading ‘Post 2: The OSTP’s call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government’

The OSTP's call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government

The following comments were posted in response to the OSTP’s call as posted here:

Open access to our body of federally funded research, including not only published papers but also any supporting data and code, is imperative, not just for scientific progress but for the integrity of the research itself. We list below nine focus areas and recommendations for action.

Continue reading ‘The OSTP's call for comments regarding Public Access Policies for Science and Technology Funding Agencies Across the Federal Government’

The Climate Modeling Leak: Code and Data Generating Published Results Must be Open and Facilitate Reproducibility

On November 20 documents including email and code spanning more than a decade were leaked from the Computing Climatic Research Unit (CRU) at East Anglia University in the UK.

The Leak Reveals a Failure of Reproducibility of Computational Results

It appears as though the leak came about through a long battle to get the CRU scientists to reveal the code and data associated with published results, and highlights a crack in the scientific method as practiced in computational science. Publishing standards have not yet adapted to the relatively new computational methods used pervasively across scientific research today.

Other branches of science have long-established methods to bring reproducibility into their practice. Deductive or mathematical results are published only with proofs, and there are long established standards for an acceptable proof. Empirical science contains clear mechanisms for communication of methods with the goal of facilitation of replication. Computational methods are a relatively new addition to a scientist’s toolkit, and the scientific community is only just establishing similar standards for verification and reproducibility in this new context. Peer review and journal publishing have generally not yet adapted to the use of computational methods and still operate as suitable for the deductive or empirical branches, creating a growing credibility gap in computational science.

The key point emerging from the leak of the CRU docs is that without the code and data it is all but impossible to tell whether the research is right or wrong, and this community’s lack of awareness of reproducibility and blustery demeanor does not inspire confidence in their production of reliable knowledge. This leak and the ensuing embarrassment would not have happened if code and data that permit reproducibility had been released alongside the published results. When mature, computational science will produce routinely verifiable results.

Verifying Computational Results without Clear Communication of the Steps Taken is Near-Impossible

The frequent near-impossibility of verification of computational results when reproducibility is not considered a research goal is shown by the miserable travails of “Harry,” a CRU employee with access to their system who was trying to reproduce the temperature results. The leaked documents contain logs of his unsuccessful attempts. It seems reasonable to conclude that CRU’s published results aren’t reproducible if Harry, an insider, was unable to do so after four years.

This example also illustrates why a decision to leave reproducibility to others, beyond a cursory description of methods in the published text, is wholly inadequate for computational science. Harry seems to have had access to the data and code used and he couldn’t replicate the results. The merging and preprocessing of data in preparation for modeling and estimation encompasses a potentially very large number of steps, and a change in any one could produce different results. Just as when fitting models or running simulations, parameter settings and function invocation sequences must be communicated, again because the final results are a culmination of many decisions and without this information each small step must match the original work – a Herculean task. Responding with raw data when questioned about computational results is merely a canard, not intended to seriously facilitate reproducibility.

The story of Penn State professor of meteorology Michael Mann‘s famous hockey stick temperature time series estimates is an example where lack of verifiability had important consequences. In February 2005 two panels examined the integrity of his work and debunked the results, largely from work done by Peter Bloomfield, a statistics professor at North Carolina State University, and Ed Wegman, statistics professor at George Mason University. (See also this site for further explanation of statistical errors.) Release of the code and data used to generate the results in the hockey stick paper likely would have caught the errors earlier, avoided the convening of the panels to assess the papers, and prevented the widespread promulgation of incorrect science. The hockey stick is a dramatic illustration of global warming and became something of a logo for the U.N.’s Intergovernmental Panel of Climate Change (IPCC). Mann was an author of the 2001 IPCC Assessment report, and was a lead author on the “Copenhagen Diagnosis,” a report released Nov 24 and intended to synthesize the hundreds of research papers about human-induced climate change that have been published since the last assessment by the IPCC two years ago. The report was prepared in advance of the Copenhagen climate summit scheduled for Dec 7-18. Emails between CRU researchers and Mann are included in the leak, which happened right before the release of the Copenhagen Diagnosis (a quick search of the leaked emails for “Mann” provided 489 matches).

These reports are important in part because of their impact on policy, as CBS news reports, “In global warming circles, the CRU wields outsize influence: it claims the world’s largest temperature data set, and its work and mathematical models were incorporated into the United Nations Intergovernmental Panel on Climate Change’s 2007 report. That report, in turn, is what the Environmental Protection Agency acknowledged it “relies on most heavily” when concluding that carbon dioxide emissions endanger public health and should be regulated.”

Discussions of Appropriate Level of Code and Data Disclosure on, Before and After the CRU Leak

For years researchers had requested the data and programs used to produce Mann’s Hockey Stick result, and were resisted. The repeated requests for code and data culminated in Freedom of Information (FOI) requests, in particular those made by Willis Eschenbach, who tells his story of requests he made for underlying code and data up until the time of the leak. It appears that a file,, was placed on CRU’s FTP server and then comments alerting people to its existence were posted on several key blogs.

The thinking regarding disclosure of code and data in one part of the climate change community is illustrated in this fascinating discussion on the blog in February. (Thank you to Michael Nielsen for the pointer.) has 5 primary authors, one of whom is Michael Mann, and its primary author is Gavin Schmidt who was described earlier this year as a “computer jockeys for Nasa’s James Hansen, the world’s loudest climate alarmist.” In this RealClimate blog post from November 27, Where’s the Data, the position seems to be now very much all in favor of data release, but the first comment asks for the steps taken in reconstructing the results as well. This is right – reproducibility of results should be the concern but does not yet appear to be taken seriously (as also argued here).

Policy and Public Relations

The Hill‘s Blog Briefing Room reported that Senator Inhofe (R-Okla.) will investigate whether the IPCC “cooked the science to make this thing look as if the science was settled, when all the time of course we knew it was not.” With the current emphasis on evidence-based policy making, Inhofe’s review should recommend code and data release and require reliance on verified scientific results in policy making. The Federal Research Public Access Act should be modified to include reproducibility in publicly funded research.

A dangerous ramification from the leak could be an undermining of public confidence in science and the conduct of scientists. My sense is that had this climate modeling community made its code and data readily available in a way that facilitated reproducibility of results, not only would they have avoided this embarrassment but the discourse would have been about scientific methods and results rather than potential evasions of FOIA requests, whether or not data were fudged, or scientists acted improperly in squelching dissent or manipulating journal editorial boards. Perhaps data release is becoming an accepted norm, but code release for reproducibility must follow. The issue here is verification and reproducibility, without which it is all but impossible to tell whether the core science done at CRU was correct or not, even for peer reviewing scientists.

My Interview with ITConversations on Reproducible Research

On September 30, I was interviewed by Jon Udell from in his Interviews with Innovators series, on Reproducibility of Computational Science.

Here’s the blurb: “If you’re a writer, a musician, or an artist, you can use Creative Commons licenses to share your digital works. But how can scientists license their work for sharing? In this conversation, Victoria Stodden — a fellow with Science Commons — explains to host Jon Udell why scientific output is different and how Science Commons aims to help scientists share it freely.”

Optimal Information Disclosure Levels: and "Taleb's Criticism"

I was listening to the audio recording of last Friday’s “Scientific Data for Evidence Based Policy and Decision Making” symposium at the National Academies, and was struck by the earnest effort on the part of members of the Whitehouse to release governmental data to the public. Beth Noveck, Obama’s Deputy Chief Technology Officer for Open Government, frames the effort with a slogan, “Transparency, Participation, and Collaboration.” A plan is being developed by the Whitehouse in collaboration with the OMB to implement these three principles via a “massive release of data in open, downloadable, accessible for machine readable formats, across all agencies, not only in the Whitehouse,” says Beth. “At the heart of this commitment to transparency is a commitment to open data and open information..”

Vivek Kundra, Chief Information Officer in the Whitehouse’s Open Government Initiative, was even more explicit – saying that “the dream here is that you have a grad student, sifting through these datasets at 3 in the morning, who finds, at the intersection of multiple datasets, insight that we may not have seen, or developed a solution that we may not have thought of.”

This is an extraordinary vision. This discussion comes hot on the heels of a debate in Congress regarding the level of information they are willing to release to the public in advance of voting on a bill. Last Wednesday CBS reports, with regard to the health care bill, that “[t]he Senate Finance Committee considered for two hours today a Republican amendment — which was ultimately rejected — that would have required the “legislative” language of the committee’s final bill, along with a cost estimate for the bill, to be posted online for 72 hours before the committee voted on it. Instead, the committee passed a similar amendment, offered by Committee Chair Max Baucus (D-Mont.), to put online the “conceptual” or “plain” language of the bill, along with the cost estimate.” What is remarkable is the sense this gives that somehow the public won’t understand the raw text of the bill (I noticed no compromise position offered that would make both versions available, which seems an obvious solution).

The Whitehouse’s efforts have the potential to test this hypothesis: if given more information will people pull things out of context and promulgate misinformation? The Whitehouse is betting that they won’t, and Kundra does state the Whitehouse is accompanying dataset release with efforts to provide contextual meta-data for each dataset while safeguarding national security and individual privacy rights.

This sense of limits in openness isn’t unique to governmental issues and in my research on data and code sharing among scientists I’ve termed the concern “Taleb’s crticism.” In a 2008 essay on The Edge website, Taleb worries about the dangers that can result from people using statistical methodology without having a clear understanding of the techniques. An example of concern about Taleb’s Criticism appeared on UCSF’s EVA website, a repository of programs for automatic protein structure prediction. The UCSF researchers won’t release their code publicly because, as stated on their website, “We are seriously concerned about the ‘negative’ aspect of the freedom of the Web being that any newcomer can spend a day and hack out a program that predicts 3D structure, put it on the web, and it will be used.” Like the congressmen seemed to fear, for these folks openness is scary because people may misuse the information.

It could be argued, and for scientific research should be argued, that an open dialog of an idea’s merits is preferable to no dialog at all, and misinformation can be countered and exposed. Justice Brandeis famously elucidated this point in Whitney v. California (1927), writing that “If there be time to expose through discussion the falsehood and fallacies, to avert the evil by the processes of education, the remedy to be applied is more speech, not enforced silence.” is an experiment in context and may bolster trust in the public release of complex information. Speaking of the project, Noveck explained that “the notion of making complex information more accessible to people and to make greater sense of that complex information was really at the heart.” This is a very bold move and it will be fascinating to see the outcome.

Crossposted on Yale Law School’s Information Society Project blog.

Sunstein speaks on extremism

Cass Sunstein, Professor at Harvard Law School, is speaking today on Extremism: Politics and Law. Related to this topic, he is the author of Nudge, 2.0, and Infotopia. He discussed Republic 2.0 with Henry Farrell on this diavlog, which touches on the theme of extremism in discourse and the web’s role is facilitating polarization of political views (notably, Farrell gives a good counterfactual to Sunstein’s claims, and Sunstein ends up agreeing with him).

Sunstein is in the midst of writing a new book on extremism and this talk is a teaser. He gives us a quote from Churchill: “Fanatics are people who can’t change their minds and will not change the subject.” Political scientist Hardin says he agrees with the first clause epistemologically but the second clause is wrong because they *cannot* change the subject. Sunstein says extremism in multiple domains (The Whitehouse, company boards, unions) results from group polarization.

He thinks the concept of group polization should replace the notion of group think in all fields. Group Polarization involves both information exchange and reputation. His thesis is that like-minded people talking with other like-minded people tend to move to more extreme positions upon disucssion – partly because of the new information and partly because of the pressure from peer viewpoints.

His empirical work on this began with his Colorado study. He and his coauthors recorded the private views on 3 issues (climate change, same sex marriage and race conscious affirmative action) for citizens in Boulder and for citizens in Colorado Springs. Boulder is liberal so they screened people to ensure liberalness: if they liked Cheney they were excused from the test. They asked the same Cheney question in Colorado Springs and if they didn’t like him they were excused. Then he interviewed them to determine their private view after deliberation, and well as having come to a group consensus.
Continue reading ‘Sunstein speaks on extremism’

Benkler: We are collaborators, not knaves

Yochai Benkler gave a talk today in reception of his appointment as the Jack N. and Lillian R. Berkman Professor of Entrepreneurial Legal Studies at Harvard Law School. Jack Berkman (now deceased) is the father of Myles Berkman, whose family endowed both the Berkman Center (where I am a fellow) and Benkler’s professorial chair.

His talk was titled “After Selfishness: Wikipedia 1, Hobbes 0 at Half Time” and he sets out to show that there is a sea change happening in the study of organizational systems that far better reflects how we actually interact, organize, and operate. He explains that the collaborative movements we generally characterize as belonging to the new internet age (free and open source software, wikipedia) are really just the instantiation of a wider and pervasive, in fact completely natural and longstanding, phenomena in human life.

This is due to how we can organize capital in the information and networked society: We own the core physical means of production as well as knowledge, insight, and creativity. Now we’re seeing longstanding society practices, such as non-hierarchical norm generation and collaboration more from the periphery of society to the center of our productive enterprises. Benkler’s key point in this talk is that this shift is not limited to Internet-based environments, but part of a broader change happening across society.

So how to we get people to produce good stuff? money? prizes? competition? Benkler notes the example of – contributors are not paid yet the community thrives. Benkler hypothesizes that the key is that people feel secure in their involvement with the community: not paying but creating a context where people feel secure in their collaboration in a system. Another example is Benkler attributes their success to ways they have found to assure contributors that when you produce you will be able to control what you produce. Cash doesn’t change hands. The challenge is to learn about human collaboration in general from these web-based examples. In Benkler’s words “replacing Leviathan with a collaborative system.”

Examples outside the web-based world include GM’s experience with it’s Fremont plant. This plant was among the worst performance in the company. GM shut it down for two years and brought it back 85% staffed by the previous workforce, the same union, but reorganized collaboratively to align incentives. This means there are no longer process engineers on the shop floor and direct control over experimentation and flow at the team level is gone. The plant did so well it forced the big three to copy although they did so in less purely collaborative ways, such as retaining competitive bidding. Benkler’s point is that there is a need for long term relationships based on trust. An emphasis on norms and trust, along with greater teamwork and autonomy for workers implies a more complex system with less perfect control than Hobbes’ Leviathan vision. The world changes too quickly for the old encumbered hierarchical model of economic production.

Benkler thinks this leads us to study social dynamics, an open field without many answers yet. He also relates this work to evolutionary biology: from group selection theory of the 50’s to the individualistic conception in Dawkins’ theory of the selfish gene in the 70’s, and now to multi-level selection and cooperation as a distinct driving force in evolution as opposed to the other way around. This opens a vein of research in empirical deviations from selfishness, as a pillar of homo economicus, just as Kahneman and Tversky challenged the twin pillar of rationality.

Benkler’s vision is to move away from the simple rigid hierarchical models toward ones that are richer and more complex and can capture more of our actual behavior, while still being tractable enough to produce predictions and a larger understanding of our world.

Justice Scalia: Populist

Justice Scalia (HLS 1960) is speaking at the inaugural Herbert W. Vaughan Lecture today at Harvard Law School. It’s packed – I arrived at 4pm for the 4:30 talk and joined the end of a long line…. then was immediately told the auditorium was full and was relegated to an overflow room with video. I’m lucky to have been early enough to even see it live.

The topic of the talk hasn’t been announced and we’re all waiting with palpable anticipation in the air. The din is deafening.

Scalia takes the podium. The title of his talk is “Methodology of Originalism.”

His subject is the intersection of constitutional law and history. He notes that the orthodox view of constitutional interpretation, up to the time of the Warren Court, was that the constitution is no different from any other legal text. That is, it bears a static meaning that doesn’t change from generation to generation, although it gets applied to new situations. The application to pre-existing phenomena doesn’t change over time, but these applications do provide the data upon which to decide the cases on the new phenomena.

Things changed when the Warren court permitted in New York Times Co. v. Sullivan 376 U.S. 254 (1964) that good faith libel of public figures was good for democracy. Scalia says this might be so but that change should be made by statute and not by the court. He argues this is respectful of the democratic system in that the laws are reflections of people’s votes. This is the first, and perhaps the best known, of two ways Scalia comes across as populist in this talk. In a question at the end he says that the whole theory of democracy is that a justice is not supposed to be writing a constitution but just reflecting what the american people have decided. If you believe in democracy, he explains, you believe in majority rules. In liberal democracies like ours we have made exceptions and given protection to certain minorites such as religious or political minorities. But his key point is that the people made these exceptions, ie. they were adopted in a democratic fashion.

But doesn’t originalism require you to know the original meaning of a document? and isn’t history a science unto itself, and different from law? Scalia responds to this argument by saying first that history is central to the law, in the very least through the fact that the meanings of words change over time. So inquiry into the past certainly has to do with the law and vice versa. He notes that the only way to assign meaning to many of the phrases in the constitution is through historical understanding: for example “letters of mark and reprisal” and “habeas corpus” etc. Secondly, he gives a deeply non-elitist argument about the quality of expert vs nonexpert reasoning. This is the second way Scalia expresses a populist sentiment.

In District of Columbia v. Heller, 554 U.S. ___ (2008), the petitioners contended that the term “bear arms” only meant a military interpretation, although there are previous cases that show this isn’t true. But this case was about more than the historical usage of words: the 2nd Amendment didn’t say “the people shall have the right to keep and bear arms,” for example, but that “the right of the people to keep and bear arms shall not be infringed” – as if this was a pre-existing right. So Scalia argues that here there was a place for historical inquiry here that showed there was such a pre-exsiting right: in the English bill of Rights of 1689 (found by Blackstone). So now it’s hard to see the 2nd Amendment as more than the right to join a militia. Which goes with the prologue of the 2nd Amendment: the right of a well regulated militia to keep arms. This goes much further than just lexicography.

So what can be expected of judges? Scalia argues, like Churchill’s argument for democracy, that all an originalist need show is that originalism beats the alternatives. He says this isn’t hard to do since inquiry into original meaning is not as difficult as what opponents suggest. He says one place to look when the framer’s intent is not clear is to look at states’ older interpretations. And in the vast majority of cases, including the most controversial ones, the originalist interpretation is clear. His examples of cases with clear original intent are abortion, a right to engage in homosexual sodomy, or assisted suicide, or prohibition of the death penalty (the death penalty was historically the only penalty for a felony) – these rights are not found in the constitution. Determining whether there should be (and hence is, for a non-originalist judge) a right to abortion or same sex marriage or whatnot, requires moral philosophy which Scalia says is harder than historical inquiry.

He also uses as evidence for the symbiotic relationship between law and history that history departments have legal historical scholars and law schools have historical experts.

Scalia gives the case of Thompson v. Oklahoma 487 U.S. 815 (1988) as an example of a situation in which historical reasoning played little part and he uses this as a baseline to argue that the role of historical reasoning in Supreme Court opinions is increasing. The briefs in Thompson were of no help with historical questions since they did not touch on the history of the 8th Amendment, but Scalia says this isn’t surprising since the history of the clause had been written out of the argument by previous thinking. Another case, Morrison v. Olsen 487 U.S. 654 (1988), considered a challenge to the statue creating the independent counsel. Scalia thinks these questions could benefit little from historical clarification, so the briefing in Morrison focused on historical questions such as what did the term “inferior officers” mean at the time of the founding. Two briefs authored by HLS faculty (Cox, Fried) provided useful historical material, but the historical referencing was sparse and none of these briefs were written by scholars of legal history.

In contrast, in 2007 in Heller there was again little historical context but in this case many amicus briefs focused on historical arguments and material. This is a very different situation to that of 20 years ago. There were several briefs from legal historical experts and each contained detailed discussions of the historical right to bear arms in England and here at the time of the founding. Such foci were the heart of the brief, and not relegated to a footnote as it likely would have been 20 years ago, and was in Morrison. Scalia thinks this reinforces the use of the originalist approach, by showing how easy it is compared to other approaches.

Scalia eschews amicus briefs in general, especially insofar as they repeat the arguments made by the parties because of their pretense to scholarly impartiality which may convince judges to sign on to briefs that are nothing but impartial. “Disinterested scholarship and advocacy do not mix well.”

Scalia takes on a second argument made against the use of history in the courts – that the history used is “law office history.” That is, the selection of data favorable to the position being advanced without regard or concern for contradictory data or relevance. Here the charge is not incompentance but tendentiousness: advocates cannot be trusted to present an unbiased view. But of course! says Scalia, since they are advocates. But insofar as the criticism is directed at the court, it is essential that the adjudicator is impartial. “Of course a judicial opinion can give a distorted picture of historical truth, but this would be an inadequate historical opinion and not that which is expected” from the Court. Scalia admonishes that one must review the historical evidence in detail rather than raise the “know nothing” cry.

This is Scalia’s second populist argument: it is deeply non-elitist since it seems to imply that nonprofessional historians are capable of coming up with good historical understanding. It provides an example that dovetails with the notion of opening knowledge and the respect for autonomy in allow individuals to evaluate reasoning and data and come to their own conclusions (and even be right sometimes). Scalia notes that he sees the role of the Court as finding conclusions from these facts, which is different from the role of the historians.

But he feels quite differently about the conclusions of experts in other fields. For example, in overruling Dr. Miles Medical Co. v. John D. Park and Sons, 220 U.S. 373 (1911), holding that resale price maintenance isn’t a per se violation of the Sherman Act, he didn’t feel uncomfortable since this is the almost uniform view of professional economists. Scalia seems to be saying that experts are probably right more often than nonexperts, but nonexperts can also contribute. He phrases this as an expert in judicial analysis – and he says there is a difference in historical analysis vs, say, the type of engineering analysis that might be required for patent cases. He makes a distinction between types of subject which are more susceptible to successful nonexpert analysis.

Scalia then advocates for submission of analysis to public scrutiny with data open, thus allowing suspect conclusions to be challenged. The originalist will reach substantive results he doesn’t personally favor and the reasoning process should be open. Scalia notes that this is more honest that judges who reason morally, who will never disagree with their own opinions.

There was a question that got the audience laughing at the end. The questioner claims to have approached a Raytheon manufacturing facility to buy a missile or tank, since in his view the 2nd Amendment is about keeping the government scared of the people, and somehow having a gun when the government has more advanced weaponry misses the point. Scalia thinks this is outside the scope of the 2nd Amendment because “You can’t bear a tank!”