Archive for the 'Open Science' Category

Page 2 of 3

Software and Intellectual Lock-in in Science

In a recent discussion with a friend, a hypothesis occurred to me: that increased levels of computation in scientific research could cause greater intellectual lock-in to particular ideas.

Examining how ideas change in scientific thinking isn’t new. Thomas Kuhn for example caused a revolution himself in how scientific progress is understood with his 1962 book The Structure of Scientific Revolutions. The notion of technological lock-in isn’t new either, see for example Paul David’s examination of how we ended up with the non-optimal QWERTY keyboard (“Clio and the Economics of QWERTY,” AER, 75(2), 1985) or Brian Arthur’s “Competing Technologies and Lock-in by Historical Events: The Dynamics of Allocation Under Increasing Returns” (Economic Journal, 99, 1989).

Computer-based methods are relatively new to scientific research, and are reaching even the most seemingly uncomputational edges of the humanities, like English literature and archaeology. Did Shakespeare really write all the plays attributed to him? Let’s see if word distributions by play are significantly different; or can we use signal processing to “see” artifacts without unearthing them, and thereby preserving artifact features?

Software has the property of encapsulating ideas and methods for scientific problem solving. Software also has a second property: brittleness, it breaks before it bends. Computing hardware has grown steadily in capability, speed, reliability, and capacity, but as Jaron Lanier describes in his essay on The Edge, trends in software are “a macabre parody of Moore’s Law” and the “moment programs grow beyond smallness, their brittleness becomes the most prominent feature, and software engineering becomes Sisyphean.” My concern is that as ideas become increasingly manifest as code, with all the scientific advancement that can imply, it becomes more difficult to adapt, modify, and change the underlying scientific approaches. We become, as scientists, more locked into particular methods for solving scientific questions and particular ways of thinking.

For example, what happens when an approach to solving a problem is encoded in software and becomes a standard tool? Many such tools exist, and are vital to research – just look at the list at Andrej Sali’s highly regarded lab at UCSF, or the statistical packages in the widely used language R, for example. David Donoho laments the now widespread use of test cases he released online to illustrate his methods for particular types of data, “I have seen numerous papers and conference presentations referring to “Blocks,” “Bumps,” “HeaviSine,” and “Doppler” as standards of a sort (this is a practice I object to but am powerless to stop; I wish people would develop new test cases which are more appropriate to illustrate the methodology they are developing).” Code and ideas should be reused and built upon, but at what point does the cost of recoding outweigh the scientific cost of not improving the method? In fact, perhaps counterintuitively, it’s hardware that is routinely upgraded and replaced, not the seemingly ephemeral software.

In his essay Lanier argues that the brittle state of software today results from metaphors used by the first computer scientists – electronic communications devices that sent signals on a wire. It’s an example of intellectual lock-in itself that’s become hardened in how we encode ideas as machine instructions now.

My Interview with ITConversations on Reproducible Research

On September 30, I was interviewed by Jon Udell from ITConversations.org in his Interviews with Innovators series, on Reproducibility of Computational Science.

Here’s the blurb: “If you’re a writer, a musician, or an artist, you can use Creative Commons licenses to share your digital works. But how can scientists license their work for sharing? In this conversation, Victoria Stodden — a fellow with Science Commons — explains to host Jon Udell why scientific output is different and how Science Commons aims to help scientists share it freely.”

Optimal Information Disclosure Levels: Data.gov and "Taleb's Criticism"

I was listening to the audio recording of last Friday’s “Scientific Data for Evidence Based Policy and Decision Making” symposium at the National Academies, and was struck by the earnest effort on the part of members of the Whitehouse to release governmental data to the public. Beth Noveck, Obama’s Deputy Chief Technology Officer for Open Government, frames the effort with a slogan, “Transparency, Participation, and Collaboration.” A plan is being developed by the Whitehouse in collaboration with the OMB to implement these three principles via a “massive release of data in open, downloadable, accessible for machine readable formats, across all agencies, not only in the Whitehouse,” says Beth. “At the heart of this commitment to transparency is a commitment to open data and open information..”

Vivek Kundra, Chief Information Officer in the Whitehouse’s Open Government Initiative, was even more explicit – saying that “the dream here is that you have a grad student, sifting through these datasets at 3 in the morning, who finds, at the intersection of multiple datasets, insight that we may not have seen, or developed a solution that we may not have thought of.”

This is an extraordinary vision. This discussion comes hot on the heels of a debate in Congress regarding the level of information they are willing to release to the public in advance of voting on a bill. Last Wednesday CBS reports, with regard to the health care bill, that “[t]he Senate Finance Committee considered for two hours today a Republican amendment — which was ultimately rejected — that would have required the “legislative” language of the committee’s final bill, along with a cost estimate for the bill, to be posted online for 72 hours before the committee voted on it. Instead, the committee passed a similar amendment, offered by Committee Chair Max Baucus (D-Mont.), to put online the “conceptual” or “plain” language of the bill, along with the cost estimate.” What is remarkable is the sense this gives that somehow the public won’t understand the raw text of the bill (I noticed no compromise position offered that would make both versions available, which seems an obvious solution).

The Whitehouse’s efforts have the potential to test this hypothesis: if given more information will people pull things out of context and promulgate misinformation? The Whitehouse is betting that they won’t, and Kundra does state the Whitehouse is accompanying dataset release with efforts to provide contextual meta-data for each dataset while safeguarding national security and individual privacy rights.

This sense of limits in openness isn’t unique to governmental issues and in my research on data and code sharing among scientists I’ve termed the concern “Taleb’s crticism.” In a 2008 essay on The Edge website, Taleb worries about the dangers that can result from people using statistical methodology without having a clear understanding of the techniques. An example of concern about Taleb’s Criticism appeared on UCSF’s EVA website, a repository of programs for automatic protein structure prediction. The UCSF researchers won’t release their code publicly because, as stated on their website, “We are seriously concerned about the ‘negative’ aspect of the freedom of the Web being that any newcomer can spend a day and hack out a program that predicts 3D structure, put it on the web, and it will be used.” Like the congressmen seemed to fear, for these folks openness is scary because people may misuse the information.

It could be argued, and for scientific research should be argued, that an open dialog of an idea’s merits is preferable to no dialog at all, and misinformation can be countered and exposed. Justice Brandeis famously elucidated this point in Whitney v. California (1927), writing that “If there be time to expose through discussion the falsehood and fallacies, to avert the evil by the processes of education, the remedy to be applied is more speech, not enforced silence.” Data.gov is an experiment in context and may bolster trust in the public release of complex information. Speaking of the Data.gov project, Noveck explained that “the notion of making complex information more accessible to people and to make greater sense of that complex information was really at the heart.” This is a very bold move and it will be fascinating to see the outcome.

Crossposted on Yale Law School’s Information Society Project blog.

What's New at Science Foo Camp 2009

SciFoo is a wonderful annual gathering of thinkers about science. It’s an unconference and people who choose to speak do so. Here’s my reaction to a couple of these talks.

In Pete Worden’s discussion of modeling future climate change, I wondered about the reliability of simulation results. Worden conceded that there are several models doing the same predictions he showed, and they can give wildly opposing results. We need to develop the machinery to quantify error in simulation models just as we routinely do for conventional statistical modeling: simulation is often the only empirical tool we have for guiding policy responses to some of our most pressing issues.

But the newest I saw was Bob Metcalfe’s call for us to imagine what to do with the coming overabundance of energy. Metcalfe likened solving energy scarcity to the early days of Internet development: because of the generative design of Internet technology, we now have things that were unimagined in the early discussions, such as YouTube and online video. According to Metcalfe, we need to envision our future as including a “squanderable abundance” of energy, and use Internet lessons such as standardization and distribution of power sources to get there, rather than building for energy conservation.

Cross posted on The Edge.

Bill Gates to Development Researchers: Create and Share Statistics

I was recently in Doha, Qatar, presenting my research on global communication technology use and democratic tendency at ICTD09. I spoke right before the keynote, Bill Gates, whose main point was that when you engage in a goal-oriented activity, such as development, progress can only be made when you measure the impact of your efforts.

Gates paints a positive picture, measured by deaths before age 5. In the 1880′s he says about 30% of children died before their 5th birthday in most countries, and this gradually moved to 20 million in 1960 and then 10 million in 2006. Gates postulates this is due to rising income levels (40% of decrease), and medical innovation such as vaccines (60% of decrease).

This is an example of Gates’ mantra: you can only improve what you can measure. For example, an outbreak of measles tells you your vaccine system isn’t functioning. In his example about childhood deaths, he says we are getting somewhere here because we are measuring the value for money spent on the problem.

Gates thinks the wealthy in the world need to be exposed to these problems ideally through intermingling, or since that is unlikely to happen, through statistics and data visualization. Collect data, then communicate it. In short, Gates advocates creating statistics through measuring development efforts, and changing the world by exposing people to these data.

Wolfram|Alpha Demoed at Harvard: Limits on Human Understanding?

Yesterday Stephen Wolfram gave the first demo of Wolfram|Alpha, coming in May, what he modestly describes as a system to make our stock of human knowledge computable. It includes not just facts, but also our algorithmic knowledge. He says, “Given all the methods, models ,and equations that have been created from science and analysis – take all that stuff and package it so that we can walk up to a website and ask it a question and have it generate the knowledge that we want. … like interacting with an expert.”

It’s ambitious, but so are Wolfram’s previous projects: Mathematica and Mathworld. I remember relying on Mathworld as a grad student – it was excellent, and so I remember when it suddenly disappeared when the content was to be published as a book. In 2002 he published A New Kind of Science, arguing that all processes, including thought, can be viewed as computations and a simple set of rules can describe a complex system. This thinking is clearly evident in Wolfram|Alpha and here are some key examples.
Continue reading ‘Wolfram|Alpha Demoed at Harvard: Limits on Human Understanding?’

Stuart Shieber and the Future of Open Access Publishing

Back in February Harvard adopted a mandate requiring its faculty member to make their research papers available within a year of publication. Stuart Shieber is a computer science professor at Harvard and responsible for proposing the policy. He has since been named director of Harvard’s new Office for Scholarly Comminication.

On November 12 Shieber gave a talk entitled “The Future of Open Access — and How to Stop It” to give an update on where things stand after the adoption of the open access mandate. Open access isn’t just something that makes sense from an ethical standpoint, as Shieber points out that (for-profit) journal subscription costs have risen out of proportion with inflation costs and out of proportion with the costs of nonprofit journals. He notes that the cost per published page in a commercial journal is six times that of the nonprofits. With the current library budget cuts, open access — meaning both access to articles directly on the web and shifting subscriptions away from for-profit journals — is something that appears financially unavoidable.

Here’s the business model for an Open Access (OA) journal: authors pay a fee upfront in order for their paper to be published. Then the issue of the journal appears on the web (possibly also in print) without an access fee. Conversely, traditional for-profit publishing doesn’t charge the author to publish, but keeps the journal closed and charges subscription fees for access.

Shieber recaps Harvard’s policy:

1. The faculty member grants permission to the University to make the article available through an OA repository.

2. There is a waiver for articles: a faculty member can opt out of the OA mandate at his or her sole discretion. For example, if you have a prior agreement with a publisher you can abide by it.

3. The author themselves deposits the article in the repository.

Shieber notes that the policy is also because it allows Harvard to make a collective statement of principle, systematically provide metadata about articles, it clarifies the rights accruing to the article, it allows the university to facilitate the article deposit process, it allows the university to negotiate collectively, and having the mandate be opt out rather than opt in might increase rights retention at the author level.

So the concern Shieber set up in his talk is whether standards for research quality and peer review will be weakened. Here’s how the dystopian argument runs:

1. all universities enact OA policies
2. all articles become OA
3. libraries cancel subscriptions
4. prices go up on remaining journals
5. these remaining journals can’t recoup their costs
6. publishers can’t adapt their business model
7. so the journals and the logistics of peer review they provide, disappear

Shieber counters this argument: 1 through 5 are good because journals will start to feel some competitive pressure. What would be bad is if publishers cannot change their way of doing business. Shieber thinks that even if this is so it will have the effect of pushing us towards OA journals, which provide the same services, including peer review, as the traditional commercial journals.

But does the process of getting there cause a race to the bottom? The argument goes like this: since OA journals are paid by the number of articles published they will just publish everything, thereby destroying standards. Shieber argues this won’t happen because there is price discrimination among journals – authors will pay more to publish in the more prestigious journals. For example, PLOS costs about $3k, Biomed Central about $1000, and Scientific Publishers International is $96 for an article. Shieber also makes an argument that Harvard should have a fund to support faculty who wish to publish in an OA journal and have no other way to pay the fee.

This seems to imply that researchers with sufficient grant funding or falling under his proposed Harvard publication fee subsidy, would then be immune to the fee pressure and simply submit to the most prestigious journal and work their way down the chain until their paper is accepted. This also means that editors/reviewers decide what constitutes the best scientific articles by determining acceptance.

But is democratic representation in science a goal of OA? Missing from Shieber’s described market for scientific publications is any kind of feedback from the readers. The content of these journals, and the determination of prestige, is defined solely by the editors and reviewers. Maybe this is a good thing. But maybe there’s an opportunity to open this by allowing readers a voice in the market. This could done through ads or a very tiny fee on articles – both would give OA publishers an incentive to respond to the preferences of the readers. Perhaps OA journals should be commercial in the sense of profit-maximizing: they might have a reason to listen to readers and might be more effective at maximizing their prestige level.

This vision of OA publishing still effectively excludes researchers who are unable to secure grants or are not affiliated with a university that offers a publication subsidy. The dream behind OA publishing is that everyone can read the articles, but to fully engage in the intellectual debate quality research must still find its way into print, and at the appropriate level of prestige, regardless of the affiliation of the researcher. This is the other side of OA that is very important for researchers from the developing world or thinkers whose research is not mainstream (see, for example, Garrett Lisi a high impact researcher who is unaffiliated with an institution).

The OA publishing model Shieber describes is a clear step forward from the current model where journals are only accessible by affiliates of universities who have paid the subscription fees. It might be worth continuing to move toward an OA system where, not only can anyone access publications, but any quality research is capable of being published, regardless of the author’s affiliation and wealth. To get around the financial constraints one approach might be to allow journals to fund themselves through ads, or provide subsidies to certain researchers. This also opens up the idea of who decides what is quality research.

Justice Scalia: Populist

Justice Scalia (HLS 1960) is speaking at the inaugural Herbert W. Vaughan Lecture today at Harvard Law School. It’s packed – I arrived at 4pm for the 4:30 talk and joined the end of a long line…. then was immediately told the auditorium was full and was relegated to an overflow room with video. I’m lucky to have been early enough to even see it live.

The topic of the talk hasn’t been announced and we’re all waiting with palpable anticipation in the air. The din is deafening.

Scalia takes the podium. The title of his talk is “Methodology of Originalism.”

His subject is the intersection of constitutional law and history. He notes that the orthodox view of constitutional interpretation, up to the time of the Warren Court, was that the constitution is no different from any other legal text. That is, it bears a static meaning that doesn’t change from generation to generation, although it gets applied to new situations. The application to pre-existing phenomena doesn’t change over time, but these applications do provide the data upon which to decide the cases on the new phenomena.

Things changed when the Warren court permitted in New York Times Co. v. Sullivan 376 U.S. 254 (1964) that good faith libel of public figures was good for democracy. Scalia says this might be so but that change should be made by statute and not by the court. He argues this is respectful of the democratic system in that the laws are reflections of people’s votes. This is the first, and perhaps the best known, of two ways Scalia comes across as populist in this talk. In a question at the end he says that the whole theory of democracy is that a justice is not supposed to be writing a constitution but just reflecting what the american people have decided. If you believe in democracy, he explains, you believe in majority rules. In liberal democracies like ours we have made exceptions and given protection to certain minorites such as religious or political minorities. But his key point is that the people made these exceptions, ie. they were adopted in a democratic fashion.

But doesn’t originalism require you to know the original meaning of a document? and isn’t history a science unto itself, and different from law? Scalia responds to this argument by saying first that history is central to the law, in the very least through the fact that the meanings of words change over time. So inquiry into the past certainly has to do with the law and vice versa. He notes that the only way to assign meaning to many of the phrases in the constitution is through historical understanding: for example “letters of mark and reprisal” and “habeas corpus” etc. Secondly, he gives a deeply non-elitist argument about the quality of expert vs nonexpert reasoning. This is the second way Scalia expresses a populist sentiment.

In District of Columbia v. Heller, 554 U.S. ___ (2008), the petitioners contended that the term “bear arms” only meant a military interpretation, although there are previous cases that show this isn’t true. But this case was about more than the historical usage of words: the 2nd Amendment didn’t say “the people shall have the right to keep and bear arms,” for example, but that “the right of the people to keep and bear arms shall not be infringed” – as if this was a pre-existing right. So Scalia argues that here there was a place for historical inquiry here that showed there was such a pre-exsiting right: in the English bill of Rights of 1689 (found by Blackstone). So now it’s hard to see the 2nd Amendment as more than the right to join a militia. Which goes with the prologue of the 2nd Amendment: the right of a well regulated militia to keep arms. This goes much further than just lexicography.

So what can be expected of judges? Scalia argues, like Churchill’s argument for democracy, that all an originalist need show is that originalism beats the alternatives. He says this isn’t hard to do since inquiry into original meaning is not as difficult as what opponents suggest. He says one place to look when the framer’s intent is not clear is to look at states’ older interpretations. And in the vast majority of cases, including the most controversial ones, the originalist interpretation is clear. His examples of cases with clear original intent are abortion, a right to engage in homosexual sodomy, or assisted suicide, or prohibition of the death penalty (the death penalty was historically the only penalty for a felony) – these rights are not found in the constitution. Determining whether there should be (and hence is, for a non-originalist judge) a right to abortion or same sex marriage or whatnot, requires moral philosophy which Scalia says is harder than historical inquiry.

He also uses as evidence for the symbiotic relationship between law and history that history departments have legal historical scholars and law schools have historical experts.

Scalia gives the case of Thompson v. Oklahoma 487 U.S. 815 (1988) as an example of a situation in which historical reasoning played little part and he uses this as a baseline to argue that the role of historical reasoning in Supreme Court opinions is increasing. The briefs in Thompson were of no help with historical questions since they did not touch on the history of the 8th Amendment, but Scalia says this isn’t surprising since the history of the clause had been written out of the argument by previous thinking. Another case, Morrison v. Olsen 487 U.S. 654 (1988), considered a challenge to the statue creating the independent counsel. Scalia thinks these questions could benefit little from historical clarification, so the briefing in Morrison focused on historical questions such as what did the term “inferior officers” mean at the time of the founding. Two briefs authored by HLS faculty (Cox, Fried) provided useful historical material, but the historical referencing was sparse and none of these briefs were written by scholars of legal history.

In contrast, in 2007 in Heller there was again little historical context but in this case many amicus briefs focused on historical arguments and material. This is a very different situation to that of 20 years ago. There were several briefs from legal historical experts and each contained detailed discussions of the historical right to bear arms in England and here at the time of the founding. Such foci were the heart of the brief, and not relegated to a footnote as it likely would have been 20 years ago, and was in Morrison. Scalia thinks this reinforces the use of the originalist approach, by showing how easy it is compared to other approaches.

Scalia eschews amicus briefs in general, especially insofar as they repeat the arguments made by the parties because of their pretense to scholarly impartiality which may convince judges to sign on to briefs that are nothing but impartial. “Disinterested scholarship and advocacy do not mix well.”

Scalia takes on a second argument made against the use of history in the courts – that the history used is “law office history.” That is, the selection of data favorable to the position being advanced without regard or concern for contradictory data or relevance. Here the charge is not incompentance but tendentiousness: advocates cannot be trusted to present an unbiased view. But of course! says Scalia, since they are advocates. But insofar as the criticism is directed at the court, it is essential that the adjudicator is impartial. “Of course a judicial opinion can give a distorted picture of historical truth, but this would be an inadequate historical opinion and not that which is expected” from the Court. Scalia admonishes that one must review the historical evidence in detail rather than raise the “know nothing” cry.

This is Scalia’s second populist argument: it is deeply non-elitist since it seems to imply that nonprofessional historians are capable of coming up with good historical understanding. It provides an example that dovetails with the notion of opening knowledge and the respect for autonomy in allow individuals to evaluate reasoning and data and come to their own conclusions (and even be right sometimes). Scalia notes that he sees the role of the Court as finding conclusions from these facts, which is different from the role of the historians.

But he feels quite differently about the conclusions of experts in other fields. For example, in overruling Dr. Miles Medical Co. v. John D. Park and Sons, 220 U.S. 373 (1911), holding that resale price maintenance isn’t a per se violation of the Sherman Act, he didn’t feel uncomfortable since this is the almost uniform view of professional economists. Scalia seems to be saying that experts are probably right more often than nonexperts, but nonexperts can also contribute. He phrases this as an expert in judicial analysis – and he says there is a difference in historical analysis vs, say, the type of engineering analysis that might be required for patent cases. He makes a distinction between types of subject which are more susceptible to successful nonexpert analysis.

Scalia then advocates for submission of analysis to public scrutiny with data open, thus allowing suspect conclusions to be challenged. The originalist will reach substantive results he doesn’t personally favor and the reasoning process should be open. Scalia notes that this is more honest that judges who reason morally, who will never disagree with their own opinions.

There was a question that got the audience laughing at the end. The questioner claims to have approached a Raytheon manufacturing facility to buy a missile or tank, since in his view the 2nd Amendment is about keeping the government scared of the people, and somehow having a gun when the government has more advanced weaponry misses the point. Scalia thinks this is outside the scope of the 2nd Amendment because “You can’t bear a tank!”

A2K3: Opening Scientific Research Requires Societal Change

In the A2K3 panel on Open Access to Science and Research, Eve Gray, from the Centre for Educational Technology, University of Cape Town, sees the Open Access movement as a real societal change. Accordingly she shows us a picture of Nelson Mandela and asks us to think about his release from prison and the amount of change that ushered in. She also asks us to consider whether or not Mandela is an international person or a local person. She sees a parallel with how South African society changed with Mandela and the change people are advocation toward open access to research knowledge. She shows a worldmapper.org map of countries distorted by the amount of (copyrighted) scientific research publications. South Africa looks small. She blames this on South Africa’s willingness to uphold colonial traditions in copyright law and norms in knowledge dissemination. She says this happens almost unquestioningly, and in South Africa to rise in the research world you are expected to publish in ‘international’ journals – the prestigious journals are not South African, she says (I am familiar with this attitude from my own experience in Canada. The top American journals and schools were considered the holy grail. When I asked about attending a top American graduate school I was laughed at by a professor and told that maybe it could happen, if perhaps I had an Olympic gold medal.) She states that for real change in this area to come about people have to recognize that they must mediate a “complex meshing” of policies: at the university level, and the various government levels, norms and the individual scientist level… just as Mandela had to mediate a large number of complex policies at a variety of different levels in order to bring about the change he did.

Legal Barriers to Open Science: my SciFoo talk

I had an amazing time participating at Science Foo Camp this year. This is a unique conference: there are 200 invitees comprising some of the most innovative thinkers about science today. Most are scientists but not all – there are publishers, science reporters, scientific entrepreneurs, writers on science, and so on. I met old friends there and found many amazing new ones.

One thing that I was glad to see was the level of interest in Open Science. Some of the top thinkers in this area were there and I’d guess at least half the participants are highly motivated by this problem. There were sessions on reporting negative results, the future of the scientific method, reproducibility in science. I organized a session with Michael Nielsen on overcoming barriers in open science. I spoke about the legal barriers and O’Reilly Media has made the talk available here.

I have papers forthcoming on this topic you can find on my website.