Don’t expect computer scientists to be on top of every use that’s found for computers, including scientific investigation

Computational scientists need to understand and assert their computational needs, and see that they are met.

I just read this excellent interview with Donald Knuth, inventor of TeX and the concept of literature literate programming, as well as author of the famous textbook, The Art of Computer Programming. When asked for comments on (the lack of) software development using multicore processing, he says something very interesting – that multicore technology isn’t that useful, except in a few applications such as “rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc.” This caught my eye because parallel processing is a key advance for data processing. Statistical analysis of data typically executes line by line through the data, making it ideal for multithreaded applications. This isn’t some obscure part of science either – most science carried out today has some element of digital data processing (although of course not always at scales that warrant implementing parallel processing).

Knuth then says that “all these applications [that use parallel processing] require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.” As the state of our scientific knowledge changes so does our problem solving ability, requiring modification of code used to generate scientific discovery. If I’m reading him correctly, Knuth seems to think this makes such applications less relevant to mainstream computer science.

The discussion reminded me of comments made at the “Workshop on Algorithms for Modern Massive Datasets” at Stanford in June 2010. Researchers in scientific computation (a specialized subdiscipline of computational science, see the Institute for Computational and Mathematical Engineering at Stanford or UT Austin’s Institute for Computational Engineering and Sciences for examples) were lamenting the direction computer hardware architecture was taking toward facilitating certain particular problems, such as particular techniques for matrix inversion and hot topics in linear algebra.

As scientific discovery transforms into a deeply computational process, we computational scientists must be prepared to partner with computer scientists to develop tools suited to the needs of scientific knowledge creation, or develop these skills ourselves. I’ve written elsewhere on the need to develop software that natively supports scientific ends (especially for workflow sharing; see e.g. http://stodden.net/AMP2011 ) and this applies to hardware as well.

3 Responses to “Don’t expect computer scientists to be on top of every use that’s found for computers, including scientific investigation”


  • In a sense, you’re preaching to the choir, but in a very real sense, you aren’t. We’ve been down this road before. I worked in high-performance computing from 1980 to 1990. We told ourselves how limited the serial programming paradigm was, that our languages – FORTRAN, C, Pascal and Lisp – were hopelessly inadequate, and that only functional programming would save us.

    What we ended up with is for the most part “clusters” of x86_64 machines, GPUs and a few scattered other experiments. Our codes are mostly written in Python, Java and R, of which only R is based on a functional paradigm. As far as I can tell, the dominant “computer science” paradigms for concurrency, Petri nets and the CSP-Pi Calculus line, aren’t used to reason about correctness of concurrent programs.

    In short, ours is and always will be an *engineering* discipline and not a “science” or a branch of mathematics. We build stuff that can be made to work at acceptable expenditures of human waiting time, electricity and other resources.

  • It is called literate programming, not literature programming.

    Also, in the context of Knuth’s interview, he is talking from the point of view of algorithmics. Having 32 cores can only make your program 32 times faster, which is multiplication by a constant, which is completely irrelevant from the algorithmic point of view. It is not a surprise to see that the person who cares most for the exact running time of an algorithm is not bothered too much by multiplication by such constants. But then again, by his own definition, Knuth likes to be at the *bottom* of things, not at all at the *top* of things.

  • enric,

    Thank you for catching my typo. I have fixed it.

    Sometimes people to argue to me that a fully specified algorithm, with carefully documented inputs and outputs, is enough for reproducibility. I agree, but I’d argue this is not typically the case and full code/data disclosure is warranted (even for HPC).

Leave a Reply