Literate programming is now a team sport

On GitHub, countless stories of countless programs are being told every day

In the mid-1980s I worked for a company that squandered a goodly number of tax dollars on a software project for the Army. Do horrors spring to mind when you hear the phrase "milspec software"? It was even worse. My gig was milspec software documentation.

We produced some architectural docs, but our output was dwarfed by piles of that era's precursor to Javadoc: boilerplate extracted from source code. During one meeting, the head of the doc team thudded a two-foot-thick mound of the stuff onto the table and said: "This means nothing to anyone."

Meanwhile, we were following the adventures of Donald Knuth, who was developing an idea he called literate programming. A program, he said, was a story told in two languages: code and prose. You needed to be able to write both at the same time, and in a way that elevated the prose to equal status with the code. Ideas were explored in prose narration, at varying levels of abstraction, then gradually fleshed out in code. 

In order to work in this way he created a tool that consumed a blend of the two. TeX was the language in which he expressed prose that could be beautifully typeset. Pascal was the language in which he expressed code. Both comprised a literate program. One tool extracted prose for publication, another extracted code for compilation.

There have been a few implementations of this idea over the years, for a few different pairs of programming languages and publishing systems. But it's never caught on in a big way. I used to think that was because in order to do what Knuth advocated, in the way Knuth did it, you had to be Knuth or one of relatively few others able to think like him.

Lately, though, I've come to see that literate programming is happening all around us, albeit differently than we once imagined. You can see what I mean most clearly on GitHub, where countless stories of countless programs are being told every day. 

In a related story posting this week, "GitHub for the rest of us," I consider some of the ways in which nonprogrammers are being drawn into the culture of GitHub. It is, among other things, a literary culture. Inline comments are one form of expression, but increasingly the code's story is told in commit messages and issue discussion. That story is often crafted with exquisite care.

The notion of a clean commit history, for example, should fascinate scholars from other disciplines. In an era of always-on and continuously improving software services, built and operated by evolving teams of contributors, there can be no secrets, no tricks, no mysteries. Anyone may need to fix a bug, add a feature, or integrate with another service. Understanding what exists, and how it came to be, is crucial. 

That history, summarized in streams of commit messages, is so important that there are huge debates about how to write it. Should you merge changes in order to preserve the graph of branches that preceded a commit? Or should you rebase in order to flatten that graph into a linear view for posterity? If your personal style is to commit more frequent and more granular changes than will be helpful to a future collaborator, should you aggregate those changes into bigger chunks before you push? 

Those are great questions for somebody working in digital humanities to explore, particularly because this literary form is inherently collaborative. In the 1980s, when Donald Knuth first envisioned literate programming, it was not unusual for a major software project -- like TeX -- to emerge from the mind of a single author. Times have changed. Software development is much more likely to be a team sport, as is the storytelling that surrounds and informs it.

This story, "Literate programming is now a team sport" was originally published by InfoWorld.