You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

What's new?

Jun 16 2011

Let me get back into the blogging swing with a (too longthis is why I cant handle Twitter, folks) reflection on an offhand comment. Dont worry, theres some data stuff in the pipe, maybe including some long-delayed playing with topic models.

Even at the NEHs Digging into Data conference last weekend, one commenter brought out one of the standard criticisms of digital workthat it doesnt tell us anything we didnt know before. The context was some of Gregory Cranes work in describing shifting word use patterns in Latin over very long time spans (2000 years) at the Perseus Project: Cynthia Damon, from Penn, worried that being able to represent this as a graph instead by traditional reading is not necessarily a major gain. That is to say, we already know this; having a chart restate the things any classicist could tell you is less than useful. I might have written down the quote wrong; it doesnt really matter, because this is a pretty standard response from humanists to computational work, and Damon didnt press the point as forcefully as others do. Outside the friendly confines of the digital humanities community, we have to deal with it all the time.

Now, there are bunch of responses to this question on the level of pure research. Just a few in passing: Knowledge of overall trends from arbitrary sampling can suffer major confirmation biases; by definition, only the rarest research is truly groundbreaking, and confirmatory research is important and underprivileged across all fields in the academy; theres a difference between knowing the existence of a trend and knowing the magnitude and contours of that trend. Its easy to go on.

But this time, the question that jumps out for me is Tontos: What do you mean we, kemosabe? Just who is it that already knows about these trends? The obvious answer, presumably, is that its some academic field or subfield. Our expert speaks from authority to say that the research doesnt contribute to their fields. (Note that the statement can be exclusionary: an implication is that if you the researcher find this discovery interesting, you must not really be in the field, even if youre a professor in it.) But though the field is important, its more complicated. Ive read a lot of pieces in the ever-lively crisis-of-the-humanities/defense-of-the-humanities genre, and pretty much all of them would agree that we also means the culture as a whole: scholars know it in their capacity as the keepers of the flame of knowledge. And for that we, different types of knowledge reshaping do actually contribute to what we know.

This struck me in Damons commentary because she mentioned elsewhere that she was working on a translation of Tacitus. Im outside the field, obviously, but I still feel pretty confident in saying that putting Tacitus into modern English contributes very little to the body of scholarly knowledge. Jack Gladney notwithstanding, scholars speak the language of their field. If we think that kind of work broadens knowledge, its because it makes Tacitus available to the much larger group of people who cant read Latin. If translations are a worthy activity for senior scholars, why arent data representations?

I can think of a couple potentially concerning reasons that humanists dont think this work increases what we know. The first is that while humanists care about non-Latin readers knowing things about Tacitus, we dont care about people who are more persuaded by quantitative data than by anecdotal impressions. Requiring numbers for proof is naive empiricism, blind to the complexities of human experience, etc. While theres certainly an undercurrent of this thinking, I dont think its insuperable or always present in these critiques: at least in history, Ive long been struck by how often maps and stats get used in lecture courses by faculty who would never use them in their published work.

Moreover, plenty of humanists themselves are interested in this type of knowledgethats whats driving much of the interest in reaching out to new methods by humanists today now that the data is available. Thus even within the field, theres been an undercurrent of people who dont find our conclusions from traditional reading completely persuasive: I, for one, love to see more solid evidence on a few canards of historical interpretation. (The Culturomics keynote, for example, has this slide, which helps answer some live questions about things many historians claim to simply know about the transition of the United States from a plural to a singular subject around the Civil War).

But if we do accept that new representations help persuade different groups of people, including some who arent obviously outside the scholarly field, why dont they expand what we know in a real way? I think it has to do with what one of the Digging into Data speakers (cant remember who ) talked about as the privileging of method over questions in the humanities. Learning that language changed by looking at a chart isnt real knowledge, the argument would go, not like knowledge gained by reading lots of books. Even if I read a result off a google ngram, the only way to confirm its truth is to ask someone whos actually read all the books. Its easy to make a mistake off a graph, so the only real knowledge is rooted in reading in reading techniques anyway. Humanists would fail in their obligations to students if they let them reach conclusions through charts rather than through extended reading.

Now, a methodological fight may be coming, and it might be fun. I think a lot of participants would like this to be a purely epistemological issue. JB Michel briefly mentioned Viennese logical positivism in the keynote while suggesting that now we can speak quantitatively about culture, although way back when it wasnt possible. Many more humanists, I suspect, would think that we still cantthat humanistic questions are by definition not tractable to purely quantitative analysis.

So far as possible, I want to sidestep those issues to point out some more pragmatic problems with defending the scholarly status quo. Although for many humanists defending reading seems like a warmly resounding defending of human practices in texts, for students and outsiders it can seem much closer to an unquestionable assertion of authority. To assert a trend on the basis of experience that is not open to critical interrogation tends to tighten the circle of we for whom humanities knowledge is accessible enormously. Thats a mistake; trust in authority is not a core humanistic value. Neither is dismissing the relevance of particular types of learning. By making it easier to draw conclusions about the past, quantification allows us to enormously broaden the circle of people who can know things about the past.

I say know a bit uncautiously, but I do think there is an enormous difference between someone told by an authority figure that language use changed, and someone given a graph to figure it out. Even on a printed page, a chart invites an engaged reading in a way that a simple pronouncement does not. Its hard to overstate how important that is. Now, a chart doesnt need to be quantitativeI actually think dynamically generated concordances might allow people to reach conclusions in the same way without statistics, although with a bit more effort and a bit more computing power. But if it changes the number of people to whom basic knowledge about culture is available, even if it doesnt change the type of knowledge, that serves the purpose of the humanities better than anything.

~~~~~
A concluding parable: Imagine a world with no maps. Most people know only their immediate neighborhoods, but a few social misfits spent their twenties driving the long hauls between cities, sacrificing fame, fortune, and family to learn the lay of the land. If you want to get from New York to midcoast Maine, they tell you about how to take the Hutchison to the Merritt, about the I-84 turnoff from 91 just before Hartford, about the merits of taking the coastal route north from Portland or following the interstate to Augusta. They tell you that they recall a friend who wrote a book mentioning a cutoff from Route 1 that saves a few miles by skipping RocklandMaine route 90, route 95, something like thatyou might want to look into further. This is a useful service. Some people make careers out of it.

If someone walks into this world with a stack of Hagstrom atlases, what happens? Those people go on about how those maps cant capture the rush hour traffic in Hartford or the backups near Wiscasset on summer weekends, about how youd never know from a map to buy your gas in Massachusetts and your alcohol in New Hampshire. They say theyve already driven all these roads; the maps dont tell us anything that we dont know already. Real knowledge of the terrain can only be gained by driving it.

 In a way, theyd be right. The routes that a mapreader gets may be more interesting at times, but they will also be shallower. If they rely on a completely algorithmic solution (Google Maps!) they will frequently get terrible results. But theres no surer way to avoid more people learning the landscape than making it as inaccessible as possible. It might validate the choices and expertise of a few, but it certainly does far less for the knowledge of the land itself than opening it up to new audiences.

Comments:

I think youre very much on the right track wh

Anonymous - Jun 4, 2011

I think youre very much on the right track when you suggest that this is an issue of social or institutional organization, rather than a strictly epistemological one.

Part of the problem, I think, is that the payoff for digital projects (especially ones involving big data) often does not *fit* neatly into a single field, as fields are presently defined. This is particularly an issue in literary studies, I think, because we periodize ourselves very tightly. If you draw a time-series graph with an x-axis longer than about 100 years, it can start to be hard to say who, exactly, would be the audience for such a thesis.

Or, to pick up the way youre describing it here, it may be the case that Romanticists already know one implication of the graph, and Modernists already know another implication, and no one thinks its particularly important that those two insights can be fused in a single trend line.

Its weird: this kind of personal protectivene

Jamie - Jun 5, 2011

Its weird: this kind of personal protectiveness over knowledge is exactly the kind of thing that will crash and burn a job talk (I can just tell). It also wouldnt work in a book, at least not for a new scholar. Why is it okay in other contexts? Maybe resisting quantitative analysis is a knee-jerk defense against a public thats already skeptical about the existence and value of expert knowledge in the humanities. Fortunately, your take on this is much more productive.

I think youre absolutely right - that quantit

scritic - Jun 5, 2011

I think youre absolutely right - that quantitative proofs or demonstrations can serve a different purpose - perhaps of making the work reach a different kind of disciplinary audience. Im going to use this argument the next time (with due credit!) when I get into a conversation with anyone about the utility of quantitative analysis. :)

I suspect though that when humanists claim that pattern recognition by programs is not a major gain, theyre also trying to raise an epistemological point. That is: if we construct programs to look for statistical regularities in texts, the programmer (or researcher) already has some idea of these regularities and the program only finds it for her (or verifies it for her that they exist). Your rejoinder is that it is indeed valuable for the program to be able to find it for her, even if thats all it does.

I agree. But its worth mentioning too that it is now possible in machine learning - through a technique called boosting - to construct weak pattern recognizers and then build them up together into a very strong one (such that the strong classifier is more than a sum of its weak parts). So there does exist a possibility of being able to go beyond things we already know in the computational analysis of texts.

Hopefully well get there soon.