You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Today's Times Article

Dec 04 2010

Patricia Cohens new article about the digital humanities doesnt come with the rafts of crotchety comments the first one did, so unlike last time Im not in defensive crouch. To the contrary: Im thrilled and grateful that Dan Cohen, the main subject of the article, took the time in his moment in the sun to link to me. The article itself is really good, not just because the Cohen-Gibbs Victorian project is so exciting, but because P. Cohen gets some thoughtful comments and the NYT graphic designers, as always, do a great job. So I just want to focus on the Google connection for now, and then Ill post my versions of the charts the Times published.

Theres one strange subtext that I cant quite figure out: the secret google metadata. Cohen says google has substantially better metadata than they put on their site, which makes me somewhat doubtful of just how open they can be with all their resources. If Google can get a full API with access to texts and good metadata, which seems like its a year or two off, that will obviate any need for databases like the one Ive built. But if its hampered by restrictions put on by content providers, that could cripple their ability to give the full access scholars need to engage in real dialogue with the data. Google Trends for historical terms might be worse than nothing, because it would only allow the facile sort of thinking Cohens discouraging.* It was a big, messy production for google to wean itself from outside providers to let them do more interesting things with Maps: is metadata for historians and literature scholars going to be worth that effort for them, particularly when errors could result in copyright infringement? The Google employee in the article has a long comment about metadata that makes it sound like they currently have some obligations to providers, which is a bad sign. On the other hand,  Cohen seems to trust them, which is something, and its their book-scanning and free circulation of PDFs (though not OCR) that makes all of this possible.

But given the lack of clarity on a) what Google will offer, and b) when it will offer it, Im happy for now to be working with Internet Archive OCR on Google scans, even though their metadata is quite a headache. The completeness of the Google stuff is appealing, but for most of the actual, historical questions I can think of dealing with books (not serials, which is a whole other mess) a combination of Internet Archive sources and Library of Congress catalog information should be fine. (Not that Ive made any progress towards getting them to play together since the last time I said that.)