You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at For current posts, see here.

Do it yourself

Dec 02 2010

Jamie’s been asking for some thoughts on what it takes to do this–statistics backgrounds, etc. I should say that I’m doing this, for the most part, the hard way, because 1) My database is too large to start out using most tools I know of, including I think the R text-mining package, and 2) I want to understand how it works better. I don’t think I’m going to do the software review thing here, but there are what look like a _lot _of promising leads at an American Studies blog.

As for whether the courses exist, I think they do from place to place: Stephen Ramsay says he’s taught one at Nebraska for years.

It’s easy to follow a few of these links and quickly end up drinking from a firehose of information. I get two initial impressions: 1) English is ahead of history on this; 2) there are a lot of highly developed applications for doing similar things with text analysis. The advantage is that it’s leading me to think more carefully about how my applications are different than other people’s.