You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag Resources


← Back to all posts
Feb 01 2011

Im changing several things about my data, so Im going to describe my system again in case anyone is interested, and so I have a page to link to in the future.

Dec 02 2010

Jamies been asking for some thoughts on what it takes to do thisstatistics backgrounds, etc. I should say that Im doing this, for the most part, the hard way, because 1) My database is too large to start out using most tools I know of, including I think the R text-mining package, and 2) I want to understand how it works better. I dont think Im going to do the software review thing here, but there are what look like a _lot _of promising leads at an American Studies blog.

Nov 07 2010

A collection as large as the Internet Archives OCR database means I have to think through what I want well in advance of doing it. Im only using a small subset of their 900,000 Google-scanned books, but thats still 16 gigabytesit takes a couple hours just to get my baseline count of the 200,000 most common words. I could probably improve a lot of my search time through some more sophisticated database management, but Ill still have to figure out what sort of relations are worth looking for. So what are some?