You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag Metadata


← Back to all posts
Apr 13 2011

All the cool kids are talking about shortcomings in digitized text databases. I dont have anything so detailed to say as what Goose Commerce or Shane Landrum have gone into, but I do have one fun fact. Those guys describe ways that projects miss things we might think are important but that lie just outside the most mainstream intereststhe neglected Early Republic in newspapers, letters to the editor in journals, etc. They raise the important point that digital resources are nowhere near as comprehensive as we sometimes think, which is a big caveat we all need to keep in mind. I want to point out that its not just at the margins were missing texts: omissions are also, maybe surprisingly, lurking right at the heart of the canon. Heres an example.

Feb 01 2011

Im changing several things about my data, so Im going to describe my system again in case anyone is interested, and so I have a page to link to in the future.

Dec 27 2010

I finally got some call numbers. Not for everything, but for a better portion than I thought I would: about 7,600 records, or c. 30% of my books.

Dec 09 2010

A commenter asked about why I dont improve the metadata instead of doing this clustering stuff, which seems just poorly to reproduce the work of generations of librarians in classifying books. Id like to. The biggest problem right now for text analysis for historical purposes is metadata (followed closely by OCR quality). What are the sources? Im going to think through what I know, but Id love any advice on this because its really outside my expertise.