You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Posts with tag Evolution


← Back to all posts
Apr 01 2011

When I first thought about using digital texts to track shifts in language usage over time, the largest reliable repository of e-texts was Project Gutenberg. I quickly found out, though, that they didnt have works for years, somewhat to my surprise. (Its remarkable how much metadata holds this sort of work back, rather than data itself). They did, though, have one kind of year information: author birth dates. You can use those to create same type of charts of word use over time that people like me, the Victorian Books project, or the Culturomists have been doing, but in a different dimension: we can see how all the authors born in a year use language rather than looking at how books published in a year use language.

Feb 02 2011

Genre information is important and interesting. Using the smaller of my two book databases, I can get some pretty good genre information about some fields Im interested in for my dissertation by using the Library of Congress classifications for the books. Im going to start with the difference between psychology and philosophy. Ive already got some more interesting stuff than these basic charts, but I think a constrained comparison like this should be somewhat more clear.

Jan 18 2011

Ill end my unannounced hiatus by posting several charts that show the limits of the search-term clustering I talked about last week before I respond to a couple things that caught my interest in the last week.

Jan 11 2011

Because of my primitive search engine, Ive been thinking about some of the ways we can better use search data to a) interpret historical data, and b) improve our understanding of what goes on when we search. As I was saying then, there are two things that search engines let us do that we usually dont get:

Jan 10 2011

More access to the connections between words makes it possible to separate word-use from language. This is one of the reasons that we need access to analyzed texts to do any real digital history. Im thinking through ways to use patterns of correlations across books as a way to start thinking about how connections between words and concepts change over time, just as word count data can tell us something (fuzzy, but something) about the general prominence of a term. This post is about how the search algorithm Ive been working with can help improve this sort of search. Ill get back to evolution (which I talked about in my post introducing these correlation charts) in a day or two, but let me start with an even more basic question that illustrates some of the possibilities and limitations of this analysis: What was the Civil War fought about?

Dec 27 2010

I finally got some call numbers. Not for everything, but for a better portion than I thought I would: about 7,600 records, or c. 30% of my books.

Dec 04 2010

Lexical analysis widens the hermeneutic circle. The statistics need to be kept close to the text to keep any work sufficiently under the researchers control. Ive noticed that when I ask the computer to do too much work for me in identifying patterns, outliers, and so on, it frequently responds with mistakes in the data set, not with real historical data. So as I start to harness this new database, one of the big questions is how to integrate what the researcher already knows into the patterns he or she is analyzing.

Nov 27 2010

What can we do with this information weve gathered about unexpected occurrences? The most obvious thing is simply to look at what words appear most often with other ones. We can do this for any ism given the data Ive gathered. Hank asked earlier in the comments about the difference between Darwinism and evolutionism, so:

Nov 15 2010

Hank asked for a couple of charts in the comments, so I thought Id oblige. Since Im starting to feel theyre better at tracking the permeation of concepts, well use appearances per 1000 books as the y axis:

Nov 13 2010

Henry asks in the comments whether the decline in evolutionary thought in the 1890s is the Eclipse of Darwinism, rise or prominence of neo-Lamarckians and saltationism and kooky discussions of hereditary mechanisms? Lets take a look, with our new and improved data (and better charts, too, compared to earlier in the weekany suggestions on design?). First,three words very closely tied to the theory of natural selection.

Nov 08 2010

An anonymous correspondent says: