Posts with tag isms
← Back to all posts
Dan asks for some numbers on “capitalism” and “capitalist” similar to the ones on “Darwinism” and “Darwinist” I ran for Hank earlier. That seems like a nice big question I can use to get some basic methods to warm up the new database I set up this week and to get some basic functionality written into it.
In addition to finding the similarities in use between particular isms, we can look at their similarities in general. Since we have the distances, it’s possible to create a dendrogram, which is a sort of family tree. Looking around the literary studies text-analysis blogs, I see these done quite a few times to classify works by their vocabulary. I haven’t seen much using words, though: but it works fairly well. I thought it might help answer Hank’s question about the difference between evolutionism and darwinism, but, as you’ll see, that distinction seems to be a little too fine for now.
What can we do with this information we’ve gathered about unexpected occurrences? The most obvious thing is simply to look at what words appear most often with other ones. We can do this for any ism given the data I’ve gathered. Hank asked earlier in the comments about the difference between “Darwinism” and evolutionism, so:
Now to the final term in my sentence from earlier— “How often, compared to what we would expect, does a given word appear with any other given word?”**.** Let’s think about How much more often. I though this was more complicated than it is for a while, so this post will be short and not very important.
This is the second post on ways to measure connections—or more precisely, distance—between words by looking at how often they appear together in books. These are a little dry, and the payoff doesn’t come for a while, so let me remind you of the payoff (after which you can bail on this post). I’m trying to create some simple methods that will work well with historical texts to see relations between words—what words are used in similar semantic contexts, what groups of words tend to appear together. First I’ll apply them to the isms, and then we’ll put them in the toolbox to use for later analysis.
I said earlier I would break up the sentence “How often, compared to what we would expect, does a given word appear with any other given word?” into different components. Now let’s look at the central, and maybe most important, part of the question—how often do we expect words to appear together?
Ties between words are one of the most important things computers can tell us about language. I already looked at one way of looking at connections between words in talking about the phrase “scientific method”--the percentage of occurrences of a word that occur with another phrase. I’ve been trying a different tack, however, in looking at the interrelations among the isms. The whole thing has been do complicated–I never posted anything from Russia because I couldn’t get the whole system in order in my time here. So instead, I want to take a couple posts to break down a simple sentence and think about how we could statistically measure each component. Here’s the sentence:
Hank asked for a couple of charts in the comments, so I thought I’d oblige. Since I’m starting to feel they’re better at tracking the permeation of concepts, we’ll use appearances per 1000 books as the y axis:
I’m going to keep looking at the list of isms, because a) they’re fun; and b) the methods we use on them can be used on any group of words–for example, ones that we find are highly tied to evolution. So, let’s use them as a test case for one of the questions I started out with: how can we find similarities in the historical patterns of emergence and submergence of words?
Here’s a fun way of using this dataset to convey a lot of historical information. I took all the 414 words that end in ism in my database, and plotted them by the year in which they peaked,* with the size proportional to their use at peak. I’m going to think about how to make it flashier, but it’s pretty interesting as it is. Sample below, and full chart after the break.