You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Clustering isms together

Nov 28 2010

In addition to finding the similarities in use between particular isms, we can look at their similarities in general. Since we have the distances, its possible to create a dendrogram, which is a sort of family tree. Looking around the literary studies text-analysis blogs, I see these done quite a few times to classify works by their vocabulary. I havent seen much using words, though: but it works fairly well. I thought it might help answer Hanks question about the difference between evolutionism and darwinism, but, as youll see, that distinction seems to be a little too fine for now.

Heres the overall tree of the 400-ish isms, with the words removed, just to give a sense. We can cut the tree at any point to divide into however many groups wed like. The top three branches essentially correspond to 1) Christian and philosophical terminology, 2) social, historical, and everything else, and 3) medical and some scientific terminology.

But you probably want to see the actual words.

We can see a little more detail on any of these chunks. If we cut it up lower, it often makes quite a bit of sense. If I break it into 41 clusters, for example, so that they will average ten items per cluster, the largest one looks like this:

From the top, you can see three sub-clusters; one roughly of Catholic friends and enemies, one of Protestant ones, and a third that is less clear but may involve slightly more exotic heresies. The Catholic one nicely separates words connected to the first milleniumheathenism, nestorianism, arianism away from more modern movementsjesuitism, jansenismalthough some words involved with the early church fathersaugustinism, manicheanismshow up in the latter cluster, presumably because they had more currency. Sacerdotalism and Ritualism show up among the Protestant words because they are important in defining Protestantism by contrast, not Catholicismopposites are pulled together by use. There are a lot of reminders that this is classifying by use, not by meaningId like to have montanism and ultramontanism closer together, and a number of other protestant words, particularly those with a stronger role in American history (Puritanism, Wesleyanism) appear in quite different clusters. But theres certainly some stuff here. In some other cases, it turns up discursive spheres quite neatly recognizable:

With a few exceptions (favoritism in the first, opportunism in the second), its pretty clear why these words are clustered. But in others, the connections are somewhat more mystifying:

None of these words are very closely related to each othersee how they branch off before distance=2, while our big protestant cluster was all inside 1.5but they are mostly as close as trinitarianism and tritheism. The frustrating but important thing is that its just these sorts of odd juxtapositions that can spur us in new directions. A frequent complaint about statistical humanities is that it tells us nothing we didnt know before. Well, I certainly didnt know before that there was any connection between transcendentalism and animalism, or between those two and obscurantism and humanism. There is one. Probably its not a historically interesting one among these fourit could have to with publishers, typos, anything else.

As for evolutionism vs. Darwinism, theres not much to separate themthey appear in different places, but clustered oddly among among various philosophical terms. Maybe I could make sense of it if I read the chart moreits down at the bottom if you want to try.

The isms may not be the best set to use this on. What would be? Some set of names might be interesting. But the real work might involve a related set of concepts whose connections are disputed. Ive been thinking about running some kind of a retread of Dan Rodgers In Search of Progressivismarticle, in which he basically does this same kind of cluster analysis to various strands in the language of progressive reform. Limiting the set to books in relevant categories, which would require some sort of LOC catalog information, and then taking years in the progressive era, we could see what sort of clustering a naive computer program does, or even start up a few cluster around words of social control, reform, etc, and see if we can modify or confirm parts of his ordering. Thats a ways off, though. For now, Ill just dump the whole tree, in three segments, at the bottom of this post. Id show the connection between them, but blogger doesnt let me post that long an image.

Comments:

Holy Crap! My compliments to you for taking on thi

Anonymous - Oct 2, 2014

Holy Crap! My compliments to you for taking on this mammoth task! Now, all we need is definitions and a short history of each. LOL! Somebody needs to make an app.