You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Diffusion patterns for news and technological events

Nov 08 2010

An anonymous correspondent says:

You mention in the post about evolution & efficiency that Offhand, the evolution curve looks more the ones I see for technologies, while the efficiency curve resembles news events.

Thats a very interesting observation, and possibly a very important one if its original to you, and can be substantiated. Do you have an example of a tech vs news event graph? Something like lightbulbs or batteris vs the Spanish American war might provide a good test case.

Also, do you think there might be changes in how these graphs play out over a century? That is, do news events remain separate from tech stuff? Tech changes these days are often news events themselves, and distributed similarly across media.

I think another way to put the tech vs news event could be in terms of the kind of event it is: structural change vs superficial, mid-range event vs short-term.

Anyhow, a very interesting idea, of using the visual pattern to recognize and characterize a change. While I think your emphasis on the teaching angle (rather than research) is spot on, this could be one application of these techniques where itd be more useful in research.

He or she is right that technology vs. news isnt quite the right way to describe it. Even in the 19C, some technology changes are news events, while others arent. But lets look at some examples here.

First, news events. Events arent usually defined by nonproper nouns the way that technologies or social/intellectual movements.Even the obvious ones, like war, dont work as well Id likeif this database included more newspapers and magazines, that would be different. I find it baffling that loess has the use of war peaking in 1860, before hostilities started.

Places can as proxies for events, but only sometimestake three civil war sites.

Sumter is the most common, and does about what youd expectrises out of obscurity from 1860 (although why the mini-spike in 1860?). The others are messier, but so rare that its hard to make any generalizations. Youre probably wondering why I left out Gettysburg:

The huge spike in 1913 would have made the previous chart unreadable. Loess regression ignores it as an outlier, but we know that its actually the fiftieth anniversary of Lincolns address. Both Gettysburg and Antietam spike briefly in the 60s, fall off, and then begin a slow climb back upthat says something about the way they took on more important meaning as time went on, perhaps in contrast to Shiloh. Nothing I see here makes me think less of the main themes of David Blights great book on the civil war in American memory. The database might not be big enough, though, to let us draw inferences about words that rare. That Gettysburg peak is 1700 mentions, and Shiloh is getting about 50-100 mentions a year after the civil war. Thats pretty good, but Id have to implement book counts to see if thats all just driven by, eg., one or two histories of the civil war a year.

Anyway, lets say thats an OK way of characterizing a news eventthe sudden impingement of a previously obscure place on the national consciousness, followed by recession that leaves it well high of the original point. Here are some for three major wars of the century covered (Bighorn and Hidalgo for the record, dont have anywhere near as much penetration into language. Plus Bighorn peaks around 1910. Again, interesting. But also indicative that this isnt a science.)

Loess doesnt see that Verdun is already trending down by 1922, but you can.

So what about technologies? My imagination is failing me on them, but here are three. When I talked about technologies having slow adoption curves, I was thinking mostly of the telegraph and railroad, which I had tested before.

But that telephone line is impressive, even if the strength of the spike is driven largely by the point in 1911.

So I agree its not really about technologies vs. news eventstelephones can break into prominence, Antietam can slowly rise.

So how could we class these things? Structural vs superficial, mid-range vs. long term are trueId also add some sense of magnitudedoes a word increase by 5% a year or 60%? But those are all just descriptors of shape, not really sociological modelsideally, I guess, we could move from the math to discover what types of words match it. How would we do that?

I see two ways. One is to just look at the shapes of the curves for previously obscure words that rise in prominencewe could cut off everything before it starts to go up, and then normalize the values and run some sort of least-squares comparison on the loess curves. (Or on the moving averageIm still not totally sold on loess).

Another would be to come up with a number of individual dimensions that express what we talked about before: the ratio of the peak to the last available datapoint, the time between the start of the rise and the peak, the slope of the fastest-rising five years, a few other things like that. Then we could do a principle-components analysis to find words that share similar trends. In theory, this is close to the first way, but by choosing the variables used carefully we might be able to pull out the patterns that really seem important. Ive always wanted to use pca more than I have in the past, so that might be the route Ill take if I pursue this. The prior way has more purity, however.