You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Centennials, part II

Dec 03 2010

So I just looked at patterns of commemoration for a few famous anniversaries. This is, for some people, kind of interestinghow does the publishing industry focus in on certain figures to create news or resurgences of interest in them?  I love the way we get excited about the civil war sesquicentennial now, or the Darwin/Lincoln year last year.

I was asking if this spike in mentions of Thoreau in 1917, is extraordinary or merely high.

Emerson (1903) doesnt seem to have much a spikehes up in 1904 with everyone, although Hawthorne, whose centenary is 1904, isnt up very much.

Can we look at the centennial spikes for a lot of authors? Yes. The best way would be to use a biographical dictionary or wikipedia or something, but I can also just use the years built into some of my author metadata to get a rough list of authors born between 1730 and 1822, so they can have a centenary during my sample. A little grepping gets us down to thousand or so authors. Here are the ten with the most books, to check for reliability:

1   Thackeray  1811               86
4      Holmes  1809               52
9      Darwin  1809               30
10    Dickens  1812               30
14   Whittier  1807               29
15  Hawthorne  1804               28
21    Spencer  1820               25
22    Tyndall  1820               25
23    Holland  1819               24
26 Longfellow  1807               22

So who has the biggest centenaries? The Percent column is the percentage of all mentions of an author that occur in his or her (hello, Harriet Beecher Stowe!) centenary. Im only applying this to the 200 most represented authors. So Thoreau is indeed remarkable in having 11% of his mentions in his centennial year. There are a lot of other big centennials in the 90s and aughts, and only one from the first half of our sample.

Percent       Author Birth Year
1  12.305728     Gieseler       1792
2  11.053941      Thoreau       1817
3   7.580752    Thackeray       1811
4   5.753147      Colburn       1793
5   5.594953        Stowe       1811
6   4.320671 Woodhouselee       1747
7   4.209995        Lewes       1817
8   3.640463      Dickens       1812
9   3.634805      Emerson       1803
10  3.533058      Haswell       1809

Do the centenary scores increase as time goes on?

> cor(centenary.scores[,3],centenary.scores[,1])
[1] 0.02625526

Emphatically they do not. Its unusualto get a correlation score so close to zero on this kind of data. But that may just be because most authors dont get centenary celebrations, in which case most of their mentions were probably closer to when they were alive. Or for some, like Darwin, the centennial just doesnt matter compared to the other controversies that get kicked up around the name at other points in time.

What if we did a different set of centennials: say, presidents? I can just pull their dates off wikipedia and sort the percentage of their mentions that come in the centenary year:

So any number below 1 means their centenary year had a below average number of mentions. I knew the Lincoln centenary was a big deal, but Fillmore? Who knew? A graphic like this could be good for teaching if we want to talk about, say, the eclipse of Grant, who gets a smaller boost than Hayes in 1922.  I could go on, but I think its clear that theres some interesting stuff about not just publishing practices, but maybe larger questions of reputation.

For the record, there is a positive correlation here:

> cor(as.numeric(presidents[,2]),result)
[1] 0.4922704

But its mostly driven by the lack of centenaries for the founding fathers, which I think is something else entirely.

But enough of this stuffafter 19 hours of processing, Ive finally got my database running in a new form, which should open up some new possibilities for comparing across discursive spheres.

Comments:

This, from wikipedia, is one of the dumbest senten

Ben - Dec 5, 2010

This, from wikipedia, is one of the dumbest sentences Ive read: The record for the fewest Presidential birthdays is one, shared by June and September.

Two questions: 1. Was that Bronson or Louisa May i

Dan - Dec 6, 2010

Two questions:
1. Was that Bronson or Louisa May in that graph?
2. Could you do a similar graph with some writers who werent transcendentalists or Am. Renaissance stars? Im interested in knowing if that big jump in the 1880s and beyond has to do with subject matter, reputation, or is partly an artifact of a large general jump in US publishing.

Dan, Its just Alcott, unfortunat

Ben - Dec 6, 2010

Dan,

Its just Alcott, unfortunatelymy system only works quickly on single words. Im curious about all those things, too, but Im not sure Im going to chase it down right away. Most of the jump is generic, I thinkLongfellow gets it too, and more importantly, Shakespeare gets an abbreviated version. Even John Bunyan, who pretty clearly peaks earlier, has a secondary peak around 1885. I said earlier I was working on a post about the pitfalls of loess and the transcendentalistsIm just not sure how deep I want to go into differences between publishing houses, etc., particularly before I have genre information.

As for whether its a large general jump, the numbers are normed against all words published that year, so its not just that there are more booksbut theres probably some sort of change, maybe involving publishers printing books about writers more, and not just the primary sources.

A less interesting possibility is that it has to do with publishers putting the names of other books in their catalogs on the endpapers, which my program would parse. I know you see that a lot in the late 19C, but I dont know when it starts. Thats a whole problem Ive hardly thought about at all.

Ben, If its the last alternative (which is,

Hank - Dec 0, 2010

Ben,

If its the last alternative (which is, I agree, less interesting, at least for content analysis), is there some way to spin out of it to claims about different efforts by different houses to push their wares in different books?

Its possible thoughts on that would have to wait for genre tags from you, but I just wondered if there were, in theory, some way to zoom out from an admission about a seemingly technical snafu to some analysis about commercial patterns in the publishing industry a la the last piece of Dans second question