You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at For current posts, see here.

Literary Dopplegängers and interestingness

May 30 2016

I started this post with a few digital-humanities posturing paragraphs: if you want to read them, you’ll encounter them eventually. But instead let me just get the point: here’s a trite new category of analysis that wouldn’t be possible without distant reading techniques that produces sometimes charmingly serendipitous results.

I’ll call it dopplegänger books. A dopplegänger is, for any world-historically great work of literature, a book that shares many of the same themes, subjects, and language, but is comparatively obscure, not widely read, and–most likely–of surpassingly mediocre quality.

Edit: Ryan Cordell informs me privately and regretfully that I’m wrong in some of my conclusions here. I said, “I took a grand total of one English literature class in college; does anyone expect me to be right?” But he’s worried that my wrongness might reflect poorly on the field of DH, which has a history of critics straw-manning offhand blog posts into terrible representatives of the field. So let me say up front: Persons attempting to find an argument in this post will be prosecuted; persons attempting to find political advocacy in it will be banished; persons expecting me to have anything above a high-schooler’s knowledge of English literature will be shot.

Take Huck Finn. In hazy recollection (I haven’t read the whole book in probably 10 years), much of what seems great about it is the purely American picaresque of a vision of America. Twain’s interest “is in the the boy in whose mouth he puts the story, and in this boy’s view of the world as it passes under his eye.” Huck “is a true child of the river,” and gives us a view of America seen through the eyes of “a perfect vagabond of a youngster, wandering up and down the river at his will, taking in the passing show with open mind, finding it all for to admire.”

All those quotes, as you may already have guessed guessed, are not describing Huck Finn at all, but instead come from a review of Charles Stewart’s Partners of Providence (1904).

Read the book online through Hathi

The table of contents is pretty fascinatingly close to Huckleberry Finn; the reviewers note the comparison, and it’s hard to imagine that the tale of a young boy’s adventures up and down the river with an entertaining ethnic (here Irish) sidekick past swindlers and exhibitions and perils wasn’t somehow noveled on the most famous humorist in the country.

But there are surely differences as well; I wouldn’t be surprised if an in-class discussion on the racial politics Huckleberry Finn couldn’t benefit from a brief comparison to Partners’ account of “the marooning and subsequent escape of a pair of pugnacious darkies.”
Across the c. 4.5 million public domain volumes in the Hathi Trust, there are a surprising number of these, many books that seem (based on Google searches) to languish in deserved obscurity. (I’ve got a set of tricks that actually finding the pairings more feasible than running 20 trillion pairwise comparisons, but the exact mechanics of that are for another day). But they’re interesting; not in a “distant reading” way, but in that they provide some greater focus around the core texts we all read already.

So let me just plug a few books in here and see what comes back. My criteria are just that the original book be canonical.

Huckleberry Finn
Twain is closest to himself; Huck Finn is closest to the later Tom Sawyer books than to Tom Sawyer itself, which should perhaps not be surprising.

But nearest-neighbor searching also reveals a deep vein of western boys literature. We know that this exists; the interesting questions here would probably involve the specific ways (especially dialect: these are mostly first person narratives in highly vernacular styles) that writers imitate Twain.

Publication years also provide a point of departure. All the books here were written substantially later than Huck Finn except for “Live boys in the Black Hills.” So if I were going to pick any up, maybe I’d start there.

0.628 Danny’s own story, (1912)

0.662 Mr. Pratt, a novel, (1906)

(and 4 nearly identical books): 4

0.665 Lige Mounts: free trapper, (1922)

(and 3 nearly identical books): 3

0.679 Jim Hands / (1911)

(and 1 nearly identical books): 1

0.681 Swatty; a story of real boys, (1920)

(and 1 nearly identical books): 1

0.683 Mark Tidd in the backwoods, (1914)

0.683 Live boys in the Black Hills, or, The young Texan gold hunters : a narrative in Charley’s own language, describing their adventures during a second trip over the great Texas cattle trail … (1880)

0.685 Peace in Friendship Village, (1919)

(and 2 nearly identical books): 2

0.689 Billy Fortune, (1912)

This has fewer straightforward imitators; but the whaling novel is a perfectly well-represented genre.
The closest match is the romance “The Red Eric; or, The whaler’s last cruise. A tale” from 1883. Some elements of the contents are provocative, at least; but the similarities are less than perfect. (Red Eric’s captain’s “insane resolution” is to bring his daughter on a whaling cruise with him, for example).

A few other options include a collection of sea stories,

0.615 Round the gallery fire / (1914)

0.616 A Bounty boy: being some adventures of a Christian barbarian on an unpremeditated trip round the world, (1912)

0.616 Sea-wrack, (1903)

(and 1 nearly identical books): 1

0.618 Old Jack, a man of-war’s man and South-Sea whaler, (1859)

0.620 The cruise of the Cachalot round the world after sperm whales / (1911)

(and 3 nearly identical books): 3

The Cruise of the Cachalot and Sea-wrack, by Frank Bullen, offer some of the more interesting comparisons. Properly shuffled, it makes sense that Moby Dick’s closest companions might include not literature at all, but piecemeal miscellanea from the magazines like this (“Sea-Wrack”)


Middlemarch is somewhat harder to find close matches for uninteresting reasons: since the novel is so long, it was frequently chopped into 2, 3, or 4 parts; and each one of those sections ranks highly on the list.

          ``` [Hannah. (1890)](
      and 3 others*   ```
          ``` [A brave lady. (1870)]($b111725)
      and 3 others*   ```
          ``` [Fraternity; a romance ... (1910)](

The nearest novels are by Dinah Craik, who I don’t know, but who seems well enough established as a poor man’s George Eliot in the scholarly literature. (Googling quickly brought me to the online version of Sally Mitchell’s monograph on the author.). “Hannah”, the closest, is characterized by Mitchell as “a one-issue novel with a narrow legislative aim.”

Fraternity; a romance … (1910) is a harder nut to crack. It’s a rural novel set in Wales and published by Macmillan around 1888, but the only surviving digital copy was (according to library metadata) published in the United States in 1910. (Galsworthy’s 1911 novel Fraternity further muddies things here.) It’s the subject of a strikingly positive review in the Boston press that explicitly casts it as a diamond in the rough.

I was going to let it go there, but then discovered a whole separate track via this book. The author is one Miss M. M. Holland Thomas, and the novel somehow attracted the intense admiration of JP Morgan for its message of social reform through benevolent patronage. (It is Morgan who paid for the American reprint in 1910.) Does this story have anything to do with a similarity to Middlemarch? Hmm. there’s definitely something here about the connections between the English social novel and political intentions. But beyond that, I couldn’t say.

The Education of Henry Adams

The absolute closest match is his brother’s autobiography. Which should surprise no one, and I’m sure I’ve encountered the book before. “Early Memories” by Henry Cabot Lodge is also high on the list, which is probably a decent choice as well. But I’ll pick as the dopplegänger Cambridge Sketches by Frank Preston Stearns, which hits a number of the same points

0.586 Charles Francis Adams, 1835-1915; an autobiography; (1916)

(and 12 nearly identical books): 10 11 12

0.600 Studies of men / (1895)

(and 4 nearly identical books): 4

0.606 Cambridge sketches (1905)

(and 2 nearly identical books): 2

0.608 Early memories, (1913)

(and 3 nearly identical books): 3

0.609 Charles Sumner, (1892)

(and 3 nearly identical books): 3

0.610 History of the United States of America. (1889)

(and 5 nearly identical books): 5

0.610 Life and letters of Edwin Lawrence Godkin; (1907)

(and 4 nearly identical books): 4

The Souls of Black Folk

A real genre-bender of a book, even more than Moby Dick. And even less often reprinted.

The closest match is a fairly dull-seeming hagiography of Booker T. Washington. But I’ll take as a shadow “Up stream: an American chronicle” by Ludwig Lewisohn. It seems to be the personal memoir of a German-born Jew who grew up in Charleston, SC before attending Columbia and (eventually) becoming a founding faculty member at Brandeis. The grounds for similarity aren’t entirely clear–perhaps some odd combination of self-recognition, music, and the South?–but that’s what makes it an interesting track. Some of the

      ``` [Booker T. Washington, the master mind of a child of slavery : an appealing life story rivaling in its picturesque simplicity and power those recounted about the lives of Washington and Lincoln : a biographical tale destined to live in history and furnish an inspiration for present and future generations, a human interest story depicting the life achievements of a great leader of a rising race / (1915)](
          ``` [Darkwater : voices from within the veil / (1920)](
      and 3 others*   ```
          ``` [Up stream; an American chronicle, (1922)](
      and 2 others*   ```
          ``` [Memoirs of a millionaire, (1889)](
      ``` [Circumstantial evidence; a novel. (1890)](
          ``` ["89. (1891)](
      and 2 others*   ```
          ``` [The Harvard monthly. (1885)](

Autobiography of an ex-colored man

On the topic of great Af-Am literature. This one was suggested to me as a candidate by John Reuland. For this one I’m pasting in a longer list of matches, because we were initially very disappointed at the results. (Very little African American literature on the list).

But on looking at the list, what there is is an extraordinary amount of autobiographical self-help literature about money. So maybe there’s some lesson to be gleaned there.

0.610 A victorious defeat; the story of a franchise, (1906)

0.612 Banner bearers; tales of the suffrage campaigns, (1920)

(and 2 nearly identical books): 2

0.619 Not angels quite. (1893)

0.619 In paradise : a novel, from the German of Paul Heyse. (1878)

(and 2 nearly identical books): 2

0.620 Years of experience; an autobiographical narrative. (1886)

(and 1 nearly identical books): 1

0.621 The “goldfish” : being the confessions of a successful man. (1921)

(and 7 nearly identical books): 7

0.623 Philip Gerard : an individual / (1899)

0.624 Courtship under contract : the science of selection, a tale of woman’s emancipation / (1910)

(and 1 nearly identical books): 1

0.624 Of one blood / (1916)

0.625 The Lawton girl, (1897)

(and 1 nearly identical books): 1

0.625 My threescore years and ten : An autobiography / (1892)

(and 2 nearly identical books): 2

0.626 The works of Charles Dickens … (1898)

(and 8 nearly identical books): 8

0.627 A man of millions / (1901)

0.628 The writings of Mark Twain. (1899)

(and 6 nearly identical books): 6

0.629 Jacob Schuyler’s millions. A novel. (1886)

0.630 The £1,000,000 bank-note, and other new stories, (1893)

(and 2 nearly identical books): 2

0.630 Bubble reputation : a story of modern life / (1906)

(and 1 nearly identical books): 1

OK, that’s enough.

Portrait of the Artist as a Young Man
Again, the matches aren’t as clear; a vocabulary-based approach like mine works best thematically distinct themes like riverboats, not with “childhood.”

There are some vaguely interesting similarities: at #3, I particularly like “What to read at Winter Entertainments,” in which it appears the closest antecedent to Joyce is a stuffed-together hodgepodge of great British writers from the 19th century. Sounds about right.

But as a Doppleganer, I’ll take Shaw Desmond’s Gods, which seems to cover similar places in the Irish experience of the early 20th century.

On Interestingness
I’ve thinking about Ted Underwood’s “old-fashioned, shamelessly opinionated, 1000-word blog post” from yesterday. There are parts I wholeheartedly agree with, such as the section where he dances near to, but decorously avoids citing, Kieran Healy’s magnum opus on what calls for nuance do in contemporary academic discourse. There are parts I don’t; I’m increasingly convinced that efforts to apply and invent novel algorithmic practices should be fully central to the work of some humanists, and that calls to return to the primary questions of the disciplines are not just premature but somewhat misguided.*

(Roughly, although I should boil this up into a richer stew at some point: very few people outside a philosophy department think that only academic philosophers should do philosophy; very few people *inside* history departments think that only academic historians should do history. Just as we let political philosophy flourish in politics departments and cultural history flourish in art and music departments, computer programming shouldn’t be the sole province of computer science departments.)
Is this interesting? I’m not sure. It’s not here-I-come-PMLA interesting, for sure. But then again, I’ve never deliberately sought out much contemporary literary history written since 1980 or so. For a certain sort of Arnoldian prudish conception of literature, I kind of like the game. Much like my anachronism-searching blog posts, it’s a field-and-context approach to literature where the whole is not treated as the object of study itself (the stated purpose of much “distant reading”) but as a conveniently large wall on which to reposition the works of literature we’re already interested in. What that means for literary history, I think I’m under no professional obligation to say.

Bonus links

A little bonus for those who read through to the end; a temporary link to a live interface to the engine I used for this thing, so you can play along at home. Just go to and you can paste in any text you’re interested in. Terms and conditions are: don’t link to that page, because this may not scale; and e-mail me or post in the comments if you find any terrible bugs or interesting matches.


I suspect I don’t really disagree with you. No…

Ted Underwood - May 1, 2016

I suspect I don’t really disagree with you. Notice that I say “writers who want to reach a broad audience need to resist geeking out over the latest algorithm,” not “I personally promise to resist geeking out over the latest algorithm.”

On the contrary, I can pretty much guarantee that I won’t resist. Especially not when I’m reading/writing blog posts.

And doppelganger-search strikes me as something that has immediately practical applications even for non-distant-readers. It could be a pretty decent way of exploring literary influence.

I think before I got all distracted by the novels …

Ben - May 1, 2016

I think before I got all distracted by the novels and gave up actually responding to you, what I meant I really think was:

1. If you want to be publishable to a disciplinary audience, you probably do best to avoid geeking over the latest algorithm;
2. If you want to be interest to a non-disciplinary audience, you’ll often do well to geek out over the latest algorithm, and pursue the questions it makes possible without regard to whether they have been answered satisfyingly with the disciplines.

But the point being that “interestingness” is often a disciplinarily defined phenomenon. I think I deleted a sentence as too inside-baseball saying that I thought your frequently-voiced suspicion of “DH” as a useful constellation was related here. If DH were related to English the way bioinformatics is related to biology, then things might look different. But I’m probably not ready to declare war on the disciplines yet. Or at least, not on *my* discipline.

Hmm, interesting. Good points there. In a certain …

Ted Underwood - May 1, 2016

Hmm, interesting. Good points there. In a certain sense, you’re right, embracing algorithms as algorithms could produce a broader (in the sense of more cross-disciplinary) audience.

I don’t know if it’s either/or. I’m not systematically rededicating myself to disciplinarity – far from it, in some ways. Ideally, I’d like to do both of these things: pure and applied text mining.

My suspicion of DH is related to all this, I guess, but in a complicated way. It’s not the cross-disciplinary character of DH that tires me out so much as the lack of a methodological common denominator.