You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Women in the libraries

May 08 2012

Its pretty obvious that one of the many problems in studying history by relying on the print record is that writers of books are disproportionately male.

Data can give some structure to this view. Not in the complicated, archival-silences filling waythats important, but hardbut just in the most basic sense. How many women were writing books? Do projects on big digital archives only answer, as Katherine Harris asks, how do men write? Where were gender barriers strongest, and where weakest? Once we know these sorts of things, its easy to do what historians do: read against the grain of archives. It doesnt matter if theyre digital or not.

One of the nice things about having author gender in Bookworm is that it opens a new way to give rough answers to these questions. Gendered patterns of authorship vary according to social spaces, according to time, according to geography: a lot of the time, the most interesting distinctions are comparative, not absolute. Anecdotal data is a terrible way to understand comparative levels of exclusion; being able to see rates across different types of books adds a lot to the picture.

In this post, Im going to run through a lot of basic metadata about the gender composition of libraries very quickly, because I need to know it to work with this data. Although this is the bookworm database, the rules for inclusion in Bookworm are so simple (Open Library page, Internet Archive downloadable file) that at least up to 1922, the results here should be broadly similar to any large selection of texts that draws heavily from the Google library-scanning project. (Most notably: HathiTrust and Google Books). And those are so similar to the composition of the university libraries that humanists have been using for decades, that even non-digital researchers should have some use for similar statistics.

More interesting findings might come out of more complicated questions about interrelations among all these patterns: lots of questions are relatively easy to answer with the data at hand. (If you want to download it, its temporarily here. For entertainment purposes only, etc., etc.)

The most basic question is: what percentage of books are by women? How did that change? (Of course, we could flip this and ask it about menthis data analysis is going to be clearer if we treat women as the exceptional group). Heres a basic estimate: as the chart says, post-1922 results are unreliable. The takeaway: something like 5% at midcentury, up to about 15% by the 1920s.

Although the fall around 1950 is hard to interpret since the sample gets so small and specific, I do think its interesting: I think, without references that a lot of other indicators of womens empowerment (Ph.D. and BA earning rates? age at first marriage?) show a similar pattern when plotted against time. Just another reminder that the 1960s are one of the least typical periods in US history, and that the widespread practice of using them as some sort of baseline is very misguided.

From now on, Im removing post-1922 data from the analysis.

Next: Library of Congress classifications, my favorite proxy for genre. The labels wont fit on this chart, but you can read them here. The results are generally between 10 and 20% female for most genres (roughly comparable to the data in the Arxiv nowadays, I think), with some notable exceptions.

(If its not clear: the transparency here is according to the (log of the) number of books in the category. Theres will be a strong tendency on these charts to overestimate the importance of some small genres: this is my attempt to let you avoid that).

  • The Psfictionare far and away the most frequently female fields. Theres really no question about it: particularly PZ (fiction and juvenile belles-lettres), but also PS (American literature) are more female than almost any other field.

  • DD, German history, is _far_ more male_dominated than any other field in history except maybe E, one of the two for US history. Does this reflect greater constraints on access to print in the heavily university-dominated German system in the 19C? (For American or German authorsthe Ph.D.s are probably all going through Berlin, anyway). Are there other places that institutional discrimination might be evident?

  • Genealogy and particularly biography, (CT) are a really striking area of female authorship. Might be worth looking into.

  • HQThe FamilyMarriageWomen is about 45% female. Most of this is probably settlement-house stuff that is well covered in the historiography, but is nonetheless a little higher than I might have thought.

  • K, the law, has fewer women than anywhere. As with the German history, that can reflect the role of higher education in enforcing discriminatory practices.

  • The religion section of the Bs, BL-BX, is particularly male-dominated, with the exception of practical theology. The really strikingly low bar, BM, is Judaism.

  • From the number of authors Ive worked with myself, I think of the Lseducationas having a very high female percentage. (Although more in the 1930s than the 1900s). But though theyre a little higher, its not that notable.

  • The Ns, visual art, are a little more female than most other fields.

  • The low numbers in the sciences and technology are not very surprising; the spikes in the Ts are for handicrafts and home economics. The latter of those is the only field to break 50% female.

What about geography?
By state. Massachusetts does extremely well: of books with a publishing industry to speak of, only California does better. New York is OK, but in the middle of the pack. A lot of this probably has to do with the individual presses in the statesee the publishers list below for more on that.

 

A question emerges: Montana and Nevada both seem to have high female percentages. We know that western states had womens suffrage early; is the same true of female authors? A map loses the information about which states actually have significant numbers of books published in them, but makes regional comparisons easier. My opinion is that it puts to rest any idea of a particularly progressive West, but I could be dissuaded from that.

International comparisons are interesting as well. We can look at publication country. The result is a really striking win for the United States, with almost 18% of books written by women. The Swedes are next, followed by the Australians. Once again, the Germans are shockingly bad. This seems too strong to be merely a genre effect: the Germany overall percentage is lower than a lot of the science fields are. Whats different about 19th century Germany compared to these other countries? And what does America not have? Im strongly inclined to blame the developed system of universities.


Publishers exist in the data, although theyre a little harder to pull out. After a little text scrubbing (to make Little,Brown the same as Little Brown the same as Little, Brown & co.) the following are the largest publishers:

The numbers get surprisingly high here in some places:

  • Its nearly 50% for Roberts Brothers; that might even be low, since they seem (from Wikipedia, Im ashamed to admit) to have built their success on Little Women, and generally capitalized on the market that opened up.

  • I thought Dodd Mead was largely the education market, but wikipedia has no sign of that. Why did one mass-market publisher would publish about 1/3 women, while putnam or macmillan publish only about 1/8?

  • Houghton Mifflin and Little Brown both get above 20%: this probably has to do largely with the predominance of fiction (remember the PZs above), but there might be other differences as well.

  • Grosset Dunlap is largely the childrens market: thats clearly a confounding factor on a lot of these statistics.

Among the low percentages:

  • The government printing office is not surprising, but worth remembering.

  • T.T. Clark is largely religious materials, I believe.

  • The university presses (U of Chicago, the Clarendon press at Oxford) are among the lowest. Yet another strike against the universities.

Thats the general outline of gender patterns from library book metadata in the data I have. One thing Id like to do, but cant with my current data, is look at whether individual libraries seem to have strongly discriminatory patterns compared to others.

If I were to draw a preliminary conclusion, it might be: established institutionsthe state, the universitiesseem to most strongly suppress women, presumably because there are more hurdles to jump. In certain areas, things have changed. In others, they haventI ran some of this on the ArXiv author lists, and the 10-15% figures hold in the sciences. Theres no reason to think that the same massively distortionary effects arent still going on in academia, particularly on behalf or against social structures in addition to gender.

Keep in mind: women are the only discriminated-against group that we can pull out of library catalogs, but hardly the only ones in the 19th century. Surnames might get ethnicitiesI havent had much luck with thatbut race and class are virtually impenetrable. I suspect that access to print is at least as strongly skewed by income and race as it is by gender. I dont thinkI have to write this up at greater lengthit makes any sense to not use libraries as they are not representative. They are what they arelibraries are interesting. Everything that anyone ever said would be interesting, too. We have one of these: well never have the other.

A few disclaimers: All this data is restricted to 1,000,000 library books from the Open Library; I see no reason to think they arent basically representative of the books that make it into university libraries. (Except that all but one or two of them had considerably fewer books around 1910-1920). The basic gender categorization scheme is here. For percentage of books I calculate categorized female authors divided by categorized female plus categorized male, throwing out books I cant classify. Those numbers will be off if unclassifiable authorship skews heavily in one direction or the other, but I dont see substantial reasons to think thats happening.

Comments:

Awesome stuff, and Im confident that this is

Anonymous - May 3, 2012

Awesome stuff, and Im confident that this is only the beginning of what we can get out of broad gender classification of metadata.

Two questions. 1) Why do you think people havent done this before? Didnt require digitized text, just metadata which weve had. My guess would be that error tolerances have been set too low. [And although the data wont be perfect, I cant emphasize the degree to which this doesnt matter for the sorts of questions that can usefully be asked of it.]

  1. Will you at some point make your gender classification public? A list of imputed genders paired to standard volume identifiers and/or author names would do it.

I dont think theres any rush here: you should feel free to actually publish some results on paper before making that data public. But in the long run, theres obviously a huge potential here for further work.

1) (With the caveat that maybe it has, in various

Ben - May 3, 2012

  1. (With the caveat that maybe it has, in various places): part of the reason is error tolerances, definitely, and the confusion of imprecision with bias that humanists always bring up against data. Part of it is division of laborthis is more a librarians task than a historians, but librarians are rightfully wary of doing machine-classification. And part is that it may not be that useful: I dont have a clever argument I can build around this data at the moment (and indeed, I wouldnt have done it if I wasnt headed towards full-text comparisons with it), and theres not really a venue for fun facts about author genders. Particularly because womens studies is frequently on the individualist side of the structure/agency divide.

For 2) Most of that should be in the file I linked, as long as it stays up. If I could pull together a real publishing agenda on this, Id probably run with it. Well see. I dont totally see a path to pull this from facts to arguments, but maybe that comes.

Ill say only Im weirdly confident in tha

Ben - May 3, 2012

Ill say only Im weirdly confident in that dip, although there are all sorts of reasons to suspect the data (particularly since so much of public-domain stuff is governmental. Pages 16-17 of this NSF report (pdf) give the gory details, but basicallypretty much every field experienced a 25-50% drop in the percentage of female doctorates from 1920 to 1960, in a manner only partly attributable (I believe) to the GI bill.

If ever munge in that Harvard Library metadata, it would be easy to check for sure.

LibraryThing has done some work on identifying aut

Andrew Gray - Jun 4, 2012

LibraryThing has done some work on identifying author genders, as well, as part of their Common Knowledge data - I dont know how easy it is to extract, but apparently its been manually recorded for ~400,000 authors..

I have two iTunes libraries, one on my mother

Unknown - Feb 5, 2013

I have two iTunes libraries, one on my mothers computer (that I used to share with her) and a new one on my own computer. Both libraries contain a significant number of songs. Some purchased, some from CDs. I cannot locate all of the CDs from my original iTunes, and would prefer to transfer the music in one easy step. Is this possible?

phlebotomy schools in NV

nice info

Anonymous - Apr 4, 2014

nice info Obat Aborsi | Obat Bius