Sharing texts better, part 1: Austrian Newspapers
It’s not very hard to get individual texts in digital form. But working with grad students in the humanities looking for large sets of texts to do analysis across, I find that larger corpora are so hodgepodge as to be almost completely unusable. For humanists and ordinary people to work with large textual collections, they need to be distributed in ways that are actually accessible, not just open access.
That means:
Downloading
Reasonable file sizes (rarely more than a gigabyte).
Reasonable numbers of files (don’t make people download more than a dozen for some analysis tasks.
This isn’t happening right now. The hurdles to working with digital texts are overwhelming to almost anyone. I don’t usually write up a simple process story about what it’s like to get collections of texts, but I want to do do so a few times here.
What follows here is–I should be clear–a sort of infomercial. Over the last year or so I’ve started formalizing a much better way to distribute texts than any cultural heritage currently uses.
I’ll share texts using it. I want to start looking at some collections I encounter to make clear just how high are the barriers to working with text the way we’re distributing it now.
Part one: newspapers. Newspapers should be, in theory, a pretty easy type of text to distribute. In an ideal world, a newspaper is divided up into articles. But most of the open-access newspaper collections I’ve seen instead chope papers up into pages. That’s the case for the first archive I’m going to look at in this series: newspapers from the Austrian National Library hosted on Europeana.
I can’t completely remember the details of why I’m looking at this collection, but in short: a graduate student in my Working with data class was interested in doing text analysis for their class project on newspapers from there. We decided that the Neue Freie Presse would be an especially useful paper, and identified digitized versions both on Europeana and at ANNO, hosted by the Österreichische Nationalbibliothek. (If you visit the Wikipedia page for the NFP, it takes you to a dead Columbia link) ANNO has a nice online interface including well-formatted links like “https://anno.onb.ac.at/cgi-content/annoshow?text=nfp|18970610|20” for full-text: this seems like a possible route for getting data, although the decades of data will take an extremely long time to download in R. Looking for other copies, I first check the Atlas of Digitized Newspapers from the Oceanic Exchanges project, because I know that they have decent information about accessibility. (Despite the name, they are not an atlas in any normal sense, but instead of bibliography, registry, or catalog.) It suggests that access will be to XML files through Europeana, and does not list any access through ANNO above what I’ve been able to find.
But it also links to a bulk download site at Europeana. Looking at the Europeana sites during a Zoom call we discover that there are a number of full-text downloads identified by opaque numbers: 9200300
is the first one.
Here’s where we hit the first snag. What are these numbers? Looking at the site for one of the NFP pages in the Europeana browser, we see that it, too, starts with 9200300
. Perhaps this is just what we want? But the file is unthinkably large–116 GB, zipped, for the page-level full text. This is too large for the grad student to download, but I click on it to see what will happen. It spins, and spins, long past the end of office hours. The student has to wait.
A week passes. While looking for a completely different file on my computer, I encounter a 63GB zip file in my downloads. I dimly remember downloading this earlier, and think about opening it. To just unzip a 63GB file would be crazy–this is another place that most researchers will be stimied. I know that one can access a zipfile randomly, though, and fire it up in Python to read.
This is a second place that most researchers would be lost–63 GB is just too big. There should never be a single file that large unless it’s completely necessary; in this case, that’s clearly not so. The idea that you can extract single files is simply not obvious, so many people will try to extract. I don’t know exactly how big that 63GB file will be, but probably large enough to clobber most hard drives.
I’ve named the zipfile ‘ NFP.zip ’ now, because I’m hoping it has the Neue Freie Press. Now I can read the list of filenames.
import zipfile
import html
f = zipfile.ZipFile("NFP.zip")
fnames = f.filelist
It turns out to have 1.6 million little files bundled in there, with names like 9200300/BibliographicResource_3000116292697/3.xml
. Hmm. Well, the end is clearly the page number, and perhaps the bibliographic resource is the individual issue?
I read in a single document–the one-millionth–to see.
<TextLine HEIGHT="61" WIDTH="703" VPOS="25" HPOS="166"><String WC="0.5249999762" CONTENT="rung" HEIGHT="29" WIDTH="68" VPOS="37" HPOS="166"/><SP WIDTH="19" VPOS="32" HPOS="234"/><String WC="0.5199999809" CONTENT="des" HEIGHT="29" WIDTH="46" VPOS="33" HPOS="253"/><SP WIDTH="10" VPOS="35" HPOS="299"/><String WC="0.4877777696" CONTENT="höchstens" HEIGHT="43" WIDTH="140" VPOS="30" HPOS="309"/><SP WIDTH="17" VPOS="38" HPOS="449"/><String WC="0.625" CONTENT="ui" HEIGHT="22" WIDTH="28" VPOS="45" HPOS="466"/><SP WIDTH="17" VPOS="45" HPOS="494"/><String WC="0.275000006" CONTENT="emem" HEIGHT="27" WIDTH="84" VPOS="45" HPOS="511"/><SP WIDTH="10" VPOS="42" HPOS="595"/><String WC="0.4562500119" CONTENT="fncvüchm" HEIGHT="40" WIDTH="149" VPOS="42" HPOS="605"/><SP WIDTH="9" VPOS="48" HPOS="754"/><String WC="0.3616666794" CONTENT="Zustan" HEIGHT="36" WIDTH="96" VPOS="48" HPOS="763"/><HYP CONTENT=""/></TextLine>
So–it’s XML of the scans including exactly the position in pixels of each work. I consider parsing the textlines out and deconstruction the JSON, but XML parsing is a pain and always tediously, tediously slow. And I don’t care about any of this stuff–I’m doing text mining, so I just want the words. A quick check back at the Europeana site confirms that I have the smallest file on offer.
So let’s do the quick and dirty approach. The letters I want follow the word “CONTENT” in the XML; so I’ll just write a quick-and-dirty approach that splits on that string, and grabs everything up to the second quotation mark. This is how people use XML, I tell myself; no one is enough of a sucker to use python’s XML parsing libraries, so let’s just munge it out. split
is so much faster….
import pyarrow as pa
from pyarrow import parquet
while True:
pages = []
ids = []
for j in range(5000):
print(i, end = "\r")
r = f.open(fnames[i])
words = []
for word in r.read().decode("utf-8").split('CONTENT="')[1:]:
words.append(word.split('"', 1)[0])
page = html.unescape(" ".join(words))
pages.append(page)
ids.append(fnames[i].filename.replace(".xml", ""))
i += 1
out = pa.table({"ids": ids, "pages": pages})
parquet.write_table(out, f"{i}.parquet", compression = "zstd", compression_level = 5)
print(f"{i}/{len(fnames)}")
This is code that pulls out of XML into something better: a parquet file, written by pyarrow, for each group of 5,000 pages. I check one to be sure–looks like German. There will surely be mistakes–perhaps involving quotation marks in words. But with low-quality OCR, it’s enough to start.
Arzt der k. k. prio. THÄßbahn, anö den frischen Blätter» des Enca» lyptiis Globnlus. eines ans Anstratten stammende» BaiimcS, i» dem ««oratorwin des Apothekers ^»»>i Sdl»»»»»» Wien. JÄche», - Haupistraze Nr. 16, einzig und allein zukereiteie rmd stets «orrStbig
Rewriting with compression.
I wrote them into a folder with level 5 compression in zstd. The new directory, with parquet files and ids, is a tenth the size: 6.4GB vs 63GB for the zipfile I downloaded. Why on earth have I downloaded massive XML files when I just want text? Who really wants this positional text, anyway? I’ve used it a few times over the years–but most people want text, not XML. Zipfiles at least are nice, because I can grab the specific files I want. But they’re also slow in their own right. I start parsing at 22:21, and leave my computer open–looking at the timestamps, I don’t finish the last file until more than two hours later, at 00:31.
This is bonkers. Mediocre zip compression and uselessly XML-encoded data mean that it takes two hours just to look at the data in the most cursory way. It’s important to distribute things in a complete format, but it’s also important not to waste resources making things too hard to parse. With the parquet formatted versions of the data, it takes not two hours but 55 seconds to parse through every file in this set. That’s a major improvement–100 times faster to read, and one-tenth the size. Both of those are big enough differences that they actually affect whether this data is usable or not.
matches = []
from pyarrow import compute as pc
for p in Path("parquet_files").glob("*.parquet"):
a = parquet.read_table(p)
which = pc.match_substring(a['pages'], "Gustav Mahler")
matches.append(a.filter(which))
So–now we’ve got a huge set of text in a fairly navigable form. But we don’t know what the records are. The identifiers are all things like 9200300/BibliographicResource_3000123565676/4
; aside from the page number, it’s not clear what any of those mean. My working theory to this point was that 9200300
meant the Neue Freie Presse and BibliographicResource_3000123565676
means the individual issue; but I need to know for sure.
Sorting is information
At this point, I start putting the identifiers into the web site and figuring out the layout of the metadata here. It turns out that this is not just one newspaper, but lots–probably everything contributed from the OSB to Europeana. And, stunningly, the order seems to be completely random? I call the web based Europeana API and get a dcTitle field in this order:
["Der Humorist - 1847-01-29"]
["Blätter für Musik, Theater und Kunst - 1871-09-19"]
["Wiener Zeitung - 1841-10-18"]
["Der Humorist - 1841-03-10"]
["Neue Freie Presse - 1871-10-22"]
["Innsbrucker Nachrichten - 1859-11-25"]
["Die Presse - 1867-06-25"]
["Das Vaterland - 1862-09-26"]
["Wiener Zeitung - 1705-02-28"]
["Wiener Zeitung - 1868-12-04"]
There a couple things weird here. One is the random order. I suppose that this could be my fault, because I just used the filenames from the zipfile in the order they appeared, rather than sorting. But that itself is a problem–the zipfile should have more of an inherent order. It is an underappreciated fact that good sorting is good compression; the more natural an order information appears in, the better it will compress. And of course, the fewer files people will have to download. The other is that “title” is wrapped in an array: apparently in the EDM things can have multiple titles. OK, that’s something I can work with.
So now I have a clear plan.
Get metadata for every record.
Match it to the papers.
Write out each newspaper in chronological order.
To get the metadata, I have to find it–there is no metadata in the data dumps. First I do it using the API. https://api.europeana.eu/record/v2/{id}.json?wskey={api_key}'
But it quickly becomes clear this won’t scale: Running overnight I’ve only download 35,000 of 1.3 million records. So I go back to the Europeana page and download another enormous zipfile–a 4 gigabyte one with records for the entire set. How this manages to be so large isn’t initially clear to me–perhaps, I think, they’ve bundled the full text into it?
The answer turns out to be that there is massive amounts of text for each record because, chiefly, every records repeats an extremely long definition of ‘ newspaper ’ in many different languages. That this balloons the size so much is a failure of an over-literal use of linked data. Perhaps there would be a way to reference it as an element in a single HTML file, but really, no one cares. This part of the data model will never be used outside a Europeana site–there is some base-covering in distributing it, but it’s a massive inconvenience for researchers to have the following block of text (and something vaguely equivalent in Latvian, Arabic, Russian, etc.) **repeated 1.6 million times in a file that’s supposed to be a metadata dump about newspaper issues:
Many newspapers, besides employing journalists on their own payrolls, also subscribe to news agencies (wire services) (such as the Associated Press, Reuters, or Agence France-Presse), which employ journalists to find, assemble, and report the news, then sell the content to the various newspapers. This is a way to avoid duplicating the expense of reporting.
Now, I understand the need for clear URIs for concepts and the benefits of linked open data. But the nature of linked open data is that any individual record can be ballooned indefinitely. Why is there a definition of ‘ newspaper ’ at such tedious length and not, say a full expansion of the geographic definition of ‘ Graz ’ where it appears? I am sure there is a reason–but I’m equally sure it’s not really a good one.
Toggle to see the metadata for a single newspaper
So now I’ve got to parse these monster XML blobs 1.3 million times. And this time I can’t resort to regex. Ugh. Again, this is something that most researchers will abandon quickly. I’m increasingly XML referred to in the past tense online, as a data format/data movement that failed. Evangelists will surely disagree, and certainly a great deal has been lost. But for my purposes, I need something tabular that can be joined, and XML and tables play extremely poorly together.
But I’ll try. The first step will be to get into JSON-LD format, which is a linked data format that actually works inside of programming languages for non-evangelist humans. It turns out to be something of a pain–maybe ten minutes of vaguely recalling terms before I precisely figure out how to use Harold Solbrig’s rdflib-jsonld extension to the rdflib library to squeeze the data into JSON. Solbrig, thank goodness, has provided a code example. With everything but the format to put in, the transformation is obvious.
from rdflib import Graph, plugin
from rdflib.serializer import Serializer
g = Graph().parse(data=demo, format="xml") #<-took a while to figure this line out!
print(g.serialize(format='json-ld', indent=1))
OK. So all I really need here is the nmewspaper title and the date, so let’s see how to parse it out. Once again, the json-ld is massively large. After wasting 40 minutes trying to figure out if I can implement a general solution to parse out all the various @type
entries using a json context into a flatter document, and coming up flat against the difficulties of inferring the many contexts, I decide to just do a quick-and-dirty route that will lose most of the json-ld data here. First, filter to only proxies:
proxies = [f for f in json.loads(d) if 'http://www.openarchives.org/ore/terms/Proxy' in f['@type']]
And then reduce to a dict where we grab the first occurrence of a value or id field if it seems to be a Dublin Core item.
Again, this is requiring a completely different set of skills than the data wrangling above. If I knew a lot about LOD, I could do much better here. But the python libraries I’m finding don’t make this especially easy, so I’m giving up on the LOD dream of being able to put it back together in a multilingual frame.
def parse_row(d):
proxies = [f for f in json.loads(d) if 'http://www.openarchives.org/ore/terms/Proxy' in f['@type']]
out = {}
for k, v in proxies[1].items():
if "purl.org/dc" in k:
try:
out['dc:' + k.split("/")[-1]] = v[0]['@value']
except KeyError:
out['dc:' + k.split("/")[-1]] = v[0]['@id']
return out
{'dc:identifier': 'oai:fue.onb.at:EuropeanaNewspapers_Delivery_3:ONB_00286/1875/ONB_00286_18750610.zip',
'dc:language': 'deu',
'dc:relation': 'http://de.wikipedia.org/wiki/Neuigkeits-Welt-Blatt',
'dc:source': 'http://anno.onb.ac.at/cgi-content/anno?apm=0&aid=nwb&datum=18750610',
'dc:subject': 'http://d-nb.info/gnd/4067510-5',
'dc:title': 'Neuigkeits-Welt-Blatt - 1875-06-10',
'dc:type': 'http://schema.org/PublicationIssue',
'dc:extent': 'Pages: 4',
'dc:isPartOf': 'http://data.europeana.eu/item/9200300/BibliographicResource_3000095610170',
'dc:issued': '1875-06-10',
'dc:spatial': 'http://d-nb.info/gnd/4066009-6'
}
This whole process can parse about 40 lines a second. That sounds kind of fast, maybe. But with 1.3 million metadata items it would take nine hours to run, single threaded in Python on my laptop. That is obscene. We can reduce this by batching by issue an getting it down to about an hour–there are “only” 154,000 records in here. But a good metadata format should be able to load a million rows of structured data in under a second, not in nine hours. This data could probably have been released in CSV on the Web, or JSON-LD, or some other format where this process would take a minute or two.
Anyhow–nine hours is too long for me because it’s the morning. I’ll split this up into multiple processes that work on batches of 25,000 at a time, and set it running in a loop.
And I’m back! So now I’ve got data and I’ve got texts. Joining these together is pretty easy–I just pull apart the IIIF ID and merge them in. Now I need to figure out how to distribute these to the student. These are big–too big, probably to simply slap them into an e-mail.
But luckily, I set up a static hosting service on Google a few months ago, so I can just upload them into there. I’ve created files for all of these newspapers now. So we’ve got one for the student, but also for you.
file | start date | end date | issues | pages | compressed size | link |
---|---|---|---|---|---|---|
Figaro | 1857-01-04 | 5374 | 574 | 1875-12-25 | 9.4 MB | download |
Tages-Post | 1865-01-18 | 10089 | 2082 | 1875-12-31 | 51.0 MB | download |
Salzburger Volksblatt: die unabhängige Tageszeitung für Stadt und Land Salzburg | 1871-01-03 | 3170 | 636 | 1875-12-24 | 10.2 MB | download |
Nasa Sloga | 1870-06-01 | 322 | 79 | 1875-11-16 | 0.9 MB | download |
Wienerische Kirchenzeitung | 1784-01-24 | 1788 | 214 | 1789-12-24 | 2.4 MB | download |
Feldkircher Zeitung | 1861-08-03 | 3987 | 960 | 1875-12-29 | 11.8 MB | download |
Österreichische Buchhändler-Correspondenz | 1860-02-01 | 4154 | 421 | 1875-12-25 | 7.8 MB | download |
Volksblatt für Stadt und Land | 1871-11-09 | 4405 | 319 | 1875-12-31 | 20.9 MB | download |
Teplitz-Schönauer Anzeiger | 1861-05-01 | 6744 | 536 | 1875-12-18 | 13.9 MB | download |
Linzer Volksblatt | 1870-01-03 | 5256 | 1190 | 1875-12-29 | 22.1 MB | download |
Extract-Schreiben oder Europaeische Zeitung | 1700-12-01 | 16 | 2 | 1700-12-04 | 0.0 MB | download |
Grazer Volksblatt | 1868-01-02 | 13692 | 1495 | 1875-12-30 | 49.1 MB | download |
Nordböhmisches Volksblatt | 1873-10-04 | 42 | 7 | 1873-12-13 | 0.2 MB | download |
Agramer Zeitung | 1841-01-06 | 6943 | 1286 | 1858-06-30 | 21.7 MB | download |
Neuigkeits-Welt-Blatt | 1874-01-06 | 7104 | 425 | 1875-12-31 | 29.2 MB | download |
Die Neuzeit | 1861-09-13 | 4012 | 339 | 1872-12-20 | 9.3 MB | download |
Eideseis dia ta anatolika mere | 1811-07-05 | 216 | 27 | 1811-11-19 | 0.2 MB | download |
Die Debatte | 1864-11-13 | 5260 | 1073 | 1869-09-30 | 52.5 MB | download |
Die Bombe | 1871-01-08 | 1512 | 163 | 1875-12-31 | 4.1 MB | download |
Znaimer Wochenblatt | 1858-01-17 | 4986 | 569 | 1875-12-24 | 14.2 MB | download |
Zeitschrift für Notariat und freiwillige Gerichtsbarkeit in Österreich | 1868-01-08 | 1368 | 260 | 1875-12-29 | 3.0 MB | download |
Frauenblätter | 1872-01-01 | 285 | 17 | 1872-12-15 | 0.5 MB | download |
Populäre österreichische Gesundheits-Zeitung | 1830-05-26 | 4337 | 685 | 1840-12-31 | 5.2 MB | download |
Union | 1872-01-07 | 342 | 83 | 1874-11-15 | 2.6 MB | download |
Prager Abendblatt | 1867-01-02 | 9432 | 1697 | 1875-12-22 | 28.4 MB | download |
Kikeriki | 1861-11-14 | 3442 | 592 | 1875-12-30 | 7.9 MB | download |
Vorarlberger Landes-Zeitung | 1863-08-11 | 5402 | 1219 | 1875-12-28 | 15.9 MB | download |
Hermes ho logios | 1811-02-01 | 2791 | 114 | 1819-12-15 | 3.4 MB | download |
Philologikos telegraphos | 1817-01-01 | 400 | 84 | 1820-12-15 | 0.9 MB | download |
Oesterreichisches Journal | 1870-08-06 | 2854 | 305 | 1875-12-15 | 12.4 MB | download |
Weltausstellung: Wiener Weltausstellungs-Zeitung | 1871-08-18 | 1446 | 233 | 1875-11-19 | 5.0 MB | download |
Der Floh | 1869-01-01 | 1893 | 193 | 1875-12-19 | 6.3 MB | download |
Wiener Abendzeitung | 1848-03-28 | 438 | 106 | 1848-10-24 | 0.6 MB | download |
Feldkircher Anzeiger | 1866-01-02 | 1498 | 239 | 1875-12-21 | 1.0 MB | download |
Allgemeine Österreichische Gerichtszeitung | 1851-01-03 | 9182 | 2233 | 1875-12-31 | 22.1 MB | download |
Leitmeritzer Zeitung | 1871-07-08 | 2530 | 285 | 1875-12-31 | 7.3 MB | download |
Feldkircher Wochenblatt | 1810-02-13 | 3762 | 743 | 1857-12-22 | 2.9 MB | download |
Politische Frauen-Zeitung | 1869-10-17 | 568 | 69 | 1871-12-31 | 1.8 MB | download |
Militär-Zeitung | 1849-07-03 | 12170 | 1628 | 1875-12-08 | 35.3 MB | download |
Ellēnikos tēlegraphos: ētoi eidēseis dia ta anatolika mere | 1812-01-03 | 5343 | 1182 | 1836-12-27 | 10.9 MB | download |
Blätter für Musik, Theater und Kunst | 1855-02-02 | 4840 | 1196 | 1873-12-27 | 16.8 MB | download |
Cur-Liste Bad Ischl | 1842-06-02 | 3998 | 646 | 1875-09-11 | 2.7 MB | download |
Innsbrucker Nachrichten | 1854-01-26 | 42010 | 4330 | 1875-12-31 | 36.4 MB | download |
Der Humorist | 1837-01-02 | 18850 | 4430 | 1862-05-03 | 55.3 MB | download |
Bregenzer Wochenblatt | 1793-03-15 | 8739 | 1725 | 1863-07-28 | 9.4 MB | download |
Ephemeris | 1791-01-03 | 2774 | 311 | 1797-12-11 | 2.7 MB | download |
Wiener Sonntags-Zeitung | 1867-01-01 | 4326 | 589 | 1875-12-26 | 20.5 MB | download |
Österreichische Zeitschrift für Verwaltung | 1868-01-02 | 1130 | 280 | 1875-12-30 | 2.6 MB | download |
Vorarlberger Zeitung | 1849-04-06 | 272 | 67 | 1850-03-22 | 0.6 MB | download |
Die Gartenlaube für Österreich | 1867-01-28 | 937 | 67 | 1869-04-19 | 2.5 MB | download |
Allgemeine land- und forstwirthschaftliche Zeitung | 1851-07-05 | 3742 | 301 | 1867-12-27 | 7.1 MB | download |
Wiener Vororte-Zeitung | 1875-02-15 | 52 | 13 | 1875-11-01 | 0.3 MB | download |
Siebenbürgisch-deutsches Wochenblatt | 1868-06-10 | 3182 | 193 | 1873-12-31 | 7.3 MB | download |
Neue Wiener Musik-Zeitung | 1852-01-15 | 1289 | 312 | 1860-12-29 | 3.8 MB | download |
Österreichische Badezeitung | 1872-04-14 | 600 | 54 | 1875-08-22 | 1.6 MB | download |
Deutsche Zeitung | 1872-04-02 | 9284 | 604 | 1874-12-29 | 63.3 MB | download |
Internationale Ausstellungs-Zeitung | 1873-05-02 | 492 | 79 | 1873-09-30 | 3.1 MB | download |
Janus | 1818-10-10 | 236 | 52 | 1819-06-30 | 0.4 MB | download |
Wiener Moden-Zeitung | 1862-01-01 | 126 | 13 | 1863-07-15 | 0.3 MB | download |
Die Emancipation | 1875-04-22 | 64 | 8 | 1875-05-25 | 0.1 MB | download |
Die Vedette | 1869-11-01 | 3253 | 187 | 1875-12-19 | 5.8 MB | download |
Salzburger Chronik | 1873-07-01 | 986 | 238 | 1875-12-30 | 3.1 MB | download |
Wiener Feuerwehr-Zeitung | 1871-01-01 | 336 | 78 | 1875-12-15 | 0.7 MB | download |
Gerichtshalle | 1857-03-30 | 6132 | 1005 | 1875-12-23 | 14.6 MB | download |
Illustrirtes Wiener Extrablatt | 1872-03-24 | 6354 | 662 | 1875-12-31 | 29.7 MB | download |
Wiener Salonblatt | 1870-03-13 | 2170 | 138 | 1875-12-24 | 5.0 MB | download |
Sonntagsblätter | 1842-01-16 | 5277 | 227 | 1848-09-17 | 6.1 MB | download |
Wiener Theater-Zeitung | 1806-07-15 | 14345 | 3110 | 1838-12-29 | 33.5 MB | download |
Wiener Landwirtschaftliche Zeitung | 1868-01-03 | 746 | 76 | 1869-12-18 | 2.3 MB | download |
Vorarlberger Volks-Blatt | 1866-06-15 | 4143 | 644 | 1875-12-31 | 10.0 MB | download |
Marburger Zeitung | 1862-04-13 | 447 | 104 | 1870-11-30 | 1.6 MB | download |
Vaterländische Blätter für den österreichischen Kaiserstaat | 1808-05-10 | 5861 | 816 | 1820-12-27 | 9.0 MB | download |
Freie Pädagogische Blätter | 1867-01-19 | 5136 | 316 | 1875-12-25 | 7.0 MB | download |
Jörgel Briefe | 1852-01-02 | 14086 | 757 | 1875-12-06 | 13.0 MB | download |
Österreichische Feuerwehrzeitung | 1865-08-15 | 430 | 95 | 1872-06-02 | 1.2 MB | download |
Österreichische Buchdrucker-Zeitung | 1873-02-11 | 675 | 96 | 1875-12-30 | 1.9 MB | download |