Nov 24 2024

I was talking on Bluesky1 about why I dislike the widespread use of alphabetical ordering for states on the y-axis of charts. There are better ways! My favorite is detailed in this notebook, where I talk through some methods for treating paths. I have an interactive tool for building out paths like this one, which is a decent way to order all the countries in the world for data visualization.

Dec 26 2023

Although Ive given up on historically professing myself, I still have a number of automated scripts for analyzing the state of the historical profession hanging around. Since a number of people have asked for updates, it seems worth doing. As a reminder, Im scraping H-Net for listings. When Ive looked at job ads from the American Historical Associations website, they seem roughly comparable.

May 11 2023

Blaming the humanities fields for their travails recently can seem as sensible as blaming polar bears for not cultivating new crops as the arctic warms. Its not just that it places the blame for a crisis in the fundamentally wrong place; its that it
Its coming up on a year since I last taught graduate students in the humanities.

Apr 20 2023

Last week we released a big data visualiation in collaboration with the Berens Lab at the University of Tübingen. It presents a rich, new interface for exploring an extremely large textual collection.

Apr 07 2023

Yesterday was a big day for the Web: Chrome just shipped WebGPU without flags in the Beta for Version 113. Someone on Nomics GPT4All discord asked me to ELI5 what this means, so Im going to cross-post it hereits more important than youd think for both visualization and ML people. (thread)

Mar 22 2023

This is a Twitter thread from March 14 that Im cross-posting here. Nothing massively original below. It went viral because I was one of the first to extract the ridiculous paragraph below from on the release of GPT-4, and because it expresses some widely shared concerns.

Mar 04 2023

Recently, Marymounta small Catholic university in Arlington, Virginiahas been in the news for a draconian plan to eliminate a number of majors, ostensibly to better meet student demand. I recently learned the university leadership has been circulating one of my charts to justify the decision, so I thought Id chime in on the context a bit. My understanding of the situation, primarily informed by the coverage in ARLNow, is this seems like bad plan,1 so I thought Id take a quick look at the universitys situation.

Feb 19 2023

I sure dont fully understand how large language models work, but in that Im not alone. But in the discourse over the last week over the Bing/Sydney chatbot theres one pretty basic category error Ive noticed a lot of people making. Its thinking that theres some entity that youre talking to when you chat with a chatbot. Blake Lemoine, the Google employee who torched his career over the misguided belief that a Google chatbot was sentient, was the first but surely not the last of what will be an increasing number of people thinking that theyve talked to a ghost in the machine.1

Jan 07 2023

I attended the American Historical Associations conference last week, possibly for the last time since Ive given up history professorin. Since then, the collapse of the hiring prospects in history has been on my mind more. See Erin Bartram, Kathryn Otrofsky and Daniel Bessner on the way that this AHA was haunted by a sense of terminal decline in the history profession. I was motivated to look a bit at something Ive thought about several times over the years: what happens to people after receiving a PhD in history?

Jan 01 2023

The collapse of Twitter under Elon Musk over the last few months feels, in my corner of the universe, like something potentially a little more germinal; unlike in the various Facebook exoduses of the 2010s, I see people grasping towards different models of the architecture of the Web. Mastodon itself (Ive ended up at @benmschmidt@vis.social for the time being) seems so obviously imperfect as for its imperfections to be a selling point; its so hard to imagine social media staying on Rails application for the next decade that using it feels like a bet on the future, because everyone now knows they need to be prepared to migrate again.

Oct 27 2022

Im excited to finally share some news: Ive resigned my position on the NYU faculty and started working full time as Vice President of Information Design at Nomic, a startup helping people explore, visualize, and interact with massive vector datasets in their browser.

Oct 07 2022

When you teach programming skills to people with the goal that theyll be able to use them, the most important obligation is not to waste their time or make things seem more complicated than they are. This should be obvious. But when Im helping humanists decide what workshops to take, reviewing introductory materials for classes, or browsing tutorials to adapt for teaching, I see the same violation of the principle again and again. Introductory tutorials waste enormous amounts of time vainly covering ways of accomplishing tasks that not only have absolutely no use for beginners, but which will confuse learners by making them

Apr 19 2022

Its not very hard to get individual texts in digital form. But working with grad students in the humanities looking for large sets of texts to do analysis across, I find that larger corpora are so hodgepodge as to be almost completely unusable. For humanists and ordinary people to work with large textual collections, they need to be distributed in ways that are actually accessible, not just open access.

Mar 28 2022

Ive never done the Day of DH tradition where people explain what, exactly, it means to have a job in digital humanities. But today looks to be a pretty DH-full day, so I think, in these last days of Twitter, Ill give it a shot. (thread)

Well start it at the beginning1:30 or so AM, finally sent out an e-mail Id been procrastinating on to the college grants administrator for a public humanities project about immigrant histories Im running with @ellennoonan and Sibylle Fischer.

Weve had NYU funding as a Bennett-Polonksy Humanities Lab (https://nyuhumanities.org/program/asylum-h-lab-2020-2021/) to this point, but presenting to the history department last month clarified the use in making one of our primary sorts of recordsA filesmore accessible to historians and family researchers.

But that will take some real institutional support, because the stuff weve obtainedlegally!from US customs and immigration in our trial run is so shockingly personal in a lot of cases that I cant really share it yet.

(Yet is the wrong wordcant ethically share in my lifetime, probably. But there are still really important reasons to work on auditing these records especially. If youre a naturalized citizen or permanent resident and want any help getting your own A-file, let me know!)

OK, skipping to about 9:50 AM. (Late start b/c the first-grader had a school event and my wife teaches Thursday AM). Todays first teaching, for my class https://benschmidt.org/WWD22 will be focused on 19C directories from the NYPL.

Nick Wolf and @bertspaan digitized these years ago, but theres more to do with them. A couple weeks ago @SWrightKennedy shared a preview of Columbias great new geolocation data about 19C New York https://mappinghny.com/about/

And yesterday I finally pushed a full pipeline bringing the last two weeks of student work together for doing geo-matching and cleaning of these to the github repo. https://github.com/HumanitiesDataAnalysis/Directories . This should allow some amazing analysis of economic geography, name types, etc.

And yesterday I finally pushed a full pipeline bringing the last two weeks of student work together for doing geo-matching and cleaning of these to the github repo. https://github.com/HumanitiesDataAnalysis/Directories . This should allow some amazing analysis of economic geography, name types, etc.

So now weve got 8.3m individual people for every year from 1850-1889 queued up and ready for a variety of analyses. I want to send the students a map to show how all their R code is paying off, but the deepscatter module is breakingonly one of the filters is working here.

I spend 40 minutes poking in the web code there to try to refactor the code to get the interface working right, but this isnt really relevant for the class right nowmore something for the summer, I guess. So I give up and decide to do this DH tweeting instead.

Because of the whole Twitter is almost over thing, but some lingering guilt about not blogging enough, I decide that a Day of DH post should really be a blog firstso lets finally structure some markdown for a twitter thread that can go on benschmidt.org.

It takes a surprising amount of mucking around with the svelte-kit settings to get things publishing correctly, and I have to remember my own markdown naming conventions. But after a few minutes, weve got full recursion. https://benschmidt.org/post/2022-03-28-day-of-dh/day-of-dh-22/

Whoops, or not Time to muck with svelte-kit a little more

Well, this is embarassing but typical. Turns out there was a bug in the bleeding-edge svelte-kit build that broke trailing slash behavior in URLs. Because https://benschmidt.org/post/2022-03-19-better-texts/ is different from https://benschmidt.org/post/2022-03-19-better-texts. Finally fixed.

Insane levels of debugging is a real pain and occupational hazard. But to be honest, I dont know how anyone could responsibly teach this stuff without doing this sort of rebuilding and rescaling all the time. Every one of those things is kind of interesting and builds up ability to fix others code

Insane levels of debugging is a real pain and occupational hazard. But I dont know how you can responsibly teach this stuff without these frequent rabbit holes. Every one of those things is kind of interesting and builds up ability to fix others code

Feb 28 2022

There are programming languages that people use for money, and programming languages people use for love. There are Weekend at Bernies/Jeremy Bentham corpses that you prop up for the cash, and there are Rose for Emily corpses you sleep with every night for decades because its too painful to admit that the best version of your life you ever glimpsed is not going to happen.

Jan 22 2022

Ive been spending more time in the last year exploring modern web stacks, and have started evangelizing for SvelteKit, which is a new-ish entry into the often-mystifying world of web frameworks. As of today, Ive migrated this, personal web site from Hugo, which Ive been using the last couple years, to sveltekit. Let me know if you encounter any broken links, unexpected behavior, accessibility issues, etc. I figured here Id give a brief explanation of why sveltekit, and how I did a Hugo-Svelte kit migration.

Sep 15 2021

Scott Enderle is one of the rare people whose Twitter pages I frequently visit, apropos of nothing, just to read in reverse. A few months ago, I realized he had at some point changed his profile to include the two words increasingly stealthy. He had told me he had cancer months earlier, warning that he might occasionally drop out of communication on a project we were working on. I didnt then parse out all the other details of the pagethat he had replaced his Twitter mugshot with a photo of a tree reaching to the sky, that the last retweet was my friend Johanna introducing a journal issue about interpretive difficultythe problems literary scholars, for all their struggles to make sense, simply cant solve. I only knewand immediately stuffed down the knowledgethat things must have gotten worse.

Jun 07 2021

This article in the New Yorker about the end of genre prompts me to share a theory Ive had for a year or so that models at Spotify, Netflix, etc, are most likely not just removing artificial silos that old media companies imposed on us, but actively destroying genre without much pushback. Im curious what you think.

May 20 2021

Ive been yammering online about the distinctions between different entities in the landscape of digital publishing and access, especially for digital scholarship on text. So Ive collected everything Ive learned over the last 10 years into one, handy-to-use, chart on a 10-year-old meme. The big points here are:

Apr 28 2021

I mentioned earlier that Ive been doing some work on the old Bookworm project as I see that theres nothing else that occupies quite the same spot in the world of public- facing, nonconsumptive text tools.

Mar 08 2021

Mar 07 2021

I used to blog everything that I did about a project like Bookworm, but have got out of the habit. There are some useful changes coming through through the pipeline, so I thought Id try to keep track of them, partly to update on some of the more widely used installations and partly

Nov 12 2020

I last looked at the H-Net job numbers about a month ago.

Oct 01 2020

Out of a train-wreck curiosity about whats been happening to the historical profession, Ive been watching the numbers on tenure-track hiring as posted on H-Net, one of the major venues for listing history jobs.

Sep 01 2020

Ive been doing a lot of my data exploration lately on Observable Notebooks, which issort ofa Javascript version of Jupyter notebooks that automatically runs all the code inline. Married with Vega-Lite or D3, it provides a way to make data exploration editable and shareable in a way that R and python data code simply cant be; and since its all HTML, you can do more interesting things.

Aug 28 2020

Every year, I run the numbers to see how college degrees are changing. The Department of Education released this summer the figures for 2019; these and next years are probably the least important that well ever see, since they capture the weird period as the 2008 recessions shakeout was wrapping up but before COVID-19 upended everything once again. But for completism, its worth seeing how things changed.

Jul 28 2020

Ranking Graduate Programs

While I was choosing graduate programs back in 2005, I decided to come up with my own ranking system. I had been reading about the Google PageRank algorithm, which essentially imagines the web as a bunch of random browsing sessions that rank pages based on the likelihood that youafter clicking around at random for a few yearswill end up on any given page. It occurred to me that you could model graduate school rankings the same way. Its essentially a four-step process:

  1. Pick a random department in the United States.

  2. Pick a random faculty member from that department.

  3. Go to that faculty members graduate department.

  4. 90% of the time, return to step 2; 10% of the time, return to step 1.

At the end of each stage, youll be in a different department; but more prestigiously any given departments faculty are placed, the more likely you are to be there.

Using transition matrices, these numbers converge after a relatively short period.

I ran it on history departments, but have never circulated the history scores. (Rankings make people mad, and the benefit seems worse than the cost.) But one of my roommates at the time, Matthew Chingos, was already moving towards working in higher education policy and grad school in political science, so we wrote up a paper applying it to Political Science departments and published it in PS in 2007. (Schmidt, B., & Chingos, M. (2007). Ranking Doctoral Programs by Placement: A New Method. PS: Political Science & Politics, 40(3), 523-529. doi:10.1017/S1049096507070771)

Its a pretty simple method, but I still occasionally get questions about it, the data, and the underlying code. As I recall, the political science data was viewed as slightly sensitive, so the arrangement we made with the American Political Science Association was that they would handle requests for the data and we would only provide code.

This was in 2005, so reproducibility was not a worrynowadays, youd put all this stuff on github. In response to a recent request, Ive just done that.

The core code was interesting to look it, because its stuff I wrote in R fifteen years ago. It basically seems to still work, but it has little in common with how Id handle the problem nowadays.

Ranking Computer Science Programs as of 2015

Still, the proof is in the eating. So I went looking for some new data to try it on. On the theory that computer science faculty are too distracted by their overwhelming course sizes and endless parade of job searches to be bothered by this, Ill do them.

Alexandra Papoutsaki et al. created a crowdsourced dataset of CS faculty that they expect to be 80% correct at Brown. They seem to have updated a version thats sitting inside a Github repository here, so thats what Ive used. Im using placements that are from 2005-2015 here.

schoolp
University of California - Berkeley17.2835408
Massachusetts Institute of Technology16.6558147
Stanford University9.8659918
Carnegie Mellon University7.9750700
University of Washington4.5314467
Cornell University3.4656622
Princeton University2.9223387
University of Texas - Austin2.5394603
Columbia University2.3110282
University of California - Santa Barbara2.0507537
California Institute of Technology1.9028543
Georgia Institute of Technology1.5902598
University of Illinois at Urbana-Champaign1.5324409
University of California - Los Angeles1.5238573
University of California - San Diego1.2106396
University of Maryland - College Park1.1716862
University of Pennsylvania1.0691726
Brown University1.0167585
University of North Carolina - Chapel Hill0.9371394
University of Michigan0.9263730
University of Minnesota - Twin Cities0.7845679
Harvard University0.7668788
New York University0.7561730
University of Wisconsin - Madison0.7021781
University of Massachusetts - Amherst0.6569323
Purdue University0.6213802
University of Chicago0.6157431
Rice University0.6154933
Johns Hopkins University0.5860418
University of Virginia0.5794159

There is nothing shocking, as an outsider, here, which is good. Technical schools are pretty high up, and my current employer is on the list and right next to Harvard. Nobody ever got in trouble for saying their school is as good as Harvard, even when Harvard isas in CSnot so hot.

Feb 26 2020

As I often do, Im going to pull away from various forms of Internet reading/engagement through Lent. This year, this brings to mind one of my favorite stray observations about digital libraries that Ive never posted anywhere.

Dec 05 2019

(This is a talk from a January 2019 panel at the annual meeting of the American Historical Association. You probably need to know, to read it, that the MLA conference was simultaneously taking place about 20 blocks north.)

Jun 30 2019

Since 2010, Ive done most of my web hosting the way that the Internet was built to facilitate: from a computer under the desk in my office. This worked extremely well for me, and made it possible to rapidly prototype a lot of of websites serving large amounts of data which could then stay up indefinitely; I have a curmudgeonly resistance to cloud servers, although I have used them a bit in the last few years (mostly for course websites where I wanted to keep student information separate from the big stew.)

May 03 2019

Some news: in September, Ill be starting a new job as Director of Digital Humanities at NYU. Theres a wide variety of exciting work going on across the Faculty of Arts and Sciences, which is where my work will be based; and the university as a whole has an amazing array of programs that might be called Digital Humanities at another university, as well as an exciting new center for Data Science. Ill be helping the humanities better use all the advantages offered in this landscape. Ill also be teaching as a clinical associate professor in the history department.

Mar 19 2019

Critical Inquiry has posted an article by Nan Da offering a critique of some subset of digital humanities that she calls Computational Literary Studies, or CLS. The premise of the article is to demonstrate the poverty of the field by showing that the new structure of CLS is easily dismantled by the masters own tools. It appears to have succeeded enough at gaining attention that it clearly does some kind of work far outsize to the merits of the article itself.

Dec 03 2018

I wrote this years report on history majors for the American Historical Associations magazine, Perspectives on History; it takes a medium term view of at the significant hit the history major has taken since the 2008 financial crisis. You can read it here.

Oct 30 2018

As part of the Creating Data project, Ive been doing a lot of work lately with interactive scatterplots. The most interesting of them is this one about the full Hathi collection. But Ive posted a few more I want to link to from here:

Oct 22 2018

I have a new article on dimensionality reduction on massive digital libraries this month. Because its a technique with applications beyond the specific tasks outlined there, I want to link to a few things here.

Oct 21 2018

Im switching this site over from Wordpress to Hugo, which makes it easier for me to maintain.

Aug 31 2018

I have a new article in the Atlantic about declining numbers for humanities majors.

Jul 30 2018

. In short, its been bad enough to make me recant earlier statements of mine about the long-term health of the humanities discipline.

Mar 19 2016

This is some real inside baseball; I think only two or three people will be interested in this post. But Im hoping to get one of them to act out or criticize a quick idea. This started as a comment on Scott Enderles blog, but then I realized that Andrew Goldstone doesnt have comments for the parts pertaining to him Anyway.

Jun 12 2015

Ive gotten a couple e-mails this week from people asking advice about what sort of computers they should buy for digital humanities research. That makes me think there arent enough resources online for this, so Im posting my general advice here. (For some solid other perspectives, see here). For keyword optimization Im calling this post digital humanities.” But, obviously, I really mean the subset that is humanities computing, what I tend to call humanities data analysis. [Edit: To be clear, ] Moreover, the guidelines here are specifically tailored for text analysis; if you are working with images, youll have somewhat different needs (in particular, you may need a better graphics card). If you do GIS, god help you. I dont do any serious social network analysis, but I think the guidelines below should work relatively with Gephi.

Apr 03 2015

Practically everyone in Digital Humanities has been posting increasingly epistemological reflections on Matt Jockers Syuzhet package since Annie Swafford posted a set of critiques of its assumptions. Ive been drafting and redrafting one myself. One of the major reasons I havent is that the obligatory list of links keeps growing. Suffice it to say that this here is not a broad methodological disputation, but rather a single idea crystallized after reading Scott Enderle on sine waves of sentiment.” Ill say what this all means for the epistemology of the Digital Humanities in a different post, to the extent that thats helpful.

Feb 06 2015

Just some quick FAQs on my professor evaluations visualization: adding new ones to the front, so start with 1 if you want the important ones.

Dec 12 2014

I promised Matt Jockers Id put together a slightly longer explanation of the weird constraints Ive imposed on myself for topic models in the Bookworm system, like those I used to look at the breakdown of typical TV show episode structures. So here they are.

Nov 07 2014
Oct 23 2014

Ive been thinking a little more about how to work with the topic modeling extension I recently built for bookworm. (Im curious if any of those running installations want to try it on their own corpus.) With the movie corpus, it is most interesting split across genre; but there are definite temporal dimensions as well. As Ive said before, I have issues with the widespread practice of just plotting trends over time; and indeed, for the movie model I ran, nothing particularly interesting pops out. (I invite you, of course, to tell me how it is interesting.)

Sep 23 2014

Ive been seeing how deeply we could integrate topic models into the underlying Bookworm architecture a bit lately.

Sep 11 2014

This is a post about several different things, but maybe its got something for everyone. It starts with 1) some thoughts on why we want comparisons between seasons of the Simpsons, hits on 2) some previews of some yet-more-interesting Bookworm browsers out there, then 3) digs into some meaty comparisons about what changes about the Simpsons over time, before finally 4) talking about the internal story structure of the Simpsons and what these tools can tell us about narrative formalism, and maybe why Id care.

Sep 05 2014

Like many technically inclined historians (for instance, Caleb McDaniel, Jason Heppler, and Lincoln Mullen) I find that Ive increasingly been using the plain-text format Markdown for almost all of my writing.

Aug 29 2014

I thought it would be worth documenting the difficulty (or lack of) in building a Bookworm on a small corpus: Ive been reading too much lately about the Simpsons thanks to the FX marathon, so figured Id spend a couple hours making it possible to check for changing language in the longest running TV show of all time.

Jun 05 2014

Heres a very technical, but kind of fun, problem: whats the optimal order for a list of geographical elements, like the states of the USA?

Mar 27 2014

String distance measurements are useful for cleaning up the sort of messy data from multiple sources.

Jan 01 1970
title: AI and Copyright
categories: [Humanities, Degrees]
date: 2023-09-23T13:36:45-04:00
lastmod: 2023-09-23T13:36:45-04:00
featured: false
draft: true