Revised 27 August 2017

Work in Progress: Word frequencies

This page is about some of my current work in progress on word frequencies. The visualizations are some experiments concerning the general types of visualizations that I might use. I have included here only the ones that have made the first cut. As I elaborate these static visualizations, I also keep in mind what kinds of interactions would be relevant.

EBB and RB word frequencies

One of the main things we’re interested in when we have a collection of letters is exploring what they are talking about. Word frequencies are a first step on the path in that exploration.

Of course, “small” words like the, a, etc. are common. But with natural language processing, we can identify nouns, pronouns, verbs, etc., and look at them separately. We can also group together the different forms of words (singulars and plurals, past and present tense, …) under their dictionary form (the lemma). That’s what I’ve done here.

As with many of the other examples in the demo videos, I am using the letters between Elizabeth Barrett (EBB, or just E) and Robert Browning (RB, or just R).

Comparing word rankings

The first type of visualization is a slope graph popularized by Edward Tufte. It lets us easily compare the rankings (by frequency) between EBB and RB.


First up are nouns. I’m using just the top 20 nouns for each. Lines connect the same word (lemma) for E and R; where there is no line, that means that word does not occur in the other person’s top 20 nouns. For example, Ba (a nickname for EBB) is R’s top noun, but it is not one of EBB’s top 20, even though she signs many of her letters “Ba”. So looking for missing lines, we can see that there is far from complete overlap in EBB and RB’s top 20 nouns.

When we do have a line between EBB’s and RB’s sides, the slope of the line indicates how different their rankings are. For example Mr. is the most frequent noun for EBB, but only 12th for RB. On the other hand, they both have letter in second place.

Main Verbs

Main verbs show a slightly different pattern: here there is much more overlap between the top 20 verbs between them, as seen by the relatively few missing lines. However, their rankings do differ.


Pronouns show a mixed pattern, with the top 8 pronouns being very similar, and the rest more different.