Chris Culy

Work in Progress: Key Words

This page is a continuation of some of my current work in progess, in particular exploring what letters are about. The first part looked at word frequences, but simple word frequencies don’t tell the whole story. We used variance to get one idea of the variability of word frequencies. Another idea is that of key words, words which “clump” in certain letters. A typical measure of key words is the ratio of the relative frequency of the word in the letter to the number of documents it occurs in (called tfidf). The higher the ratio, the more special the word is for that letter.

EBB and RB key words

As with word frequencies, and with many of the other examples in the demo videos, I am using the letters between Elizabeth Barrett (EBB, or just E) and Robert Browning (RB, or just R). And as with word frequencies, we will be looking at nouns, verbs, etc individually, and we will be using lemmas (the dictionary forms).

Comparing key words

Nouns

First up are nouns as key words. We’ll use a slope graph to compare the key words for EBB and RB. As usual, I’m using just the top 20 nouns for each. Lines connect the same word (lemma) for E and R; where there is no line, that means that word does not occur in the other person’s top 20 noun key words. We can see that EBB and RB have few key words in common — just book and happiness.

We can also see a glitch in the automatic processing: &c (“etc”) is considered to be a noun, even though we probably wouldn’t consider it to be one.

We can also compare key words versus frequent words. One thing that is interesting is that Kenyon (a relative of EBB and a friend of RB) is a frequent word for EBB, but not a key word. This is because she mentions Mr. Kenyon in lots of letters, so he is not a special mention: key words are words that are concentrated in just some letters.

Finally, we can also look at the key words across time. Here the size of the dots indicates the TFIDF measure. Only book and happiness have dots for both EBB and RB, since those are the only two key words they have in common. One interesting word to note is Flush, EBB’s dog. Flush was kidnapped and RB helped recover him. This was important to EBB, but apparently not as important to RB (Flush actually bit RB at one point early on).

Verbs

When we look at verbs, we see that EBB and RB have no overlap at all in their key word verbs.

And looking at them over time … (Note R’s use of kiss)

TIFIDF for individual words

We can also look at the TFIDF scores for individual words, not just the top key words. Here we look at love, poem, and poetry, since EBB and RB were romantic poets. Since these are not key words, there is much more overlap in their usage. There is actually even more overlap, but I am not showing the letters with low TFIDF scores (anything less than 0.1).

