Posting to Autocat
I recently read the article “Big Data Meets the Bard” in the Financial Times (freely-available) http://www.ft.com/intl/cms/s/2/fb67c556-d36e-11e2-b3ff-00144feab7de.html#axzz2WDkbftz3. The article begins “Here’s some advice for bibliophiles with teetering piles of books and not enough hours in the day: don’t read them. Instead, feed the books into a computer program and make graphs, maps and charts: it is the best way to get to grips with the vastness of literature.”
This article led me to find a video of one of the people mentioned, Franco Moretti, where he discussed what he was doing and why. http://vimeo.com/11869895 It intrigued me and so I decided to try something myself. In the autocat listserv, there is “metadata” to each posting: who wrote it, when, length of message and topic. I have taken the metadata from the beginning of this year, put it in a spreadsheet, and played with it in different ways.
The line length of the average post is 70.89 and the median length is 60.
The top 10 longest posters (i.e. the total lines within all of their messages) were:
|J. McRee Elrod||14117|
|john g marr||7387|
|Mark K. Ehlert||6005|
Of course, these message lengths include any quoting from other messages.
Here are some word clouds I made with “Word It Out” http://worditout.com. They are only png images and do not function.
The first shows people by number of postings
To get a decent result, I deleted all spacing and commas in each person’s name.
The second wordcloud gives topics by number of postings
In this case, I replaced spaces with the underline, and I am not so sure how the result is, but the basic “information” is discernible. I’ve just thought that it would be better to delete all “Re:” in the message titles, but I’ve done enough!
Once again, this relates to messages from the beginning of 2013.