If At First You Don’t Succeed

A couple of weeks ago, I blogged about a failed attempt to do some exploratory text-mining on the US National Security Strategy reports (here). That project was supposed to give me a fun way to learn the basics of text mining in R, something I’ve been eager to do of late. In writing the blog post, I had two motives: 1) to help normalize the experience of getting stuck and failing in social science and data science, and 2) to appeal for help from more experienced coders who could help get me unstuck on this particular task.

The post succeeded on both counts. I won’t pepper you with evidence on the commiseration front, but I am excited to share the results of the coding improvements. In addition to learning how to text-mine, I have also been trying to learn how to use RStudio and Shiny to build interactive apps, and this project seemed like a good one to do both. So, I’ve created an app that lets users explore this corpus in three ways:

  • Plot word counts over time to see how the use of certain terms has waxed and waned over the 28 years the reports span.
  • Generate word clouds showing the 50 most common words in each of the 16 reports.
  • Explore associations between terms by picking one and see which 10 others are most closely correlated with it in the entire corpus.

For example, here’s a plot of change over time in the relative frequency of the term ‘terror’. Its usage spikes after 9/11 and then falls sharply when Barack Obama replaces George W. Bush as president.

NSS terror time trend

That pattern contrasts sharply with references to climate, which rarely gets mentioned until the Obama presidency, when its usage spikes upward. (Note, though, that the y-axis has been rescaled from the previous chart, so this large increase still has ‘climat’ only appearing about half as often as ‘terror’.)

NSS climat time trend

And here’s a word cloud of the 50 most common terms from the first US National Security Strategy, published in 1987. Surprise! The Soviet Union dominates the monologue.

NSS 1987 word cloud

When I built an initial version of the app a couple of Sundays ago, I promptly launched it on shinyapps.io to try to show it off. Unfortunately, the Shiny server only gives you 25 hours of free usage per billing cycle, and when I tweeted about the app, it got so much attention that those hours disappeared in a little over a day!

I don’t have my own server to host this thing, and I’m not sure when Shiny’s billing cycle refreshes. So, for the moment, I can’t link to a permanently working version of the app. If anyone reading this post is interested in hosting the app on a semi-permanent basis, please drop me a line at ulfelder <at> gmail. Meanwhile, R users can launch the app from their terminals with these two lines of code, assuming the ‘shiny’ package is already installed:

library(shiny)
runGitHub("national-security-strategy", "ulfelder")

You can also find all of the texts and code used in the app and some other stuff (e.g., the nss.explore.R script also implements topic modeling) in that GitHub repository, here.

Previous Post
Leave a comment

2 Comments

  1. bruce kay

     /  February 25, 2015

    Nice. Texstat free tool that gives you word counts and context (“concordances”).
    Yep, linguistics has the edge on the social sciences when it comes to text analysis

    Reply
  2. Lori

     /  February 27, 2015

    I can appreciate the interplay between words and reality. Very interesting work, Jay. Thanks for sharing.

    Reply

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: