Breaking Down History: Word Clouds, Google’s Ngram viewer, and historians.

Searching the keywords ‘civil rights’, ‘America’, and ‘congress’ brings up 431,024 results on JSTOR. Searching the same keywords on Google Scholar brings up 1.6 million results. Whilst I am a committed historian, to read and understand the vast sum of work about my particular historical interest is impossible. How can we break down history into a more manageable workload? And are these methods useful? In an ideal world, time would somehow expand to ensure historians could read every work published about their topic. In reality, time is a virtue that academics rarely have. So what options do we have?

Word clouds are a quick and easy way of getting an overview of a text. Word Clouds generate a group of keywords, and the size of each word shows how often it was used in any piece of text.[1] Word Clouds can be used to show over-arching themes, and as a quick way of presenting an analysis. Adam Crymble outlined his concerns in a blogpost about this apporach.[2] Whilst WordClouds present key words, they present them out of context. It can break up phrases into separate words, and there is a risk that they over-simplify a complicated topic. However, historians can utilise WordClouds to their benefit. Whether it’s entering sections of a book to see what topics feature most, or entering their own work. With all internet resources, historians should try and make them work best for them. WordClouds won’t replace reading entire articles or books, as it will not provide the context and understanding of traditional research. However, it’s a new way to condense information, and for a quick glance at a book, it provides historians with relevant information.

Screen Shot 2015-04-29 at 16.22.22

(An example of a WordCloud, using a proposal for a Civil Rights & Congresswomen project)

Google’s Ngram viewer is another tool aimed at lightening the workload. Searching a keyword will provide a graph of the shifting popularity of words in published books over a period of time. Using OCR, Ngram uses around eight million books to create these graphs.[3] A corpus that no historian could conquer. How can this tool be useful for historians? Well, firstly, it gives a subtle hint towards historiography. For example, I used the Ngram Viewer to search for ‘congresswomen’.

Screen Shot 2015-04-29 at 17.38.48I narrowed my search for books published between 1940 and 2015. My focus for this topic would be the role of congresswomen in the passage of the Civil Rights Act of 1964. The first thing I noticed was that when the act was passed very little was being published about women in Congress. Secondly, it shows a slow increase after the 1990’s of books concerning congresswomen until it begins declining again. Whilst this does not provide me with what books to look at or where to begin my research, it gives me a clue. I could start look in the 1990’s, as when I searched for the word ‘feminism’, I discovered a similar trend.

Screen Shot 2015-04-29 at 17.41.52However, this tool should not be heavily relied on. Words often change meaning, and as mentioned in a previous blog, the Ngram uses the OCR technique, which means errors are always a possibility.[4] For example, the letter ‘s’ until more recently looked like the letter ‘f’. As a result phrases like ‘Paradise Lost’ would be lost as the OCR scanner would register ‘Paradife Loft’.

Screen Shot 2015-04-29 at 17.49.29 (5)

Ultimately, these two approaches to breaking down the vast amount of data historians have at their fingertips are useful. However, they do not replace the good old-fashioned way of research. A WordCloud won’t provide you with the full understanding of a book, and the Ngram viewer lacks context. For historians starting a project, faced with mountains of research options, these tools can be used as a new way of breaking down the past.

[1] ‘About’, WordItOut, http://worditout.com/about; consulted 29 April 2015

[2] ‘Can We Reconstruct a Text from a Wordcloud?’, Thoughts on Public & Digital History, http://adamcrymble.blogspot.ca/2013/08/can-we-reconstruct-text-from-wordcloud.html; consulted 29 April 2015

[3] ‘Bigger, Better Google Ngrams: Brace Yourself for the Power of Grammar’, The Atlantic, http://www.theatlantic.com/technology/archive/2012/10/bigger-better-google-ngrams-brace-yourself-for-the-power-of-grammar/263487/; consulted 29 April 2015

[4] ‘Putting Big Data to Good Use: An Overview’, The Historian’s Macroscope: Big Digital History, http://www.themacroscope.org/?page_id=246; consulted 29 April 2015

[5] ‘A curiosity about the F-word in Google Ngram Viewer’, Language Learning, Science and art, http://jakubmarian.com/a-curiosity-about-the-f-word-in-google-ngram-viewer/; consulted 29 April 2015

Advertisements
Breaking Down History: Word Clouds, Google’s Ngram viewer, and historians.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s