The World Wide Web of Lies: Can historians rely on the Internet?

A study focusing on perceptions of the Internet discovered that respondents considered information found on the Internet to just a credible as information they would receive through newspapers and magazines.[1] With the advent of online archives, and blogs, historical research has entered the World Wide Web. Can historians rely on Internet sources? How can historians safely rely on the Internet in their research?

There have been instances of the Internet being used to spread historical lies. Professor Mills Kelly of the History Department at George Mason University decided to teach a class in 2008 called ‘Lying About The Past’.[2] This course asked students to deceive Internet users by creating the ‘historical’ figure of Edward Owens. The aim was to prove how easy it was to post inaccurate sources online. Professor Kelly announced the scam and warned another one would follow the next semester.[3] Another elaborate scheme followed, with another story about a serial killer. Reddit users quickly responded, and in less than half an hour users started to doubt the sources.[4] Whilst this project may have started in good faith, to highlight the problems of relying on the Internet, did this scheme help or hinder?

There is no doubt that the Internet can be easily manipulated. Users can edit Websites like Wikipedia, and relying on online sources can be problematic. Does that mean we should all turn off our Wi-Fi and run? Absolutely not. A poor workman blames his tools. The Internet can be as useful as you allow it to be. Historians should enter the fray with scepticism, of course, but the Internet can provide reliable sources for research. In order to research online, historians should research the sites they are interested in using. For example, I was recently searching for sources on the Civil Rights Act of 1964. When you Google that phrase, the first link that appears is Wikipedia. Checking the sources at the bottom of the Wikipedia page can lead you to more reliable information, but the page itself is useless to academics. However, on the first page of Google’s result we find At the click of a button you can find out the numerous institutions that the site partners with to enable this online archive. From sponsorship from the History Channel and the National Archives and Records Administration, the site checks out.[5] Websites like Our Documents that provide scans of original documents are safe-havens for historians. There are numerous sites that provide this security. Whether it’s the Old Bailey online archive or the Civil Rights Digital Library, there are numerous online resources that historians can trust.

When logging on, historians should be wary. Wikipedia, Google Images, and Yahoo answers will not provide the credible resources historians are searching for. So continue searching. The Internet is not out to get you. No self-respecting historian starts his or her research on Reddit, or on Wikipedia. Ideally, published articles and books should be the starting point for any research project. However, we should not be so quick to dismiss the Internet. My problem with the ‘Lying About The Past’ project? Professor Kelly only furthers the mistrust between the world of academia and the World Wide Web. Historians are well trained in analysing the usefulness of sources. Applying that skill to surfing the web should protect historians from the parts of the Internet that aim to mislead.

[1] Andrew J. Flanagin and Miriam J. Metzger, ‘Perceptions of Internet Information Credibility’, Journalism & Mass Communication Quarterly, (2000), pp515-540, at p515

[2] ‘The Real Story of Edward Owens’, The Last American Pirate,; consulted 29 April 2015

[3] ‘How the Professor Who Fooled Wikipedia Got Caught by Reddit’, The Atlantic,; consulted 29 April 2015

[4] Ibid

[5] ‘Organizations’, Our Documents,; consulted 29 April 2015

The World Wide Web of Lies: Can historians rely on the Internet?

Historians & Programming: how to make data work for you

In a field where reading every book on a subject is almost impossible, programming can help historians tackle the mountain of research before them. It is argued that historians are wary of the technological age.[1] Whilst that may be true for some historians, the Internet is becoming a valuable resource for academics. Whether it’s online archives, or programming, historians should not be wary of technology, but embrace it with open arms. The irony is historians have been programming since the 1970s.[2] Historical research may even be best suited for programming, enabling efficient research.[3] So should historian’s code, and how could it benefit them?

Personally, the idea of programming conjures up images of dramatic Hollywood movies of hackers and long lines of code. My first venture into coding didn’t exactly go well, but as always, learning a new skill takes time and a whole lot of patience. Thanks to the wonderful resource that is the Internet, learning how to programme can be both free and easy. Websites like the Programming Historian provide tutorials on how to venture into the world of ‘codes’, ‘interfaces’, and ‘data manipulation’.[4] This blog has discussed Optical Character Recognition in digitization projects, and coding can even help with OCR mistakes. All these methods ensure that the time historians spend researching is spent effectively.

When researching a topic, the biggest advantage and disadvantage facing historians is the sheer amount of books and articles available. Learning how to programme can make this process less daunting. Programming can change the way historians approach their projects. It can enable historians to utilise large amounts of work in a way they could not before.

Whilst a complete knowledge of all the ins and outs of programming is not required for historical research, some understanding of how programmes are built could enable historians to edit the software to work best for them. Programming can be left to the programmers, but using their software to its advantage is something historians can and should utilise. After all, building and expanding on the work of other professionals is certainly a skill historians know well. Programming can organize all the data you’ve collected and present it in an easy to understand way. Whether that’s creating a graph in Excel or creating historical maps, utilising data allows you to present your argument in a new way. Outside of making research easier, programming has other added benefits for historians. Whether it’s being able to add another skill to your CV or being able to edit the HTML code on your blog, programming opens up new doors.

So where do historians begin? Thankfully, as previously mentioned, there are resources available online aimed at making learning how to code less intimidating. Learning how to clean text created by the OCR scanners enables texts to be searched easily with keywords.[5] Topic modeling aims to understand the language used in a text and programming helps discover patterns in those words.[6] Learning how to take advantage of tools like Google Maps can be the foundation of creating your own digital maps.[7] Tutorials are available for these topics and many more on the Programming Historian site.

Screen Shot 2015-04-29 at 19.26.02

Ultimately, programming will only help historians in organizing and understanding their research. In their book historians William J. Turkel and Alan MacEachern make their argument as to why historians should enter the world of programming.

“If you don’t program your research process will always be at the mercy of those who do”.[8]

It seems as if there is no downside to venturing into the world of programming. Whether it’s making research easier, or presenting your work in new ways, programming can ensure that the data you create works for you.

[1] Julian J. DelGaudio, ‘Should Historians Become Programmers? Limitations and Possibilities of a Computer-Assisted Instruction in the United States History Survey’, The History Teacher, Vol. 33, (1999) pp67-78, at p67

[2] ‘Building the Historian’s Toolkit’, The Historian’s Macroscope: Big Digital History,; consulted 29 April 2015

[3] ‘Digital History: Concepts, Methods, Problems’, Stanford.Edu,; consulted 29 April 2015

[4] ‘Lesson Directory’, The Programming Historian,; consulted 29 April 2015

[5] ‘Cleaning OCR’d text with Regular Expressions’, The Programming Historian,; consulted 29 April 2015

[6] ‘What is Topic Modeling and For Whom is this Useful?’, The Programming Historian,; consulted 29 April 2015

[7] ‘Google Maps’, The Programming Historian,; consulted 29 April 2015

[8] ‘Building the Historian’s Toolkit’, The Historian’s Macroscope: Big Digital History,; consulted 29 April 2015

Historians & Programming: how to make data work for you

Breaking Down History: Word Clouds, Google’s Ngram viewer, and historians.

Searching the keywords ‘civil rights’, ‘America’, and ‘congress’ brings up 431,024 results on JSTOR. Searching the same keywords on Google Scholar brings up 1.6 million results. Whilst I am a committed historian, to read and understand the vast sum of work about my particular historical interest is impossible. How can we break down history into a more manageable workload? And are these methods useful? In an ideal world, time would somehow expand to ensure historians could read every work published about their topic. In reality, time is a virtue that academics rarely have. So what options do we have?

Word clouds are a quick and easy way of getting an overview of a text. Word Clouds generate a group of keywords, and the size of each word shows how often it was used in any piece of text.[1] Word Clouds can be used to show over-arching themes, and as a quick way of presenting an analysis. Adam Crymble outlined his concerns in a blogpost about this apporach.[2] Whilst WordClouds present key words, they present them out of context. It can break up phrases into separate words, and there is a risk that they over-simplify a complicated topic. However, historians can utilise WordClouds to their benefit. Whether it’s entering sections of a book to see what topics feature most, or entering their own work. With all internet resources, historians should try and make them work best for them. WordClouds won’t replace reading entire articles or books, as it will not provide the context and understanding of traditional research. However, it’s a new way to condense information, and for a quick glance at a book, it provides historians with relevant information.

Screen Shot 2015-04-29 at 16.22.22

(An example of a WordCloud, using a proposal for a Civil Rights & Congresswomen project)

Google’s Ngram viewer is another tool aimed at lightening the workload. Searching a keyword will provide a graph of the shifting popularity of words in published books over a period of time. Using OCR, Ngram uses around eight million books to create these graphs.[3] A corpus that no historian could conquer. How can this tool be useful for historians? Well, firstly, it gives a subtle hint towards historiography. For example, I used the Ngram Viewer to search for ‘congresswomen’.

Screen Shot 2015-04-29 at 17.38.48I narrowed my search for books published between 1940 and 2015. My focus for this topic would be the role of congresswomen in the passage of the Civil Rights Act of 1964. The first thing I noticed was that when the act was passed very little was being published about women in Congress. Secondly, it shows a slow increase after the 1990’s of books concerning congresswomen until it begins declining again. Whilst this does not provide me with what books to look at or where to begin my research, it gives me a clue. I could start look in the 1990’s, as when I searched for the word ‘feminism’, I discovered a similar trend.

Screen Shot 2015-04-29 at 17.41.52However, this tool should not be heavily relied on. Words often change meaning, and as mentioned in a previous blog, the Ngram uses the OCR technique, which means errors are always a possibility.[4] For example, the letter ‘s’ until more recently looked like the letter ‘f’. As a result phrases like ‘Paradise Lost’ would be lost as the OCR scanner would register ‘Paradife Loft’.

Screen Shot 2015-04-29 at 17.49.29 (5)

Ultimately, these two approaches to breaking down the vast amount of data historians have at their fingertips are useful. However, they do not replace the good old-fashioned way of research. A WordCloud won’t provide you with the full understanding of a book, and the Ngram viewer lacks context. For historians starting a project, faced with mountains of research options, these tools can be used as a new way of breaking down the past.

[1] ‘About’, WordItOut,; consulted 29 April 2015

[2] ‘Can We Reconstruct a Text from a Wordcloud?’, Thoughts on Public & Digital History,; consulted 29 April 2015

[3] ‘Bigger, Better Google Ngrams: Brace Yourself for the Power of Grammar’, The Atlantic,; consulted 29 April 2015

[4] ‘Putting Big Data to Good Use: An Overview’, The Historian’s Macroscope: Big Digital History,; consulted 29 April 2015

[5] ‘A curiosity about the F-word in Google Ngram Viewer’, Language Learning, Science and art,; consulted 29 April 2015

Breaking Down History: Word Clouds, Google’s Ngram viewer, and historians.

Double Rekeying & OCR: online archives and their approaches to digitization

Online archives now provide academics everywhere with access to thousands of primary sources. Cutting down the cost and time of research trips, these online archives play an important role in modern historical research. The British History Online website currently offers 1,200 volumes for visitors to discover, and that number is only growing.[1] With sources ranging from Ancient history to the 20th century, and covering topics from religion to the economy, the website offers a free and easy service to inquiring historians. Similarly, the Old Bailey online archive features almost 200,000 trials from 1674-1913, for historians to peruse at the click of a button. What methods do these archives use in the digitization of thousands of sources, and how effective are their approaches?

Both websites have utilised the ‘double rekeying’ method. Two typists manually transcribe the sources and then compare their work. Any mistakes are then manually corrected. Whilst double rekeying is arguably the most effective method of digitization, it comes with its own problems. Any project will have to cover the costs of the typists. When you’re dealing with as many as 200,000 sources, (as with the Old Bailey archive), time is certainly an issue. The cost and time involved with this method is not ideal. Even with the financial cost of double rekeying, it’s an effective method of digitisation. Double rekeying offers a ’99.995%’ accuracy rate. [2]However, the Old Bailey websites outlines the limitations of this method. When dealing with handwritten sources from the seventeenth and eighteenth centuries, human error is almost inevitable. To combat this, the website offers the original scanned document to users which provides security to academics wanting to utilise the sources. Although the double rekeying method is not perfect, it seems to be the best option available to historians wanting to digitize sources.

Both websites have utilised OCR in creating these online archives. Optical Character Recognition or ‘OCR’ is more unreliable than the process of double rekeying. This method relies on software rather than human transcription as an optical scanner is used to digitize all documents. The OCR method could create more problems for a project than utilising double rekeying, as the margin of error is much greater with OCR.

The picture below provides an example of errors made by OCR systems.

Screen Shot 2015-04-28 at 17.54.58

The first example shows that the slightest change in writing causes errors within the systems. Words like ‘recommendations’ become ‘RecDmENASTIONS’. With just this example, the problems with OCR became clear. The inaccuracy renders searching for keywords difficult, and without manual corrections OCR can be ineffective. The manual attempt to correct mistakes does not need to be a financial burden to a digitization project. Crowdsourcing is always an option, similar to the Bentham project from the University College of London.

Whilst these two approaches do not provide complete accuracy to a digitization project, there are significant advantages to utilising OCR and double rekeying. As a historian based in England wanting to research American history, online archives are the foundation of my research. Both the British History Online website and the Old Bailey website provide a valuable resource to historians interested in their respective fields. Whilst the approaches themselves may be problematic, using both OCR and the double rekeying method ensures a greater level of trust when using sources on their websites. If either website only used OCR and did not have a team of highly trained academics ensuring the credibility of the site, historians could be well within their right to be wary. However, the approaches adopted by these websites ensure a quick and easy way of making primary sources available to all online, especially in comparison with manual transcriptions. Although vast improvements could be made to OCR software, and double rekeying is time-consuming and expensive, the pairing of these two methods have made the Old Bailey and British History Online websites a worthwhile source, providing historians with accurate transcriptions of primary sources.

[1] Homepage, British History Online,; consulted 28 April 2015

[2] ‘About British History Online’, British History Online,; consulted 28 April 2015

Double Rekeying & OCR: online archives and their approaches to digitization

How useful can a website be?: ‘The History Learning Site’ and the cardinal sins of history.

With around 1.2 million students turning to the Internet to help them revise, online educational resources are becoming ever more important.[1] How useful are these revision sites? For history, students are inundated with options. Whether it’s the YouTube channel Crash Course (with its almost three million subscribers), BBC’s GCSE Bitesize, or the ‘History Learning Site’: students are relying on the Internet to succeed in their studies. The History Learning site was established in 2000, and provides visitors with information on almost too many topics.[2]

Chris Trueman, the original author of the site, graduated with a BA (hons) in History, and taught History for 26 years.[3] Trueman himself had some significant training in history, and taught in subjects offered by the site. However, the sheer breadth of topics seems problematic. With subjects ranging from Tudor England to Hitler’s Germany, the phrase ‘jack of all trades, master of none’ does come to mind when visiting this website. Any academic would be hard pushed convincing an audience that they were an expert on this vast list of topics; which span from the 8th century to the 2008 presidential election. The website now lists Trueman’s niece and nephew along with a ‘team of history graduates’ as the new authors of the website. This description is rather vague, and the website seems to rely on Trueman’s credentials even after he can no longer contribute to the website.

The site itself seems out-dated. The graphics and fonts remind visitors that the website was created in the early 2000s. The homepage is simple to use. It lists the topics in chronological order, and splits them up into exam subjects as well. This seems to outline that the website is aimed at younger students; GSCE or A-level students.

The Civil Rights movement of the 1950s/1960s, and specifically the Civil Rights Act of 1964 are of a particular interest to me. The section on the 1964 Act provides a rough overview of the topic.

The biggest concern for anyone above GSCE level is that the information provided is not cited. For example, the section refers to the statistic that ‘57% of African-America housing judged to be unacceptable’.[4] Firstly, the information is not cited. An A-level student, or certainly any undergraduate, could not use that information in a project or an essay. Secondly, the website does not make it clear when this statistic refers to. 1963? 1964? It makes it almost impossible to rely on the evidence it’s providing to its users. This page also features the cardinal sin of history. It refers to ‘many historians’ when discussing the importance of the Civil Rights Act. You can almost hear the strained cry of academics screaming, ‘which historians?’.

This site succeeds in providing an overview of a topic. For GCSE students, this site could be a valuable resource when revising for an exam. A-level students could use this site as a quick reminder of information, but could not rely on it for essays or projects. For undergraduate students? This website is rendered almost completely useless by its lack of citations, and its ambiguous nods to historiography. However, whilst the website is irrelevant to undergraduate students, that certainly does not mean the website is not a useful resource. For a quick overview of a topic, or out of intrigue, this website provides a quick glance at history. Undergraduates could use the website to get an outline of a topic they might be considering for an essay, but this website could not be cited in any assignments. For history enthusiasts and younger students, this website could be invaluable in providing a quick and easy narrative of an otherwise complicated topic.

[1] ‘Children with internet access at home gain exam advantage, charity says’, The Guardian,; consulted 27 April 2015

[2] ‘About the Author’, History Learning Site,; consulted 27 April 2015

[3] ‘About the Author’, History Learning Site,; consulted 27 April 20152015

[4] ‘1964 Civil Rights Act’, History Learning Site,; consulted 27 April 2015

How useful can a website be?: ‘The History Learning Site’ and the cardinal sins of history.

Crowdsource transcription critical reflection

This post will critically reflect on two crowdsourcing projects; Transcribe Bentham and Old Weather and analyse what these projects offer to those who participate, and how these projects can improve the experience for their audience.

Transcribe Bentham is an award winning project aimed at publishing the works of Jeremy Bentham. [1]Bentham was a prominent philosopher and advocate of representative democracy, [2] and the digitisation and increased availability of his works are important as they are becoming increasingly relevant to modern life. [3] The Old Weather project is looking at how the climate has changed by transcribing ship logbook entries. [4]

Both websites allow members to join for free, meaning that all a volunteers would need to participate in these projects in as Internet connection. Old Weather provides a tutorial for those interested in participating, which walks the volunteer through how to contribute to the project. The Old Weather website offers a forum where volunteers can discuss the project and get advice from other contributors. The forum encourages discussion between volunteers and fosters a sense of community. For volunteers of the Old Weather project, transcribing provides an opportunity to be a part of a community.

To encourage participation Old Weather gives its volunteers titles. For example, when you first begin transcribing you are a ‘Cadet’, and once you complete a certain number of transcriptions, a volunteer can be promoted to ‘Lieutenant’.  This is an effective way to encourage participation, and inspire users to keep transcribing.

Transcribe Bentham’s website is a bit more difficult to navigate. However, you can pick what document to transcribe by subject matter, difficulty, or the period in which it was written. This engages users as they can pick which documents to transcribe based on subjects the interest them. This is particularly useful as Bentham’s handwriting can be rather difficult to decipher. Once you familiarise yourself with the layout of the website, the actual mechanics for transcribing documents are easy to use.
Both websites are limiting volunteers to those who can speak English, and as a result the amount of people willing to volunteer for these projects are dramatically restricted. Transcribe Bentham’s website is difficult to Navigate. Provide more of a historical background to the sources. The Old Weather website is a more modern website, and Transcribe Bentham’s website looks rather anachronistic in comparison.

Overall, both websites offer a unique experience to those who choose to volunteer for the projects. Users are able to interact with history in a more direct way, than they would by reading a journal article for example. Volunteers are playing an active part in ensuring these documents are available for years to come.


Causer, Tim, Terras Melissa, ‘Crowdsourcing Bentham: beyond the traditional boundaries of academic history’, International Journal of Humanities and Arts Computing, Vol. 8, No. 1. (2014)

Causer Tim, Justin Tonra, and Valerie Wallace, ‘Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham’, Lit Linguist Computing, Vol.27, No. 2 (2012)

Brumfield, W. Ben, ‘Collaborative Manuscript Transcription’, Manuscript Transcription,,

[1] Tim Causer, Melissa Terras, ‘Crowdsourcing Bentham: beyond the traditional boundaries of academic history’, International Journal of Humanities and Arts Computing, Vol. 8, No. 1. (2014) P2

[2] Tim Causer, Justin Tonra and Valerie Wallace, ‘Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham’, Lit Linguist Computing, Vol.27, No. 2 (2012) P120

[3] Tim Causer, Transcription Maximised, P120.

[4] Ben W. Brumfield, ‘Collaborative Manuscript Transcription’, Manuscript Transcription,, consulted 4 March 2015.

Crowdsource transcription critical reflection

#twitterstorians: academia and social media.

Sometimes it can be difficult to imagine the worlds of social media and academia colliding, but that’s no longer the case. Of course, Twitter can be a source for constant tedious updates about the weather or someone’s bus journey, courtesy of that school friend you have not spoken to in years. However, Twitter is as good or as bad as you make it. For historians, you can create a Twitter feed that can be a valuable resource for networking, research, and finding a job. Historians seem to be latecomers to this social media party, but the presence of academics on Twitter is certainly growing.[1] Around 1 in 40 scholars have a Twitter account, and it seems this number will only rise.[2] With around 288 million users, Twitter has quickly established itself as a frontrunner in the social media world. [3] So, how can historians benefit from using Twitter?

Twitter has, to some extent, democratized academia. Undergraduates can tweet Professors, and the social aspect of this social media website, opens up new conversations that may not happen without Twitter. The #hashtag makes finding other historians on Twitter simple. Just search for hashtags such as #twitterstorians, or #earlymodernhistory, and join the discussions. Conversations that would usually be limited to long emails, or conferences, can now happen at the click of a button. The hashtag function can also help in that tiring search for a new job. Searching both ‘#twitterstorians’, and ‘#jobs’, brings up a long lists of employment opportunities including permanent lectureships and oral history curators. Hashtags, as well as other accounts, can provide a historian with information they may not have otherwise found. New archive openings, or new sources in your field. Ultimately, if you know where to look, Twitter can be a goldmine for historians; whether it’s finding other academic work, or new archives, or even just an interesting blogpost. Twitter makes those discoveries easier, and at your fingertips.

Following organizations such as The British Library or The National Archives on Twitter provides up-to-date information that could be invaluable when planning a research trip. A quick visit to the National Archives Twitter page and their ‘reply’ section shows that Twitter can be a quick and easy way of getting your questions answered. With frequent updates, it can offer you information on new exhibitions, opening hours, and advice. For any research trips, Twitter can be a helpful resource for historians.

Although Twitter limits posts to 180 characters, this should not discourage historians from participating. Twitter is certainly not the best tool to publish an article, or a dissertation. However, it’s the perfect way to send your followers links to your work. Whether it’s a blog post or the link to an article, when utilised, Twitter can be a free advertising platform to encourage more people to visit your website, or read your work. Arguably, the biggest disadvantage of Twitter is the 180-character rule. However, with the ability to include short hyperlinks into a Tweet, this should not dissuade academics from signing up.

Twitter will not replace the typical forums of discussions for historians. Twitter will not usurp conferences, archives, or the lecture hall. 180 characters will not shake the world of academics, but it can change how academics interact with each other. It can provide new audiences for a historian’s work, and it can offer a quick and easy way to play an active role in the community. Inherently, Twitter is a social media website. However, if you use it to your advantage it can be a useful resource for involving yourself in the historical community.

[1] ‘Twitter Among Scholars’, Figshare,; consulted 22 April 2015

[2] Ibid

[3] ‘About’, Twitter,; consulted 22 April 2015

#twitterstorians: academia and social media.