Data Visualization

Hey all! In class we’re starting to do some distant readings of the of our corpus of Idaho Statesman articles relating to the Fourth of July in order to see what trends start to emerge over time. And specifically this week, we’re looking at different ways that we can visualize those trends.

men, women, children chart

I’m starting out by looking at the demographics of the individuals who attended Fourth of July celebrations from 1864 up until 1998. Using Voyant, I was able to search how frequently the terms “men,” “man,” “women,” “woman,” “child,” and “children” were used over the years in relation to the celebrations. It seems that generally, men were referred to more often in the articles than children, and children seemed to be referred to more often in some years than women, which stood out as interesting to me.

crowds, president, speaker chart

Using Voyant and Google Sheets together, I was also able to search how often these terms in the chart above were used throughout the articles in order to see who in particular the stories seemed to focus on. We can see here that the terms “crowd,” “crowds,” “audience,” and “spectators” combined make up the largest percentage of the words used, which can tell us that in general the articles focused on telling the story of the attendees at the Fourth of July celebrations. “President,” was also a frequently used term, showing that many articles focused on the celebrations of the U.S. presidents throughout the years, while less public figures such as mayors were not as prominent. Lastly, those who gave speeches on the holiday also seemed to be focused less on, as “speaker” and “speakers” were used less often than other terms.

citizens vs soldiers

My final chart focuses in on how often stories focused on military figures involved in the celebrations versus the civilians who attended the events. From this search in Voyant, we can see that “citizens” and “citizen” were used 60 times combined, while “soldiers” and “soldier” were used 21 times. This may tell us again that stories focused on ordinary citizens who attended the events, but also that soldiers were often an important part of the celebratory traditions.

 

Advertisements

Fourth of July Insights

Hey all! This past week in class we’ve been working to develop a corpus to explore major trends concerning Independence Day and how it has been written about and celebrated over the years. Our corpus is made up of newspaper articles from the Idaho Statesman on July 4th and 5th from the year 1900 until 1949. We compiled 193 separate articles from those years and explored them using Voyant.

July 4th Word Cloude

We can see from the word cloud above that the words “fourth,” “July,” and “day” are by far the most commonly used words throughout the years, as is to be expected. But we also see words such as “Boise,” “people,” “celebration,” and “program,” from which we can start to guess that generally these articles were about the celebrations themselves and the people who took part in them, as well as what towns people celebrated in.

rodeo v. concert v. sports

We can see from the word cloud that parades and firework shows were popular forms of celebration, but by using the search trends function in Voyant, we look into what other celebrations outside of those more traditional ones were popular over the years. Rodeos and concerts were among some of the highest forms of entertainment over the years. Concerts look to be more common, but in years where both rodeos and concerts took place, rodeos seemed to be more popular.

This is just the beginning of looking into these trends concerning the Fourth of July over the years. With some more time and research, we can see some deeper insights into the general attitudes and customs of Americans during the Fourth of July.

 

Discovering Digital Tools

Hello all! This week in class we’re exploring the types of tools that are offered online via DiRT Directory, which is essentially a registry of a number of digital tools that serve a variety of purposes. It suggests software programs that will allow you to visualize data, collect information, record audio and video, or accomplish a number of other tasks. So today, we’re looking at a couple examples of those tools and what you could accomplish in using them.

First off, we have a program called Heurist which allows you to create, manage, and share a unique database in a short time, typically a few hours. It looks to be simple to use if you already have some knowledge about databases and have an idea of what you want to be in your database. This could be useful for organizing large amounts of texts and links for a project and would allow you to share the sources of your information with others who may be interested.

Another tool that I thought looked interesting and could have some cool implications for researching is an audio recorder and editor program called Audacity. It allows you to record live audio and play it back on your laptop later, as well as edit out the bits of audio you don’t need, move audio pieces around, or change the speed of the audio. This might be useful in a project exploring the evolution of the music industry over the years, or could allow you to embed an audio recording of a famous speech from a politician.

This tool is also interesting because it allows you to convert physical tapes and records into digital recordings or burn them onto a CD. This is something that could be really useful in giving you more freedom in using audio recordings despite what form they are in. This could allow you to take an old recording that may not be digitized yet and digitize it for your specific project. How exactly Audacity does this is a bit unclear, but the implications of that are big.

Those are just a couple of the digital research tools available online at DiRT Directory that could be helpful in current or future history projects. These two are also opensource programs and are free. Many more useful tools can be found on their site for free or for a relatively cheap price.

 

Regular Expressions

Hey all! This past week in class we spent some time exploring regular expressions, a tool that helps us find and manipulate text in a text document.

To start out with regular expressions, I downloaded the program Notepad++, which allows us to input a text document and search that document for certain words, phrases and trends.

When using Notepad++, there are a lot of different search functions that you can use to find particular words. I used the text from an essay on Oliver Stone’s film Platoon to explore those functions.

To start off, I searched for all the uses of the word “soldier.” But if you wish to search multiple words at a time, such as “soldier” and “Vietnam,” you can search using “soldier | Vietnam.” This will highlight all instances of those words.

Notepad++ also allows you to search all words that start and end with a particular letter or letters, by using a period between the two letters. For example, searching “s.t” will show you all uses of “sit” or “sat,” as well as all phrases with a word that ends in ‘s’ and a following word that begins with ‘t.’

This week we also explored the functionality of Wget. This program allows us to download a large amount of text in an effective way. It essentially automates the process of data collection by allowing you to download a selected website or portions of that website.

It allows you to retrieve or mirror an entire website that you choose. One simple command can download the entire site onto your computer. It also allows you to download specific files within a website.

One issues that I encountered when trying to use Wget is that it is difficult to download onto a computer with a Windows operating system. Wget works well on Mac computers, but I personally had trouble finding a file that would download onto my laptop to explore the full range of its capabilities.

Nevertheless, Wget does serve a unique purpose! It allows you to very quickly download the text you wish to analyze with other tools, such as Notepad++, and could save you a good deal of time.

Exploring the HTML Editor

Hello all! In this blog I am simply looking at the capabilities of the HTML editor on WordPress.

By using the HTML editor, we can add emphasis to a particular word through the use of a specific code.

We can also add links to a body of text to other sites that might be useful, such as The Programming Historian home page.

Images such as this may also be inserted into a text via the HTML editor.
old pic

This coding effect may also be used to make your text appear in a different manner.

The block quote feature also allows you to separate a block of text from the rest of the body of your work.

 

Benefits of Text Mining

Hello all! This past week or so we’ve been discussing in class the different approaches to a research method known as text mining. Text mining is essentially a way that we as researchers can search through a large amount of data or text for specific words or phrases and observe trends throughout different works that can give us some more evidence as to the context and significance of those works.

In the past, analyzing a work or a collection of works would have been a very time consuming and labor-intensive process. But with the technology that we have at our disposal today, a book that would have taken weeks or even months to examine can be analyzed in a matter of seconds.

Text mining relies on having a digital corpus, or collection of works, to read through in order to analyze trends in the usage of words and phrases. There are several tools that can be used to begin text mining, such as AntConc, Overview, and Voyant. These tools all work in similar manners, allowing you to search terms in context and explore the number of times those words were used. I found from my explorations that Voyant was typically the most user friendly out of the three and presented all of its data in an easily readable format. So to begin to explore its capabilities, I uploaded the works from my Crime in the Media class last year at the UI to serve as my database.

Voyant is able to take any amount of text that you upload and can give you an analysis of word uses, trends in those usages, along with other useful bits of information on the text itself.

voyant crime in the media

Voyant is able to construct a word cloud from the provided text, which can serve as a starting point to some more in depth research. We see above that the most common word in the collection of my works was “Arias,” referring to the Jodi Arias murder trial in 2008. We also see one of the more commonly used words was “Walter,” referring to Breaking Bad‘s Walter White. By analyzing the number of times each word was used, we can begin to infer things such as what topic was discussed at the most length throughout these papers as well as the context in which they were spoken of.

We see in the bottom right corner of Voyant there is actually a “Context” box that allows us to see the exact way that a word was used in any given sentence. In this case, we can see “Arias” is typically performing some sort of action, which tells us that this piece was written largely in telling the story of her and her trial.

These sort of text mining tools are definitely useful in providing a statistical analysis and can give a certain amount of context of any particular term, but to maximize the effectiveness of these sort of tools, we must use critical thinking and come up with good questions to investigate in order to get to the heart of the work itself.

If we take a larger work, for example 1846 penny dreadful The String of Pearls, and wish to look at the role that gender played in this piece, we may begin by looking at the usage of gender specific terms throughout the piece.

voyant sweeney todd gender

If we search the terms “man” and “woman” in this work, we can see that “man” is used drastically more often than “woman” was throughout the entirety of the piece. From this we might be able to guess that the story revolves more around men and their exploits, or that most of the main characters are men rather than women.

voyant himher

But if we search the terms “himself” and “herself” we see there is more of a fluctuation in the feminine term. “Herself” is used more often in the first half of the work, maybe telling us that there is indeed a female character who is described to the readers in the beginning of the story. Notice on both searches, the terms “woman” and “herself” both increase in the end portion of the work. This may be able to give us clues that though a female character didn’t appear often for a large portion of the book, she plays a more pivotal role as the story comes to a close.

Using text mining in this manner can help us as researchers to explore vast amount of materials. It allows us to get hard, statistical facts on the usages of certain phrases and words throughout a work. But strong critical thinking skills and analysis on our own parts are also required to get to the bottom of that statistical data.

 

Exploring Ngrams

Hey all! This week for class we’re discussing and practicing the use of Ngrams in our historical research. By searching within a certain collection of works and books, these Ngrams are able to display a collection of data which tells users how often a certain word or phrase has been used at specific points in the past. Using an Ngram allows historians to begin to notice language use trends over time which can be quite informative and can provide some insight into the events happening in the world at that time.

To begin exploring how these Ngrams work, I decided to search the keywords “comic book,” “superhero,” and “Captain America.” This is the information I was presented with when using the Google Ngram Viewer.

google ngram

We see that the phrase “comic book” started to be published some time during the 1930’s, while “superhero” was not a common word until the mid 1960’s. From this one can guess that perhaps early comic books were not entirely focused on heroes with extraordinary powers as many are today, or perhaps comic book creators had not yet identified those heroes specifically as “superheroes.” Its also interesting to note that the use of “superhero” and “Captain America” in publications seemed to begin and rise at a similar rate for about a decade starting in the 1960’s. Perhaps Captain America started to become more popular and was marketed as one of the first true superheroes during that time.

Google Ngrams is not the only Ngram tool online though. Many other sites offer their own Ngram viewers that allow users to search their sets of collections. When searching the Bookworm Open Library for the same terms as those above, we see a fairly different result.

bookworm ngram

We can see here that this Ngram viewer has the same spike in the term “comic book” during the 1950’s as the Google Ngram Viewer. The interesting thing though is how much drastically higher the spike is here than on the first chart, along with the relatively low usage of the word after that spike compared to the steady incline also found in the first chart. This could tell us that these two Ngram tools are referring to different collections of published materials, therefore showing differing results.

One last Ngram tool that I used to research this topic is the TIME Magazine Corpus, which shows specifically how often words were used over time in their TIME Magazine publications.

time magazine superhero

Here we see the number of times the word “superhero” was used each decade in TIME Magazine. Its trends are quite similar to the Google Ngram Viewer chart, as it shows a steady increase in usage from the 1960’s and onward. This can tell us as historians that the superhero trend gained popularity even in the more influential and mainstream media outlets and gives us a better understanding of the reach of its culture over time.

These Ngram tools are quite effective in showing researchers the trends of the popularity of certain subjects over time, and are to be considered a useful tool. But because of the varying collections that each site uses, it can be difficult to take this information as 100 percent accurate. Even so, these Ngrams serve as a good first indicator of trends in historical research.