Hello all! This past week or so we’ve been discussing in class the different approaches to a research method known as text mining. Text mining is essentially a way that we as researchers can search through a large amount of data or text for specific words or phrases and observe trends throughout different works that can give us some more evidence as to the context and significance of those works.
In the past, analyzing a work or a collection of works would have been a very time consuming and labor-intensive process. But with the technology that we have at our disposal today, a book that would have taken weeks or even months to examine can be analyzed in a matter of seconds.
Text mining relies on having a digital corpus, or collection of works, to read through in order to analyze trends in the usage of words and phrases. There are several tools that can be used to begin text mining, such as AntConc, Overview, and Voyant. These tools all work in similar manners, allowing you to search terms in context and explore the number of times those words were used. I found from my explorations that Voyant was typically the most user friendly out of the three and presented all of its data in an easily readable format. So to begin to explore its capabilities, I uploaded the works from my Crime in the Media class last year at the UI to serve as my database.
Voyant is able to take any amount of text that you upload and can give you an analysis of word uses, trends in those usages, along with other useful bits of information on the text itself.
Voyant is able to construct a word cloud from the provided text, which can serve as a starting point to some more in depth research. We see above that the most common word in the collection of my works was “Arias,” referring to the Jodi Arias murder trial in 2008. We also see one of the more commonly used words was “Walter,” referring to Breaking Bad‘s Walter White. By analyzing the number of times each word was used, we can begin to infer things such as what topic was discussed at the most length throughout these papers as well as the context in which they were spoken of.
We see in the bottom right corner of Voyant there is actually a “Context” box that allows us to see the exact way that a word was used in any given sentence. In this case, we can see “Arias” is typically performing some sort of action, which tells us that this piece was written largely in telling the story of her and her trial.
These sort of text mining tools are definitely useful in providing a statistical analysis and can give a certain amount of context of any particular term, but to maximize the effectiveness of these sort of tools, we must use critical thinking and come up with good questions to investigate in order to get to the heart of the work itself.
If we take a larger work, for example 1846 penny dreadful The String of Pearls, and wish to look at the role that gender played in this piece, we may begin by looking at the usage of gender specific terms throughout the piece.
If we search the terms “man” and “woman” in this work, we can see that “man” is used drastically more often than “woman” was throughout the entirety of the piece. From this we might be able to guess that the story revolves more around men and their exploits, or that most of the main characters are men rather than women.
But if we search the terms “himself” and “herself” we see there is more of a fluctuation in the feminine term. “Herself” is used more often in the first half of the work, maybe telling us that there is indeed a female character who is described to the readers in the beginning of the story. Notice on both searches, the terms “woman” and “herself” both increase in the end portion of the work. This may be able to give us clues that though a female character didn’t appear often for a large portion of the book, she plays a more pivotal role as the story comes to a close.
Using text mining in this manner can help us as researchers to explore vast amount of materials. It allows us to get hard, statistical facts on the usages of certain phrases and words throughout a work. But strong critical thinking skills and analysis on our own parts are also required to get to the bottom of that statistical data.