John Matey

Text Pattern Analysis, Emily Dickinson: 1864-5

Emily Dickinson’s writing has continued to fascinate readers for over one hundred years past its creation.  Her reclusive personality and peculiar quirks coupled with the mysterious nature of her writing leave much the imagination.  Aside from the obvious strength of her body of work, much of the interest in her and her poetry comes from the lack of information available about her life.  Much has been argued back and forth about the role an artist’s life has on their work, or “separating the art from the artist”.  In the case of Dickinson, it is difficult to attribute real life events to poetry without a fair degree of guesswork.  In the case of this project, I attempted to take the little information available about her and draw a connection between two years of her life (1864 and 1865) and her poetry through word usage.

My interest in these two years specifically was due to the change in her writing (as well as some technical difficulties, which will be addressed in more detail in the DH Process Blog).  According to Thomas H Johnson (who was responsible for the numbering of Dickinson’s poems), Dickinson’s most productive years in terms of poems written were the early 1860s.  Johnson claims that “Dickinson continued to write poetry [after 1865], but never again with the urgency she experienced in the early 1860s, when she fully developed her ‘flood subjects’ on the themes of living and dying” (viii).  By analyzing 1864 and 1865, I hoped to find a change in her word choice to signify a change in her writing that would explain the decrease in poetic output: in 1864, Dickinson wrote one hundred and seventy-two poems, and in 1865, eighty-four (Johnson viii).  After running the entire text of her poetry from each respective year through Voyant, I produced two graphs that display the fifteen most used words throughout the two years.

Anyone with a passing interest in Dickinson will know about her fascination with death, so “death” appearing in both years is no surprise.  In fact, there are several similarities between the years, namely the following words: day, death, till, sun, unto, like, light, and little.  Given that there was such a large difference in the amount of poems written, I found it hard to compare the frequency of words between the years, so I converted the number of uses into a percentage from the total word count of each year.  For example, in 1864, “day” was used thirty times, which makes that word 0.335% of the total words used.  In 1865, the word was used seven times, or 0.207% of the total words used.  Dickinson’s language is very diverse, evidenced by the small percentages, however, changes can still be measured between the two.

Of these words, the most telling would be the nouns: day, death, sun, and light.  Being potential candidates of symbolism, when used in poetry these words often express moods and feelings.  However, the only major change was on the word “day,” which was used less in 1865.  Aside from “sun,” the other word’s usage increased.  1865 signified the beginning of a decline in the amount of poems Dickinson wrote, but the information here does not seem to signify any sort of drastic change in her writing.  Death is still a significant topic, while the other words are situational and likely more dependent on context to ascertain their meanings.

Unfortunately, after analyzing this data, I find it difficult to come to any sort of conclusion about Dickinson’s life or her poetry.  My complications with transcribing Dickinson’s poems into Voyant severely limited the amount of data I had access to (I initially intended to use all of her poetry, rather than two years-worth).  Working with fractions of percentages do not seem very telling for a greater picture of how Dickinson’s life affected her poetry.  Additionally, by only looking for specific words, it severely limits the amount of information available about some words that could be used in multiple contexts.  For example, “day” might be associated with light, but it could also have a negative connotation if it is used in another manner.  This is an inherent flaw with searching for specific words.  Perhaps in a large-scale search across her entire literature, this sort of analysis could be useful, but with a limited selection, it leaves too much room for ambiguity.  However, I can reaffirm that Dickinson’s preoccupation with death remains strong throughout the years I explored.


What text did you choose and why?

I chose Emily Dickinson’s poetry.  Initially I wanted to work with her entire collection of poetry, but due to time constraints I was forced to narrow it down to the years 1864 and 1865.  I have always felt a personal connection with Dickinson’s work.  Her language speaks to me, and I find her ruminations on death interesting, so I wanted to explore more of her writing from a digital humanities perspective.

What hypothesis or research question did you start with?  How did you come up with that?

Initially, I wanted to look at the numbering of Dickinson’s poems, and analyze that correlation with events that occurred during her lifetime.  A healthy amount of guesswork would have been involved due to a lack of concrete dates on any of Dickinson’s poems, but I planned to look at poems from the same years that significant events occurred and see if there was any reflection of the event in her writing.  I came up with this because I have always wondered about the connection between an artist and their art.  For example, when an artist does something morally suspect, we tend to re-examine their work in a negative light.  It seemed contradictory to me that the public’s opinion of a piece of art could change from a positive one to a negative one when the art itself is unchanging.

What tools did you choose to use and why?

I only used Voyant, simply because I only needed a program that could tell me the frequency of certain words in a body of text.  Given that the chronology of her poetry is based on guesswork, I felt any kind of analysis of changing word use over time within a single year would not be accurate.  I intended to analyze change over time across years of her work, as I felt that Johnson was more likely to get the year a poem was written in correct rather than the specific order of every poem.

What steps in order did you take to complete this project?

After creating a topic, I did some research on what years of Dickinson’s poetry I should start with.  The Johnson collection mentioned her productive years being the early 1860s, so I decided I would start with the year that he considered a transition into a different stage of her writing.  I hypothesized the most change would occur in this year.  After narrowing my topic to 1864 and 1865, I copied the text into Voyant and used it to analyze the word usage.  Following this, I put the data into graphs, and attempted to find some sort of meaning in the data.

What challenges did you come across? What kinds of decisions did you have to make?  How did you work around them, and how did they shape the final outcome?

I felt this had the greatest effect on my project.  When searching for a complete collection of Dickinson’s work, I decided that I wanted the Johnson version of her poems, as they seemed the most accurate to what Dickinson originally wrote.  Unfortunately, the only available form of these poems was in a PDF.  This meant I could not copy the data into Voyant without the text being butchered (certain letters would often be misread; for example, some Ms became Ns, but it was not consistent), so I was forced to copy each poem over by hand.  This took an extremely long time, much longer than I anticipated.  I did my best to avoid any human error in the transition, but with the mass of poetry I transcribed, it is likely inevitable that at least some mistakes were made.  Initially I intended to analyze all of Dickinson’s poems, but I was forced to narrow down to two years due to time constraints.  This severely changed the outcome of the project.

How do you feel about the project that you made?  Does it meet with your expectations?

I feel that my project could be better.  Had I been aware of the amount of time it would have taken to retype all of Dickinson’s poems, I would have allocated much more time to it.  Because I was not able to include all of Dickinson’s poems, it forced me to change my topic and presented me with a narrower range of data that did not yield as much of an answer to my hypothesis as I had hoped.

If you had unlimited time and energy to pursue this project, what things would you do differently? What questions still remain unanswered?

Firstly, I would put all of Dickinson’s poems into Voyant.  I would also do extensive research on Dickinson’s life to try and gain as much insight as possible so I could relate the information gathered from her poems to her life more clearly.  Additionally, I would research the topic of my project itself to see if any work has already been done on the subject.  Perhaps I could build upon someone else’s work rather than start from scratch.  From my research, I feel the question of how much Dickinson’s life affected her poetry is still mostly unanswered.  While common knowledge tells me that the obvious answer to that question is “yes, Dickinson’s life played a major role in her writing,” I wanted to find specific changes in word use to back this claim up.

How could you imagine someone else building on or extending the work you started?

I imagine there is a large psychological component to my work that I have not explored.  How the mind is affected by its surroundings and the psychology behind writing and expression would play a large role in understanding Dickinson.