Foundations of Analysis: Part 1

On the Shoulders of Giants and Other Researchers of Above Average Height

shoulders4

In the introductory post, I indicated that most of my research in Pop Culture derives its foundation from the field of Digital Humanities (DH) and the works of “giants” in the field like Lev Manovich and Franco Moretti. In this post, I want to put this foundation in context, discussing some of the key ideas and the researchers behind those ideas.

For the record, Manovich and Moretti are #3 and #4 in the picture above. The identity of the others is revealed below. That’s me “standing on their shoulders.” You may heard the phrase If I have seen further, it is by standing on the shoulders of giants” often attributed to Sir Isaac Newton or read Stephen Hawking’s book On the Shoulders of Giants. Matthew Jockers, one of the leaders in DH (#5 above and discussed at the end of the post), also noted in describing the state of DH’s union that, “In 2012 we stand upon the shoulders of giants, and the view from the top is breath taking.” Actually, the original phrase and the one quoted in the picture above was more like “A dwarf standing on the shoulder of giant may see further than a giant himself.”  This phrasing dates to sometime in the 12th century well before political correctness. The term was meant to be metaphorical. referring to someone with mortal skills. For me its both metaphorical and literal since I’m 3 standard deviations to the left of the average height for US males.

Roots of Digital Humanities

The humanities encompass a variety of disciplines focused on the study of human culture.  Among the various disciplines are literature, history, philosophy, religion, languages, the performing arts, and the visual arts. Besides their focus on human culture, the humanities traditionally have been distinguished (from the sciences) by the methods they employ, relying primarily on “reflective assessment, introspection and speculation” underpinned by a large element of historical (textual) data.

In recent years growing segment of humanists (and researchers in other disciplines) have also begun to apply some of the newer analytical and visualization methods to a range of traditional humanists topics. The application of these newer tools and techniques falls under the rubric of digital humanities (DH). The DH label was created at the beginning of the 21st century to distinguish it from its earlier counterpart — humanities computing which encompassed a variety of efforts aimed at standardizing the digital encoding of humanities texts — and to avoid any confusion that might be raised by labeling these newer activities as digitized humanities.

Google Trends and Ngram - Digital Humanities and Humanities Computing

Basically, digitized humanities refers to the digitization of resources so they can be accessed by digital means (e.g. digitizing images so they can be viewed on line or digitizing  an image catalog so it can be queried online).  In contrast, DH refers to humanities study and research performed with the aid of advanced computational, analytical, and visualization techniques enabled by digital technology. Digitization is simply one component of DH, albeit a critical component.

“The ‘IZEs’ have it” or should that be “The ‘IZations’ have it”

Speaking of “digitization,” have you ever noticed the proliferation of “izes” and “izations” in today’s rhetoric, especially IT rhetoric? As John Burkardt, a research associate in the Department of Scientific Computing at Florida State University, reminds us, we owe this phenomenon to the “16th century poet Thomas Nashe for inventing the suffix –ize as a means of easily generating new and longer words” and to the (Madison Ave.) ad-speak of the 1950s which relied on (among other things) the use of -ize and -ization to create new verbs and nouns. Burkhardt provides a long list of (393) examples, many of which are current day.  In the world of IT, “izes” and “izations” abound. For instance, a recent Tweet highlighted “The digitization and ‘cloud’ization of data #digital data.” Similarly, recent articles about the Internet of Things (IoT) have opined that the IoT is about “the ‘dataization’ of our bodies, ourselves, and our environment.”

In the context of this posting, some of the key “izes” and “izations” impacting the world of DH include:

  • Digitize and digitization – Converting cultural artifacts (e.g. text documents, paintings, photographs, song lyrics) to digital form.
  • Webize and webization – Conversion to digital form often goes hand-in-hand with the process of adapting digitized artifacts to the Web (or Internet)
  • Dataize and dataization – “…translating digitized cultural artifacts into ‘data’ which captures their content, form and use.”
  • Algorithmize and algorithmization – Converting an informal description of a process or a procedure into an algorithm.
  • (Data or Information) Visualize and visualization – Presenting data or information in a pictorial or graphic form.
  • Artize and artization – Emphasizing the artistic quality of an information or data visualization.

All but the last one of these “izes” is “real,” meaning that I was able to find definitions for the words on the Web.  The last one is sort of a figment of my imagination. I say “sort of” because the phenomenon is real and important, even though the word doesn’t and probably shouldn’t exist.  Lev Manovich  calls this phenomenon “artistic visualization” or visualizations that place a heavy emphasis on their artistic dimension. I plan to cover this dimension in detail in future postings. A couple of places where you can see a number of examples is Manuel Lima’s visualcomplexity.com or Mike Bostock’s D3 gallery.

In a crude sense, the sequence of “izes” listed above provides an outline for doing research in DH. That is to say, for some (pop) cultural phenomenon of interest and its associated artifacts, the artifacts first have to be digitized and maybe webized, then dataized, then algorithmized, and finally visualized (at least in a primitive sense). So, the question is: how do you do this?  Some of the answers are aimed at specific types of cultural phenomenon and artifacts, while others rest on very general frameworks for doing data analysis, data science, or (data) visualization.  The section below briefly discusses a couple of instances of the former, while general frameworks are described in Part II of the post.

Distant Analysis: Reading, Viewing and Listening

Close Reading

Much of the research and analysis conducted in the humanities revolves around the examination of textual information utilizing a method known as close reading. Close reading involves concentrated, thoughtful and critical analysis that focuses on significant details garnered from one text or a small sample of texts (e.g. a novel or collection of novels) with an eye toward understanding key ideas and events, as well as the underlying form (words and structure). While close reading remains the primary method in literary analysis and criticism, it is not without its drawbacks.  Given its attention to detail, as well as its reliance on the skills of the individual reader, close reading makes it difficult: (1) to replicate results; (2) to ascertain general patterns within a single text or among a collection of text; and (3) to generalize findings from the analysis of a single text or a small (non-random) sample of texts to some larger population of which the analyzed text or sample of texts is a small part.

Distant Reading

This is where distant reading can come into play.  The term distant reading was coined by the literary expert Franco Moretti in 2000 to advocate the use of various statistical, data analysis, and visualization techniques to understand the aggregate patterns of large(r) collections of text at a “distance.” Towards this end, he suggested that the (types of) “graphs used in quantitative history, the maps from geography, and trees from evolutionary theory” could serve to “reduce and abstract” the text within the collections of interest and to “place the literary field literally in front of our eyes — and show us how little we know about it” (Moretti 2007). Obviously, these are fighting words to the average humanist, which means that distant reading has an abundance of critics (both literary and otherwise).

Closely related to concept of distant reading is the notion of culturomics. As Wikipedia notes, this is a form of “computational lexicology” (I’ll let you look that one up) that studies human culture and cultural trends by means of the quantitative analysis of words and phrases in a very large corpus of digitized texts. The best known example of culturomics is the Google Labs’ Ngram Viewer project which uses n-grams to analyze the Google Books digital library for cultural patterns in language use over time. Some interesting examples of the types of analysis that can be performed are provided in a research article by Jean-Baptiste Michel (photo#6) and Erez Lieberman Aiden (photo #2) et al. that was entitled “Quantitative Analysis of Culture Using Millions of Digitized Books” and appeared in the January 2011 issue of Science.

While the notions of distant reading and culturomics highlight the need for analyzing larger collections of texts, a number DH research projects conducted in the past 10 years (since the publication of Moretti’s Graphs, Maps and Trees in 2005) have focused on individual texts or smaller samples of texts, employing the analytical and visualization techniques advocated by Moretti to enhance or supplement a close reading exercise. A very recent report by Janicke (photo #1) et al. (2015) surveyed the “close and distant reading visualization techniques” utilized by DH research papers published in key visualization and digital humanities journals from 2005-2014.*  The close reading visualizations used things like color, font size, glyphs or connections lines to highlight various features of the text or the reader’s annotations, while the distant reading visualizations involved the usual suspects –  charts, graphs, maps, networks, etc. Below is a table displaying the relationship between the type(s) of visualization used (close, distant or both) by the size of the text sample being analyzed (single text, small collection, large corpus).

close-distant viz techniques

Among other things, the results indicate that: 1. Although all of these studies fall under the DH umbrella, a sizeable number (%) used either a single text or smaller collection of texts (37 out of 100); 2. Within those studies that employed either a single text or smaller collection of texts, almost half used either distant reading visualization techniques or some combination of both close and distant visualization techniques (18 out of 37).

A Word about Distant Viewing and Listening

Even in humanistic fields outside of literature, like the visual arts where the cultural artifacts of interest (e.g. paintings or sculptures) are non-textual, something akin to close reading and text play critical roles.  That is, single pieces of art, single artists, or even specific styles or movements are the focal point of much of the scholarly research in this area, and much of the scholarly communication is delivered in textual form as “catalogues, treatises, monographs, articles and books.”  Increasingly, the tools and techniques of digital humanities are also being applied to these non-textual areas – what we might call distant viewing and distant listening of much larger, digital collections of art, artists, music and musicians.

In the world of distant viewing, the leading light is Lev Manovich.  Currently, Manovich is a Professor at The Graduate Center, City University of New York (CUNY) and Director of the Software Studies Initiative which has labs at CUNY and the University of California, San Diego (UCSD). His focus, and the focus of the labs, is on the intertwined topics of software (as a cultural phenomenon and artifact) and cultural analytics.  Cultural Analytics is defined as “the use of computational methods for the analysis of massive cultural data sets and flows.” Basically, it’s the subset of Digital Humanities focused on “big (cultural) data,” especially really big image data sets. Manovich and his colleagues at the Software Initiative are very prolific, so it’s hard to pinpoint one article or book that summaries their work.  However, Manovich’s article “Museum without Walls, Art History without Names: Visualization Methods for Humanities and Media Studies” does an excellent job of summarizing a number of his articles dealing with the topics of this post including distant and close reading.

A Final Reference

This post barely touches the world of Digital Humanities and the associated notions of distant and close reading.  For a really good book on these topics, take a look at Matthew Jockers’ Macroanalysis: Digital Methods and Literary History — Topics in Digital Humanities (2013). As the title implies, Jockers prefers the terms microanalysis and macroanalysis (ala Economics)  with a bit of mesoanalysis thrown in between instead of standard terms like close and distant reading. He’s also written a how to book — Text Analysis with R for Students of Literature (Quantitative Methods in the Humanities and Social Sciences) that details how to perform micro-, meso- and macroanalysis of text.

As an aside, Jockers (photo #5) was a colleague of Moretti’s at Stanford University and with Moretti was co-founder and co-director of the Stanford Literary Lab.  Today he is an Associate Professor of English at the University of Nebraska, Lincoln, Faculty Fellow in the Center for Digital Research in the Humanities and Director of the Nebraska Literary Lab.

Resources

References

Aiken, M., E. Aiden* et al. (2011). “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science. [* joint lead authors]

Bostock, M. D3 Gallery.

Burkhardt, John, Word Play.

Lima, M. Visual Complexity.

Jänicke, S. et al. (2015). “On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges.” Proceedings of Eurovisstar 2015.

Jockers, M. (2013) Macroanalysis: Digital Methods and Literary History  – Topics in the Digital Humanities. University of Illinois Press.

Jockers, M. (2014). Text Analysis with R for Students of Literature (Quantitative Methods in the Humanities and Social Sciences). Springer.

Manovich, L. (2013). “Museum without Walls, Art History without Names: Visualization Methods for Humanities and Media Studies.” In Oxford Handbook of Sound and Image in Digital Media. Oxford University Press.

Moretti, Franco (2007). Graphs, Maps and Trees:  Abstract Models for Literary History. London: Verso.

Moretti, Franco (2013). Distant Reading.  London: Verso.

Places

Center for Digital Research, University of Nebraska at Lincoln.

Stanford Literary Lab, Stanford University.

Culturomics.

Software Studies Initiative.