Skip to main content

April 24, 2017

How do we support those who wish to push beyond what they can learn from the languages that they know? New developments in Digital Humanities offer some intriguing avenues for dealing with scholarly material in unfamiliar languages, even if present achievements only highlight more challenges. In the following visualization, David Mimno of Cornell and Thomas Koentges of Leipzig have identified recurring clusters of words in a collection of Greek Christian Church Fathers. The works of these men were produced over more than a thousand years and amount to more than 30 million words. I do not think many specialists in Christian Church history have read this entire corpus, and I do not believe that any human being has ever been able to read a collection this large critically—it is just too big.

But what about scholars in, say, Iran who wish to understand how Christian thought evolved and wish to do so by analyzing the actual textual data for themselves? The list of words above are all in Greek and only a tiny fraction of the 30+ million-word collection is available in any one modern language translation. How do scholars in Iran work with this data? Do they simply depend upon the articles and books of their colleagues, who are experts in patristic Greek?

Consider an example that offers different linguistic challenges.

The visualization above shows where one book quotes another and the quote has been automatically detected. We are able to scan millions of documents and see which parts of which texts are quoted at different points of time. You do not have to be a Digital Humanist, or even fond of technology, to recognize the importance of a system that can tell you which parts of the plays of Shakespeare, or which Suras from the Koran are quoted in subsequent literature.

But the results above are, of course, in Arabic and reflect the development of Islamicate culture. How do we support people who are not specialists in Classical Arabic, but who also want to go beyond what they can read in articles and books composed in the European languages that they do understand?

Individuals are already exploiting technology in novel ways to open up literary sources to new audiences. In the following Youtube page we find a poem by Rumi that is read aloud with an accompanying textual transcription and translation into another modern language (in this case, English).

This website uses the addition of sound to surpass the possibilities of traditional editions with a translation on the opposite page to the source text.

With rich linguistic annotation and a basic understanding of grammar, we can do quite a bit with texts in a language that we do not understand. The Leipzig Glossing Rules are designed for linguists who may have to work with hundreds and even thousands of languages; they provide one framework for such annotation.

The Alpheios Project has begun to provide this level of annotation with digital texts. In the first visualization, a reader mouses over a word in the Greek and the corresponding English is highlighted.

Here, the same data is visualized as an interlinear translation, with the English below the Greek.

Here we add morphological information, as well as a brief dictionary entry.

We might at this point choose to examine the grammatical paradigm to which this particular word belongs:

or, we might explore the (in this case) brief dictionary entry that goes somewhat beyond the short definition initially offered.

Alternatively, we could shift to syntax and see the precise role that the highlighted word plays in this sentence.

[pullquote]There is no substitute for fluency or for the sensibilities that we cultivate when we study a language, literature and culture over years and decades, but if we add annotations such as those above, then we are able to do a quite a bit more with a source text than we ever could in print culture.[/pullquote]

Consider the problem of relying upon a modern language translation. English translations conventionally translate the Greek words erôs and agapê with the same English word, “love.” The Greek word erôs designates physical desire and is common in Plato. By contrast, agapê is common in the Christian New Testament and designates a kind of idealized Christian love that may be difficult to describe but that is definitely not sexual. By tracking how English translations represent Greek words using the Perseus Dynamic Lexicon, we can begin to push beyond the flat, previously impenetrable surface of the translation and to see the language beyond.

In the examples above, we compare the noun erôs and agapaô, the verb related to agape. If readers go on and look at a half-dozen passages in Plato and a half-dozen passages in the New Testament, the difference between erôs and agapê will quickly emerge. I do this regularly with students who know no Greek. These students learn two lessons: first, they learn about the semantics of two particular Greek words; second, they realize that they have the power to engage directly with a language that they have never studied and to form their own conclusions, however provisional these conclusions may be.

The examples above illustrate new ways in which we can interact with source materials in an unfamiliar language. Most of the work on which we build comes from corpus and computational linguistics but the applications to broader textual analysis should be clear—especially for the study of short, dense compositions (such as poems of Rumi or Hafez) to which readers traditionally devote a great deal of time (whether they are Anglophones reading poems by Keats or Persian-speakers reading Hafez). Serious students will learn to compare multiple passages and multiple poems, and will begin to detect some of those associations between poems or variations on standard patterns that characterize literature in any language.

(Header Image: “Babel” by Chris Murtagh via Flickr.com. Licensed under CC BY 2.0.)


Authors

Gregory Crane is Winnick Family Chair of Technology and Entrepreneurship at Tufts University, Alexander von Humboldt Professor of Digital Humanities at University of Leipzig, and editor-in-chief of the Perseus Project. He has published on a wide range of ancient Greek authors and has a long-standing interest in the relationship between the humanities and digital technology. gregory.crane@tufts.edu