What Can Computers Do for Philology? A Case Study in Pseudo-Seneca

Pramit Chaudhuri and Joseph P. Dexter

This paper outlines a sample of recent computational approaches aimed at aiding traditional interpretive work in literary criticism. Taking the corpus of Senecan tragedies as a case study, the paper demonstrates the power of computation both to identify standard stylistic features far more rapidly than manual methods and to highlight features too small and numerous for human readers to register with any comprehensiveness. As an example of the hermeneutic power of this computational method, we argue that the resulting numerical figures offer a more objective account of the distinctiveness of the two outlying works in the corpus, the Octavia and the Hercules Oetaeus (cf. Billerbeck). In particular, we show how quantitative calculations can 1) contrast the verse style of the Octavia against that of the eight main tragedies, and 2) reveal sound preferences that highlight important aspects of diction in each play.

Stylometry, especially the frequency of word usage or rhetorical figures, has long been used to bolster arguments for attribution or dating (cf. Kitto, Vickers, Holmes). Fitch, for instance, counted sense-pauses in Senecan tragedy to establish a stylistic progression, and hence a relative dating, of the plays. Such work can be accomplished much faster and at larger scale using simple computational methods that have yet to be broadly applied by classicists. Extending the analysis of Fitch and Ferri, we find computationally that the ratio of intra-line to total sense-pauses is lowest in the Octavia, corroborating the view that the author of the Octavia imitated the style of early Seneca. By contrast, instances of enjambment, which occur infrequently and therefore can be tracked by hand more easily than all sense-pauses, do not differentiate the Octavia from the main corpus.

The benefits of such methods go further than automation of existing approaches. Following Michel et al., researchers in literary studies have taken the frequency of the n-gram (blocks of n characters or words) as a unit of analysis across massive corpora, such as the content of Google Books (cf. Hughes et al.). Within the Digital Humanities, the n-gram has typically been used to study large-scale trends in word usage or as an index of stylistic attribution and imitation (e.g., Forstall and Scheirer 2010, Scheirer and Forstall 2014). N-gram analysis has rarely been used, however, to address subtler literary critical questions.

Charting character-level n-grams in the Senecan tragic corpus reveals two important results for the Octavia and Oetaeus. The Octavia shows a higher frequency of the n-grams ‘tri’ and ‘tris’ compared to the other nine plays. Cross-referencing these results with a wordlist of the text reveals frequent usage of parts of tristis (noted by Helm and Ferri) and noster. We argue that both words play an important role in the mood and rhetorical purpose of the work and that the initial n-gram analysis is important in identifying these aspects of the work’s diction. Whereas manual methods of tabulating word frequencies are time-consuming, examination of n-grams instantly highlights the disproportionate presence of the ‘tri/tris’ sound in the Octavia and thereby enables the identification and interpretive contextualisation of the relevant words.

The Oetaeus shows a higher frequency of various n-grams of the form ‘vowel + nt + vowel,’ which points to increased use of the present participle. Plotting the appearance of these n-grams across the work shows a marked preference for clusters of the ‘-nt-’ sound as against the other nine plays. We conclude that the jingle is clearly tolerable to, if not a stylistic preference of, the author of the Oetaeus and in that way distinguishes the work from the rest of the corpus. Once again, it is only a computational method sensitive to small scale elements that can plausibly lead the researcher to these conclusions: the task of counting such units by hand would be both technically challenging and overly time-consuming. This facility, however, has now been made straightforward and should become part of every philologist’s array of tools.

Pramit Chaudhuri and Joseph P. Dexter

About this Abstract