Skip to main content

In this talk I will share my experiences as a classical philologist who has learned LDA- topic modelling in order to apply this method to collaborative work with a number of classical languages, including Latin, Greek, Classical Arabic, Persian and Sanskrit.

While I will give an accessible introduction to LDA-topic modelling itself, I will also showcase a subset of the results obtained applying this method to the Corpus Platonicum during my residency at the Center of Hellenic studies (CHS).

The Corpus Platonicum is one of the most well-known and influential ancient works. Tracing its ideas through two millennia of Greek can be a daunting task that not only requires intimate knowledge of the over 500,000-words-long Corpus Platonicum, but also the reading and manual analysis of several-hundred-million words of Greek.

Philology is the art of reading slowly and I hope to demonstrate that while this is a challenge when facing large corpora, topic modelling can help us to regain this quality by finding the passages that we really want to read. Furthermore, the topic modelling method can now be employed to disclose complex patterns and intratexuality in the Corpus Platonicum and can also help train machines to detect platonic thinking in a huge corpus of unclassified ancient Greek text.

The Corpus Platonicum is available through the Perseus Digital Library (PerseusDL) at Tufts University and the Open Greek and Latin project (OGL) at the University of Leipzig. Within the last 30 years PerseusDL has digitised and curated the preponderance of the Corpus Platonicum and in a move to a CTS-inspired citation, OGL has performed several transformations of the data. In collaboration with CHS, the Definitiones of the Corpus Platonicum have also been made machine-actionable. Now is the first time in history that it will be possible to use computational analyses for the whole Corpus Platonicum.

Building on prior topic-modelling and on traditional philological training, I’d like to present a subset of the results of the first thorough multi-method computational analysis of the whole Corpus Platonicum. I will place a particular emphasis on the application of LDA topic modelling to detect verbatim and non-verbatim re-use and reception of platonic thoughts in ancient texts.