Trees into Nets: Network-based Approaches to Ancient Greek Treebanks

Francesco Mambrini and Marco Passarotti

(Additional presenter: Marco Passarotti Università Cattolica del Sacro Cuore, Milan)

"Networks" are rapidly evolving into the dominating model for scientific knowledge. Networks - it has been argued - "will drive the fundamental questions that form our view of the world in the coming era" [1: 7].

Very recently, linguistics too has been touched by this paradigm shift. Graphs based on synonyms or co-occurring words (where words are the nodes, and the proximity in a given text creates links between them) were compared to the complex networks studied in computer sciences, physics or sociology [2]. Yet co-occurrence is only a superficial phenomenon, that hardly account for the structure of a language. By encoding information on the syntactic relations between each word, dependency treebanks can drastically improve the quality of the available resources for network analysis [3, 4, 5].

In our talk, we will apply this new approach to the domain of Ancient Greek literary texts for the first time. We will take our data from the Ancient Greek Dependency Treebank (AGDT) and PROIEL, two dependency-based treebanks that include texts from the Archaic and Classical age with complete morpho-syntactic annotation. Networks where the nodes are represented by the lemmata, and the dependency relations between them are the (directed) edges will be generated from a subset of these collections. Difficult authors such as Sophocles (fig. 1) and Aeschylus (from the AGDT) will be analyzed using the standard metrics that are employed to describe the structure of a network (its topology): average path length, clustering coefficient, and degree distribution [4].

The observations on the tragic poets will be compared with a similar network based on a contemporary prose author (Herodotus), whose text is partially annotated in PROIEL.

We will discuss whether even the difficult text of the Greek tragic poets comply with the model of small-world, highly clustered networks that is commonly observed in physics or sociology [1, 3, 4]. At the same time, our analysis will serve to open other questions that are crucial for the field of Classics and the Humanities. What are the peculiarities of a network representing literary works? What word-classes play the role of the highly connected "hubs"? And ultimately, can this approach tell us anything about the language or the style of a work?

Francesco Mambrini and Marco Passarotti

About this Abstract