Skip to main content

We present a join-many task language model (Godwin et al. 2016; Sgaard and Goldberg 2016; Hashimoto et al. 2016), tailored to the literary character of Ancient Greek, its rich morphology and complex syntactic structures. In this way we can formalize the intuitive stylistic and syntactic differences across authors and genres, and potentially map the transmission history of specific texts. More specifcally, the joint many-task language model presented here is trained on part of speech prediction as the most junior, word-level task, on translation from Greek to English, as the intermediate sentence-level task, inducing simultaneously the latent dependency tree that describes the source sentence's syntactic structure, and, lastly, on author attribution as the document-level task. The syntactic structure of the source sentence is represented in the form of a tree whose nodes are the words of the sentence and whose arcs are directed and denote head-dependent relationships between words. The tree is induced without annotated training data as in unsupervised dependency grammar induction tasks (Klein and Manning 2004; Tran and Bisk 2018). Recent work on the evaluation of the latent dependency trees in neural machine translation suggests that unsupervised latent representations of syntactic dependency yield better results than similar models in which the dependency trees are either provided as input or predicted using supervision from treebank annotations (Williams et al. 2018, Havrylov et al. 2019). Our joint many-task model is trained hierarchically according to the different tasks, ensuring the potential information between the tasks by penalizing representations of junior tasks that make predictions that are inconsistent with senior tasks.

Syntactic dependency trees can also be predicted with a supervision signal as part of the intermediate sentence-level task of the joint many-task model. Using the alternative many-task model architectures (the latent parser and the supervised sentence-level task parser), we can make interesting comparisons with the learned unsupervised latent dependency trees and analyze the discrepancy between their corresponding grammars. In addition, we can compare the latent dependency representations across various documents corresponding to authors writing at different times, in di erent dialects, geographical locations, and genres. The learned syntactic representations of the sentences can be aggregated to form structure-aware document representations which can be used as features for the senior classification task of authorship attribution of the documents.