Linking, publishing and evaluating language resources: The “LiLa: Linking Latin” project

Francesco Mambrini

Semantic-Web technologies and Linked-Open-Data (LOD) have considerably affected the way data are published on the web. The adoption of LOD has important consequences also on the question of how digital scholarship is used and evaluated. By aligning data on common vocabularies and ontologies, the LOD paradigm introduces a form of standardization between projects, thus allowing for a more informed comparison. Moreover, interoperability makes projects easier to update and more discoverable, and thus more open to be used and debated by a larger community. On the other hand, Linked Data rest on the principle that anyone can make any assertion about any resource: the Resource Description Framework (RDF) on which Linked Data are based “does not prevent anyone from making assertions that are nonsensical or inconsistent with other statements, or the world as people see it” [1].

Recently, digital projects in Classics have productively adopted the LOD paradigm, in particular in Archaeology, Epigraphy and Ancient History (see e.g. Arachne [2], EAGLE [3][4], and Pelagios [5]). Using knowledge bases like Pleiades [6] or SNAP:DRGN [7], researchers can now connect all sorts of different data that make reference to locations or persons.

In spite of the growing interest in LOD in both Classics and Computational Linguistics [8], linguistic and textual resources for ancient languages remain largely scattered and unconnected. While several corpora with linguistic annotation (including lemmatization, PoS-tagging, and syntax) or lexical resources like dictionaries and thesauri exist, no infrastructure is in place to connect those data in a unified architecture based on LOD.

In this presentation, we intend to discuss these questions starting from our experience in the LiLa project. LiLa aims to create an open-ended, lexically-based Knowledge Base for Latin using the Linked-Data paradigm. LiLa will enable users to exploit the wealth of linguistic resources for Latin assembled so far. At the same time, the project creates a space for newly published digital resources to interact with other projects all over the web. After a short introduction on LiLa’s architecture and the resources that we plan to connect, we intend to discuss the challenges and opportunities that Semantic-Web technologies pose to how we publish, use and evaluate data related to the ancient languages. Particularly relevant to the topic of evaluation of digital works is the aforementined potential trade-off between increased publicity (as mentioned above) and lower control over data quality, which the very same principle of openness in LOD may produce.

References

[1] https://www.w3.org/TR/rdf-concepts/#section-anyone

[2] https://arachne.dainst.org/

[3] https://www.eagle-network.eu/

[4] Orlandi, Silvia. 2016. Ancient Inscriptions between Citizens and Scholars: The Double Soul of the EAGLE Project. In: M. Romanello, and Bodard G. (edd.), Digital Classics Outside the Echo-Chamber. London: Ubiquity Press. DOI: https://doi.org/10.5334/bat.l

[5] http://commons.pelagios.org/

[6] https://pleiades.stoa.org/

[7] http://snapdrgn.net/

[8] Chiarcos, Christian, Sebastian Nordhoff, and Sebastian Hellmann (edd.). 2012. Linked data in linguistics: representing and connecting language data and language metadata.

Francesco Mambrini

About this Abstract