Skip to main content

(additional presenters: Elton Barker, The Open University, Pau de Soto, The University of Southampton, Rainer Simon, The Austrian Institute of Technology)

What do you do with a Million Links?

What use is Linked Data for the Classics? We use this term as shorthand for a middle ground between structure and scalability. A million links isn’t Big Data by today’s standards; yet, producing and making use of semantic connections between content - i.e. the very business of interpretation - remain arguably the biggest challenges facing the Digital Humanities community. In this paper we use the Pelagios project as means of flagging up the benefits of taking a central path, and some of the issues raised.

Research projects in Classics have produced a critical mass of digital resources that hold the key to a successful Linked Data approach. Perseus has pioneered the encoding of online classical works so that fragments of texts can be canonically cited (Crane et al. 2009). The online ancient world gazetteer, Pleiades (http://pleiades.stoa.org/), allocates Uniform Resource Identifiers (URIs) for each ancient place. Resources such as these are not merely valuable in their own right; they act as essential nodes in an evolving cloud of connections much greater than the sum of its parts. Pelagios (http://pelagios-project.blogspot.co.uk/) is representative of this growing trend, allowing users to link different data (visual and textual, literary and archaeological) that until now have been stored in separate ‘silos’, with only occasional and uni-directional links. Of course, the ability to link is not enough: ‘we need to be clear about why we are linking data, what sort of data we are linking, and our aim in doing so’ (Prescott 2013).

This paper tackles the differing but complementary benefits of statistical and dimensional approaches on the one hand, with graph-based inferencing on the other. The former are exceptionally powerful for crunching big data, producing summary values and identifying patterns following well known distributive laws (Moretti 2005). Yet they less robust when applied to small data sets, and tend to reduce results to just a few key parameters, making it hard to notice phenomena which are not already part of the model. Semantic approaches are much better suited to ‘rich’ data, allowing them to be queried and combined in a more nuanced fashion. Nonetheless, the cost of encoding and normalising such data is extremely high - so high in fact that it can be hard to scale beyond the level at which an individual human can already operate effectively.