Skip to main content

August 21, 2017

Latinists enjoy ready access to online texts collected under names like Perseus, PHI, and the Latin Library, collections which are now as much a fixture of scholarly workflows as OCTs, Teubners, and Loebs. Descriptive data and statistics about these texts are harder to find. How many times does Lucretius use the future imperative? How many ablatives absolute are there in Cicero’s De amicitia? Where does ensis appear in Caesar’s writings? (Answers at the end of this post.) Opera Latina is a search interface from the Laboratoire d’Analyse Statistique des Langues Anciennes (LASLA) at the University of Liège that draws on over five decades of linguistic research on Latin literature to return the sort of descriptive details posed in the questions above. The database currently includes 154 works from 19 authors: Caesar, Cato, Catullus, Cicero, Horace, Juvenal, Lucretius, Ovid, Persius, Petronius, Plautus, Pliny the Younger, Propertius, Quintus Curtius Rufus, Sallust, Seneca the Younger, Tactius, Tibullus, and Vergil. For these authors, the majority of their output is available to be searched. There are in all 1,630,825 words. The most prominent gaps include Cicero’s correspondence, Ovid’s Metamorphoses, and all but six of Plautus’s comedies.

Every word in the corpus has been annotated with the following information: the lemma, or dictionary head word (following Forcellini’s 1864 Lexicon totius latinitatis); the form of the word as it appears in the text; a citation with the word’s location in the text; the word’s morphology; and its subordinating syntax. Records are also flagged to distinguish ambiguous forms, mark proper nouns, and call attention to notable miscellany.

Sample records from LASLA’s annotation of the beginning of Tacitus’s Annales.

LASLA researchers have been annotating their texts with the same method since 1962, and their results have a long print history in indices verborum or concordances and lexica that have been published over the last century through Centre Informatique de Philosophie & Lettres and G. Olms. This consistency in method, the hand-curated and hand-checked nature of the annotations, and the longevity of the LASLA project speak to high accuracy. The quality of the LASLA data is undoubtedly the strongest aspect of Opera Latina.

The interface has three parts: corpus selection, lemma search, and options for restricting searches by syntax and morphology.

The main Opera Latina interface.

At the top, users will find an option to select a corpus for their searches. By default one searches all authors and works; selecting a subset is as easy as clicking on the green bar marked “Corpus selection (click here)” and ticking a few checkboxes.

List of authors and work in the Opera Latina Corpus Selection.

Under corpus selection is the humble centerpiece of the interface: a textbox where the user enters the lemma. The box has a kind of autocomplete for lemmata: after typing the first letters of a word, a dropdown list of potential matches appears. For most searches, this autocomplete is not necessary, but for ambiguous lemmata, like “CVM [1] - adv.”, “CVM [2] - prép.”, and “CVM [3] - conj. sub.,” it appears to be the only way to differentiate them.[1] Retrieval is snappy; no search I ran took more than a second or two.

Example of autocomplete in lemma search for cum.

The third part of the search interface consists of a dropdown list and a series of radio buttons that limit searches by morphological or syntactical details. The morphological options include part of speech; declension or conjugation category; gender, number, and case for nouns; and person, number, tense, mood, and voice for verbs. Options for searching by subordinating syntax can be found on a dropdown menu labeled “Subordination Code” and include dozens of words that introduce subordinating clauses (e.g. cum, si, quotusquisque) or in some cases an independent grammatical structure (e.g., ablatif absolu, surprisingly in French unlike the rest of the site[2]). This part of the interface takes up a disproportionate amount of screen space and can be cumbersome to use; the list displays especially poorly on mobile devices. Despite the large number of parameters, there is no accompanying documentation online to explain them. The closest thing to documentation that exists is Joseph Denooz’s 2004 article, Opera latina: le site Internet du LASLA.” It suffices; Opera Latina 2004 appears to be substantially similar to Opera Latina 2017, cosmetic differences aside. Fortunately, the interface is simple enough that the features are understandable with a bit of trial and error.

Search results appear below the interface and display search parameters, the number of matches, and a paginated web table with the following columns: author, work, citation, form, and a link for more context. Clicking on “View Context” brings up the search term, displayed in the sentence in which it appears and highlighted in yellow. The sentences immediately before and after are also shown. There is no option to view more context, which seems artificially constrained, considering the widespread availability of these texts online. It is easy enough, though obviously not as convenient, to cut and paste the Opera Latina-generated context into another site for more information. I found myself doing this often.

Results of a search for the conjunction et in Caesar’s Bellum Gallicum.

Opera Latina can export results, even if the export options are a bit lacking. Clicking on the button “Export in printable format” presents the results in a new HTML page with line breaks. The results are structured—that is, author, work and citation are set off with a hyphen and the form is set off by a colon (e.g. “CAESAR - Commentarii Belli Civilis - 1,2,2 : et”). Thus, an exported file could be exported again to Excel and converted with the Text to Columns function. A savvy scripter could convert the results to a .csv file from the command line. But these steps feel like hoops to jump through in order to make the Opera Latina data research-friendly. More robust options for export would perhaps be the single most helpful improvement that the LASLA developers could make to future versions of Opera Latina.

Exported results for the conjunction et in Caesar’s Bellum Gallicum.

Two final notes on data export. The exported results contain no indication of the parameters used to generate this data. A header showing the parameters of a given search would be extremely useful, especially if one wanted to string together the results of multiple searches. Secondly, a minor point, but one that will gain in importance as researchers in Classics become more attuned to the need for replicability in their research: there is no version control, at least none made publically obvious, nor is there a version history. This goes for the interface and the data. If you plan to use Opera Latina data in your research, you should want to know that you or a colleague or perhaps a diligent peer reviewer could rerun a search and return the same results. A more explicit record of changes and access to earlier version would help address this.

LASLA has a long tradition of producing database-driven lexical research, beginning in 1961 with Étienne Évrard and Louis Delatte among others. They consider part their mission to be “the establishment of literary databases and informatics tools aiming at their distribution and use in all available media.” While they began with and continue to release print concordances, LASLA brought their database work to the public with a CD-ROM edition in 1995–1996. The first iteration of the project was brought online in 2004 with Opera Latina. They continue to innovate with Hyperbase, another LASLA search interface that extends the flexibility and scope of Latin corpus searches to include collocations and more complicated morphological combinations (e.g. the ability to search for all plural nouns followed by an adjective) among other features. What has never been publically available in a machine-actionable format, however, is the data itself.

And with LASLA, the data is the main event. I understand that curated data is expensive and what LASLA offers through Opera Latina should be appreciated for what it is: the culmination of over five decades of data collected, annotated, reviewed, and published, that is, data that has been subject to exacting philological standards. Does this reviewer wish that this data was open data? Yes. Does this reviewer understand the enormous scholarly labor and deep, long-running institutional investment that LASLA has made in creating this data and for that reason their instinct to protect it? Also, yes. Nevertheless, I still hope that a better balance can be struck between the two extremes in the near future so that this rich source of data can find a larger, more engaged audience through increased access.

[pullquote]Opera Latina can return a wide variety of answers to lexical, morphological, and syntactical questions about Latin literature that would be otherwise far more labor-intensive to produce.[/pullquote] Search by subordinating syntax seems to be a particularly beneficial feature and one that is not readily available elsewhere in a convenient interface. If I were writing a Latin textbook, or even a series of classroom exercises, it would be a godsend to discover a way to generate in a matter of seconds lengthy lists of conditionals, relative clauses, supines, and so on. Better documentation, including more examples and use cases, would go a long way in helping the site find a larger audience, as would a push to bring its interface design more in line with what web users expect in 2017. But the superb quality of the data, combined with the relative ease and speed with which you can return highly specific corpus searches, make Opera Latina a worthwhile addition to the Latinist’s digital toolkit.

(Answers to question in the first paragraph: Lucretius uses the future imperative seven times in De rerum natura, there are 27 ablative absolutes in De amicitia, and—trick question—ensis does not appear in Caesar’s Commentarii)


Metadata

Title: Opera Latina

Description: Web interface for retrieving lexical, morphological, and syntactical data about Latin literature

URL: http://web.philo.ulg.ac.be/lasla/description-opera-latina/?

Name: D. Longrée (Project Leader); G. Purnelle (Associate Project Leader); L. Simon (Lead Programmer)

Publisher: Laboratoire d’Analyse Statistique des Langues Anciennes (LASLA); Centre Informatique de Philosophie & Lettres (CIPL)

Place: Université de Liège

Date Created: 2004–2017 (work ongoing)

Date Accessed: June 12-15, 2017

Availability: Free

Rights: Copyright LASLA-CIPL 2014

Classification: databases, Latin, lemmatization, morphology, reference materials, syntax

(Header Image: "Sheet 1: Soldiers carrying banners depicting Julius Caesar's triumphant military exploits", from The Triumph of Julius Caesar, Andrea Andreani (1558/1559–1629). Metropolitan Museum of Art 22.73.3-9. Licensed under CC0 1.0. Public Domain.)


[1] Why this disambiguation cannot be done with a single lemma search for “CVM” and choosing, say, “Adverb” from the “Category” option is not clear. Nor is it clear why a search for plain “CVM” returns results for “CVM [2] - prép.” rather than the first lemma entry.

[2] The online description, and the majority of related LASLA research are in French; Opera Latina itself is almost entirely in English.


Authors

Patrick J. Burns is a Postdoctoral Fellow at the Quantitative Criticism Lab at the University of Texas at Austin where he works on large-scale computational literary criticism. He formerly worked as an Assistant Research Scholar at NYU's Institute for the Study of the Ancient World. Patrick received his PhD from Fordham University in 2016 writing about the influence of Latin love elegy on later epic. He is a contributor to the Latin language resources at the Classical Language Toolkit with a focus on automated lemmatization. Lastly, he writes about Latin and digital philology on Twitter at @diyclassics.