Automatically Encoding Critical Editions of Latin Texts

Virginia K. Felkner

Scholars might want to produce critical editions for the LDLT, but they might lack the technical background and time to learn XML, the standard format for LDLT editions. The goal of my work has been to facilitate the publication of digital editions by automating the XML-encoding process. I have developed a set of four Python scripts, each of which produces a complete TEI XML document from simple input files for a different type of source text: prose, poetry, drama, and mixed matter. Each script takes two inputs: a plain text version of the editor's text, and a spreadsheet version of the critical apparatus. The text input uses the same editorial symbols as traditional critical editions, and the spreadsheet contains the same information in the same order as a printed critical apparatus. Because the purpose of this project is to make it possible to produce digital critical editions without writing code, the input formats for the scripts have been designed to be intuitive for anyone familiar with a critical edition.

The script starts by encoding the edition's main text. It wraps structural elements, such as paragraphs or lines, in the appropriate XML tags. The script automatically numbers elements as it encodes them, and is capable of handling transposed or missing lines in poetry. After the main text has been encoded, the script adds some structural elements (e.g., the TEI header and footer), then it begins working on the critical apparatus. For critical apparatus entries, we start by encoding the lemma and its witnesses and sources. Next, we encode alternate readings and editorial annotations. The script automatically generates a unique XML ID number for each lemma and reading, and it uses the generated IDs to link notes to their appropriate targets.

The script can encode a poem or short prose text in about thirty seconds. As I learned through experience, it would take about 4 hours to encode a comparable text manually. In other words, the script reduces encoding time by a factor of about 400. The script checks the validity of each tag before it is inserted, meaning it is guaranteed to produce well-formed XML. In addition to an XML file, each script outputs a log file that is a concise list of issues encountered during encoding. This makes it easy for editors to focus on the cases that require the application of critical judgment and experience to decide how to represent the information.

Working as an undergraduate researcher on this script development project benefitted me as a classics student. This project gave me the chance to work with real texts, rather than excerpts in textbooks. Additionally, in developing the spreadsheet guidelines and making test spreadsheets from print editions, I understood the purpose of textual criticism more fully. During this process, I learned more than what sigla are or what the difference is between a witness and a source. Through my work for the DLL, I began to appreciate the sophisticated, nuanced arguments that editors make in just a few words in the apparatus. Understanding these arguments has allowed me to break them down, decide on the best way to represent them in XML, and then figure out a way to automatically generate that XML directly from the editor's apparatus. My experience with textual criticism prepared me for an advanced undergraduate seminar in Latin, because understanding how the text may have been corrupted or emended made it easier to understand and translate difficult passages. My work in digital humanities has benefitted me tremendously as an undergraduate classics student, and I hope it will benefit the classics community as much or more.

Virginia K. Felkner

About this Abstract