Skip to main content
Blog: How Much Latin Does ChatGPT “Know”? Patrick Burns Mon, 07/31/2023 - 15:02

Why is ChatGPT — OpenAI’s chatbot-style large language model (LLM) and focus of recent artificial-intelligence buzz — good at so many Latin tasks? Wait, is it good at Latin? For those of you who have yet to kick its rotas, the answer is decidedly yes! It can correct errors in sentences. It will write short Latin stories and then produce reading comprehension sentences (in Latin!) based on that story. It will lemmatize sentences and provide part-of-speech tags.

 A screenshot of a Latin grammar question and answer in ChatGPT
Figure 1. An example of ChatGPT correcting the grammar of a simple Latin sentence.

Are the results perfect? Hardly. But often they are good, or at least good enough. It would take more than a blog post to dive deeply into the mechanics of LLMs, but there is a quick way into a discussion of their effectiveness. ChatGPT is good at Latin because ChatGPT has seen a lot of Latin. In a recent talk at CANE, I asked “How much Latin does ChatGPT ‘know’?,” and in today’s post, I take this quantitative question at face value.

A circle chart in various shades of green showing a small, yellow circle labeled "Catullus tokens" contained within a much larger turquoise circle labeled "GPT-3 Latin Tokens"
Fig. 2: A nested proportional area chart based on the list of Latin works+word counts. The smallest, yellowest circle represents Catullus’ 15k words, while the largest, greenest circle represents the 339.1M tokens that may be in the ChatGPT training data.

When we start talking about millions of Latin words, we already have some sense of what is possible. I trained a Latin model for the NLP platform spaCy on a little less than a million words: this model can predict lemmas with around 94% accuracy and part-of-speech tags with around 97% accuracy. Good performance, and on a small fraction of the data available to GPT-3.

Review: The Duolingo Latin Course Ashley Francese Fri, 07/31/2020 - 07:06

After many years of offering free language courses to students of popular modern languages such as French, Spanish, Chinese, and German, and to people interested in learning rather more obscure languages such as Esperanto, Klingon, High Valyrian, and Navajo, Duolingo added a Latin course. The course was prepared for Duolingo by the Paideia Institute and was road tested by a group of Duolingo learners before it was made available to the general public.