[A robotic cat sitting at a table, translating a book and laughing.]

Computational Pun-derstanding

Computer-Assisted Translation of Humorous Wordplay

Learn more

About the Project


Creative language, such as humour and wordplay, is all around us: every day we are amused by clever advertising slogans; our televisions and cinemas play an endless string of eloquent comedies; and literary critics write volumes on the wit of contemporary and classic authors. The ubiquity of creative language, and the constant need for creative professionals to analyze and translate it, would seem to make it a prime candidate for automatic language processing techniques such as machine translation. However, computers have tremendous difficulties in processing the vagaries of creative language. This is because they view anomalies, incongruities, and ambiguities in the input as things that must be resolved in favour of a single “correct” interpretation, rather than preserved and interpreted in their own right. But if computers cannot translate creative language on their own, can they at least provide specialized support to creative professionals, such as human translators of humour and wordplay?


The translation of wordplay is one of the most extensively researched problems in translation studies, but until now it has attracted little attention in the fields of artificial intelligence and language technology. In Computational Pun-derstanding, we will study how professional translators process wordplay, with particular attention to the tools, knowledge sources, and working processes they employ. We will then decompose these processes and look for parts that can be modelled computationally as part of an interactive, computer-assisted translation system. With this “machine-in-the-loop” paradigm, language technology will be applied only to those subtasks it can perform best, such as searching a large vocabulary space for translation candidates matching certain phonetic and semantic constraints. Subtasks that depend heavily on real-world background knowledge—such as selecting the candidate that best fits the wider humorous context—will be left to the human translator. To fulfill this ambitious vision, it will be necessary to develop innovative, interactive techniques for identifying instances of wordplay, interpreting and exploring their semantics, and generating target-language candidates that best preserve the ambiguity and humorousness of the original.


The project's scientific innovation lies in its connection of hitherto separate channels of research: linguistic theories of humour, computational representations and analyses of word meanings, manual translation of wordplay, and computer-assisted translation technologies. Besides providing new insights into the linguistic processes and translation strategies for wordplay, the research has the potential to significantly ease the burdens borne by professional translators in the processing of creative language, fostering creative solutions to unorthodox translation problems.


Principal Investigator

Cooperation Partners

Funding Agency



[The logo of IberLEF2019]

Tristan Miller, Erik-Lân Do Dinh, Edwin Simpson, and Iryna Gurevych.
OFAI–UKP at HAHA@IberLEF2019: Predicting the humorousness of tweets using Gaussian process preference learning.
In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), CEUR Workshop Proceedings, September 2019. To appear.
Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method.
author       = {Tristan Miller and Do Dinh, Erik-L{\^{a}}n and Edwin Simpson and Iryna Gurevych},
title        = {{OFAI}--{UKP} at {HAHA}@{IberLEF}2019: {Predicting} the Humorousness of Tweets Using {Gaussian} Process Preference Learning},
booktitle    = {Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)},
series       = {CEUR Workshop Proceedings},
month        = sep,
year         = {2019},
note         = {To appear},

[The logo of the Association for Computational Linguistics next to the text '57th']

The inability to quantify key aspects of creative language is a frequent obstacle to natural language understanding. To address this, we introduce novel tasks for evaluating the creativeness of language—namely, scoring and ranking text by humorousness and metaphor novelty. To sidestep the difficulty of assigning discrete labels or numeric scores, we learn from pairwise comparisons between texts. We introduce a Bayesian approach for predicting humorousness and metaphor novelty using Gaussian process preference learning (GPPL), which achieves a Spearman's ρ of 0.56 against gold using word embeddings and linguistic features. Our experiments show that given sparse, crowdsourced annotation data, ranking using GPPL outperforms best–worst scaling. We release a new dataset for evaluating humor containing 28,210 pairwise comparisons of 4,030 texts, and make our software freely available.
author       = {Edwin Simpson and Do Dinh, Erik-L{\^{a}}n and Tristan Miller and Iryna Gurevych},
title        = {Predicting Humorousness and Metaphor Novelty with {Gaussian} Process Preference Learning},
booktitle    = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
month        = jul,
year         = {2019},
note         = {To appear},