Harald Baayen

Also published as: R. Harald Baayen


2022

pdf bib
Visual Grounding of Inter-lingual Word-Embeddings
Wafaa Mohammed | Hassan Shahmohammadi | Hendrik P. A. Lensch | R. Harald Baayen
Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS)

Visual grounding of Language aims at enriching textual representations of language with multiple sources of visual knowledge such as images and videos. Although visual grounding is an area of intense research, inter-lingual aspects of visual grounding have not received much attention. The present study investigates the inter-lingual visual grounding of word embeddings. We propose an implicit alignment technique between the two spaces of vision and language in which inter-lingual textual information interacts in order to enrich pre-trained textual word embeddings. We focus on three languages in our experiments, namely, English, Arabic, and German. We obtained visually grounded vector representations for these languages and studied whether visual grounding on one or multiple languages improved the performance of embeddings on word similarity and categorization benchmarks. Our experiments suggest that inter-lingual knowledge improves the performance of grounded embeddings in similar languages such as German and English. However, inter-lingual grounding of German or English with Arabic led to a slight degradation in performance on word similarity benchmarks. On the other hand, we observed an opposite trend on categorization benchmarks where Arabic had the most improvement on English. In the discussion section, several reasons for those findings are laid out. We hope that our experiments provide a baseline for further research on inter lingual visual grounding.

2021

pdf bib
Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training
Hassan Shahmohammadi | Hendrik P. A. Lensch | R. Harald Baayen
Proceedings of the 25th Conference on Computational Natural Language Learning

Language grounding aims at linking the symbolic representation of language (e.g., words) into the rich perceptual knowledge of the outside world. The general approach is to embed both textual and visual information into a common space -the grounded space- confined by an explicit relationship. We argue that since concrete and abstract words are processed differently in the brain, such approaches sacrifice the abstract knowledge obtained from textual statistics in the process of acquiring perceptual information. The focus of this paper is to solve this issue by implicitly grounding the word embeddings. Rather than learning two mappings into a joint space, our approach integrates modalities by implicit alignment. This is achieved by learning a reversible mapping between the textual and the grounded space by means of multi-task training. Intrinsic and extrinsic evaluations show that our way of visual grounding is highly beneficial for both abstract and concrete words. Our embeddings are correlated with human judgments and outperform previous works using pretrained word embeddings on a wide range of benchmarks. Our grounded embeddings are publicly available here.

2017

pdf bib
Understanding Idiomatic Variation
Kristina Geeraert | R. Harald Baayen | John Newman
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This study investigates the processing of idiomatic variants through an eye-tracking experiment. Four types of idiom variants were included, in addition to the canonical form and the literal meaning. Results suggest that modifications to idioms, modulo obvious effects of length differences, are not more difficult to process than the canonical forms themselves. This fits with recent corpus findings.

2014

pdf bib
Electrophysiological correlates of noun-noun compound processing by non-native speakers of English
Cecile De Cat | Harald Baayen | Ekaterini Klepousniotou
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)

2002

pdf bib
Experiences from the Spoken Dutch Corpus Project
Nelleke Oostdijk | Wim Goedertier | Frank van Eynde | Louis Boves | Jean-Pierre Martens | Michael Moortgat | Harald Baayen
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Extracting the lowest-frequency words: pitfalls and possibilities
Marc Weeber | Rein Vos | R. Harald Baayen
Computational Linguistics, Volume 26, Number 3, September 2000

1996

pdf bib
Estimating Lexical Priors for Low-Frequency Morphologically Ambiguous Forms
Harald Baayen | Richard Sproat
Computational Linguistics, Volume 22, Number 2, June 1996

pdf bib
The Effects of Lexical Specialization on the Growth Curve of the Vocabulary
R. Harald Baayen
Computational Linguistics, Volume 22, Number 4, December 1996

1991

pdf bib
A Stochastic Process for Word Frequency Distributions
Harald Baayen
29th Annual Meeting of the Association for Computational Linguistics