Benjamin Piwowarski


2023

pdf bib
LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization
Laura Nguyen | Thomas Scialom | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Text Summarization is a popular task and an active area of research for the Natural Language Processing community. By definition, it requires to account for long input texts, a characteristic which poses computational challenges for neural models. Moreover, real-world documents come in a variety of complex, visually-rich, layouts. This information is of great relevance, whether to highlight salient content or to encode long-range interactions between textual passages. Yet, all publicly available summarization datasets only provide plain text content. To facilitate research on how to exploit visual/layout information to better capture long-range dependencies in summarization models, we present LoRaLay, a collection of datasets for long-range summarization with accompanying visual/layout information. We extend existing and popular English datasets (arXiv and PubMed) with layout information and propose four novel datasets – consistently built from scholar resources – covering French, Spanish, Portuguese, and Korean languages. Further, we propose new baselines merging layout-aware and long-range models – two orthogonal approaches – and obtain state-of-the-art results, showing the importance of combining both lines of research.

pdf bib
XPMIR: Une bibliothèque modulaire pour l’apprentissage d’ordonnancement et les expériences de RI neuronale
Yuxuan Zong | Benjamin Piwowarski
Actes de CORIA-TALN 2023. Actes de la 18e Conférence en Recherche d'Information et Applications (CORIA)

Ces dernières années, plusieurs librairies pour la recherche d’information (neuronale) ont été proposées. Cependant, bien qu’elles permettent de reproduire des résultats déjà publiés, il est encore très difficile de réutiliser certaines parties des chaînes de traitement d’apprentissage, comme par exemple le pré-entraînement, la stratégie d’échantillonnage ou la définition du coût dans les modèles nouvellement développés. Il est également difficile d’utiliser de nouvelles techniques d’apprentissage avec d’anciens modèles, ce qui complique l’évaluation de l’utilité des nouvelles idées pour les différents modèles de RI neuronaux. Cela ralentit l’adoption de nouvelles techniques et, par conséquent, le développement du domaine de la RI. Dans cet article, nous présentons XPMIR, une librairie Python définissant un ensemble réutilisable de composants expérimentaux. La bibliothèque contient déjà des modèles et des techniques d’indexation de pointe, et est intégrée au hub HuggingFace.

2022

pdf bib
Choisir le bon co-équipier pour la génération coopérative de texte (Choosing The Right Teammate For Cooperative Text Generation)
Antoine Chaffin | Vincent Claveau | Ewa Kijak | Sylvain Lamprier | Benjamin Piwowarski | Thomas Scialom | Jacopo Staiano
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Les modèles de langue génèrent des textes en prédisant successivement des distributions de probabilité pour les prochains tokens en fonction des tokens précédents. Pour générer des textes avec des propriétés souhaitées (par ex. être plus naturels, non toxiques ou avoir un style d’écriture spécifique), une solution — le décodage coopératif — consiste à utiliser un classifieur lors de la génération pour guider l’échantillonnage de la distribution du modèle de langue vers des textes ayant la propriété attendue. Dans cet article, nous examinons trois familles de discriminateurs (basés sur des transformers) pour cette tâche de décodage coopératif : les discriminateurs bidirectionnels, unidirectionnels (de gauche à droite) et génératifs. Nous évaluons leurs avantages et inconvénients, en explorant leur précision respective sur des tâches de classification, ainsi que leur impact sur la génération coopérative et leur coût de calcul, dans le cadre d’une stratégie de décodage état de l’art, basée sur une recherche arborescente de Monte-Carlo (MCTS). Nous fournissons également l’implémentation (batchée) utilisée pour nos expériences.

2021

pdf bib
Skim-Attention: Learning to Focus via Document Layout
Laura Nguyen | Thomas Scialom | Jacopo Staiano | Benjamin Piwowarski
Findings of the Association for Computational Linguistics: EMNLP 2021

Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2-dimensional position of the words in a document. Our experiments show that Skim-Attention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.

pdf bib
QuestEval: Summarization Asks for Fact-based Evaluation
Thomas Scialom | Paul-Alexis Dray | Sylvain Lamprier | Benjamin Piwowarski | Jacopo Staiano | Alex Wang | Patrick Gallinari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches have so far failed to correlate better than ROUGE with human judgments. In this paper, we extend previous approaches and propose a unified framework, named QuestEval. In contrast to established metrics such as ROUGE or BERTScore, QuestEval does not require any ground-truth reference. Nonetheless, QuestEval substantially improves the correlation with human judgments over four evaluation dimensions (consistency, coherence, fluency, and relevance), as shown in extensive experiments.

pdf bib
Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation
Clement Rebuffel | Thomas Scialom | Laure Soulier | Benjamin Piwowarski | Sylvain Lamprier | Jacopo Staiano | Geoffrey Scoutheeten | Patrick Gallinari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions. Its adaptation to Data-to-Text tasks is not straightforward, as it requires multimodal Question Generation and Answering systems on the considered tasks, which are seldom available. To this purpose, we propose a method to build synthetic multimodal corpora enabling to train multimodal components for a data-QuestEval metric. The resulting metric is reference-less and multimodal; it obtains state-of-the-art correlations with human judgment on the WebNLG and WikiBio benchmarks. We make data-QuestEval’s code and models available for reproducibility purpose, as part of the QuestEval project.

2020

pdf bib
MLSUM: The Multilingual Summarization Corpus
Thomas Scialom | Paul-Alexis Dray | Sylvain Lamprier | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages – namely, French, German, Spanish, Russian, Turkish. Together with English news articles from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.

2019

pdf bib
Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses
Étienne Simon | Vincent Guigue | Benjamin Piwowarski
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Unsupervised relation extraction aims at extracting relations between entities in text. Previous unsupervised approaches are either generative or discriminative. In a supervised setting, discriminative approaches, such as deep neural network classifiers, have demonstrated substantial improvement. However, these models are hard to train without supervision, and the currently proposed solutions are unstable. To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average. These losses improve the performance of discriminative based models, and enable us to train deep neural networks satisfactorily, surpassing current state of the art on three different datasets.

pdf bib
Self-Attention Architectures for Answer-Agnostic Neural Question Generation
Thomas Scialom | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural architectures based on self-attention, such as Transformers, recently attracted interest from the research community, and obtained significant improvements over the state of the art in several tasks. We explore how Transformers can be adapted to the task of Neural Question Generation without constraining the model to focus on a specific answer passage. We study the effect of several strategies to deal with out-of-vocabulary words such as copy mechanisms, placeholders, and contextual word embeddings. We report improvements obtained over the state-of-the-art on the SQuAD dataset according to automated metrics (BLEU, ROUGE), as well as qualitative human assessments of the system outputs.

pdf bib
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
Patrick Bordes | Eloi Zablocki | Laure Soulier | Benjamin Piwowarski | Patrick Gallinari
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Language grounding is an active field aiming at enriching textual representations with visual information. Generally, textual and visual elements are embedded in the same representation space, which implicitly assumes a one-to-one correspondence between modalities. This hypothesis does not hold when representing words, and becomes problematic when used to learn sentence representations — the focus of this paper — as a visual scene can be described by a wide variety of sentences. To overcome this limitation, we propose to transfer visual information to textual representations by learning an intermediate representation space: the grounded space. We further propose two new complementary objectives ensuring that (1) sentences associated with the same visual content are close in the grounded space and (2) similarities between related elements are preserved across modalities. We show that this model outperforms the previous state-of-the-art on classification and semantic relatedness tasks.

pdf bib
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Thomas Scialom | Sylvain Lamprier | Benjamin Piwowarski | Jacopo Staiano
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from sub-optimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compare to ROUGE – with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as reward.