Crowdsourcing a Large Corpus of Clickbait on Twitter

Martin Potthast; Tim Gollub; Kristof Komlossy; Sebastian Schuster; Matti Wiegmann; Erika Patricia Garces Fernandez; Matthias Hagen; Benno Stein

Crowdsourcing a Large Corpus of Clickbait on Twitter

Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen, Benno Stein

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Clickbait has become a nuisance on social media. To address the urging task of clickbait detection, we constructed a new corpus of 38,517 annotated Twitter tweets, the Webis Clickbait Corpus 2017. To avoid biases in terms of publisher and topic, tweets were sampled from the top 27 most retweeted news publishers, covering a period of 150 days. Each tweet has been annotated on 4-point scale by five annotators recruited at Amazon’s Mechanical Turk. The corpus has been employed to evaluate 12 clickbait detectors submitted to the Clickbait Challenge 2017. Download: https://webis.de/data/webis-clickbait-17.html Challenge: https://clickbait-challenge.org

Anthology ID:: C18-1127
Volume:: Proceedings of the 27th International Conference on Computational Linguistics
Month:: August
Year:: 2018
Address:: Santa Fe, New Mexico, USA
Editors:: Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1498–1507
Language:
URL:: https://aclanthology.org/C18-1127/
DOI:
Bibkey:
Cite (ACL):: Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen, and Benno Stein. 2018. Crowdsourcing a Large Corpus of Clickbait on Twitter. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1498–1507, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Crowdsourcing a Large Corpus of Clickbait on Twitter (Potthast et al., COLING 2018)
Copy Citation:
PDF:: https://aclanthology.org/C18-1127.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{potthast-etal-2018-crowdsourcing,
    title = "Crowdsourcing a Large Corpus of Clickbait on {T}witter",
    author = "Potthast, Martin  and
      Gollub, Tim  and
      Komlossy, Kristof  and
      Schuster, Sebastian  and
      Wiegmann, Matti  and
      Garces Fernandez, Erika Patricia  and
      Hagen, Matthias  and
      Stein, Benno",
    editor = "Bender, Emily M.  and
      Derczynski, Leon  and
      Isabelle, Pierre",
    booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
    month = aug,
    year = "2018",
    address = "Santa Fe, New Mexico, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/C18-1127/",
    pages = "1498--1507",
    abstract = "Clickbait has become a nuisance on social media. To address the urging task of clickbait detection, we constructed a new corpus of 38,517 annotated Twitter tweets, the Webis Clickbait Corpus 2017. To avoid biases in terms of publisher and topic, tweets were sampled from the top 27 most retweeted news publishers, covering a period of 150 days. Each tweet has been annotated on 4-point scale by five annotators recruited at Amazon{'}s Mechanical Turk. The corpus has been employed to evaluate 12 clickbait detectors submitted to the Clickbait Challenge 2017. Download: \url{https://webis.de/data/webis-clickbait-17.html} Challenge: \url{https://clickbait-challenge.org}"
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="potthast-etal-2018-crowdsourcing">
    <titleInfo>
        <title>Crowdsourcing a Large Corpus of Clickbait on Twitter</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Martin</namePart>
        <namePart type="family">Potthast</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Tim</namePart>
        <namePart type="family">Gollub</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kristof</namePart>
        <namePart type="family">Komlossy</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sebastian</namePart>
        <namePart type="family">Schuster</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Matti</namePart>
        <namePart type="family">Wiegmann</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Erika</namePart>
        <namePart type="given">Patricia</namePart>
        <namePart type="family">Garces Fernandez</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Matthias</namePart>
        <namePart type="family">Hagen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Benno</namePart>
        <namePart type="family">Stein</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 27th International Conference on Computational Linguistics</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Emily</namePart>
            <namePart type="given">M</namePart>
            <namePart type="family">Bender</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Leon</namePart>
            <namePart type="family">Derczynski</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Pierre</namePart>
            <namePart type="family">Isabelle</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Santa Fe, New Mexico, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Clickbait has become a nuisance on social media. To address the urging task of clickbait detection, we constructed a new corpus of 38,517 annotated Twitter tweets, the Webis Clickbait Corpus 2017. To avoid biases in terms of publisher and topic, tweets were sampled from the top 27 most retweeted news publishers, covering a period of 150 days. Each tweet has been annotated on 4-point scale by five annotators recruited at Amazon’s Mechanical Turk. The corpus has been employed to evaluate 12 clickbait detectors submitted to the Clickbait Challenge 2017. Download: https://webis.de/data/webis-clickbait-17.html Challenge: https://clickbait-challenge.org</abstract>
    <identifier type="citekey">potthast-etal-2018-crowdsourcing</identifier>
    <location>
        <url>https://aclanthology.org/C18-1127/</url>
    </location>
    <part>
        <date>2018-08</date>
        <extent unit="page">
            <start>1498</start>
            <end>1507</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Crowdsourcing a Large Corpus of Clickbait on Twitter
%A Potthast, Martin
%A Gollub, Tim
%A Komlossy, Kristof
%A Schuster, Sebastian
%A Wiegmann, Matti
%A Garces Fernandez, Erika Patricia
%A Hagen, Matthias
%A Stein, Benno
%Y Bender, Emily M.
%Y Derczynski, Leon
%Y Isabelle, Pierre
%S Proceedings of the 27th International Conference on Computational Linguistics
%D 2018
%8 August
%I Association for Computational Linguistics
%C Santa Fe, New Mexico, USA
%F potthast-etal-2018-crowdsourcing
%X Clickbait has become a nuisance on social media. To address the urging task of clickbait detection, we constructed a new corpus of 38,517 annotated Twitter tweets, the Webis Clickbait Corpus 2017. To avoid biases in terms of publisher and topic, tweets were sampled from the top 27 most retweeted news publishers, covering a period of 150 days. Each tweet has been annotated on 4-point scale by five annotators recruited at Amazon’s Mechanical Turk. The corpus has been employed to evaluate 12 clickbait detectors submitted to the Clickbait Challenge 2017. Download: https://webis.de/data/webis-clickbait-17.html Challenge: https://clickbait-challenge.org
%U https://aclanthology.org/C18-1127/
%P 1498-1507

Download as File

Markdown (Informal)

[Crowdsourcing a Large Corpus of Clickbait on Twitter](https://aclanthology.org/C18-1127/) (Potthast et al., COLING 2018)

Crowdsourcing a Large Corpus of Clickbait on Twitter (Potthast et al., COLING 2018)

ACL

Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen, and Benno Stein. 2018. Crowdsourcing a Large Corpus of Clickbait on Twitter. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1498–1507, Santa Fe, New Mexico, USA. Association for Computational Linguistics.