Over the last few years, there has been a growing public and enterprise interest in 'social media' and their role in modern society. At the heart of this interest is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, social networking sites etc. The volume and variety of user-generated content (UGC) and the user participation network behind it are creating new opportunities for understanding web-based practices and building socially intelligent and personalized applications. The goals for our workshop are to focus on sharing research efforts and results in the area of understanding language usage on social media.
While there is a rich body of previous work in processing textual content, certain characteristics of UGC on social media introduce challenges in their analyses. A large portion of language found in UGC is in the Informal English domain — a blend of abbreviations, slang and context specific terms; lacking in sufficient context and regularities and delivered with an indifferent approach to grammar and spelling. Traditional content analysis techniques developed for a more formal genre like news, Wikipedia or scientific articles do not translate effectively to UGC. Consequently, well-understood problems such as information extraction, search or monetization on the Web are facing pertinent challenges owing to this new class of textual data.
Workshops and conferences such as the NIPS workshop on Machine Learning for Social Computing, the International Conference on Social Computing and Behavioral Modeling, the Workshop on Algorithms and Models for the Web Graph, the International Conference on Weblogs and Social Media, the Workshop on Search on Social Media, the Workshop on Social Data on the Web etc., have focused on a variety of problem areas in Social Computing. Results of these meetings have highlighted the challenges in processing social data and the insights that can be garnered to complement traditional techniques (e.g., polling methods).
The goal of the workshop we propose is to bring together researchers from all of these areas but, in contrast to the above conferences and workshops, with a focused goal on exploration of characteristics and challenges associated with language on this evolving digital platform. We believe that the proposed workshop can serve as a focused venue for the linguistics community around the topic of language in social media.
Call For Papers
We invite original and unpublished research papers on all topics related to the intersection of computational linguistics and language in social media, including but not limited to the sample topics below. Note that we will also consider submissions on email corpora, with the caveat that the research should be generalizable or emphasize cross-applicability to web-based public social media.
The following is a list of possible topics that may be covered in contributions to this workshop:
- What are people talking about?
What are the Named Entities and topics that people are making references to?
What are effective summaries of volumes of user comments around a news-worthy event that offer a lens into the society's perceptions?
How are cultures interpreting any situation in local contexts and supporting them in their variable observations on a social medium?
- How are they expressing themselves?
What do word usages tell us about an active population or about individual allegiances or non-conformity to group practices?
Are we seeing differences in how users self-present on this new form of digital media?
Can groups of users be described in terms of their language use (e.g. stylistic properties)?
- Why do they scribe?
What are the diverse intentions that produce the diverse content on social media?
Can we understand why we share by looking at what we predominantly do with the medium? What emotions are people sharing about content?
How are community structures and roles evidenced via language usage? Can content analysis shed more light on network properties of community such as link-based diffusion models?
- What level of linguistic analysis is possible/necessary in a noisy medium such as social media?
How can existing analysis techniques be adapted to this medium?
- Language and network structure: How do language and social network properties interact?
What properties of a network (structural connections) or the participants (personalities, influencers, followers) correlate with which properties of the language used?
- Semantic Web / Ontologies / Domain models to aid in social data understanding:
Given the recent interest in the Semantic Web and LOD community to expose models of a domain, how can we utilize these public knowledge bases to serve as priors in linguistic analysis?
Meena Nagarajan (IBM Almaden)
Sara Owsley Sood (Pomona College)
Michael Gamon (Microsoft Research)
John Breslin (U of Galway)
Cindy Chung (UTexas)
Munmun De Choudhury (Arizona State University)
Cristian Danescu-Niculescu-Mizil (Cornell)
Susan Dumais (Microsoft Research)
Jennifer Foster (Dublin City University)
Daniel Gruhl (IBM)
Kevin Haas (Microsoft)
Emre Kiciman (Microsoft Research)
Nicolas Nicolov (Microsoft)
Daniel Ramage (Stanford)
Alan Ritter (University of Washington)
Christine Robson (IBM)
Hassan Sayyadi (University of Maryland)
Valerie Shalin (Wright State)
Amit Sheth (Wright State)
Ian Soboroff (NIST)
Scott Spangler (IBM)
Patrick Pantel (Microsoft Research)
Andrew Gordon (USC)
Georgia Koutrika (IBM)
Hyung-il Ahn (IBM)
Smaranda Muresan (Rutgers)