Workshop on Information Extraction & Entity Analytics on Social Media Data

Event Notification Type: 
Call for Papers
Abbreviated Title: 
VMCC, IIT Bombay
Sunday, 9 December 2012
Ganesh Ramakrishnan
Ajay Nagesh
Submission Deadline: 
Sunday, 30 September 2012

Workshop on Information Extraction & Entity Analytics on Social Media Data

9th December 2012, Indian Institute of Technology Bombay, Mumbai, India

(collocated with 24th International Conference on Computational Linguistics - COLING 2012)

The growth of online social media represents a fundamental shift in the generation, consumption, and sharing of digital information online. Social media data comes in many forms: from blogs (Blogger, LiveJournal) and micro-blogs (Twitter) to social networking (Facebook, LinkedIn, Google+), wikis, social bookmarking (Delicious), reviews (Yelp), media sharing (Youtube, Flickr), and many others. The information inherent in these online conversations is a veritable gold mine with the ability to influence every aspect of a modern enterprise -- from marketing and brand management to product design and customer support. However, the task of drawing concrete, relevant, trustworthy, and actionable insights from the ever increasing volumes of social data presents a significant challenge to current day information management and business intelligence systems. As a result, there is growing interest and activity in the academic and industrial research communities towards various fundamental questions in this space:

- How do we collect, curate, and cleanse massive amounts of social media data?
- What new analytic techniques, models, and algorithms are required to deal with the unique characteristics of social media data?
- How does one combine information extracted from textual content with the structural information in the "network" (linking, sharing, friending, etc.)?
- What kind of platforms and infrastructure components are required to support all of these analytic activities at scale?

Currently, relevant work in this area is distributed across the individual conferences and workshops organized by different computer science research disciplines such as information retrieval, database systems, NLP, and machine learning. Furthermore, a lot of interesting innovations and hands-on experience from industrial practitioners is not publicly available. This workshop aims to bring together industrial and academic practitioners with a focus on an aspect of this problem of particular relevance to COLING -- namely, robust and scalable techniques for information extraction and entity analytics on social media data.

We plan to organize the content of the workshop broadly along the following lines:

- Invited keynote addresses by Marius Pasca (Google Research, US) and Dan Roth (Professor, University of Illinois at Urbana-Champaign)
- Two sessions with research papers
- One session on case studies/practical applications of social media analytics highlighting technical challenges and the lessons learnt

Topics of interest to the workshop include:

- Statistical/Rule-based techniques for information extraction (IE) from social media data
- Robust techniques for extraction and analysis in the presence of
- Noise (slang, acronyms, use of colloquialisms, etc.)
- Multi-lingual text
- Very short amounts of text (e.g., tweets)
- Context-heavy text (e.g.. conversations between the participants in a forum, series of comments on an article, etc.)
- Techniques for robust sentiment extraction & mining from social media data
- Detection of emerging topics and themes from social media conversations
- Entity extraction, resolution, and disambiguation from social data
- Infrastructure components and/or platforms to enable all of these analytic techniques to operate at scale

Submission Guidelines

Authors are required to provide a Portable Document Format (PDF) version of their papers on or before September 30th, 2012 (11:59pm Samoa time, UTC-11). The submission link is

For this workshop, the maximum length of a manuscript is 14 (A5 sized) pages plus additional 2 pages for references, and should report original unpublished research. Papers must conform to official COLING 2012 style guidelines. Please use the following style files for formatting your submissions: Latex style files, Microsoft Word style files.

Reviewing: Each long/short paper will be reviewed by three/two members from the program committee. The final selection will be made by the committee based on the reports of the reviewers. We will follow a double blind reviewing policy. Therefore, all submissions should be anonymous. Please, do not put any information that can potentially reveal the identity of the author.

Dual submission policy: Authors can submit papers that are under review or has been submitted to another conference/workshop. However, upon acceptance of the paper, the authors have to decide whether they want to present the paper at IEEASMD or another forum.

Presentation and Participation: At least one of the authors MUST register for the workshop to ensure the inclusion of the paper in the proceedings. It is also expected that at least one author for each accepted submission personally attends and presents the work at the workshop. There will be both oral and poster presentations. The mode of presentation will be decided later based on the suggestion of the program committee and has nothing to do with the technical quality of the paper.


Marius Pasca, Google Research
Dan Roth, University of Illinois at Urbana-Champaign

Important Dates

30th September, 2012 (11:59pm Samoa time, UTC-11): Paper submission deadline
31st October, 2012: Paper accept/reject notification
15th November, 2012: Camera ready paper due
9th December, 2012: Workshop (schedule to be uploaded soon)

Organizing Committee:
Sriram Raghavan, IBM Research - India
Ganesh Ramakrishnan, IIT Bombay
Ajay Nagesh, IIT Bombay

Program Committee

Sunita Sarawagi, IIT Bombay
Indrajit Bhattacharyya, Indian Institute of Science, Bangalore
Rajasekar Krishnamurthy, IBM Research - Almaden
L. V. Subramanian, IBM Research - India
Sundararajan Sellamanickam, Yahoo! Labs, Bangalore
Rahul Gupta, Google Inc, USA
Anhai Doan, University of Wisconsin-Madison and WalmartLabs
Kevin Chen-Chuan Chang, Univ. of Illinois at Urbana-Champaign
Parag Singla, IIT Delhi

For any queries, please contact us at :