First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Language

Event Notification Type: 
Call for Papers
Abbreviated Title: 
Co-Located with Coling 2014
Saturday, 23 August 2014 to Sunday, 24 August 2014
general questions
specific questions about the shared task
Submission Deadline: 
Friday, 2 May 2014

First Joint Workshop on Statistical Parsing of Morphologically Rich
Languages and Syntactic Analysis of Non-Canonical Language


Co-located with COLING 2014, August 23-24 in Dublin, Ireland

Submission Deadline: May 02, 2014

The CFP below is for the SPMRL-SANCL Main Workshop. The workshop also features:

* Second Shared Task on Semi-Supervised Parsing of Morphologically
Rich Languages:
* Special Track on the Syntactic Analysis of Non-Canonical Language:


Statistical parsing of morphologically-rich languages (MRLs) and
syntactic analysis of non-canonical languages (NCL) have shown several
similar properties and challenges in recent research. Therefore, this
year we organize a joint workshop of these two research communities,
to foster cross-pollination of ideas and technology for both.

Statistical parsing of morphologically-rich languages has repeatedly
been shown to exhibit non-trivial challenges including, among others,
sparse lexica in the face of rich inflectional systems, parsing
deficiency in the face of free word order and treebank annotation
idiosyncrasies in the face of morphosyntactic interactions.

Similar problems arise for parsing of non-canonical languages. Besides
technical issues such as lexical sparseness and ad-hoc structures, we
also face theoretical problems including constructions that do not
occur, or very seldom occur, in standard language, such as verbless
sentences or complex hashtags.

The first joint SPMRL-SANCL workshop addresses both the challenge of
parsing MRLs and NCL. It provides a forum for researchers addressing
the often overlapping issues of both fields with the goal of
identifying cross-cutting issues in the annotation and parsing
methodology for such languages.


The areas of interest of the SPMRL-SANCL workshop include, but are not
limited to, the following list of topics:

* applying cutting-edge parsing techniques to new languages and domains

* identifying the strengths and weaknesses of current parsing
techniques when applied to morphologically-rich and/or non-canonical

* developing techniques that are targeted at improving parsing quality
of morphologically-rich and/or non-canonical language

* developing models and architectures that explicitly integrate
morphological analysis and parsing

* addressing data sparseness due to lexical variants,
out-of-vocabulary (OOV) words and noise, ad-hoc syntactic rules,
non-canonical word order, ungrammatical structures, or disfluencies

* using insights from parsing and associated processing problems to
motivate decisions in the creation of new syntactically annotated
corpora ("treebanks"), especially in domains, genres, and languages
that are not yet, or hardly covered; tag set design

* discussing the role of parsing in higher-level NLP applications
involving MRLs and NCLs, e.g. syntax-enhanced MT and semantic


The workshop will also host the second shared task on parsing
morphologically rich language (see The first shared task was
held in conjunction with SPMRL 2013. It helped show that carefully
engineered approaches can help to push the envelope on languages such
as Hungarian, Basque, Hebrew and Polish, where the shared task results
for constituency parsing are the best current known for those
languages. The task embodied a focus on realistic scenarios (no gold
tokenization, no gold part-of-speech or morphology), as well as
meaningful evaluation measures, including a cross-framework evaluation
that permits comparisons between constituent and dependency parsing

The second installment of the Shared Task will feature a similar range
of languages. Moreover, it will also consider a semi-supervised
scenario where larger quantities of in-domain text are available.
These unlabeled data are aimed to be used for self-training,
co-training, lexical acquisition, generating word clusters, word
embeddings and so on. A separate call for the Shared Task is


In addition to regular paper submissions, we solicit poster
submissions addressing the syntactic analysis of frequent phenomena of
non-canonical language, which are difficult to annotate and parse
using conventional annotation schemes. Cases in point are the
representation of verbless utterances in a dependency scheme, the pros
and cons of different representations of disfluencies for statistical
parsing, or the analysis of complex hashtags which incorporate and
merge different syntactic arguments into one token. The posters should
focus, in more detail, on one more of these issues. More details on
the submission categories for the poster session can be found below
and at: .


* Submission Date: May 02, 2014 (23:59 UTC - 12)
* Author Notification: June 06, 2014
* Camera-ready papers due: June 27, 2014
* Workshop: August 23 or 24, 2014


We solicit the following submission categories:

* Long papers (up to 11 pages with unlimited references)
* Short papers (up to 6 pages with unlimited references)
* Abstracts (500 words excluding examples/references, for SANCL poster
topics only)
* Shared task paper submissions (format will be disclosed later)

Long papers are most appropriate for presenting substantial and
completed research addressing a topic relevant to either SANCL or

Short papers are suited for presenting work in progress, position
papers or short, focused contributions, relevant to either SANCL or
SPMRL (including the poster session topics described above and, in
more detail, here).

Both long and short papers should present original, unpublished
research. They will be peer reviewed and will be presented as either
an oral talk or as a poster at the workshop. Long/short papers will be
included in the proceedings. Abstract submissions are most appropriate
for presenting an idea for an analysis for one or more of the poster
topics. In contrast to long/short paper submissions, abstract
submissions do not need to back up their ideas with experimental
results. Abstract submission will receive a yes/no review and will not
be included in the proceedings.

Submissions will be accepted in PDF format via the START system and
must conform to the COLING 2014 formatting instructions:



Yoav Goldberg (Bar Ilan University, Israel)
Yuval Marton (Microsoft Corp., US)
Ines Rehbein (Potsdam University, Germany)
Yannick Versley (Heidelberg University, Germany)
Özlem Çetinoğlu (University of Stuttgart, Germany)
Joel Tetreault (Yahoo! Labs, US)

-SANCL Special Session-

Ines Rehbein (Potsdam University, Germany)
Özlem Çetinoğlu (University of Stuttgart, Germany)
Djamé Seddah (Université Paris Sorbonne & INRIA's Alpage Project, France)
Joel Tetreault (Yahoo! Labs, US)

-Shared task-

Sandra Kübler (Indiana University, US)
Djamé Seddah (Université Paris Sorbonne & INRIA's Alpage Project, France)
Reut Tsarfaty (The Weizmann Institute of Science, Israel)


Bernd Bohnet (University of Birmingham, UK)
Marie Candito (University of Paris 7, France)
Aoife Cahill (Educational Testing Service, US)
Jinho D. Choi (University of Massachusetts Amherst, US)
Grzegorz Chrupala (Tilburg University, Netherlands)
Gülşen Cebiroğlu Eryiğit (Istanbul Technical University, Turkey)
Markus Dickinson (Indiana University, US)
Stefanie Dipper (Ruhr-Universität Bochum, Germany)
Jacob Eisenstein (Georgia Institute of Technology, US)
Richard Farkas (University of Szeged, Hungary)
Jennifer Foster (Dublin City University, Ireland)
Josef van Genabith (DFKI, Germany)
Koldo Gojenola (University of the Basque Country, Spain)
Spence Green (Stanford University, US)
Samar Husain (Potsdam University, Germany)
Joseph Le Roux (Université Paris-Nord, France)
John Lee (City University of Hong Kong, China)
Wolfgang Maier (University of Düsseldorf, Germany)
Takuya Matsuzaki (University of Tokyo, Japan)
David McClosky (IBM Research, US)
Detmar Meurers (University of Tübingen, Germany)
Joakim Nivre (Uppsala University, Sweden)
Kemal Oflazer (Carnegie Mellon University, Qatar)
Adam Przepiorkowski (ICS PAS, Poland)
Owen Rambow (Columbia University, US)
Kenji Sagae (University of Southern California, US)
Benoit Sagot (Inria Rocquencourt, France)
Wolfgang Seeker (IMS Stuttgart, Germany)
Anders Soogard (University of Copenhagen, Denmark)
Lamia Tounsi (Dublin City University, Ireland)
Daniel Zeman (Charles University, Czechia)


* For up-to-date information, please visit
* For general questions about the workshop, please email
* For specific questions about the shared task, please email