2019Q3 Reports: NAACL 2019
Program Committee
Organising Committee
General Chair
Jill Burstein, Educational Testing Service, USA
Program Co-Chairs
Christy Doran, Interactions LLC, USA
Thamar Solorio, University of Houston, USA
Industry Track Co-chairs
Rohit Kumar
Anastassia Loukina, Educational Testing Service, USA
Michelle Morales, IBM, USA
Workshop Co-Chairs
Smaranda Muresan, Columbia University, USA
Swapna Somasundaran, Educational Testing Service, USA
Elena Volodina, University of Gothenburg, Sweden
Tutorial Co-Chairs
Anoop Sarkar, Simon Fraser University, Canada
Michael Strube, Heidelberg Institute for Theoretical Studies, Germany
System Demonstration Co-Chairs
Waleed Ammar, Allen Institute for AI, USA
Annie Louis, University of Edinburgh, Scotland
Nasrin Mostafazadeh, Elemental Cognition, USA
Publication Co-Chairs
Stephanie Lukin, U.S. Army Research Laboratory
Alla Roskovskaya, City University of New York, USA
Handbook Chair
Steve DeNeefe, SDL, USA
Student Research Workshop Co-Chairs & Faculty Advisors
Sudipta Kar, University of Houston, USA
Farah Nadeem, University of Washington, USA
Laura Wendlandt, University of Michigan, USA
Greg Durrett, University of Texas at Austin, USA
Na-Rae Han, University of Pittsburgh, USA
Diversity & Inclusion Co-Chairs
Jason Eisner, Johns Hopkins University, USA
Natalie Schluter, IT University, Copenhagen, Denmark
Publicity & Social Media Co-Chairs
Yuval Pinter, Georgia Institute of Technology, USA
Rachael Tatman, Kaggle, USA
Website & Conference App Chair
Nitin Madnani, Educational Testing Service, USA
Student Volunteer Coordinator
Lu Wang, Northeastern University, USA
Video Chair
Spencer Whitehead, Rensselaer Polytechnic Institute, USA
Remote Presentation Co-Chairs
Meg Mitchell, Google, USA
Abhinav Misra, Educational Testing Service, USA
Local Sponsorship Co-Chairs
Chris Callison-Burch, University of Pennsylvania, USA
Tonya Custis, Thomson Reuters, USA
Local Organization
Priscilla Rasmussen, ACL
Area Chairs
Biomedical NLP & Clinical Text Processing
Bridget McInnes, Virginia Commonwealth University, USA
Byron C. Wallace, Northeastern University, USA
Cognitive Modeling – Psycholinguistics
Serguei Pakhomov, University of Minnesota, USA
Emily Prud’hommeaux, Boston College, USA
Dialog and Interactive systems
Nobuhiro Kaji, Yahoo Japan Corporation, Japan
Zornitsa Kozareva, Google, USA
Sujith Ravi, Google, USA
Michael White, Ohio State University, USA
Discourse and Pragmatics
Ruihong Huang, Texas A&M University, USA
Vincent Ng, University of Texas at Dallas, USA
Ethics, Bias and Fairness
Saif Mohammad, National Research Council Canada, Canada
Mark Yatskar, University of Washington, USA
Generation
He He, Amazon Web Services, USA
Wei Xu, Ohio State University, USA
Yue Zhang, Westlake University, China
Information Extraction
Heng Ji, Rensselaer Polytechnic Institute, USA
David McClosky, Google, USA
Gerard de Melo, Rutgers University, USA
Timothy Miller, Boston Children’s Hospital, USA
Mo Yu, IBM Research, USA
Information Retrieval
Sumit Bhatia, IBM’s India Research Laboratory, India
Dina Demner-Fushman, US National Library of Medicine, USA
Machine Learning for NLP
Ryan Cotterell, Johns Hopkins University, USA
Daichi Mochihashi, The Institute of Statistical Mathematics, Japan
Marie-Francine Moens, KU Leuven, Belgium
Vikram Ramanarayanan, Educational Testing Service, USA
Anna Rumshisky, University of Massachusetts Lowell, USA
Natalie Schluter, IT University of Copenhagen, Denmark
Machine Translation
Rafael E. Banchs, HLT Institute for Infocomm Research A*Star, Singapore
Daniel Cer, Google Research, USA
Haitao Mi, Ant Financial US, USA
Preslav Nakov, Qatar Computing Research Institute, Qatar
Zhaopeng Tu, Tencent, China
Mixed Topics
Ion Androutsopoulos, Athens Univ. of Economics and Business, Greece
Steven Bethard, University of Arizona, USA
Multilingualism, Cross lingual resources
Željko Agić, IT University of Copenhagen, Denmark
Ekaterina Shutova, University of Amsterdam, Netherlands
Yulia Tsvetkov, Carnegie Mellon University, USA
Ivan Vulic, Cambridge University, UK
NLP Applications
T. J. Hazen, Microsoft, USA
Alessandro Moschitti, Amazon, USA
Shimei Pan, University of Maryland Baltimore County, USA
Wenpeng Yin, University of Pennsylvania, USA
Su-Youn Yoon, Educational Testing Service, USA
Phonology, Morphology and Word Segmentation
Ramy Eskander, Columbia University, USA
Grzegorz Kondrak, University of Alberta, Canada
Question Answering
Eduardo Blanco, University of North Texas, USA
Christos Christodoulopoulos, Amazon, USA
Asif Ekbal, Indian Institute of Technology Patna, India
Yansong Feng, Peking University, China
Tim Rocktäschel, Facebook, USA
Avi Sil, IBM Research, USA
Resources and Evaluation
Torsten Zesch, University of Duisburg-Essen, Germany
Tristan Miller, Technische Universität Darmstadt, Germany
Semantics
Ebrahim Bagheri, Ryerson University, Canada
Samuel Bowman, New York University, USA
Matt Gardner, Allen Institute for Artificial Intelligence, USA
Kevin Gimpel, Toyota Technological Institute at Chicago, USA
Daisuke Kawahara, Kyoto University, Japan
Carlos Ramisch, Aix Marseille University, France
Sentiment Analysis
Isabelle Augenstein, University of Copenhagen, Denmark
Wai Lam, The Chinese University of Hong Kong, Hong Kong
Soujanya Poria, Nanyang Technological University, Singapore
Ivan Vladimir Meza Ruiz, UNAM, Mexico
Social Media
Dan Goldwasser, Purdue University, USA
Michael J. Paul, University of Colorado Boulder, USA
Sara Rosenthal, IBM Research, USA
Paolo Rosso, Universitat Politècnica de València, Spain
Chenhao Tan, University of Colorado Boulder, USA
Xiaodan Zhu, Queen’s University, Canada
Speech
Keelan Evanini, Educational Testing Service, USA
Yang Liu, LAIX Inc, USA
Style
Beata Beigman Klebanov, Educational Testing Service, USA
Manuel Montes, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico
Joel Tetreault, Grammarly, USA
Summarization
Mohit Bansal, University of North Carolina Chapel Hill, USA
Fei Liu, University of Central Florida, USA
Ani Nenkova, University of Pennsylvania, USA
Tagging, Chunking, Syntax and Parsing
Adam Lopez, University of Edinburgh, Scotland
Roi Reichart, Technion – Israel Institute of Technology, Israel
Agata Savary, University of Tours, France
Guillaume Wisniewski, Université Paris Sud, France
Text Mining
Kai-Wei Chang, University of California Los Angeles, USA
Anna Feldman, Montclair State University, USA
Shervin Malmasi, Harvard Medical School, USA
Verónica Pérez-Rosas, University of Michigan, USA
Kevin Small, Amazon, USA
Diyi Yang, Carnegie Mellon University, USA
Theory and Formalisms
Valia Kordoni, Humboldt University Berlin, Germany
Andreas Maletti, University of Stuttgart, Germany
Vision, Robotics and other grounding
Francis Ferraro, University of Maryland Baltimore County, USA
Vicente Ordóñez, University of Virginia, USA
William Yang Wang, University of California Santa Barbara, USA
Main Innovations
- Conference theme
- The CFP made a special request for papers addressing the tension between data privacy and model bias in NLP, including: using NLP for surveillance and profiling, balancing the need for broadly representative data sets with protections for individuals, understanding and addressing model bias, and where bias correction becomes censorship. The three invited speakers were all selected to tie into the theme, and a Best Thematic Paper was selected.
- Land Acknowledgement
- Similar to what has been done in recent *CL conferences, the opening session included a land acknowledgement to recognize and honor Indigeneous Peoples.
- Video Poster Highlights
- This year included one minute slides with pre recorded audio that showcase the posters to be presented that day. The goal was to provide more visibility to posters. These were shown during the welcome reception, breakfast and breaks. A/V failures the first day of the conference have made it hard to assess effectiveness.
- Remote Presentations
- Remote presentations were supported for both talks and posters, via an application form to the committee.
- The new Diversity & Inclusion team piloted a number of new initiatives including:
- additional questions on the registration form to identify any accommodations
- preferred pronouns (optionally) added to badges
- I’m hiring/I’m looking for a job/I’m new badge stickers
- Link to D&I report will be included when it is available.
- Two-stage Submissions
- This year we followed a two-stage submission process, in which abstracts were due one week before full papers. Our goal was to get a head start on assigning papers to areas, and recruiting additional area chairs where submissions exceeded our predicted volume.
- Pro: early response to areas with larger than predicted number of papers
- Con: too much overhead for PCs, as authors repeatedly contacted chairs to request that papers be moved between long and short, or asked about changes to authorship, titles and abstracts.
- Full papers available for bidding: reviewers loved it, authors did not
- Student Research Papers
- talks and posters from the SRW were integrated into the main conference program. Positive feedback was received about this, better experience for students.
Submissions rates and distributions
Authors were permitted to switch format (long/short) when they submitted the full papers, so the total in the chart below uses 2271 as the total number of submissions, discounting the 103 that never submitted a full paper in the second phase. Seventy nine papers were desk-rejected due to anonymity, formatting, or dual-submission violations; 456 papers withdrawn prior to acceptance decisions being sent, although some were withdrawn part way through the review process; and an additional 11 papers were withdrawn after acceptance notifications had been sent. Keeping the acceptance rate consistent with past years meant 5 parallel tracks were needed to fit more papers into 3 days--as the conference grows, decisions will have to be made about continuing to add more tracks, adding more days to the main conference, or lowering the acceptance rate. The overall technical program consisted of 423 main conference papers, plus 9 TACL papers, 23 SRW papers, 28 Industry papers, and 24 demos. The TACL and SRW papers were integrated into the program, and marked SRW or TACL accordingly.
NEED TO CONVERT LATEX
Long | Short | Total | TACL |
---|
Detailed statistics by area
NEED TO FORMAT TABLE
Area Long (%) Short (%) Area Long (%) Short (%) Bio and clinical NLP 7 (57) 28 (17) Question Answering 73 (36) 41 (17) Cognitive modeling 24 (29) 14 (14) Resources and Evaluation 33 (27) 20 (20) Dialog and Interactive systems 64 (20) 18 (27) Semantics 80 (13) 42 (11) Discourse and Pragmatics 38 (21)
11 (36)
Sentiment Analysis 32 (28) 40 (20) Ethics, Bias and Fairness 16 (25) 12 (50) Social Media 44 (18) 41 (36) Generation 46 (14) 19 (23) Speech 19 (31) 9 (33) Information Extraction 46 (28) 16 (12) Style 24 ( (25) 16 (25) Information Retrieval 22 (22) 13 (30) Summarization 22 (27) 28 (28) Machine Learning for NLP 100 (29) 22 (22) Syntax 36 (52) 54 (13) Machine Translation 49 (30) 53 (18) Text Mining 101 (18) 29 (24) Multilingual NLP 43 (25) 28 (10) Theory and Formalisms 12 (58) 12 (16) NLP Applications 60 (30) 41 (17) Vision & Robotics 41 (12) 22 (36) Phonology 24 (33)
24 (25)
Conference tracks
The Industry Track, in its second year, had 28 accepted papers (10 oral and 18 posters, acceptance rate: ~28%), and ran a lunchtime Careers in Industry panel which was very well attended. Panelists were Judith Klavans, Yunyao Li, Owen Rambow, and Joel Tetreault and the moderator was Phil Resnik.
The Student Research Workshop had 23 accepted papers, distributed throughout the conference, and 19 submissions received pre-submission mentoring. For the first time, both archival and non-archival submissions were offered, meaning that authors who opted for the non-archival version will not have a paper available in the archive and are free to publish elsewhere.
There were 25 accepted Demos, which were spread across several of the poster sessions.
Reviewing
Recruiting ACs and Reviewers
Similar to what other PCs have done in the past, we distributed a wide call for volunteers to recruit the Area Chairs and Reviewers. All volunteers were scanned by PCs and assigned ACs/reviewer roles, and each area was seeded with a set of volunteer reviewers. Area Chairs then filled out the remainder of their respective committees. There were 25 specific areas + one for “Mixed Topics” and at least 2 ACs per topic area. After the abstract deadline, we added more ACs to teams with larger than predicted submissions . Our goal was to ensure greater diversity by including in each area some participants who may not have been previously involved, and therefore would not have been invited if the committees were built from lists of previous reviewers. 390 of 1321 reviewers were reviewing for NAACL for the first time. 40 of the 94 area chairs were first time area chairs for NAACL.
Breakdown by gender for ACs and reviewers
Response | Area Chair | Reviewer |
---|---|---|
Female | 24.4 | 25.2 |
Male | 73 | 71.7 |
Prefer not to answer | 2.6 | 3.1 |
Breakdown by employment category and country for ACs and reviewers
Abstract Submissions
This year we followed a two-stage submission process, in which abstracts were due one week before full papers. Our goal was to get a head start on assigning papers to areas, and recruiting additional area chairs where submissions exceeded our predicted volume. Relative to the projected numbers from NAACL-HLT 2018, several areas received a higher-than-predicted number of submissions: Biomedical/Clinical, Dialogue and Vision. Text Mining ended up with the overall largest number of submissions.
Handling desk rejects
Our process for identifying desk rejects has been very similar to what other PCs have done in the past. First, the area chairs check their batch of assigned papers and report any issues to us. As the reviewing begins, reviewers may also identify issues that were not caught by ACs, which they flag up to ACs or directly to PCs. We then review each of these issues and make a final decision, to ensure that papers are handled consistently. This means each paper is reviewed for non-content issues by at least three people. The major categories for desk rejects are:
- Violations to the dual submission policy specified in the call for papers
- Violations to the anonymity policy as specified in the call for papers
- “Format cheating” submissions not following the clearly stated format and style guidelines either in LaTeX or Word (thanks to Emily and Leon for introducing the concept).
As of February 7th, out of 2378 submissions, there were 44 rejections for format issues, 24 for anonymity violations, and 11 for dual submissions. This means that a total of 3% of the submissions were desk-rejected.
Review process
Assignment to areas used the initial START assignments, followed by load-rebalancing and conflict resolution using keywords and manual inspection of the paper. Authors were blind to Area Chairs
Review assignment
- Criteria: Fairness, Expertise, Interest
- Method: area chair expertise + Toronto Paper Matching System (TPMS) + reviewer bids + manual tweaking
- Many reviewers did not have TPMS profiles
Goal was no more than 5 papers per reviewer, some reviewers agreed to handle more. First-round accept/reject suggestions were made by area chairs. Final decisions were made by the program chairs.
We used a hybrid reviewing form, combining elements of the EMNLP 2018, NAACL-HLT 2018 and ACL 2018, with a 6-point overall rating scale so there was no “easy out” mid-point, distinct sections of summary, strengths and weaknesses to make easy to scan and compare relevant sections, and the minimum length feature of START enabled to elicit more consistently substantive content for the authors. This received excellent feedback from authors but which some reviewers complained about and others outright circumvented via html tags or repeated filler content.
The graph bellow shows the timeline of first review submissions.
Regarding the increasing challenge in preserving double blind review, PCs found that the papers whose authors the reviewers could guess were more likely to receive an overall score of 5 or 6, compared to papers whose authors were not identified by the reviewers.
No author response: due to time constraints and finding from NAACL 2018 that it had little impact. Authors were unhappy about this, they really want to be able to respond to reviews.
Did not repeat Test of Time awards from 2018--should this be something that the NAACL/ACL board runs, and/or be done every few years. [There were ToT awards at ACL 2019 and it looks like this will be happening at ACLs.]
Best paper awards
- Best Thematic Paper:
- What’s in a Name? Reducing Bias in Bios Without Access to Protected Attributes
- Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky and Adam Kalai
- Best Explainable NLP Paper:
- CNM: An Interpretable Complex-valued Network for Matching
- Qiuchi Li, Benyou Wang and Massimo Melucci
- Best Long Paper
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova
- Best Short Paper
- Probing the Need for Visual Context in Multimodal Machine Translation
- Ozan Caglayan, Pranava Madhyastha, Lucia Specia and Loïc Barrault
- Best Resource Paper
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
- Alon Talmor, Jonathan Herzig, Nicholas Lourie and Jonathan Berant
Presentations
- Long-paper presentations: 22 sessions in total (4 sessions in parallel), duration: 15 minutes for talk + 3 minutes for questions + 2 dedicated Industry Track sessions
- Short-paper presentations: 12 sessions in total (4 sessions in parallel), duration: 12 minutes for talk + 3 minutes for questions
- Best-paper presentation: 1 session at the end of the last day
- Posters: 8 sessions in total (1 session in parallel with every non-plenary talk session) + 1 dedicated Industry Poster session
Timeline
- Dec. 10th, 2018: Paper submission deadline (both long and short)
- Dec. 14-17: Area chairs check papers
- Dec 20-Jan 2, 2019: Paper bidding window
- Jan. 3-8: Area chairs review assignment
- Jan. 9: Review period starts
- Jan. 29: Reviews due (around 3 weeks for reviewing)
- Jan. 30-Feb 3: Area chairs chase late reviewers add emergency reviewers
- Feb 4th-7: Area chairs discussion period
- Feb 8th-12: Area chairs determine recommendations and enter meta reviews
- Feb 13-21: Final decisions made
- Feb 22: Decisions sent to authors
- March 11: Presentation format recommendations
- March 18: ACs send best reviewers list
- March 20-April 8: Best paper selection period
Issues and recommendations
- Maintaining anonymity
- Wording of ACL policies invites reinterpretation (e.g. "are asked not to publicize [the paper] further during the anonymity period – the submitted paper should be as anonymous as possible.")
- Open review from overlapping conferences requires Chairs to make ad hoc decisions about whether de-anonymization as part of the review process does or does not violate ACL policies
- Expectation for transparency at odds with confidential review process (community wants to discuss all aspects of review process in social media)
- Higher volume of papers & participants is straining our infrastructure
- START tools struggle to support this volume of papers
- Reviewer overload/burnout
- Challenges in coordinating logistics with the venue (A/V, coffee, recruiting lunch, video release forms, random people jumping into banquet buses) in the absence of a Local Chair
- Possible solutions
- Look into sharing reviews for rejected papers with next conferences
- Revisit using Open Review for *ACL
- Strict policy on double submissions (like EMNLP)
- Other recommendations
- Do not print handbooks for all participants, have a smaller number available by request. Post-conference survey indicated that a majority of participants used only the conference app.
- Have a Local Arrangements Chair for NAACL
- Revise ACL anonymity and submission policies to remove alternate interpretations and thereby spare PCs time-consuming negotiations with authors
- Consider moving NAACL to spring so that *ACL timelines are less compressed and NAACL reviewing does not fall over end-of-year holidays
- More automation of format checks in START & better documentation of the ones that are already there (obscure and buried flags) to ease the desk reject process
- Allow extension of START COI tools to allow authors to list reviewers who should not be assigned to their paper