2008Q1 Reports: Anthology
From Admin Wiki
ACL ANTHOLOGY Report, January 2008 Steven Bird & Min Yen Kan The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. It includes the Computational Linguistics journal, and proceedings of many conferences and workshops including: ACL, EACL, NAACL, ANLP, TINLAP, COLING, HLT, MUC, and Tipster. Conference proceedings are published in the anthology around the same time as the conference. CL articles are published in the anthology one year in arrears (but individual subscribers can access recent issues electronically via the MIT Press website). The anthology now contains 14,000 papers (up from 12,500 papers twelve months ago), along with full-text search. The materials are now hosted on the ACL website, at http://aclweb.org/anthology-index/, thanks to Drago Radev. Most of the papers are also indexed by Citeseer and Google Scholar, helping the citation counts of ACL authors. The ACM Digital Library creates full metadata for all anthology materials and registers digital object identifiers for ACL papers (e.g. http://dx.doi.org/10.3115/1118693.1118695), costing the ACL $275 annually. The new AAN ACL Anthology Network website at Michigan provides detailed citation analysis for the anthology. Updates to the anthology are announced on the mailing list at http://groups.google.com/group/acl-anthology Steven Bird has now stepped down as editor, and has passed on the role to Min-Yen Kan. This transition marks the conclusion of the development phase of the Anthology: (a) materials from the ACL's hardcopy and microfiche eras are now all digitized; (b) born-digital materials published in ad hoc formats have been manually converted; (c) the anthology has been incorporated into the ACL's operation, including the publications process and web hosting. The ongoing maintenance of the anthology involves several challenges: streamlining the proceedings upload process; incorporating richer bibliographic metadata as it becomes available via DOI services, and supporting community initiatives that build on the Anthology. ONGOING ACTIVITIES PACLIC PROCEEDINGS: The steering committee of PACLIC -- the Pacific Asia Conference on Language, Information and Computation -- has approached the Anthology editor to request that PACLIC proceedings be included in the Anthology. This has been an important regional conference covering language in the Pacific Asian region over the past twenty years. Recently, with great help from Professor Harada's team at Waseda University, all PACLIC proceedings have been digitized, and posted at http://www.decode.waseda.ac.jp/PACLIC-STEERING/. Including these materials would add to the geographical and linguistic diversity of the Anthology. The Executive needs to establish the scope of the Anthology beyond the ACL's own publications. IJCNLP PROCEEDINGS: The 2005 proceedings were excluded from the ACL Anthology because of an agreement with Springer. Once the required three year period elapses, during 2008, the IJCNLP-05 proceedings can be incorporated into the Anthology. Su Jian is the contact person for organizing this. IJCNLP-08 proceedings will also be processed into the anthology at a later date this year, pending the final list of archived papers from the IJCNLP conference chairs. HIGHER-QUALITY BIBLIOGRAPHIC METADATA: The ACM Digital Library is creating high-quality bibliographic metadata for each individual paper, in conjunction with registering each paper with a DOI. It should be possible to extract that metadata and improve the quality of metadata on the Anthology site (e.g. removing OCR errors in the spelling of author and paper names). PUBLICATION INSTRUCTIONS: The instructions for the publication software need to be updated to cover two further tasks: (i) obtaining the workshop identifiers from the Anthology editor, and (ii) uploading the materials to the anthology by FTP. Conferences and workshops not held in conjunction with a regular ACL meeting are not automatically included in the Anthology. Organizers of such events shound consider using the ACL publication software and contacting the Anthology editor to ensure timely incorporation of the proceedings in the Anthology. SIG RELATED MATERIALS: Min is now working on expanding the scope of Anthology materials where feasible. In particular, SIGs are likely to have their own specialized Anthology pages, featuring links to materials of relevance or supported by each SIG. Once this is done, we hope to expand the archiving of materials to workshops/conference related to SIGs. TIMING: Conference and workshop organizers have a variety of opinions about exactly when proceedings should appear in the Anthology (e.g. before, during, or after the event). We recommend that the Executive establish a standard practice here. ACM DL: Our ACM Digital Library contact, Bernard Rous, has asked to receive CD-ROMs of ACL conferences as they are published, so that he can initiate the process of assigning DOIs. His address is: Bernard Rous, Electronic Publishing Program Director, ACM, 2 Penn Plaza Suite 701, New York NY 10121-0701 TEXT EXTRACTION: There is an ongoing initiative to extract plain text from the ACL Anthology materials, involving Dragomir Radev, Min-Yen Kan and others. Most of the Anthology has been converted, and can be found at http://wing.comp.nus.edu.sg/~min/dAnth/acl/. This will facilitate the application of NLP techniques to our own publications. In particular, the Linked anthology proposal submitted to the ACL Exec grassroots initiative plans to create standardized test corpus for future bibliographic and bibliometric studies, which we expect to be reported later this year. TOPICAL INDEXING: The existence of persistent URLs makes it easy for individuals and special interest groups to set up annotated bibliographies with pointers to papers in the anthology. Moreover, the community's own text categorization techniques ought to be applied to its own text collection. The anthology site should link to any well-curated, comprehensive categorizations of its content, so that members of the CL community can benefit from them. The new ACL Wiki would be a convenient place for members to maintain topical indexes of ACL papers. WIKIFIED EDITING: On a more long-term schedule for late this year is to have the Anthology incorporate edits from the user community. These edits to metadata would be reviewed by the Anthology editor but such feedback would be made much easier from the context of the users themselves.