2022Q3 Reports: Anthology Director

From Admin Wiki
Jump to navigation Jump to search

What's new

Here are noteworthy changes and accomplishments from March through July 2022.

  • Some volumes have been accepted for indexing in important indexes (volume and index identities to follow)
  • Papers with Code crawls our site and adds explicit links to software and data linked in papers. This continues to run smoothly.
  • We continue to work with Slideslive ingesting videos of current conferences. 2021 is done, and we are processing 2022's main conference videos. Slideslive has been helpful and responsive here and it has been a pleasure working with them.
  • (Small, and actually from February, but my favorite) We now have buttons on each paper page that allow the user to quickly copy formal and informal citations to their clipboard (thanks to Marcel Bollmann).

Pain Points

The following items continue to consume a lot of our time.

Ingestion

  • Conferences are growing, with more and more workshops, all of which has to be ingested together and linked up. This is an increasingly complicated and error-prone process, often involving new and inexperienced people. Ryan Cotterell and his team have vastly simplified and improved this process by writing ACLPUB2 and by simply serving extended terms, accruing institutional knowledge and familiarity. The importance of this experience on this technical task is hard to overstate.
  • On numerous occasions, workshops erroneously submit non-archival materials, which then needs to be removed.

Corrections

  • Processing revisions and correcting metadata requires manual approval from Anthology staff. We have moved to a monthly processing window, which helps, but it is still needlessly complicated.

Roadmap

We plan the following projects, moving from short-term to long-term.

  • (Summer 2022) Explicit event representation. Many archival-worthy items are not associated with a paper (e.g., keynotes, business meeting videos), but we have no way to represent them. Events in the Anthology are inferred from a collection of volumes (e.g., NAACL'22). We are in the process of explicitly representing events, which will permit us to incorporate event-related (versus volume-related) items.
  • (Late 2022) Announcements mechanism. We will add a blog of sorts that allows us to post announcements and host other content related to the management of the Anthology.
  • (2022) Videos We continue to ingest old videos and host them locally.
  • (2023+) Dynamic site. The Anthology is currently built statically from public data on Github. Medium-term, we should move to a proper digital management system. This will easy paint points, allowing for example authors to claim their papers and manage many revisions and corrections themselves. Ideally this work will be in-house. It should be coordinated with the IT director, and the work should be paid.