ACL 2010: The 48th Annual Meeting of the Association for Computational
Linguistics

Review form for LONG SYSTEMS/APPLICATIONS research papers

This review form is appropriate for papers that present a system
that has been developed, implemented, and tested with users.


APPROPRIATENESS (1-5)

Does the paper fit in ACL 2010? (Please answer this question in light
of the desire to broaden the scope of the research areas represented
at ACL.)

5: Certainly.
4: Probably.
3: Unsure.
2: Probably not.
1: Certainly not. 


CLARITY (1-5)

For the reasonably well-prepared reader, is it clear what was done and
why? Is the paper well-written and well-structured?

5 = Very clear.
4 = Understandable by most readers.
3 = Mostly understandable to me with some effort.
2 = Important questions were hard to resolve even with effort.
1 = Much of the paper is confusing. 


ORIGINALITY / INNOVATIVENESS (1-5)

Is there novelty in the developed system? Does it address a new
problem or one that has received little attention? Alternatively, does
it present a system that has significant benefits over other systems,
either in terms of its usability, coverage, or success?

5 = Seminal: Significant new problem, or a major advance
    over other systems that attack this problem.
4 = Noteworthy: An interesting new problem, or substantial
    benefits over other systems that attack this problem.
3 = Respectable: A nice research contribution that represents a
    significant extension of prior approaches.
2 = Marginal: Minor improvements on existing systems in this 
    area.
1 = The system does not represent any advance in the area of
    natural language processing.


IMPLEMENTATION AND SOUNDNESS (1-5)

Has the system been fully implemented or do certain parts of the
system remain to be implemented? Does the system achieve its claims?
Is enough detail provided that one might be able to replicate the
system with some effort? Are working examples of the system provided
and do they adequately illustrate the claims made for the system?

5 = The system is fully implemented, and the claims for the system are
    convincingly supported. Other researchers should be able to
    replicate the work.
4 = Generally solid work, although there are some aspects of the
    system that still need work, and/or some claims that should be
    better illustrated and supported.
3 = Fairly reasonable work. The main claims are illustrated to some
    extent with examples, but I am not entirely ready to accept that
    the system can do everything that it should (based on the material
    in the paper).
2 = Troublesome. There are some aspects of the system that might be
    good, but the system has significant deficiencies and/or
    limitations that make it premature.
1 = Fatally flawed. 


EVALUATION (1-5)

To what extent has the system been tested and evaluated? Have there
been any user studies?

5 = The system has been thoroughly tested. Rigorous evaluation of the
    system on a large corpus or via formal user studies support the
    claims made for the system. Critical analysis of the results
    yields insight into the system's limitations (if any).
4 = The system has been tested and evaluated on a reasonable corpus or
    with a small set of users. The results support the claims made for
    the system. Critical analysis of the results yields insight into
    the system's limitations (if any).
3 = The system has been tested and evaluated to a limited extent. The
    results have been critically analyzed to gain insight into the
    system's performance.
2 = A few test cases have been run on the system but no significant
    evaluation or user study has been performed.
1 = The system has not been tested or evaluated.


RESOURCES (1-5)

Has the system been deployed in a real-world setting, either for
commercial or research use? Is the system available for distribution
to other researchers? 

5 = The system has been deployed or is available for distribution to
    other researchers. 
4 = The system is close to deployment or will very shortly be
    available for distribution to other researchers. 
3 = The system is not yet ready for deployment. Upon request,
    other researchers can be shown demos of the system.
2 = The system will not be deployed but limited demos of the system
    will be available soon.
1 = There are no plans to provide demos of the system.


MEANINGFUL COMPARISON (1-5)

Does the author make clear where the presented system sits with
respect to existing literature? Are the references adequate? Are the
benefits of the system/application well-supported and are the
limitations identified?

5 = Precise and complete comparison with related work. Benefits and
    limitations are fully described and supported.
4 = Mostly solid bibliography and comparison, but there are a few
    additional references that should be included. Discussion of
    benefits and limitations is acceptable but not enlightening.
3 = Bibliography and comparison are somewhat helpful, but it could be
    hard for a reader to determine exactly how this work relates to
    previous work or what its benefits and limitations are.
2 = Only partial awareness and understanding of related work, or a
    flawed comparison or deficient comparision with other work.
1 = Little awareness of related work, or insufficient justification of
    benefits and discussion of limitations.


SUBSTANCE (1-5)

Does this paper have enough substance, or would it benefit from more
ideas or analysis?

Note that this question mainly concerns the amount of work; its
quality is evaluated in other categories.

5 = Contains more ideas or analysis than most publications in this
    conference; goes the extra mile.
4 = Represents an appropriate amount of work for a publication in this
    conference. (most submissions)
3 = Leaves open one or two natural questions that should have been
    pursued within the paper.
2 = Work in progress. There are enough good ideas, but perhaps not
    enough results yet.
1 = Seems thin. Not enough ideas here for a full-length paper.


IMPACT OF IDEAS OR RESULTS (1-5)

How significant is the work described? Will novel aspects of the
system result in other researchers adopting the approach in their own
work? Does the system represent a significant and important advance in
implemented and tested human language technology?

5 = A major advance in the state-of-the-art in human language
    technology that will have a major impact on the field.
4 = Some important advances over previous systems, and likely to
    impact development work of other research groups.
3 = Interesting but not too influential. The work will be cited, but
    mainly for comparison or as a source of minor contributions.
2 = Marginally interesting. May or may not be cited.
1 = Will have no impact on the field.


RECOMMENDATION (1-6)

There are many good submissions competing for slots at ACL 2010; how
important is it to feature this one? Will people learn a lot by
reading this paper or seeing it presented?

In deciding on your ultimate recommendation, please think over all
your scores above. But remember that no paper is perfect, and remember
that we want a conference full of interesting, diverse, and timely
work. If a paper has some weaknesses, but you really got a lot out of
it, feel free to fight for it. If a paper is solid but you could live
without it, let us know that you're ambivalent. Remember also that the
author has a few weeks to address reviewer comments before the
camera-ready deadline.

Should the paper be accepted or rejected?

6 = Exciting: I'd fight to get it accepted; probably would be one
              of the best papers at the conference.
5 = Strong: I'd like to see it accepted; it will be one of the
            better papers at the conference.
4 = Worthy: A good paper that is worthy of being presented at ACL.
3 = Ambivalent: OK but does not seem up to the standards of ACL.
2 = Leaning against: I'd rather not see it in the conference.
1 = Poor: I'd fight to have it rejected.


REVIEWER CONFIDENCE (1-5)

5 = Positive that my evaluation is correct. I read the paper very
    carefully and am familiar with related work.
4 = Quite sure. I tried to check the important points carefully. It's
    unlikely, though conceivable, that I missed something that should
    affect my ratings.
3 = Pretty sure, but there's a chance I missed something. Although I
    have a good feel for this area in general, I did not carefully check
    the paper's details, e.g., the math, experimental design, or novelty.
2 = Willing to defend my evaluation, but it is fairly likely that I
    missed some details, didn't understand some central points, or can't
    be sure about the novelty of the work.
1 = Not my area, or paper is very hard to understand. My evaluation is
    just an educated guess.


RECOMMENDATION FOR BEST LONG PAPER AWARD (1-3)

3 = Definitely.
2 = Maybe.
1 = Definitely not.


ACCEPTANCE AS A SHORT PAPER (1-4)

If this submission is rejected as a long paper, could it be turned
into a reasonable short paper? (It is possible that some of the long
paper submissions will be accepted as short papers.) Submissions to
which this might apply include long papers in which the work is too
preliminary, and work in progress; description of a new system which
really doesn't need 8 pages of content, but which would make a good
4-page paper.

In making this judgement please consider only whether the paper could
be reasonably reduced to 4 pages, rather than its quality. (If some
long papers are accepted as short papers, then your evaluation above
will also be taken into account.)

4 = Would be an excellent short paper.
3 = Would be a good short paper.
2 = Could be a short paper but would be one of the weaker ones.
1 = Definitely not.