Core body of knowledge

From ACL Wiki
Jump to navigation Jump to search

This page is a community-based effort to identify the core body of knowledge of the CL curriculum, following the ACM Model. It seeks to define "a minimal core consisting of those units for which there is a broad consensus that the corresponding material is essential" for any introductory course in computational linguistics. Following ACM's definition, "the core is not a complete curriculum ... the core must be supplemented by additional material."

The consensus of the participants of the Third Workshop on Teaching Computational Linguistics was to use the ACL wiki to develop and refine the definition of the CL core body of knowledge. Please use the talk page for discussion.

Computational Linguistics Core Body of Knowledge

CL1. Goals of computational linguistics
roots, philosophical underpinnings, ideology, contemporary divides
CL2. Introduction to Language
written vs spoken language; linguistic levels; typology, variation and change
CL3. Words, morphology and the lexicon
tokenization, lexical categories, POS-tagging, stemming, morphological analysis, FSAs
CL4. Syntax, grammars and parsing
grammar formalisms, grammar development, formal complexity of natural language
CL5. Semantics and discourse
lexical semantics, multiword expressions, discourse representation
CL6. Generation
text planning, syntactic realization
CL7. Language engineering
architecture, robustness, evaluation paradigms
CL8. Language resources
corpora, web as corpus, data-intensive linguistics, linguistic annotation, Unicode
CL9. Language technologies
named entity detection, coreference, IE, QA, summarization, MT, NL interfaces

References

Bird, Steven (2008). Defining a Core Body of Knowledge for the Introductory Computational Linguistics Curriculum. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics [1]