ACL Wiki - User contributions [en]

SIGGEN

2021-09-22T17:41:37Z

Ereiter: /* Recent Events */

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== NLG Data Sets and Other Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Resources]]

== Upcoming Events ==

We are looking for bids to host INLG 2022! Email the SIGGEN board (details below) for more information.

[https://inlg2021.github.io INLG 2021] will be hosted by Aberdeen University on 20-24 September. It will mostly be online, but there will be extra events for people who are able to physically travel to Aberdeen. Submission deadline is 31 May.

[https://gem-benchmark.com/workshop GEM workshop] will be held at ACL in early August. It will focus on a series of NLG shared tasks.

SIGGEN is starting a monthly webinar series. We're looking for talks on anything of interest to SIGGEN/NLG community, including research, tutorials, and commercial projects. If you have an idea for a webinar, please email siggen-board@aclweb.org

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://gem-benchmark.com/workshop GEM] was held in August 2021. Proceedings available in the [https://aclanthology.org/volumes/2021.gem-1/ ACL Anthology].

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (at) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2021-09-22T17:36:06Z

Ereiter: /* Upcoming Events */

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== NLG Data Sets and Other Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Resources]]

== Upcoming Events ==

We are looking for bids to host INLG 2022! Email the SIGGEN board (details below) for more information.

[https://inlg2021.github.io INLG 2021] will be hosted by Aberdeen University on 20-24 September. It will mostly be online, but there will be extra events for people who are able to physically travel to Aberdeen. Submission deadline is 31 May.

[https://gem-benchmark.com/workshop GEM workshop] will be held at ACL in early August. It will focus on a series of NLG shared tasks.

SIGGEN is starting a monthly webinar series. We're looking for talks on anything of interest to SIGGEN/NLG community, including research, tutorials, and commercial projects. If you have an idea for a webinar, please email siggen-board@aclweb.org

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (at) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2021-09-22T17:35:22Z

Ereiter: /* Upcoming Events */

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== NLG Data Sets and Other Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Resources]]

== Upcoming Events ==

We are looking for bids to host INLG 2022!

[https://inlg2021.github.io INLG 2021] will be hosted by Aberdeen University on 20-24 September. It will mostly be online, but there will be extra events for people who are able to physically travel to Aberdeen. Submission deadline is 31 May.

[https://gem-benchmark.com/workshop GEM workshop] will be held at ACL in early August. It will focus on a series of NLG shared tasks.

SIGGEN is starting a monthly webinar series. We're looking for talks on anything of interest to SIGGEN/NLG community, including research, tutorials, and commercial projects. If you have an idea for a webinar, please email siggen-board@aclweb.org

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (at) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2021-09-22T17:34:24Z

Ereiter: /* NLG Data Sets and Other Resources */

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== NLG Data Sets and Other Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Resources]]

== Upcoming Events ==

[https://inlg2021.github.io INLG 2021] will be hosted by Aberdeen University on 20-24 September. It will mostly be online, but there will be extra events for people who are able to physically travel to Aberdeen. Submission deadline is 31 May.

[https://gem-benchmark.com/workshop GEM workshop] will be held at ACL in early August. It will focus on a series of NLG shared tasks.

SIGGEN is starting a monthly webinar series. We're looking for talks on anything of interest to SIGGEN/NLG community, including research, tutorials, and commercial projects. If you have an idea for a webinar, please email siggen-board@aclweb.org

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (at) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2021-09-22T17:33:12Z

Ereiter: clarify that this includes data sets

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== NLG Data Sets and Other Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

[https://inlg2021.github.io INLG 2021] will be hosted by Aberdeen University on 20-24 September. It will mostly be online, but there will be extra events for people who are able to physically travel to Aberdeen. Submission deadline is 31 May.

[https://gem-benchmark.com/workshop GEM workshop] will be held at ACL in early August. It will focus on a series of NLG shared tasks.

SIGGEN is starting a monthly webinar series. We're looking for talks on anything of interest to SIGGEN/NLG community, including research, tutorials, and commercial projects. If you have an idea for a webinar, please email siggen-board@aclweb.org

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (at) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2021-03-01T10:48:45Z

Ereiter: add INLG details, GEM

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

[https://inlg2021.github.io INLG 2021] will be hosted by Aberdeen University on 20-24 September. It will mostly be online, but there will be extra events for people who are able to physically travel to Aberdeen. Submission deadline is 31 May.

[https://gem-benchmark.com/workshop GEM workshop] will be held at ACL in early August. It will focus on a series of NLG shared tasks.

SIGGEN is starting a monthly webinar series. We're looking for talks on anything of interest to SIGGEN/NLG community, including research, tutorials, and commercial projects. If you have an idea for a webinar, please email siggen-board@aclweb.org

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (at) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2021-01-12T09:34:12Z

Ereiter: removed mention of CSL special issue, since I dont think it is formally linked to SIGGEN

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

'''INLG 2021 will be hosted by Aberdeen University in September, either physical or online, with an anticipated submission deadline in May.'''

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2020.org/ INLG 2020] was held online (virtually from Dublin City University, DCU, in Dublin Ireland, 15-18 December, 2020). Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/2020.inlg-1/ ACL Anthology]

Endorsed events:
* [https://pragma.ruhr-uni-bochum.de/workshopINLG2020/ 1st Workshop on Discourse Theories for Text Planning] ([ proceedings])
* [https://sites.google.com/view/nl4xai2020 2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI)] ([ proceedings])
* [https://hbuschme.github.io/nlg-hri-workshop-2020/ 2nd Workshop on Natural Language Generation for Human-Robot Interaction (NLG4HRI)] ([ proceedings])
* [https://evalnlg-workshop.github.io/ 1st Workshop on Evaluating NLG Evaluation] ([ proceedings])
* [https://webnlg-challenge.loria.fr/workshop_2020/ 3rd Workshop on Natural Language Generation from the Semantic Web] ([ proceedings])

2020 SIGGEN supported events:
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG] ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://citius.usc.es/v/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es mail]) [https://citius.usc.es/ CiTIUS, University of Santiago de Compostela], Spain
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.emielvanmiltenburg.nl/ Emiel van Miltenburg] ([mailto:c.w.j.vanmiltenburg@tilburguniversity.edu mail]) [https://www.tilburguniversity.edu/research/institutes-and-research-groups/ticc Tilburg center for Cognition and Communication, Tilburg University], The Netherlands (treasurer)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[http://nil.fdi.ucm.es/?q=members/raquelhervas Raquel Hervás] ([mailto:raquelhb@fdi.ucm.es mail]) [https://www.ucm.es/ Universidad Complutense de Madrid], Spain (secretary)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2024
*[https://www.edinburgh-robotics.org/students/miruna-adriana-clinciu Miruna-Adriana Clinciu] ([mailto:mc191@hw.ac.uk mail]) [https://www.edinburgh-robotics.org/ Edinburgh Centre of Robotics], Scotland, UK (student member)
:elected in December 2020 for the period from 1st January 2021 to 31st December 2022

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2020-10-22T14:52:40Z

Ereiter: /* Upcoming Events */ update INLG

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

[https://www.inlg2020.org/ INLG 2020] will take place online 15-18 Dec.

Other SIGGEN supported events are
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)

There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG].

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

Data sets for NLG

2020-10-21T14:12:25Z

Ereiter: add SportSett

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== boxscore-data (Rotowire) and SportSett ===
https://github.com/harvardnlp/boxscore-data/

Boxscore-data consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

https://github.com/nlgcat/sport_sett_basketball

SportSett is an expanded data set which includes additional information about basketball games. It is structured as a relational DB

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Methodius Corpus ===
https://www.inf.ed.ac.uk/research/isdd/admin/package?view=1&id=197

This dataset consists of 5000 short texts describing ancient Greek artefacts, generated by the Methodius NLG system. Each text is linked to its corresponding content plan (including rhetorical relations) and OpenCCG logical form (which describes the syntactic structure).

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== ToTTo ===
https://github.com/google-research-datasets/ToTTo/

100,000 examples of descriptions of the content of highlighted cells in a Wikipedia table.

=== Weather ===
https://github.com/facebookresearch/TreeNLG

~30K human annotated utterances for tree-structured weather meaning representations.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
http://webnlg.loria.fr/pages/data.html ([[Data sets for NLG blog#WebNLG|blog comments]])

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset ([[Data sets for NLG blog#WikiBio|blog comments]])

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== Wikipedia Person and Animal Dataset ===
https://eaglew.github.io/dataset/narrating

This dataset gathers 428,748 person and 12,236 animal infobox with descriptions based on Wikipedia dump (2018/04/01) and Wikidata (2018/04/12).

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

=== Finnish morphology ===

https://www.kaggle.com/mikahama/finnish-locative-cases-for-nouns

Dataset for picking the correct locative case for Finnish nouns (e.g Venäjä'''llä''' vs Suome'''ssa''')

https://www.kaggle.com/mikahama/cases-of-complements-of-finnish-verbs

Dataset for picking the right case for objects of verbs in Finnish (e.g. näen talo'''n''' vs uneksin talo'''sta''')

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== CASS (French) ===
https://github.com/euranova/CASS-dataset

This dataset is composed of decisions made by the French Court of cassation and summaries of these decisions made by lawyer.

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

== Paper Generation ==

=== ACL Title and Abstract Dataset ===
https://github.com/EagleW/ACL_titles_abstracts_dataset

This dataset gathers 10,874 title and abstract pairs from the ACL Anthology Network (until 2016).

=== PubMed Term, Abstract, Conclusion, Title Dataset ===
https://eaglew.github.io/dataset/paperrobot_writing

This dataset gathers three types of pairs: Title-to-Abstract (Training: 22,811/Development: 2095/Test: 2095), Abstract-to-Conclusion and Future work (Training: 22,811/Development: 2095/Test: 2095), Conclusion and Future work-to-Title (Training: 15,902/Development: 2095/Test: 2095) from PubMed. Each pair contains a pair of input and output as well as the corresponding terms(from original KB and link prediction results).

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

SIGGEN

2020-07-31T09:26:45Z

Ereiter: added MSR2020 as upcoming event

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

INLG 2020 will be held in Dublin (Ireland), in the week of 7 Sept (ie, the week before [https://coling2020.org/ COLING 2020]).

Other SIGGEN supported events are
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop)
* [http://taln.upf.edu/pages/msr2020-ws/ MSR2020 - Third Workshop on Multilingual Surface Realisation] (COLING workshop)

There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG].

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2020-07-31T08:57:25Z

Ereiter: added MSR events to list of endorsed events

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

INLG 2020 will be held in Dublin (Ireland), in the week of 7 Sept (ie, the week before [https://coling2020.org/ COLING 2020]).

Other SIGGEN supported events are
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop])

There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG].

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])
* [http://taln.upf.edu/pages/msr2019-ws/ The Second Workshop on Multilingual Surface Realization (MSR 2019)] ([https://www.aclweb.org/anthology/volumes/D19-63/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])
* [http://taln.upf.edu/pages/msr2018-ws/ Workshop on Multilingual Surface Realization (MSR 2018)] ([https://www.aclweb.org/anthology/volumes/W18-36/ proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

Data sets for NLG

2020-05-01T09:54:06Z

Ereiter: added ToTTo

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== boxscore-data (Rotowire) ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Methodius Corpus ===
https://www.inf.ed.ac.uk/research/isdd/admin/package?view=1&id=197

This dataset consists of 5000 short texts describing ancient Greek artefacts, generated by the Methodius NLG system. Each text is linked to its corresponding content plan (including rhetorical relations) and OpenCCG logical form (which describes the syntactic structure).

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== ToTTo ===
https://github.com/google-research-datasets/ToTTo/

100,000 examples of descriptions of the content of highlighted cells in a Wikipedia table.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
http://webnlg.loria.fr/pages/data.html ([[Data sets for NLG blog#WebNLG|blog comments]])

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset ([[Data sets for NLG blog#WikiBio|blog comments]])

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== CASS (French) ===
https://github.com/euranova/CASS-dataset

This dataset is composed of decisions made by the French Court of cassation and summaries of these decisions made by lawyer.

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

SIGGEN

2020-02-27T17:31:04Z

Ereiter: added endorsed events for 2019

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

INLG 2020 will be held in Dublin (Ireland), in the week of 7 Sept (ie, the week before [https://coling2020.org/ COLING 2020]).

Other SIGGEN supported events are
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop])

There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG].

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://www.inlg2019.com/ INLG 2019] was held in Tokyo, Japan, 29 Oct - 1 Nov 2019. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W19-86/ ACL Anthology]

Endorsed events:
* [https://sites.google.com/view/dsnnlg2019/ 1st Workshop on Discourse Structure in Neural NLG] ([https://www.aclweb.org/anthology/volumes/W19-81/ proceedings])
* [https://sites.google.com/view/nl4xai2019/ 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence] ([https://www.aclweb.org/anthology/volumes/W19-84/ proceedings])
* [https://aiwolfdial.kanolab.net/ The 1st International Workshop of AI Werewolf and Dialog System] ([https://www.aclweb.org/anthology/volumes/W19-83/ proceedings])
* [http://www.ccnlg.org/ The 4th Workshop on Computational Creativity in Natural Language Generation] ([http://www.ccnlg.org/index.php/home/programme/ proceedings])

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 November 2018. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W18-65/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W17-35/ ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published in the [https://www.aclweb.org/anthology/volumes/W16-66/ ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2020-02-05T12:37:24Z

Ereiter: add INLG date

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

INLG 2020 will be held in Dublin (Ireland), in the week of 7 Sept (ie, the week before [https://coling2020.org/ COLING 2020]).

Other SIGGEN supported events are
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop])

There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG].

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 Novemeber 2018. Conference proceedings are published at [https://aclanthology.info/events/inlg-2018 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published at [https://aclanthology.info/events/inlg-2017 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published at [https://aclanthology.info/events/inlg-2016 ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2020-01-24T17:37:03Z

Ereiter: update events

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

INLG 2020 will be held in Dublin (Ireland). Date to-be-decided, but most likely it will be either before or after [https://coling2020.org/ COLING 2020]

Other SIGGEN supported events are
* [https://intellang.github.io/ IntelLanG - Intelligent Information Processing and Natural Language Generation] (ECAI workshop])

There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG].

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 Novemeber 2018. Conference proceedings are published at [https://aclanthology.info/events/inlg-2018 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published at [https://aclanthology.info/events/inlg-2017 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published at [https://aclanthology.info/events/inlg-2016 ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN

2020-01-24T11:08:46Z

Ereiter: update events

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

INLG 2020 will be held in Dublin (Ireland). Date to-be-decided, but most likely it will be either before or after [https://coling2020.org/ COLING 2020]

Other related events are:
* There is a [https://www.journals.elsevier.com/computer-speech-and-language/call-for-papers/special-issue-on-natural-language-generation special issue of Computer Speech and language on NLG], which is partially a followup of INLG 2019.

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 Novemeber 2018. Conference proceedings are published at [https://aclanthology.info/events/inlg-2018 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published at [https://aclanthology.info/events/inlg-2017 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published at [https://aclanthology.info/events/inlg-2016 ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

Data sets for NLG blog

2019-10-15T11:43:49Z

Ereiter: update webnlg info

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome.

We'd love to see more content here, please email Ehud Reiter (e.reiter@abdn.ac.uk) with contributions or other comments

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

=== WebNLG ===
Thiago Castro Ferreira and Diego Moussallem spent six months producing an enriched version of WebNLG with high-quality annotations. This is available on [https://github.com/ThiagoCF05/webnlg GitHub]

The WebNLG dataset was used in the [http://webnlg.loria.fr/pages/results.html WebNLG challenge].

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== WikiBio ===
No manual verification or filtering [https://ehudreiter.com/2019/09/26/generated-texts-must-be-accurate/#comment-15983]

Data sets for NLG

2019-10-15T11:37:24Z

Ereiter: update webnlg URL

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Methodius Corpus ===
https://www.inf.ed.ac.uk/research/isdd/admin/package?view=1&id=197

This dataset consists of 5000 short texts describing ancient Greek artefacts, generated by the Methodius NLG system. Each text is linked to its corresponding content plan (including rhetorical relations) and OpenCCG logical form (which describes the syntactic structure).

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
http://webnlg.loria.fr/pages/data.html ([[Data sets for NLG blog#WebNLG|blog comments]])

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset ([[Data sets for NLG blog#WikiBio|blog comments]])

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== CASS (French) ===
https://github.com/euranova/CASS-dataset

This dataset is composed of decisions made by the French Court of cassation and summaries of these decisions made by lawyer.

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG

2019-10-10T11:21:45Z

Ereiter: added CASS

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg ([[Data sets for NLG blog#WebNLG|blog comments]])

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset ([[Data sets for NLG blog#WikiBio|blog comments]])

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== CASS (French) ===
https://github.com/euranova/CASS-dataset

This dataset is composed of decisions made by the French Court of cassation and summaries of these decisions made by lawyer.

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG

2019-09-27T13:00:57Z

Ereiter: add blog links

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg ([[Data sets for NLG blog#WebNLG|blog comments]])

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset ([[Data sets for NLG blog#WikiBio|blog comments]])

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG blog

2019-09-27T13:00:34Z

Ereiter:

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome.

We'd love to see more content here, please email Ehud Reiter (e.reiter@abdn.ac.uk) with contributions or other comments

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

=== WebNLG ===
Thiago Castro Ferreira and Diego Moussallem spent six months cleaning up WebNLG, so quality should be good

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== WikiBio ===
No manual verification or filtering [https://ehudreiter.com/2019/09/26/generated-texts-must-be-accurate/#comment-15983]

Data sets for NLG blog

2019-09-27T12:59:18Z

Ereiter:

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome.

We'd love to see more content here, please email Ehud Reiter (e.reiter@abdn.ac.uk) with contributions or other comments

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

=== WebNLG ===
Thiago Castro Ferreira and Diego Moussallem spent six months cleaning up WebNLG, so quality should be good

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== Wikibio ===
No manual verification or filtering [https://ehudreiter.com/2019/09/26/generated-texts-must-be-accurate/#comment-15983]

Data sets for NLG blog

2019-09-27T12:59:00Z

Ereiter: webnlg

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome.

We'd love to see more content here, please email Ehud Reiter (e.reiter@abdn.ac.uk) with contributions or other comments

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

=== Web NLG ===
Thiago Castro Ferreira and Diego Moussallem spent six months cleaning up WebNLG, so quality should be good

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== Wikibio ===
No manual verification or filtering [https://ehudreiter.com/2019/09/26/generated-texts-must-be-accurate/#comment-15983]

Data sets for NLG blog

2019-09-26T14:39:50Z

Ereiter: add wikibio note

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome.

We'd love to see more content here, please email Ehud Reiter (e.reiter@abdn.ac.uk) with contributions or other comments

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== Wikibio ===
No manual verification or filtering [https://ehudreiter.com/2019/09/26/generated-texts-must-be-accurate/#comment-15983]

Data sets for NLG

2019-09-26T14:36:44Z

Ereiter: put wikibio in alpha order

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Downloadable NLG systems

2019-09-18T12:14:03Z

Ereiter: add RosaeNLG

The natural language generation systems listed below are available for download over the web.
If you know of a system which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

== ASTROGEN ==
http://www.dsv.su.se/~hercules/ASTROGEN/ASTROGEN.html

Aggregated deep and Surface naTuRal language GENerator - Prolog based system.

== Chimera ==

https://github.com/AmitMY/chimera

Chimera is a component-based step-by-step pipeline for data-to-text generation based on https://arxiv.org/abs/1904.03396
It handles the necessary pre-processing for text-planning and surface realization which use neural networks, and does referring-expressions generation.
It can automatically evaluate datasets with a train-dev-test split, with both BLEU and data coverage.

== CRISP ==
http://code.google.com/p/crisp-nlg/

CRISP is Alexander Koller's NLG system that tries to cast both microplanning and sentence realisation as an AI planning problem. The code is a mixture of Java and Scala, a scripting language for the Java virtual machine. CRISP comes with its own implementation of GraphPlan, but it can also output plans in PDDL (“Planning Domain Definition Language”, a successor to STRIPS) for use with other AI planners. License: LGPL.

== CODA Tools software Release 1.1 ==
http://computing.open.ac.uk/coda/resources/tools_form.html

This release contains 1) software for converting text parsed with RST relations into dialogue and 2) an annotation tool for annotating dialogue and translating it into monologue (used for creating CODA corpus).

== Elvex ==
https://github.com/lionelclement/Elvex

Elvex is a NLG system based on a functional unification grammar close to LFG. It is implemented in C++, and is freely available under the GNU GPL License.

== FUF/SURGE ==
https://www.cs.bgu.ac.il/~elhadad/install-fuf.html

FUF/SURGE is a surface realisation system, based on functional unification grammar.

== GenI ==
http://kowey.github.io/GenI

GenI is a surface realiser for (Feature-Based Lexicalised) Tree Adjoining Grammar and a flat MRS-like semantics (sans top handle and underspecification). Toy example grammars provided for English and French. Largish core grammar for French is under development (contact us for details). GPL (commercial dual licensing available upon request). Known to work under Linux and Mac OS X (potential for making it work on Windows as well). Written in Haskell. Source code available via [http://hackage.haskell.org/package/GenI hackage], [https://github.com/kowey/GenI GitHub], or [http://hub.darcs.net/kowey/GenI hub.darcs.net].

== Grammar Explorer ==
http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/tutorials/Grexplorer/grexplorer.html

The Grammar Explorer provides a means of exploring large-scale systemic-functional grammars in order to see how they are
organized and what kinds of things they cover. It can be used to explore the KPML resources.
Downloadable standalone executables of the grammar explorer are available for Windows 95/98/NT.
These already include a version of the Nigel grammar of English and pre-installed examples.

== GoPhi : an AMR to ENGLISH VERBALIZER ==

https://github.com/rali-udem/gophi

GoPhi (Generation Of Parenthesized Human Input) is a system for generating a literal reading of Abstract Meaning Representation (AMR) structures. The system, written in SWI-Prolog, uses a symbolic approach to transform the original rooted graph into a tree of constituents that is transformed into an English sentence by jsRealB.

== jsRealB ==

http://rali.iro.umontreal.ca/rali/?q=en/jsrealb-bilingual-text-realiser

jsRealB is a text realizer designed specifically for the web, easy to learn and to use. This realizer allows its user to build a variety of French and English expressions and sentences, to add HTML tags to them and to easily integrate them into web pages. jsRealB can also be used in Javascript application by means of a node.js module.
Sources for the programs, linguistic resources and demonstrations are available on the RALI GitHub [https://github.com/rali-udem/jsRealB].

== KPML ==

http://www.purl.org/net/kpml

The KPML system offers a robust, mature platform for large-scale grammar engineering that is particularly oriented to multilingual grammar development and generation. It is particularly targetted at providing resources for realistic but broad-coverage generation applications, where both flexibility of expression and speed of generation are at issue—for example in online webpage generation or spoken dialogue. KPML is also used extensively in multilingual text generation research and for teaching. It is based on systemic functional linguistics.

A growing set of generation grammars are under development for a variety of languages, inlcluding English, Spanish, Dutch, Chinese, German, Czech, and more. See the
Generation Bank (http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/genbank/generation-bank.html )
for current examples. The development of further languages and of extensions to existing resources are very welcome!

== LKB ==
http://wiki.delph-in.net/moin/LkbTop

LKB (Linguistic Knowledge Builder) is a grammar engineering environment for unification-based formalisms, typically HPSG.
It includes a [http://wiki.delph-in.net/moin/LkbGeneration realiser] that takes as input Minimal Recursion Semantics (MRS). LKB is implemented in Common Lisp, and is freely available under an open source license. It includes also a KNOPPIX-based GNU/Linux live-CD, with all the system installed, ready to use.

== Multimodal Unification Grammar ==
http://www.david-reitter.com/compling/mug/

MUG Workbench is a development and debugging tool for Multimodal NLG. The grammar formalism supported is
Multimodal Functional Unification Grammar (MUG). The MUG system runs MUG grammars with fixed (test cases)
and arbitrary input specifications to produce output in a natural language, graphical user interface and
possibly in other modes. It is designed to do three things:
- Multimodal Fission (distributing output to interaction/communication modes)
- Some sentence planning (chosing information to include in the utterance)
- Natural Language and graphical user interface realization (producing some form of output)
The MUG system does these three jobs in parallel. MUG Workbench can serve to inspect the data-structures
used during generation. It should help you to learn more about the nature of unification grammars used
for parsing or natural language generation. Furthermore, the MUG Workbench is helpful in debugging your grammars.

== NaturalOWL ==
http://www.aueb.gr/users/ion/software/NaturalOWL1.1.tar.gz NaturalOWL (version 1.1)

Generates descriptions of entities and classes from OWL ontologies that have been annotated with linguistic and user modeling resources expressed in RDF. Currently supports English and Greek. Extensions for other languages welcome. NaturalOWL can also be used as a [http://protege.stanford.edu/ Protégé] plug-in. See [http://www.aueb.gr/users/ion/publications.html here] for publications describing NaturalOWL. (GPL)

== NLGen and NLGen2 ==
https://launchpad.net/nlgen

https://launchpad.net/nlgen2

The NLGen natural language generation system applies the [http://www.opencog.org/wiki/SegSim SegSim strategy] for generating English sentences. Probabilistic inference for sentence construction is based on a statistical analysis of [http://opencog.org/wiki/RelEx RelEx] output. Java, Apache license. See demo: [http://novamente.net/example/nlp.html Demo of AI Virtual Pet Answering Simple Questions].

NLGen2 uses [http://opencog.org/wiki/RelEx RelEx] dependency parses, together with [http://www.abisource.com/projects/link-grammar/ Link Grammar] linkage analysis to generate English-language output. Java, Apache license. Reference: Blake Lemoine, "[http://www.louisiana.edu/~bal2277/NLGen2.doc NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System]".

== OpenCCG ==
http://openccg.sourceforge.net/

OpenCCG is both a parser and a realizer for [[Combinatory Categorial Grammar]]. It has been used in several dialog systems. The realizer has been enhanced with n-gram models and a supertagging approach called hypertagging. OpenCCG is implemented in Java, and is freely available under the LGPL.

== rLDCP: Text Generation from Data ==
https://cran.r-project.org/web/packages/rLDCP/index.html

R package for text generation from data

== RNNLG ==
https://github.com/shawnwun/RNNLG

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

== RoseaNLG ==

https://rosaenlg.org

RosaeNLG is a Natural Language Generation library for node.js or client side (browser) execution, based on the Pug template engine. It was previously known as FreeNLG. It supports English, French, German and Italian, and is complete enough to write production grade real life NLG applications.

== SimpleNLG ==

https://github.com/simplenlg/simplenlg (English)

https://github.com/rali-udem/SimpleNLG-EnFr (English and French)

https://github.com/citiususc/SimpleNLG-GL (Galician)

https://github.com/citiususc/SimpleNLG-ES (Spanish)

SimpleNLG is a simple Java-based realiser. Its grammatical coverage and syntactic knowledge is small compared to KPML or FUF/SURGE. However, because it is so simple, its relatively
easy for people to learn how to use it. It has a Java API, and can be used from other languages via an XML interface. There are "unofficial" ports to other programming languages such as Python and Ruby. Versions for other human languages are being worked on, including [https://aclweb.org/anthology/W18-6508 Dutch], [https://github.com/alexmazzei/SimpleNLG-IT Italian], [https://aclweb.org/anthology/papers/W/W18/W18-6506/ Mandarin]

== SPUD ==
http://www.cs.rutgers.edu/~mdstone/nlg.html

SPUD (Sentence Planner Using Descriptions) is Matthew Purver's LTAG-based NLG system. There are two versions: SPUD version 0.01 was written in SML. Later versions, known as SPUD lite, are written in Prolog. The small codebase of SPUD lite makes it ideal for teaching, but it is also used in dialog system prototypes.

== STANDUP ==
https://www.abdn.ac.uk/ncs/departments/computing-science/standup-315.php

STANDUP (System To Augment Non-speakers' Dialogue Using Puns) is a collaborative project on generating simple jokes from a graphical user interface appropriate for non-speaking children. The project began in October 2003 and ran until March 2007. The software was written in Java and is available for Windows and Linux, including source code and database files.

== Suregen-2 ==
http://www.suregen.de/00023.html

Suregen is “a hybrid, multilingual (German, English) ontology based and NLG-oriented formalism for generating text for documents in clinical medicine.”
The system Suregen-2 is written in (Allegro) Common Lisp. A [http://www.suregen.de/ftp/standalone1.zip demo system] which runs under Windows is available for download. A [http://www.suregen.de/ftp/selfrunningdemo.zip screencast video] shows data being entered into computer forms using mouse and keyboard while a feedback text is continually updated and shown below. (Try playing the AVI file in [http://www.videolan.org/vlc/ VLC] if you run into problems.) Perhaps this system could be considered an instance of the [http://en.wikipedia.org/wiki/WYSIWYM_(Meant) WYSIWYM] approach.

== TGen ==
------
A statistical generator generating sentences from dialogue acts or similar representations, based on the sequence-to-sequence (seq2seq) neural network architecture. Beams generated using seq2seq are reranked based on whether they conform to the input meaning representation. The system is written in Python and uses Tensorflow.

Link: https://github.com/UFAL-DSG/tgen

Paper: https://aclweb.org/anthology/P16-2008

[[Category:Software]]
{{SIGGEN Wiki}}

Downloadable NLG systems

2019-09-05T09:42:01Z

Ereiter: add Elvex

The natural language generation systems listed below are available for download over the web.
If you know of a system which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

== ASTROGEN ==
http://www.dsv.su.se/~hercules/ASTROGEN/ASTROGEN.html

Aggregated deep and Surface naTuRal language GENerator - Prolog based system.

== Chimera ==

https://github.com/AmitMY/chimera

Chimera is a component-based step-by-step pipeline for data-to-text generation based on https://arxiv.org/abs/1904.03396
It handles the necessary pre-processing for text-planning and surface realization which use neural networks, and does referring-expressions generation.
It can automatically evaluate datasets with a train-dev-test split, with both BLEU and data coverage.

== CRISP ==
http://code.google.com/p/crisp-nlg/

CRISP is Alexander Koller's NLG system that tries to cast both microplanning and sentence realisation as an AI planning problem. The code is a mixture of Java and Scala, a scripting language for the Java virtual machine. CRISP comes with its own implementation of GraphPlan, but it can also output plans in PDDL (“Planning Domain Definition Language”, a successor to STRIPS) for use with other AI planners. License: LGPL.

== CODA Tools software Release 1.1 ==
http://computing.open.ac.uk/coda/resources/tools_form.html

This release contains 1) software for converting text parsed with RST relations into dialogue and 2) an annotation tool for annotating dialogue and translating it into monologue (used for creating CODA corpus).

== Elvex ==
https://github.com/lionelclement/Elvex

Elvex is a NLG system based on a functional unification grammar close to LFG. It is implemented in C++, and is freely available under the GNU GPL License.

== FUF/SURGE ==
https://www.cs.bgu.ac.il/~elhadad/install-fuf.html

FUF/SURGE is a surface realisation system, based on functional unification grammar.

== GenI ==
http://kowey.github.io/GenI

GenI is a surface realiser for (Feature-Based Lexicalised) Tree Adjoining Grammar and a flat MRS-like semantics (sans top handle and underspecification). Toy example grammars provided for English and French. Largish core grammar for French is under development (contact us for details). GPL (commercial dual licensing available upon request). Known to work under Linux and Mac OS X (potential for making it work on Windows as well). Written in Haskell. Source code available via [http://hackage.haskell.org/package/GenI hackage], [https://github.com/kowey/GenI GitHub], or [http://hub.darcs.net/kowey/GenI hub.darcs.net].

== Grammar Explorer ==
http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/tutorials/Grexplorer/grexplorer.html

The Grammar Explorer provides a means of exploring large-scale systemic-functional grammars in order to see how they are
organized and what kinds of things they cover. It can be used to explore the KPML resources.
Downloadable standalone executables of the grammar explorer are available for Windows 95/98/NT.
These already include a version of the Nigel grammar of English and pre-installed examples.

== GoPhi : an AMR to ENGLISH VERBALIZER ==

https://github.com/rali-udem/gophi

GoPhi (Generation Of Parenthesized Human Input) is a system for generating a literal reading of Abstract Meaning Representation (AMR) structures. The system, written in SWI-Prolog, uses a symbolic approach to transform the original rooted graph into a tree of constituents that is transformed into an English sentence by jsRealB.

== jsRealB ==

http://rali.iro.umontreal.ca/rali/?q=en/jsrealb-bilingual-text-realiser

jsRealB is a text realizer designed specifically for the web, easy to learn and to use. This realizer allows its user to build a variety of French and English expressions and sentences, to add HTML tags to them and to easily integrate them into web pages. jsRealB can also be used in Javascript application by means of a node.js module.
Sources for the programs, linguistic resources and demonstrations are available on the RALI GitHub [https://github.com/rali-udem/jsRealB].

== KPML ==

http://www.purl.org/net/kpml

The KPML system offers a robust, mature platform for large-scale grammar engineering that is particularly oriented to multilingual grammar development and generation. It is particularly targetted at providing resources for realistic but broad-coverage generation applications, where both flexibility of expression and speed of generation are at issue—for example in online webpage generation or spoken dialogue. KPML is also used extensively in multilingual text generation research and for teaching. It is based on systemic functional linguistics.

A growing set of generation grammars are under development for a variety of languages, inlcluding English, Spanish, Dutch, Chinese, German, Czech, and more. See the
Generation Bank (http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/genbank/generation-bank.html )
for current examples. The development of further languages and of extensions to existing resources are very welcome!

== LKB ==
http://wiki.delph-in.net/moin/LkbTop

LKB (Linguistic Knowledge Builder) is a grammar engineering environment for unification-based formalisms, typically HPSG.
It includes a [http://wiki.delph-in.net/moin/LkbGeneration realiser] that takes as input Minimal Recursion Semantics (MRS). LKB is implemented in Common Lisp, and is freely available under an open source license. It includes also a KNOPPIX-based GNU/Linux live-CD, with all the system installed, ready to use.

== Multimodal Unification Grammar ==
http://www.david-reitter.com/compling/mug/

MUG Workbench is a development and debugging tool for Multimodal NLG. The grammar formalism supported is
Multimodal Functional Unification Grammar (MUG). The MUG system runs MUG grammars with fixed (test cases)
and arbitrary input specifications to produce output in a natural language, graphical user interface and
possibly in other modes. It is designed to do three things:
- Multimodal Fission (distributing output to interaction/communication modes)
- Some sentence planning (chosing information to include in the utterance)
- Natural Language and graphical user interface realization (producing some form of output)
The MUG system does these three jobs in parallel. MUG Workbench can serve to inspect the data-structures
used during generation. It should help you to learn more about the nature of unification grammars used
for parsing or natural language generation. Furthermore, the MUG Workbench is helpful in debugging your grammars.

== NaturalOWL ==
http://www.aueb.gr/users/ion/software/NaturalOWL1.1.tar.gz NaturalOWL (version 1.1)

Generates descriptions of entities and classes from OWL ontologies that have been annotated with linguistic and user modeling resources expressed in RDF. Currently supports English and Greek. Extensions for other languages welcome. NaturalOWL can also be used as a [http://protege.stanford.edu/ Protégé] plug-in. See [http://www.aueb.gr/users/ion/publications.html here] for publications describing NaturalOWL. (GPL)

== NLGen and NLGen2 ==
https://launchpad.net/nlgen

https://launchpad.net/nlgen2

The NLGen natural language generation system applies the [http://www.opencog.org/wiki/SegSim SegSim strategy] for generating English sentences. Probabilistic inference for sentence construction is based on a statistical analysis of [http://opencog.org/wiki/RelEx RelEx] output. Java, Apache license. See demo: [http://novamente.net/example/nlp.html Demo of AI Virtual Pet Answering Simple Questions].

NLGen2 uses [http://opencog.org/wiki/RelEx RelEx] dependency parses, together with [http://www.abisource.com/projects/link-grammar/ Link Grammar] linkage analysis to generate English-language output. Java, Apache license. Reference: Blake Lemoine, "[http://www.louisiana.edu/~bal2277/NLGen2.doc NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System]".

== OpenCCG ==
http://openccg.sourceforge.net/

OpenCCG is both a parser and a realizer for [[Combinatory Categorial Grammar]]. It has been used in several dialog systems. The realizer has been enhanced with n-gram models and a supertagging approach called hypertagging. OpenCCG is implemented in Java, and is freely available under the LGPL.

== rLDCP: Text Generation from Data ==
https://cran.r-project.org/web/packages/rLDCP/index.html

R package for text generation from data

== RNNLG ==
https://github.com/shawnwun/RNNLG

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

== SimpleNLG ==

https://github.com/simplenlg/simplenlg (English)

https://github.com/rali-udem/SimpleNLG-EnFr (English and French)

https://github.com/citiususc/SimpleNLG-GL (Galician)

https://github.com/citiususc/SimpleNLG-ES (Spanish)

SimpleNLG is a simple Java-based realiser. Its grammatical coverage and syntactic knowledge is small compared to KPML or FUF/SURGE. However, because it is so simple, its relatively
easy for people to learn how to use it. It has a Java API, and can be used from other languages via an XML interface. There are "unofficial" ports to other programming languages such as Python and Ruby. Versions for other human languages are being worked on, including [https://aclweb.org/anthology/W18-6508 Dutch], [https://github.com/alexmazzei/SimpleNLG-IT Italian], [https://aclweb.org/anthology/papers/W/W18/W18-6506/ Mandarin]

== SPUD ==
http://www.cs.rutgers.edu/~mdstone/nlg.html

SPUD (Sentence Planner Using Descriptions) is Matthew Purver's LTAG-based NLG system. There are two versions: SPUD version 0.01 was written in SML. Later versions, known as SPUD lite, are written in Prolog. The small codebase of SPUD lite makes it ideal for teaching, but it is also used in dialog system prototypes.

== STANDUP ==
https://www.abdn.ac.uk/ncs/departments/computing-science/standup-315.php

STANDUP (System To Augment Non-speakers' Dialogue Using Puns) is a collaborative project on generating simple jokes from a graphical user interface appropriate for non-speaking children. The project began in October 2003 and ran until March 2007. The software was written in Java and is available for Windows and Linux, including source code and database files.

== Suregen-2 ==
http://www.suregen.de/00023.html

Suregen is “a hybrid, multilingual (German, English) ontology based and NLG-oriented formalism for generating text for documents in clinical medicine.”
The system Suregen-2 is written in (Allegro) Common Lisp. A [http://www.suregen.de/ftp/standalone1.zip demo system] which runs under Windows is available for download. A [http://www.suregen.de/ftp/selfrunningdemo.zip screencast video] shows data being entered into computer forms using mouse and keyboard while a feedback text is continually updated and shown below. (Try playing the AVI file in [http://www.videolan.org/vlc/ VLC] if you run into problems.) Perhaps this system could be considered an instance of the [http://en.wikipedia.org/wiki/WYSIWYM_(Meant) WYSIWYM] approach.

== TGen ==
------
A statistical generator generating sentences from dialogue acts or similar representations, based on the sequence-to-sequence (seq2seq) neural network architecture. Beams generated using seq2seq are reranked based on whether they conform to the input meaning representation. The system is written in Python and uses Tensorflow.

Link: https://github.com/UFAL-DSG/tgen

Paper: https://aclweb.org/anthology/P16-2008

[[Category:Software]]
{{SIGGEN Wiki}}

Data sets for NLG blog

2019-08-29T12:12:24Z

Ereiter:

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome.

We'd love to see more content here, please email Ehud Reiter (e.reiter@abdn.ac.uk) with contributions or other comments

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

Data sets for NLG

2019-08-29T10:35:27Z

Ereiter: add blog description

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

We also have a [[Data sets for NLG blog|blog page]] about data sets, which includes comments about appropriate and inappropriate usage, additional information about data sets, and pointers to related resources.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG blog

2019-08-21T13:56:21Z

Ereiter: another SumTime comment

This blog is a supplement to [[Data sets for NLG]], which lists comments about these data sets from users, authors and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, useful additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome

=== E2E ===
The E2E dataset was used in the [http://www.macs.hw.ac.uk/InteractionLab/E2E/ E2E challenge].

=== SumTime ===
The SumTime corpus is structured as a database, and presented in text (CSV) and MDB (Microsoft Access) formats.

A good example of the use of Sumtime is [https://doi.org/10.1017/S1351324907004664 Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models].

=== Weathergov ===
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

=== Tuna ===
[http://www.lrec-conf.org/proceedings/lrec2010/pdf/251_Paper.pdf Dutch] and [https://www.aclweb.org/anthology/W17-3532 Mandarin] versions of Tuna have been developed.

Data sets for NLG

2019-08-21T13:48:46Z

Ereiter: link to Tuna blog

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php ([[Data sets for NLG blog#Tuna|blog comments]])

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG blog

2019-08-21T13:46:45Z

Ereiter: add Tuna blog

Data sets for NLG

2019-08-21T13:36:38Z

Ereiter: add blog links for SumTime and E2E

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data ([[Data sets for NLG blog#E2E|blog comments]])

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip ([[Data sets for NLG blog#SumTime|blog comments]])

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG blog

2019-08-21T13:34:10Z

Ereiter: add SumTime blog

Data sets for NLG blog

2019-08-21T13:29:36Z

Ereiter: add E2E blog

Data sets for NLG

2019-08-21T13:26:51Z

Ereiter:

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([[Data sets for NLG blog#Weathergov|blog comments]])

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG

2019-08-21T13:24:22Z

Ereiter: add blog link for Weathergov

This page lists data sets and corpora used for research in natural language generation. They are available for download over the web. If you know of a dataset which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== WikiBio (Wikipedia biography dataset) ===
https://github.com/DavidGrangier/wikipedia-biography-dataset

This dataset gathers 728,321 biographies from Wikipedia. It consists of the first paragraph and the infobox (both tokenized).

=== WikiBio German and French(Wikipedia biography dataset) ===
https://github.com/PrekshaNema25/StructuredData_To_Descriptions

This dataset consists of the first paragraph and the infobox from German and French Wikipedia biography pages.

=== boxscore-data ===
https://github.com/harvardnlp/boxscore-data/

This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== Personage Stylistic Variation for NLG ===
https://nlds.soe.ucsc.edu/stylistic-variation-nlg

This dataset provides training data for natural language generation of restaurant descriptions in different Big-Five personality styles.

=== Personage Sentence Planning for NLG ===
https://nlds.soe.ucsc.edu/sentence-planning-NLG

This dataset provides training data for natural language generation of restaurant descriptions using sentence planning operations of various kinds.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip ([Data sets for NLG blog#Weathergov|blog comments)]

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

=== The Wikipedia company corpus ===
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus

Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for real-wrold objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

=== Stars2 corpus of referring expressions ===
A collection of 884 annotated definite descriptions produced by 56 subjects in collaborative communication involving speaker-hearer pairs in situations designed so as to challenge existing REG algorithms, with a particular focus on the issue of attribute choice in referential overspeci�fication.
Link: https://drive.google.com/file/d/0B-KyU7T8S8bLZ1lEQmJRdUc1V28/view?usp=sharing
Cite: https://link.springer.com/article/10.1007/s10579-016-9350-y

=== b5 corpus of text and referring expressions labelled with personality information ===
A collection of crowd sourced scene descriptions and an annotated REG corpus, both of which labelled with Big Five personality scores of their authors. Suitable for studies in personality-dependent text generation and referring expression generation.
Link: https://drive.google.com/open?id=0B-KyU7T8S8bLTHpaMnh2U2NWZzQ
Cite: https://www.aclweb.org/anthology/L18-1183

==Surface Realisation ==

=== Surface Realization Shared Task 2018 (SR'18) dataset ===
http://taln.upf.edu/pages/msr2018-ws/SRST.html#data

Description: A multilingual dataset automatically converted from the Universal Dependencies v2.0, comprising unordered syntactic structures (10 languages) and predicate-argument structures (3 languages).

== Dialogue ==

=== Alex Context NLG Dataset===
https://github.com/UFAL-DSG/alex_context_nlg_dataset

A dataset for NLG in dialogue systems in the public transport information domain. It includes preceding context along with each data instance, which should allow NLG systems trained on this data to adapt to user's way of speaking, which should improve perceived naturalness. Papers: http://workshop.colips.org/re-wochat/documents/02_Paper_6.pdf, https://www.aclweb.org/anthology/W16-3622

=== Cam4NLG ===
https://github.com/shawnwun/RNNLG/tree/master/data

Cam4NLG: Cam4NLG contains 4 NLG datasets for dialogue system development, each of them is in a unique domain. Each data point contains a (dialogue act, ground truth, handcrafted baseline) tuple.

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

=== CODA corpus Release 1.0 ===
http://computing.open.ac.uk/coda/resources/code_form.html

This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags.

=== Hotel Dialogs for NLG ===
https://nlds.soe.ucsc.edu/hotels

This set of hotel corpora includes a set of paraphrases, room and property descriptions, and full hotel dialogues aimed at exploring different ways of eliciting dialogic, conversational descriptions about hotels.

== Summarisation ==

=== TL;DR ===
https://toolbox.google.com/datasetsearch/search?query=Webis-TLDR-17%20Corpus&docid=kzcwbWD9z3B4Ah3wAAAAAA%3D%3D

Dataset for abstractive summarization constructed using Reddit posts. It is the largest corpus (approximately 3 Million posts) for informal text such as Social Media text, which can be used to train neural networks for summarization technology.

== Image description ==

===Chinese===
* Flickr8k-CN: http://lixirong.net/datasets/flickr8kcn

===Dutch===

* DIDEC: http://didec.uvt.nl
* Flickr30K https://github.com/cltl/DutchDescriptions

===German===
* Multi30K: http://www.statmt.org/wmt16/multimodal-task.html

== Question Generation ==

=== QGSTEC 2010 Generating Questions from Sentences Corpus ===
http://computing.open.ac.uk/coda/resources/qg_form.html

A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety).

=== QGSTEC+ ===
https://github.com/Keith-Godwin/QG-STEC-plus

Improved annotations for the QGSTEC corpus (with higher inter-rater reliability) as described in [http://oro.open.ac.uk/47284/ Godwin and Piwek (2016)].

==Challenge Data Repository ==

https://sites.google.com/site/genchalrepository/

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}

Data sets for NLG blog

2019-08-21T13:21:44Z

Ereiter:

Data sets for NLG blog

2019-08-21T13:18:13Z

Ereiter: Create blog to supplement NLG data sets

This page is a blog supplement to [[Data sets for NLG]], which lists comments about the data sets from users and other interested parties. We are especially interested in comments about appropriate and inappropriate usage of a data set, "best practice" use of a data set, additional information about a data set (eg, scope, how it was constructed), and pointers to related data sets which may be more appropriate for some users. Links to relevant papers and other resources are welcome

== Weathergov ==
The Weathergov corpus contains the output of a template-based weather forecast generator, not human-written forecasts ([https://ehudreiter.com/2017/05/09/weathergov/ blog post]). Hence ML on Weathergov is an exercise in reverse engineering a template-based NLG system, not in training an NLG system from human data. If you want to train on human-written weather forecasts, consider using the [https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip SumTime corpus] instead.

SIGGEN

2019-07-12T12:47:53Z

Ereiter: /* Workshop Proceedings */ fix URL

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

We are glad to announce that '''INLG 2020 will be held in Dublin (Ireland).'''

More details about INLG2020 are coming soon, but now it is time to think about submissions for
[https://www.inlg2019.com/ INLG 2019] (submission deadline: '''1 July 2019''')

Tokyo, Japan, 29 October - 1 November 2019. Follow [https://twitter.com/INLG2019 @INLG2019] on Twitter for up-to-date information and announcements.

Other related events are:

* [https://naacl2019.org/ NAACL-HLT 2019] to be held in Minneapolis, USA, June 2 - 7. With [https://neuralgen.io/ NeuralGen 2019] as co-located event.

* [http://www.acl2019.org/EN/index.xhtml ACL 2019] to be held in Florence, Italy, July 28 - August 2

* [https://www.emnlp-ijcnlp2019.org/ EMNLP-IJCNLP 2019] to be held in Hong Kong, China, November 3 - 7. It will include [http://taln.upf.edu/pages/msr2019-ws/ MSR19] among other events.

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 Novemeber 2018. Conference proceedings are published at [https://aclanthology.info/events/inlg-2018 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published at [https://aclanthology.info/events/inlg-2017 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published at [https://aclanthology.info/events/inlg-2016 ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [https://www.aclweb.org/anthology/sigs/siggen/ Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

Natural Language Generation Portal

2019-06-17T08:07:20Z

Ereiter:

[[File:Siggen_logo_small.JPG|right]]
''This portal is based on and replaces the [http://www.siggen.org/resources/moin.html NLG Resources Wiki] run by [http://www.siggen.org/ ACL SIGGEN] from November 2005 to February 2009.'' Updated in 2019 to remove dead links/resources and add new ones.

== Resources for Natural Language Generation ==

* [[Downloadable NLG systems]]
* [[Data sets for NLG]]
* [[NLG research groups]]



[[Category:Natural Language Generation|*]]
[[Category:Imported from the SIGGEN Resources Wiki]]
[[Category:Resources]]
[[Category:Natural Language Generation Portal]]

NLG research groups

2019-04-18T15:49:10Z

Ereiter: Updated Aberdeen link

* [http://mcs.open.ac.uk/nlg/ The Open University Natural Language Generation Group]
* [http://www.siggen.org/ ACL Special Interest Group on Natural Language Generation (SIGGEN)]
* [https://www.abdn.ac.uk/ncs/departments/computing-science/natural-language-generation-187.php Natural Language Generation, School of Natural and Computing Sciences, The University of Aberdeen]
* [https://www.isi.edu/research_groups/nlg/home Information Sciences Institute, University of Southern California]
* [http://web.inf.ed.ac.uk/ilcc Institute for Language, Cognition and Computation, School of Informatics, The University of Edinburgh]
* [http://nlp.seas.harvard.edu/ Harvard NLP, Harvard University]
* [https://sites.google.com/site/hwinteractionlab/ Interaction Lab, School of Mathematical and Computer Sciences, Heriot-Watt University]
* [https://synalp.loria.fr/ SyNaLP, LORIA]
* [https://www.sheffield.ac.uk/dcs/research/groups/nlp Natural Language Processing Group, The University of Sheffield]
* [https://www.cs.washington.edu/ Paul G. Allen School of Computer Science and Engineering, University of Washington]

Natural Language Generation Portal

2019-04-18T15:47:11Z

Ereiter: deleted pubs; moved grammara to archive; commented out Bateman/Zock list

[[File:Siggen_logo_small.JPG|right]]
''This portal is based on and replaces the [http://www.siggen.org/resources/moin.html NLG Resources Wiki] run by [http://www.siggen.org/ ACL SIGGEN] from November 2005 to February 2009.''

== Resources for Natural Language Generation ==

* [[Downloadable NLG systems]]
* [[Data sets for NLG]]
* [[NLG research groups]]



[[Category:Natural Language Generation|*]]
[[Category:Imported from the SIGGEN Resources Wiki]]
[[Category:Resources]]
[[Category:Natural Language Generation Portal]]

SIGGEN: Archive

2019-04-18T15:43:18Z

Ereiter: moved generation grammara here

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="margin:0;font-size:125%;padding:0.2em 0.4em;">This archive collects material on the history of ACL SIGGEN.</h4>
|}
__TOC__
==[[SIGGEN:_ACL_Reports|annual reports to the ACL]]==
==SIGGEN newsletters (1993–2005)==
==[[SIGGEN: Constitution| ‎SIGGEN constitution]] (including older versions)==
==[http://www.ics.mq.edu.au/mailman/private/siggen/ mailing list archive](list members only)==
== [[SIGGEN: Who's Who in NLG|Who's Who in NLG]] ==
== [[SIGGEN: What's Where in NLG|What's Where in NLG]] ==
== [[Generation grammars]] ==

SIGGEN

2019-04-18T15:39:37Z

Ereiter: Moved Whos Who and Whats Where to archive; reordered sections

__NOTOC__

<h1>ACL Special Interest Group on Natural Language Generation </h1>

{|
|-
|[[File:Siggen_logo_small.JPG|left]]||<h4 style="width:95%;margin:0;background-color:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:justify;color:#000;padding:0.2em 0.4em;">Welcome to the home page of the Association for Computational Linguistics Special Interest Group on Natural Language Generation. SIGGEN [ˈsɪɡ.ʤɛn] is a special interest group of the Association for Computational Linguistics (ACL). It provides a forum for the discussion, dissemination and archiving of research topics and results in the field of text generation. </h4>

|}

== What is Natural Language Generation? ==
Natural language ''generation'' (NLG) focuses on algorithms and models for producing texts in English or other natural languages. NLG systems generally produce summaries, explanations, descriptions, etc of non-linguistic data from databases, knowledge bases, sensors, and so forth. Good sources to learn about NLG include
* [https://ehudreiter.com/2018/01/16/learn-about-nlg/ How do I Learn about NLG?]
* [https://www.jair.org/index.php/jair/article/view/11173 Survey of the State of the Art in Natural Language Generation]
* [https://en.wikipedia.org/wiki/Natural-language_generation Wikipedia article on NLG]

== Resources ==
[[Natural_Language_Generation_Portal|Natural Language Generation Portal]]

== Upcoming Events ==

[https://www.inlg2019.com/ INLG 2019]

Tokyo, Japan, 29 October - 1 November 2019. Follow [https://twitter.com/INLG2019 @INLG2019] on Twitter for up-to-date information and announcements.

Other related events are:

* [https://naacl2019.org/ NAACL-HLT 2019] to be held in Minneapolis, USA, June 2 - 7. With [https://neuralgen.io/ NeuralGen 2019] as co-located event.

* [http://www.acl2019.org/EN/index.xhtml ACL 2019] to be held in Florence, Italy, July 28 - August 2

* [https://www.emnlp-ijcnlp2019.org/ EMNLP-IJCNLP 2019] to be held in Hong Kong, China, November 3 - 7

== Mailing List ==
=== Joining the mailing list: ===

:The SIGGEN mailing list is currently going through a transition.
:To sign up, view preferences, change preferences, or unsubscribe, go to:

::'''[http://www.jiscmail.ac.uk/SIGGEN http://www.jiscmail.ac.uk/SIGGEN]'''

:If there are any issues, e-mail: <u>'''siggen-webmaster (ta) aclweb (dot) org'''</u>.

=== Posting messages to the mailing list ===

:Please join the mailing list first (see above). Then you may use the email alias <u>'''siggen-list (ta) aclweb (dot) org'''</u> to post e-mails to the list.

== Recent Events ==

[https://inlg2018.uvt.nl/ INLG 2018] was held at Tilburg, Netherlands, 5-8 Novemeber 2018. Conference proceedings are published at [https://aclanthology.info/events/inlg-2018 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/ CC-NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-3rd-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2018 Workshop Proceedings])

* [https://sites.google.com/view/2is-nlg2018 2IS&NLG 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-intelligent-interactive-systems-and-language-generation-2is-nlg Workshop Proceedings])

* [https://hbuschme.github.io/nlg-hri-workshop-2018/organisation/ NLG4HRI 2018] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-nlg-for-human-robot-interaction Workshop Proceedings])

* [https://www.ida.liu.se/~evere22/ATA-18/ ATA 2018] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-automatic-text-adaptation-ata Workshop Proceedings])

[https://eventos.citius.usc.es/inlg2017/ INLG 2017] was held at Santiago de Compostela, Spain, 4-7 September 2017. Conference proceedings are published at [https://aclanthology.info/events/inlg-2017 ACL Anthology]

Endorsed events:

* [http://www.ccnlg.org/index.php/cc-nlg-2017/ CC-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-workshop-on-computational-creativity-in-natural-language-generation-cc-nlg-2017 Workshop Proceedings])

* [http://www.nooj-association.org/media/k2/attachments/events/LiRANLG.htm#programme LiRA-NLG 2017] ([https://aclanthology.info/volumes/proceedings-of-the-linguistic-resources-for-automatic-natural-language-generation-lira-nlg Workshop Proceedings])

* [https://sites.google.com/site/workshoprst2017/schedule RST 2017] ([https://aclanthology.info/volumes/proceedings-of-the-6th-workshop-on-recent-advances-in-rst-and-related-formalisms Workshop Proceedings])

* [http://xci2017.arg.tech/index.php/schedule/ XCI 2017] ([https://aclanthology.info/volumes/proceedings-of-the-1st-workshop-on-explainable-computational-intelligence-xci-2017 Workshop Proceedings])

[http://www.macs.hw.ac.uk/InteractionLab/INLG2016/ INLG 2016] was held at Edinburgh, UK, 5-8 September 2016. Conference proceedings are published at [https://aclanthology.info/events/inlg-2016 ACL Anthology]

Endorsed events:

* [http://webprojects.eecs.qmul.ac.uk/mpurver/ccnlg/ CC-NLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-inlg-2016-workshop-on-computational-creativity-in-natural-language-generation Workshop Proceedings])

* [https://webnlg2016.sciencesconf.org/ WebNLG 2016] ([https://aclanthology.info/volumes/proceedings-of-the-2nd-international-workshop-on-natural-language-generation-and-the-semantic-web-webnlg-2016 Workshop Proceedings])

== Board ==
The SIGGEN board is made up of the following people:

*[https://ehudreiter.com/ Ehud Reiter] ([mailto:e.reiter@abdn.ac.uk mail]) [https://www.abdn.ac.uk/ncs/profiles/e.reiter/ Professor/Chair in Computer Science at University of Aberdeen] and also Chief Scientist of [https://www.arria.com Arria NLG]. ([mailto:siggen-chair(ta)aclweb(dot)org chair])
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://dimitragkatzia.wordpress.com Dimitra Gkatzia] ([mailto:d.gkatzia@napier.ac.uk mail]) [http://www.napier.ac.uk/about-us/our-schools/school-of-computing/staff School of Computing, Edinburgh Napier University], Edinburgh.
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[http://amandastent.com// Amanda Stent] ([mailto:amanda.stent@gmail.com mail]), Bloomberg LP ([mailto:siggen-treasurer(ta)aclweb(dot)org treasurer])
:elected in December 2016 for the period from 1st January 2017 to 31st December 2020
*[https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral Jose M. Alonso] ([mailto:josemaria.alonso.moral@usc.es]) [https://citius.usc.es/equipo/investigadores-postdoutorais/jose-maria-alonso-moral University of Santiago de Compostela], Spain (secretary)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2022
*[https://www.edinburgh-robotics.org/students/amanda-cercas-curry Amanda Curry] ([mailto:ac293@hw.ac.uk mail]) [https://www.hw.ac.uk/ School of Mathematical and Computer Sciences, Heriot-Watt University] (student member)
:elected in December 2018 for the period from 1st January 2019 to 31st December 2020

To contact the entire board, please use the email alias: <u>'''siggen-board (ta) aclweb (dot) org'''</u>.

== [http://www.aclweb.org/anthology/siggen.html Workshop Proceedings ] ==

== [[SIGGEN: Archive|Archive]] ==
== [[SIGGEN: Constitution|Constitution]] ==

SIGGEN: Archive

2019-04-18T15:37:01Z

Ereiter: Moved Who's Who and What's Where to archive

Data sets for NLG

2019-04-11T15:16:05Z

Ereiter: clean up

Downloadable NLG systems

2019-04-11T15:13:32Z

Ereiter: more cleanup

The natural language generation systems listed below are available for download over the web.
If you know of a system which is not listed here, you can email siggen-board@aclweb.org, or just click on Edit in the upper left corner of this page and add the system yourself.

== ASTROGEN ==
http://www.dsv.su.se/~hercules/ASTROGEN/ASTROGEN.html

Aggregated deep and Surface naTuRal language GENerator - Prolog based system.

== CRISP ==
http://code.google.com/p/crisp-nlg/

CRISP is Alexander Koller's NLG system that tries to cast both microplanning and sentence realisation as an AI planning problem. The code is a mixture of Java and Scala, a scripting language for the Java virtual machine. CRISP comes with its own implementation of GraphPlan, but it can also output plans in PDDL (“Planning Domain Definition Language”, a successor to STRIPS) for use with other AI planners. License: LGPL.

== FUF/SURGE ==
https://www.cs.bgu.ac.il/~elhadad/install-fuf.html

FUF/SURGE is a surface realisation system, based on functional unification grammar.

== GenI ==
http://kowey.github.io/GenI

GenI is a surface realiser for (Feature-Based Lexicalised) Tree Adjoining Grammar and a flat MRS-like semantics (sans top handle and underspecification). Toy example grammars provided for English and French. Largish core grammar for French is under development (contact us for details). GPL (commercial dual licensing available upon request). Known to work under Linux and Mac OS X (potential for making it work on Windows as well). Written in Haskell. Source code available via [http://hackage.haskell.org/package/GenI hackage], [https://github.com/kowey/GenI GitHub], or [http://hub.darcs.net/kowey/GenI hub.darcs.net].

== Grammar Explorer ==
http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/tutorials/Grexplorer/grexplorer.html

The Grammar Explorer provides a means of exploring large-scale systemic-functional grammars in order to see how they are
organized and what kinds of things they cover. It can be used to explore the KPML resources.
Downloadable standalone executables of the grammar explorer are available for Windows 95/98/NT.
These already include a version of the Nigel grammar of English and pre-installed examples.

== jsRealB ==

http://rali.iro.umontreal.ca/rali/?q=en/jsrealb-bilingual-text-realiser

jsReakB is a bilingual (French and English) text realiser for web programming

== KPML ==

http://www.purl.org/net/kpml

The KPML system offers a robust, mature platform for large-scale grammar engineering that is particularly oriented to multilingual grammar development and generation. It is particularly targetted at providing resources for realistic but broad-coverage generation applications, where both flexibility of expression and speed of generation are at issue—for example in online webpage generation or spoken dialogue. KPML is also used extensively in multilingual text generation research and for teaching. It is based on systemic functional linguistics.

A growing set of generation grammars are under development for a variety of languages, inlcluding English, Spanish, Dutch, Chinese, German, Czech, and more. See the
Generation Bank (http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/genbank/generation-bank.html )
for current examples. The development of further languages and of extensions to existing resources are very welcome!

== LKB ==
http://wiki.delph-in.net/moin/LkbTop

LKB (Linguistic Knowledge Builder) is a grammar engineering environment for unification-based formalisms, typically HPSG.
It includes a [http://wiki.delph-in.net/moin/LkbGeneration realiser] that takes as input Minimal Recursion Semantics (MRS). LKB is implemented in Common Lisp, and is freely available under an open source license. It includes also a KNOPPIX-based GNU/Linux live-CD, with all the system installed, ready to use.

== Multimodal Unification Grammar ==
http://www.david-reitter.com/compling/mug/

MUG Workbench is a development and debugging tool for Multimodal NLG. The grammar formalism supported is
Multimodal Functional Unification Grammar (MUG). The MUG system runs MUG grammars with fixed (test cases)
and arbitrary input specifications to produce output in a natural language, graphical user interface and
possibly in other modes. It is designed to do three things:
- Multimodal Fission (distributing output to interaction/communication modes)
- Some sentence planning (chosing information to include in the utterance)
- Natural Language and graphical user interface realization (producing some form of output)
The MUG system does these three jobs in parallel. MUG Workbench can serve to inspect the data-structures
used during generation. It should help you to learn more about the nature of unification grammars used
for parsing or natural language generation. Furthermore, the MUG Workbench is helpful in debugging your grammars.

== NaturalOWL ==
http://www.aueb.gr/users/ion/software/NaturalOWL1.1.tar.gz NaturalOWL (version 1.1)

Generates descriptions of entities and classes from OWL ontologies that have been annotated with linguistic and user modeling resources expressed in RDF. Currently supports English and Greek. Extensions for other languages welcome. NaturalOWL can also be used as a [http://protege.stanford.edu/ Protégé] plug-in. See [http://www.aueb.gr/users/ion/publications.html here] for publications describing NaturalOWL. (GPL)

== NLGen and NLGen2 ==
https://launchpad.net/nlgen

https://launchpad.net/nlgen2

The NLGen natural language generation system applies the [http://www.opencog.org/wiki/SegSim SegSim strategy] for generating English sentences. Probabilistic inference for sentence construction is based on a statistical analysis of [http://opencog.org/wiki/RelEx RelEx] output. Java, Apache license. See demo: [http://novamente.net/example/nlp.html Demo of AI Virtual Pet Answering Simple Questions].

NLGen2 uses [http://opencog.org/wiki/RelEx RelEx] dependency parses, together with [http://www.abisource.com/projects/link-grammar/ Link Grammar] linkage analysis to generate English-language output. Java, Apache license. Reference: Blake Lemoine, "[http://www.louisiana.edu/~bal2277/NLGen2.doc NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System]".

== OpenCCG ==
http://openccg.sourceforge.net/

OpenCCG is both a parser and a realizer for [[Combinatory Categorial Grammar]]. It has been used in several dialog systems. The realizer has been enhanced with n-gram models and a supertagging approach called hypertagging. OpenCCG is implemented in Java, and is freely available under the LGPL.

== SimpleNLG ==

https://github.com/simplenlg/simplenlg (English)

http://www-etud.iro.umontreal.ca/~vaudrypl/snlgbil/snlgEnFr_english.html (French)

https://github.com/citiususc/SimpleNLG-GL (Galician)

https://github.com/citiususc/SimpleNLG-ES (Spanish)

SipleNLG is a simple Java-based realiser. Its grammatical coverage and syntactic knowledge is small compared to KPML or FUF/SURGE. However, because it is so simple, its relatively
easy for people to learn how to use it. It has a Java API, and can be used from other languages via an XML interface. There are "unofficial" ports to other programming languages such as Python and Ruby. Versions for other human languages are being worked on, including [https://aclweb.org/anthology/W18-6508 Dutch], [https://github.com/alexmazzei/SimpleNLG-IT Italian], [https://aclweb.org/anthology/papers/W/W18/W18-6506/ Mandarin]

== SPUD ==
http://www.cs.rutgers.edu/~mdstone/nlg.html

SPUD (Sentence Planner Using Descriptions) is Matthew Purver's LTAG-based NLG system. There are two versions: SPUD version 0.01 was written in SML. Later versions, known as SPUD lite, are written in Prolog. The small codebase of SPUD lite makes it ideal for teaching, but it is also used in dialog system prototypes.

== STANDUP ==
https://www.abdn.ac.uk/ncs/departments/computing-science/standup-315.php

STANDUP (System To Augment Non-speakers' Dialogue Using Puns) is a collaborative project on generating simple jokes from a graphical user interface appropriate for non-speaking children. The project began in October 2003 and ran until March 2007. The software was written in Java and is available for Windows and Linux, including source code and database files.

== Suregen-2 ==
http://www.suregen.de/00023.html

Suregen is “a hybrid, multilingual (German, English) ontology based and NLG-oriented formalism for generating text for documents in clinical medicine.”
The system Suregen-2 is written in (Allegro) Common Lisp. A [http://www.suregen.de/ftp/standalone1.zip demo system] which runs under Windows is available for download. A [http://www.suregen.de/ftp/selfrunningdemo.zip screencast video] shows data being entered into computer forms using mouse and keyboard while a feedback text is continually updated and shown below. (Try playing the AVI file in [http://www.videolan.org/vlc/ VLC] if you run into problems.) Perhaps this system could be considered an instance of the [http://en.wikipedia.org/wiki/WYSIWYM_(Meant) WYSIWYM] approach.

[[Category:Software]]
{{SIGGEN Wiki}}

Downloadable NLG systems

2019-04-11T15:09:41Z

Ereiter: Removed CLINT, RAGS, SURGE; cleaned up rest

'''[[Tools and Software for English]] - Downloadable NLG systems'''

For languages other than English, see [[List of resources by language]].






The natural language generation systems listed below are available for download over the web.
If you know of a system which is not listed here, please click on Edit in the upper left corner of this page and add the system yourself.

== ASTROGEN ==
http://www.dsv.su.se/~hercules/ASTROGEN/ASTROGEN.html

Aggregated deep and Surface naTuRal language GENerator - Prolog based system.

== CRISP ==
http://code.google.com/p/crisp-nlg/

CRISP is Alexander Koller's NLG system that tries to cast both microplanning and sentence realisation as an AI planning problem. The code is a mixture of Java and Scala, a scripting language for the Java virtual machine. CRISP comes with its own implementation of GraphPlan, but it can also output plans in PDDL (“Planning Domain Definition Language”, a successor to STRIPS) for use with other AI planners. License: LGPL.

== FUF/SURGE ==
https://www.cs.bgu.ac.il/~elhadad/install-fuf.html

FUF/SURGE is a surface realisation system, based on functional unification grammar.

== GenI ==
http://kowey.github.io/GenI

GenI is a surface realiser for (Feature-Based Lexicalised) Tree Adjoining Grammar and a flat MRS-like semantics (sans top handle and underspecification). Toy example grammars provided for English and French. Largish core grammar for French is under development (contact us for details). GPL (commercial dual licensing available upon request). Known to work under Linux and Mac OS X (potential for making it work on Windows as well). Written in Haskell. Source code available via [http://hackage.haskell.org/package/GenI hackage], [https://github.com/kowey/GenI GitHub], or [http://hub.darcs.net/kowey/GenI hub.darcs.net].

== Grammar Explorer ==
http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/tutorials/Grexplorer/grexplorer.html

The Grammar Explorer provides a means of exploring large-scale systemic-functional grammars in order to see how they are
organized and what kinds of things they cover. It can be used to explore the KPML resources.
Downloadable standalone executables of the grammar explorer are available for Windows 95/98/NT.
These already include a version of the Nigel grammar of English and pre-installed examples.

== jsRealB ==

http://rali.iro.umontreal.ca/rali/?q=en/jsrealb-bilingual-text-realiser

jsReakB is a bilingual (French and English) text realiser for web programming

== KPML ==

http://www.purl.org/net/kpml

The KPML system offers a robust, mature platform for large-scale grammar engineering that is particularly oriented to multilingual grammar development and generation. It is particularly targetted at providing resources for realistic but broad-coverage generation applications, where both flexibility of expression and speed of generation are at issue—for example in online webpage generation or spoken dialogue. KPML is also used extensively in multilingual text generation research and for teaching. It is based on systemic functional linguistics.

A growing set of generation grammars are under development for a variety of languages, inlcluding English, Spanish, Dutch, Chinese, German, Czech, and more. See the
Generation Bank (http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/genbank/generation-bank.html )
for current examples. The development of further languages and of extensions to existing resources are very welcome!

== LKB ==
http://wiki.delph-in.net/moin/LkbTop

LKB (Linguistic Knowledge Builder) is a grammar engineering environment for unification-based formalisms, typically HPSG.
It includes a [http://wiki.delph-in.net/moin/LkbGeneration realiser] that takes as input Minimal Recursion Semantics (MRS). LKB is implemented in Common Lisp, and is freely available under an open source license. It includes also a KNOPPIX-based GNU/Linux live-CD, with all the system installed, ready to use.

== Multimodal Unification Grammar ==
http://www.david-reitter.com/compling/mug/

MUG Workbench is a development and debugging tool for Multimodal NLG. The grammar formalism supported is
Multimodal Functional Unification Grammar (MUG). The MUG system runs MUG grammars with fixed (test cases)
and arbitrary input specifications to produce output in a natural language, graphical user interface and
possibly in other modes. It is designed to do three things:
- Multimodal Fission (distributing output to interaction/communication modes)
- Some sentence planning (chosing information to include in the utterance)
- Natural Language and graphical user interface realization (producing some form of output)
The MUG system does these three jobs in parallel. MUG Workbench can serve to inspect the data-structures
used during generation. It should help you to learn more about the nature of unification grammars used
for parsing or natural language generation. Furthermore, the MUG Workbench is helpful in debugging your grammars.

== NaturalOWL ==
http://www.aueb.gr/users/ion/software/NaturalOWL1.1.tar.gz NaturalOWL (version 1.1)

Generates descriptions of entities and classes from OWL ontologies that have been annotated with linguistic and user modeling resources expressed in RDF. Currently supports English and Greek. Extensions for other languages welcome. NaturalOWL can also be used as a [http://protege.stanford.edu/ Protégé] plug-in. See [http://www.aueb.gr/users/ion/publications.html here] for publications describing NaturalOWL. (GPL)

== NLGen and NLGen2 ==
https://launchpad.net/nlgen

https://launchpad.net/nlgen2

The NLGen natural language generation system applies the [http://www.opencog.org/wiki/SegSim SegSim strategy] for generating English sentences. Probabilistic inference for sentence construction is based on a statistical analysis of [http://opencog.org/wiki/RelEx RelEx] output. Java, Apache license. See demo: [http://novamente.net/example/nlp.html Demo of AI Virtual Pet Answering Simple Questions].

NLGen2 uses [http://opencog.org/wiki/RelEx RelEx] dependency parses, together with [http://www.abisource.com/projects/link-grammar/ Link Grammar] linkage analysis to generate English-language output. Java, Apache license. Reference: Blake Lemoine, "[http://www.louisiana.edu/~bal2277/NLGen2.doc NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System]".

== OpenCCG ==
http://openccg.sourceforge.net/

OpenCCG is both a parser and a realizer for [[Combinatory Categorial Grammar]]. It has been used in several dialog systems. The realizer has been enhanced with n-gram models and a supertagging approach called hypertagging. OpenCCG is implemented in Java, and is freely available under the LGPL.

== SimpleNLG ==

https://github.com/simplenlg/simplenlg (English)

http://www-etud.iro.umontreal.ca/~vaudrypl/snlgbil/snlgEnFr_english.html (French)

https://github.com/citiususc/SimpleNLG-GL (Galician)

https://github.com/citiususc/SimpleNLG-ES (Spanish)

SipleNLG is a simple Java-based realiser. Its grammatical coverage and syntactic knowledge is small compared to KPML or FUF/SURGE. However, because it is so simple, its relatively
easy for people to learn how to use it. It has a Java API, and can be used from other languages via an XML interface. There are "unofficial" ports to other programming languages such as Python and Ruby. Versions for other human languages are being worked on, including [https://aclweb.org/anthology/W18-6508 Dutch], [https://github.com/alexmazzei/SimpleNLG-IT Italian], [https://aclweb.org/anthology/papers/W/W18/W18-6506/ Mandarin]

== SPUD ==
http://www.cs.rutgers.edu/~mdstone/nlg.html

SPUD (Sentence Planner Using Descriptions) is Matthew Purver's LTAG-based NLG system. There are two versions: SPUD version 0.01 was written in SML. Later versions, known as SPUD lite, are written in Prolog. The small codebase of SPUD lite makes it ideal for teaching, but it is also used in dialog system prototypes.

== STANDUP ==
https://www.abdn.ac.uk/ncs/departments/computing-science/standup-315.php

STANDUP (System To Augment Non-speakers' Dialogue Using Puns) is a collaborative project on generating simple jokes from a graphical user interface appropriate for non-speaking children. The project began in October 2003 and ran until March 2007. The software was written in Java and is available for Windows and Linux, including source code and database files.

== Suregen-2 ==
http://www.suregen.de/00023.html

Suregen is “a hybrid, multilingual (German, English) ontology based and NLG-oriented formalism for generating text for documents in clinical medicine.”
The system Suregen-2 is written in (Allegro) Common Lisp. A [http://www.suregen.de/ftp/standalone1.zip demo system] which runs under Windows is available for download. A [http://www.suregen.de/ftp/selfrunningdemo.zip screencast video] shows data being entered into computer forms using mouse and keyboard while a feedback text is continually updated and shown below. (Try playing the AVI file in [http://www.videolan.org/vlc/ VLC] if you run into problems.) Perhaps this system could be considered an instance of the [http://en.wikipedia.org/wiki/WYSIWYM_(Meant) WYSIWYM] approach.

[[Category:Software]]
{{SIGGEN Wiki}}

Data sets for NLG

2019-04-11T10:19:58Z

Ereiter: add BLEU data

This page lists sets of structured data to be used as input for natural language generation tasks, or to inform research on NLG.

==Data-to-text/Concept-to-text Generation==
These datasets contain data and corresponding texts based on this data.

=== E2E ===
http://www.macs.hw.ac.uk/InteractionLab/E2E/#data

Crowdsourced restaurant descriptions with corresponding restaurant data. English.

=== SUMTIME ===
https://ehudreiter.files.wordpress.com/2016/12/sumtime.zip

Weather forecasts written by human forecasters, with corresponding forecast data, for UK North Sea marine forecasts.

=== WeatherGov ===
https://cs.stanford.edu/~pliang/data/weather-data.zip

Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.

=== WebNLG===
https://github.com/ThiagoCF05/webnlg

Crowdsourced descriptions of semantic web entities, with corresponding RDF triples.

== Referring Expressions Generation==
Referring expression generation is a sub-task of NLG that focuses only on the generation of referring expressions (descriptions) that identify specific entities called targets.

=== GRE3D3 and GRE3D7: Spatial Relations in Referring Expressions ===
http://jetteviethen.net/research/spatial.html

Two web-based production experiments were conducted by Jette Viethen under the supervision of Robert Dale.
The resulting corpora GRE3D3 and GRE3D7 contain 720 and 4480 referring expressions, respectively. Each referring expression describes a simple object in a simple 3D scene. GRE3D3 scenes contain 3 objects and GRE3D7 scenes contain 7 objects.

=== RefClef, RefCOCO, RefCOCO+ and RefCOCOg ===
https://github.com/lichengunc/refer

Referring expressions for objects in images, and the corresponding images.

=== The REAL dataset ===
https://datastorre.stir.ac.uk/handle/11667/82

Referring expressions for objects in images, and the corresponding images.

=== GeoDescriptors ===
https://gitlab.citius.usc.es/alejandro.ramos/geodescriptors

Geographical descriptions (eg, "Norte de Galicia") and corresponding regions on a map

=== TUNA Reference Corpus ===
https://www.abdn.ac.uk/ncs/departments/computing-science/corpus-496.php

https://www.abdn.ac.uk/ncs/documents/corpus.zip [direct download]

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008). Main authors: Kees van Deemter, Albert Gatt, Ielka van der Sluis.

=== COCONUT Corpus ===
http://www.pitt.edu/~coconut/coconut-corpus.html

http://www.pitt.edu/%7Ecoconut/corpora/corpus.tar.gz [direct download]

COCONUT was a project on “Cooperative, coordinated natural language utterances”. The COCONUT corpus is a collection of computer-mediated dialogues in which two subjects collaborate on a simple task, namely buying furniture. SGML annotations were added according to the [http://www.pitt.edu/%7Epjordan/papers/coconut-manual.pdf COCONUT-DRI coding scheme].

== Dialogue Systems ==

===CLASSiC WOZ corpus on InformationPresentation in Spoken Dialogue Systems===
http://www.classic-project.org/corpora

CLASSiC is a project on [http://www.classic-project.org/ Computational Learning in Adaptive Systems for Spoken Conversation]. The Wizard-of-Oz corpus on Information Presentation in Spoken Dialogue Systems contains the wizards' choices on Information Presentation strategy (summary, compare, recommend , or a combination of those) and attribute selection. The domain is restaurant search in Edinburgh. Objective measures (such as dialogue length, number of database hits, number of sentences generated etc.), as well as subjective measures (the user scores) were logged.

== Other ==
=== PIL: Patient Information Leaflet corpus ===
http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/

http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL-corpus-2.0.tar.gz [direct download]

The Patient Information Leaflet (PIL) corpus] is a [http://www.itri.brighton.ac.uk/projects/pills/corpus/PIL/searchtool/search.html searchable] and [http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ browsable] collection of patient information leaflets available in various document formats as well as structurally annotated SGML. The PIL corpus was initially developed as part of the ICONOCLAST project at ITRI, Brighton.

=== Validity of BLEU Evaluation Metric ===
https://abdn.pure.elsevier.com/en/datasets/data-for-structured-review-of-the-validity-of-bleu

https://abdn.pure.elsevier.com/files/125166547/bleu_survey_data.zip [direct download]

Correlations between BLEU and human evaluations (for MT as well as NLG), extracted from papers in the ACL Anthology

[[Category:Knowledge Collections and Datasets]]
{{SIGGEN Wiki}}