%0 Conference Proceedings %T Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation %A Zhou, Mingyang %A Arnold, Josh %A Yu, Zhou %Y Inui, Kentaro %Y Jiang, Jing %Y Ng, Vincent %Y Wan, Xiaojun %S Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) %D 2019 %8 November %I Association for Computational Linguistics %C Hong Kong, China %F zhou-etal-2019-building %X Reinforcement learning (RL) is an effective approach to learn an optimal dialog policy for task-oriented visual dialog systems. A common practice is to apply RL on a neural sequence-to-sequence(seq2seq) framework with the action space being the output vocabulary in the decoder. However, it is difficult to design a reward function that can achieve a balance between learning an effective policy and generating a natural dialog response. This paper proposes a novel framework that alternatively trains a RL policy for image guessing and a supervised seq2seq model to improve dialog generation quality. We evaluate our framework on the GuessWhich task and the framework achieves the state-of-the-art performance in both task completion and dialog quality. %R 10.18653/v1/D19-1014 %U https://aclanthology.org/D19-1014 %U https://doi.org/10.18653/v1/D19-1014 %P 143-153