Krzysztof Wolk


2021

pdf bib
Mining Bilingual Word Pairs from Comparable Corpus using Apache Spark Framework
Sanjanasri Jp | Vijay Krishna Menon | Soman Kp | Krzysztof Wolk
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)

Bilingual dictionaries are essential resources in many areas of natural language processing tasks, but resource-scarce and less popular language pairs rarely have such. Efficient automatic methods for inducting bilingual dictionaries are needed as manual resources and efforts are scarce for low-resourced languages. In this paper, we induce word translations using bilingual embedding. We use the Apache Spark framework for parallel computation. Further, to validate the quality of the generated bilingual dictionary, we use it in a phrase-table aided Neural Machine Translation (NMT) system. The system can perform moderately well with a manual bilingual dictionary; we change this into our inducted dictionary. The corresponding translated outputs are compared using the Bilingual Evaluation Understudy (BLEU) and Rank-based Intuitive Bilingual Evaluation Score (RIBES) metrics.

2017

pdf bib
PJIIT’s systems for WMT 2017 Conference
Krzysztof Wolk | Krzysztof Marasek
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
PJAIT Systems for the WMT 2016
Krzysztof Wolk | Krzysztof Marasek
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora
Krzysztof Wolk | Krzysztof Marasek
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents
Krzysztof Wolk | Krzysztof Marasek
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

2014

pdf bib
Polish-English speech statistical machine translation systems for the IWSLT 2014
Krzysztof Wolk | Krzysztof Marasek
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This research explores effects of various training settings between Polish and English Statistical Machine Translation systems for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2014 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system as well as Wikipedia based comparable corpora prepared by us. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of data preparations on translation results. Our experiments included systems, which use lemma and morphological information on Polish words. We also conducted a deep analysis of provided Polish data as preparatory work for the automatic data correction and cleaning phase.

2013

pdf bib
Polish-English speech statistical machine translation systems for the IWSLT 2013
Krzysztof Wolk | Krzysztof Marasek
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This research explores the effects of various training settings from Polish to English Statistical Machine Translation system for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2013 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of data preparations on translation results. Our experiments included systems, which use stems and morphological information on Polish words. We also conducted a deep analysis of provided Polish data as preparatory work for the automatic data correction and cleaning phase.