MulCode: A Multiplicative Multi-way Model for Compressing Neural Language Model

Yukun Ma, Patrick H. Chen, Cho-Jui Hsieh


Abstract
It is challenging to deploy deep neural nets on memory-constrained devices due to the explosion of numbers of parameters. Especially, the input embedding layer and Softmax layer usually dominate the memory usage in an RNN-based language model. For example, input embedding and Softmax matrices in IWSLT-2014 German-to-English data set account for more than 80% of the total model parameters. To compress these embedding layers, we propose MulCode, a novel multi-way multiplicative neural compressor. MulCode learns an adaptively created matrix and its multiplicative compositions. Together with a prior weighted loss, Multicode is more effective than the state-of-the-art compression methods. On the IWSLT-2014 machine translation data set, MulCode achieved 17 times compression rate for the embedding and Softmax matrices, and when combined with quantization technique, our method can achieve 41.38 times compression rate with very little loss in performance.
Anthology ID:
D19-1529
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
5257–5266
Language:
URL:
https://aclanthology.org/D19-1529
DOI:
10.18653/v1/D19-1529
Bibkey:
Cite (ACL):
Yukun Ma, Patrick H. Chen, and Cho-Jui Hsieh. 2019. MulCode: A Multiplicative Multi-way Model for Compressing Neural Language Model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5257–5266, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
MulCode: A Multiplicative Multi-way Model for Compressing Neural Language Model (Ma et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1529.pdf