BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

Benjamin Heinzerling, Michael Strube


Anthology ID:
L18-1473
Volume:
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Month:
May
Year:
2018
Address:
Miyazaki, Japan
Editors:
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
https://aclanthology.org/L18-1473
DOI:
Bibkey:
Cite (ACL):
Benjamin Heinzerling and Michael Strube. 2018. BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Cite (Informal):
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages (Heinzerling & Strube, LREC 2018)
Copy Citation:
PDF:
https://aclanthology.org/L18-1473.pdf
Code
 bheinzerling/bpemb