Extending ImageNet to Arabic using Arabic WordNet

Abdulkareem Alsudais


Abstract
ImageNet has millions of images that are labeled with English WordNet synsets. This paper investigates the extension of ImageNet to Arabic using Arabic WordNet. The objective is to discover if Arabic synsets can be found for synsets used in ImageNet. The primary finding is the identification of Arabic synsets for 1,219 of the 21,841 synsets used in ImageNet, which represents 1.1 million images. By leveraging the parent-child structure of synsets in ImageNet, this dataset is extended to 10,462 synsets (and 7.1 million images) that have an Arabic label, which is either a match or a direct hypernym, and to 17,438 synsets (and 11 million images) when a hypernym of a hypernym is included. When all hypernyms for a node are considered, an Arabic synset is found for all but four synsets. This represents the major contribution of this work: a dataset of images that have Arabic labels for 99.9% of the images in ImageNet.
Anthology ID:
2020.alvr-1.1
Volume:
Proceedings of the First Workshop on Advances in Language and Vision Research
Month:
July
Year:
2020
Address:
Online
Editors:
Xin Wang, Jesse Thomason, Ronghang Hu, Xinlei Chen, Peter Anderson, Qi Wu, Asli Celikyilmaz, Jason Baldridge, William Yang Wang
Venue:
ALVR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/2020.alvr-1.1
DOI:
10.18653/v1/2020.alvr-1.1
Bibkey:
Cite (ACL):
Abdulkareem Alsudais. 2020. Extending ImageNet to Arabic using Arabic WordNet. In Proceedings of the First Workshop on Advances in Language and Vision Research, pages 1–6, Online. Association for Computational Linguistics.
Cite (Informal):
Extending ImageNet to Arabic using Arabic WordNet (Alsudais, ALVR 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.alvr-1.1.pdf
Video:
 http://slideslive.com/38929757
Code
 alsudais/ImageNet_to_AWN