Edgar Simo-Serra


2017

pdf bib
Multi-Modal Fashion Product Retrieval
Antonio Rubio Romano | LongLong Yu | Edgar Simo-Serra | Francesc Moreno-Noguer
Proceedings of the Sixth Workshop on Vision and Language

Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset.

2016

pdf bib
Structured Prediction with Output Embeddings for Semantic Image Annotation
Ariadna Quattoni | Arnau Ramisa | Pranava Swaroop Madhyastha | Edgar Simo-Serra | Francesc Moreno-Noguer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies