Scalable multimedia retrieval by deep learning hashing with relative similarity learning

L Gao, J Song, F Zou, D Zhang, J Shao - Proceedings of the 23rd ACM …, 2015 - dl.acm.org
L Gao, J Song, F Zou, D Zhang, J Shao
Proceedings of the 23rd ACM international conference on Multimedia, 2015dl.acm.org
Learning-based hashing methods are becoming the mainstream for approximate scalable
multimedia retrieval. They consist of two main components: hash codes learning for training
data and hash functions learning for new data points. Tremendous efforts have been
devoted to designing novel methods for these two components, ie, supervised and
unsupervised methods for learning hash codes, and different models for inferring hashing
functions. However, there is little work integrating supervised and unsupervised hash codes …
Learning-based hashing methods are becoming the mainstream for approximate scalable multimedia retrieval. They consist of two main components: hash codes learning for training data and hash functions learning for new data points. Tremendous efforts have been devoted to designing novel methods for these two components, i.e., supervised and unsupervised methods for learning hash codes, and different models for inferring hashing functions. However, there is little work integrating supervised and unsupervised hash codes learning into a single framework. Moreover, the hash function learning component is usually based on hand-crafted visual features extracted from the training images. The performance of a content-based image retrieval system crucially depends on the feature representation and such hand-crafted visual features may degrade the accuracy of the hash functions. In this paper, we propose a semi-supervised deep learning hashing (DLH) method for fast multimedia retrieval. More specifically, in the first component, we utilize both visual and label information to learn an relative similarity graph that can more precisely reflect the relationship among training data, and then generate the hash codes based on the graph. In the second stage, we apply a deep convolutional neural network (CNN) to simultaneously learn a good multimedia representation and hash functions. Extensive experiments on three popular datasets demonstrate the superiority of our DLH over both supervised and unsupervised hashing methods.
ACM Digital Library