Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297156.3297244acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaiConference Proceedingsconference-collections
research-article

Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval

Published: 08 December 2018 Publication History

Abstract

Heterogeneous data embedding is a process of mapping different kinds of data into a common vector space of a certain dimension. Image-text embedding also means mapping image and text data that have completely different characteristics into a common vector space. In this paper, we propose an image-text embedding method using hierarchical knowledge such as coarse and fine labels of text data. The proposed method improves the training efficiency of the embedding model by fixing the coarse label vectors. In addition, the loss function is designed by arbitrarily selecting the negative sample from the fine labels having a hierarchical relationship with the coarse label, so that the difference between the vectors of the fine labels which have same coarse label becomes larger. So, when the images that are visual data is mapped into a common vector space, the semantic of images becomes clear. Experimental results show that embedding with hierarchical knowledge has been successfully performed using the proposed methodology and that cross-modal retrieval can be efficiently performed through embedding model.

References

[1]
Globerson, A., Chechik, G., Pereira, F., & Tishby, N. (2006, July). Embedding heterogeneous data using statistical models. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (Vol. 21, No. 2, p. 1605). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.,.
[2]
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., & Mikolov, T. (2013). Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems (pp. 2121--2129).
[3]
Chang, S., Han, W., Tang, J., Qi, G. J., Aggarwal, C. C., & Huang, T. S. (2015, August). Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 119--128). ACM.
[4]
Wang, K., Yin, Q., Wang, W., Wu, S., & Wang, L. (2016). A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215.
[5]
Wang, L., Li, Y., & Lazebnik, S. (2016). Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5005--5013).
[6]
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10(Feb), 207--244.
[7]
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532--1543).
[8]
Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In ICML Deep Learning Workshop (Vol. 2).
[9]
Hoffer, E., & Ailon, N. (2015, October). Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition (pp. 84--92). Springer, Cham.
[10]
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[11]
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images (Vol. 1, No. 4, p. 7). Technical report, University of Toronto

Index Terms

  1. Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence
    December 2018
    641 pages
    ISBN:9781450366069
    DOI:10.1145/3297156
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Shenzhen University: Shenzhen University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 December 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cross-modal Retrieval
    2. Heterogeneous Data Embedding
    3. Hierarchical Knowledge
    4. Image Text Embedding

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    CSAI '18

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 96
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media