融合多层语义的跨模态检索

计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 227-233.doi: 10.11896/j.issn.1002-137X.2019.03.034

融合多层语义的跨模态检索

冯耀功,蔡国永

(桂林电子科技大学计算机与信息安全学院广西桂林 541004)

收稿日期:2018-02-07 修回日期:2018-05-16 出版日期:2019-03-15 发布日期:2019-03-22
通讯作者: 蔡国永(1971-),男,博士,教授,主要研究方向为社交媒体数据挖掘,E-mail:ccgycai@gmail.com
作者简介:冯耀功(1992-),男,硕士,主要研究方向为跨模态检索,E-mail:fengyaogong@gmail.com
基金资助:
国家自然科学基金(61763007),广西自然科学基金(2017JJD160017)资助

Cross-modal Retrieval Fusing Multilayer Semantics

FENG Yao-gong CAI Guo-yong

School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China

Received:2018-02-07 Revised:2018-05-16 Online:2019-03-15 Published:2019-03-22

摘要/Abstract

摘要： 如何挖掘出不同模态数据之间的潜在语义关联是跨模态检索算法的核心问题。已有研究表明,将表示学习和关联学习融合的模式比较适用于跨模态检索的任务,但目前基于这一模式的模型的不同模态数据的抽象层次之间只包含着1-1的对应关联关系。由于异构多模态数据的抽象粒度并不完全相同,对此它们之间的关联关系很可能不只存在于指定的抽象层上。因此,提出了一种融合多层语义的跨模态检索模型,它利用深度玻尔兹曼机的双向结构特点,实现了将文本模态数据的不同抽象层次同时关联到图像模态数据的多个抽象层上,从而更充分地挖掘不同模态数据抽象层之间N-M的内在关联。基于3个公开数据集的实验结果表明,该模型优于之前类似的跨模态检索模型,具有更高的检索精确度。

关键词: 多层语义, 检索, 跨模态, 融合, 深度学习

Abstract: How to explore the inherent relations of different modalities is the core problem of cross-modal retrieval.The previous works demonstrate that the models which incorporate representation learning and correlation learning into a single process are more suitable for cross-modal retrieval task,but these models only contain the 1-1 correspondence correlations between different modalities.However,different modalities are more likely to have different granularities of semantics abstraction,and the correlations between different modalities are more likely to occur in different layers of semantic at the same time.This paper proposed a cross-modal retrieval model fusing multilayer semantic.The model benefits from the architecture of deep boltzmann machine which is an undirected graph model and implements that each semantic layer of text modality is associated with multiple different semantic layers of image modality at last,and explores the inherent N-M relations of different modalities more sufficiently.The results of experiments on three real and public datasets demonstrate that this model is obviously superior to the state-of-art models,and has higher accuracy of retrieval.

Key words: Cross-modal, Deep learning, Fusion, Multilayer semantics, Retrieval

中图分类号:

TP183

冯耀功,蔡国永. 融合多层语义的跨模态检索[J]. 计算机科学, 2019, 46(3): 227-233. https://doi.org/10.11896/j.issn.1002-137X.2019.03.034

FENG Yao-gong CAI Guo-yong. Cross-modal Retrieval Fusing Multilayer Semantics[J]. Computer Science, 2019, 46(3): 227-233. https://doi.org/10.11896/j.issn.1002-137X.2019.03.034

参考文献

[1]FENG F X.Deep learning for cross-modal retrieval[D].Beijing:Beijing University of Posts and Telecommunications,2015.(in Chinese)
冯方向.基于深度学习的跨模态检索研究[D].北京:北京邮电大学,2015.
[2]FENG F,WANG X,LI R.Cross-modal retrieval with correspondence autoencoder[C]∥Proceedings of the 22nd ACM international conference on Multimedia.ACM,2014:7-16.
[3]FENG F,LI R,WANG X.Deep correspondence restricted Boltzmann machine for cross-modal retrieval[J].Neurocomputing,2015,154:50-60.
[4]WANG W,OOI B C,YANG X,et al.Effective multi-modal retrieval based on stacked auto-encoders[J].Proceedings of the VLDB Endowment,2014,7(8):649-660.
[5]CAI G,FENG Y,LIN Q.Cross-modal retrieval based on deep
correlated network[C]∥2017 3rd IEEE InternationalConfe-rence on Computer and Communications (ICCC).IEEE,2017:1226-1231.
[6]PENG Y,HUANG X,QI J.Cross-media Shared Representation by Hierarchical Learning with Multiple Deep Networks[C]∥International Joint Conference on Artificial Intelligence(IJCAI).IEEE,2016:3846-3853.
[7]WANG K,YIN Q,WANG W,et al.A comprehensive survey on cross-modal retrieval[J].arXiv preprint arXiv:1607.06215,2016.
[8]SALAKHUTDINOV R,HINTON G.Deep boltzmann machines[C]∥Artificial Intelligence and Statistics.IEEE,2009:448-455.
[9]SRIVASTAVA N,SALAKHUTDINOV R R.Multimodal lear-
ning with deep boltzmann machines[C]∥Advances in Neural Information Processing Systems.2012:2222-2230.
[10]CHO K H,RAIKO T,ILIN A.Gaussian-bernoulli deep boltz-
mann machine[C]∥The 2013 International Joint Conference on Neural Networks (IJCNN).IEEE,2013:1-7.
[11]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[R].Technical Teport,University of Toronto,2009.
[12]WELLING M,ROSEN-ZVI M,HINTON G E.Exponential
family harmoniums with an application to information retrieval[C]∥Advances in Neural Information Processing Systems.2005:1481-1488.
[13]HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]∥Advances in Neural Information Processing Systems.2009:1607-1614.
[14]SALAKHUTDINOV R,LAROCHELLE H.Efficient learning
of deep Boltzmann machines[C]∥Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:693-700.
[15]HINTON G E.Training products of experts by minimizing contrastive divergence[J].Neural Computation,2002,14(8):1771-1800.
[16]RASIWASIA N,COSTA PEREIRA J,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]∥Proceedings of the 18th ACM International Conference on Multimedia.ACM,2010:251-260.
[17]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]∥Proceedings of the ACM International Conference on Image and Video Retrieval.ACM,2009.
[18]FARHADI A,HEJRATI M,SADEGHI M,et al.Every picture tells a story:Generating sentences from images[M]∥Computer Vision-ECCV 2010.Berlin:Springer,2010:15-29.
[19]NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]∥Proceedings of the 28th International Conference on Machine Learning (ICML-11).2011:689-696.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[3]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4]	聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[5]	曹晓雯, 梁美玉, 鲁康康. 基于细粒度语义推理的跨媒体双路对抗哈希学习模型 Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model 计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[6]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[7]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[8]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11]	秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[12]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[13]	魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179
[14]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[15]	沈祥培, 丁彦蕊. 多检测器融合的深度相关滤波视频多目标跟踪算法 Multi-detector Fusion-based Depth Correlation Filtering Video Multi-target Tracking Algorithm 计算机科学, 2022, 49(8): 184-190. https://doi.org/10.11896/jsjkx.210600004

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed