Abstract
In this paper, we present a quantum-inspired interactive video search engine (QIVISE), which will be tested in VBS2023. QIVISE aims at assisting the user in dealing with Known-Item Search and Ad-hoc Video Search tasks with high efficiency and accuracy. QIVISE is based on a text-image encoder to achieve multi-modal embedding and introduces multiple interaction possibilities, including a novel quantum-inspired interaction on paradigm, label search, and multi-modal search to refine the retrieval results via user’s interaction and feedback.
W. Song, J. He and X. Li—These authors contribute equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Z., Zhu, B.: Some formal analysis of Rocchio’s similarity-based relevance feedback algorithm. Inf. Retrieval 5, 61–86 (2002). https://doi.org/10.1023/A:1012730924277
Gan, J., Tao, Y.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (2015). https://doi.org/10.1145/2723372.2737792
Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Efficient search and browsing of large-scale video collections with vibro. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 487–492. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_43
Kenton, J.D., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT (2019)
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022). https://doi.org/10.48550/arXiv.2201.12086
Lokoč, J., et al.: Is the reign of interactive search eternal? Findings from the video browser showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17(3), 1–26 (2021)
Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 616–621. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_55
Uprety, S., Gkoumas, D., Song, D.: A survey of quantum theory inspired approaches to information retrieval. ACM Comput. Surv. (2021). https://doi.org/10.1145/3402179
Wang, P., Hou, Y., Li, Z., Zhang, Y.: QIRM: a quantum interactive retrieval model for session search. Neurocomputing (2021). https://doi.org/10.1016/j.neucom.2021.04.013
Wu, H., et al.: UniKeyphrase: a unified extraction and generation framework for keyphrase prediction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP (2021). https://doi.org/10.48550/arXiv.2106.04847
Acknowledgement
This work is supported by the National Natural Science Foundation of China (No. U1903214, 61876135) and by Ministry of Education Industry-University Cooperation and Collaborative Education Projects (No. 202102246004). The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Song, W., He, J., Li, X., Feng, S., Liang, C. (2023). QIVISE: A Quantum-Inspired Interactive Video Search Engine in VBS2023. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-031-27077-2_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)