Abstract
In this paper, we present a new version of our interactive video retrieval system V-FIRST. Besides the existing features of querying by textual descriptions and visual examples, we propose the usage of an image generator that can generate images from a text prompt as a means to bridge the domain gap. We also include a novel referring expression segmentation module to highlight the objects in an image. This is the first step towards providing adequate explainability to retrieval results, ensuring that the system can be trusted and used in domain-specific and critical scenarios. Searching by a sequence of events is also a new addition, as it proves to be pivotal in finding events from memory. Furthermore, we improved our Optical Character Recognition capability, especially in the case of scene text. Finally, the inclusion of relevant feedback allows the user to explicitly refine the search space. All combined, our system has greatly improved user interaction, leveraging more explicit information and providing more tools for the user to work with.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Efficient search and browsing of large-scale video collections with vibro. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 487–492. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_43
Hoang-Xuan, N., et al.: Flexible interactive retrieval SysTem 2.0 for visual lifelog exploration at LSC 2021 Submitted for review
Hoang-Xuan, N., et al.: Flexible interactive retrieval SysTem 3.0 for visual lifelog exploration at LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge. LSC 2022, pp. 20–26. Association for Computing Machinery (2022). https://doi.org/10.1145/3512729.3533013
Lokoč, J., Mejzlík, F., Souček, T., Dokoupil, P., Peška, L.: Video search with context-aware ranker and relevance feedback. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 505–510. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_46
Nguyen, E.R., Hoang-Xuan, N., Tran, M.T.: Visual-language transformer for referring video object segmentation. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, YouTube-VOS
Nguyen, N., et al.: Dictionary-guided scene text recognition, pp. 7383–7392. https://openaccess.thecvf.com/content/CVPR2021/html/Nguyen_Dictionary-Guided_Scene_Text_Recognition_CVPR_2021_paper.html
Nguyen, T.-N., Puangthamawathanakun, B., Healy, G., Nguyen, B.T., Gurrin, C., Caputo, A.: Videofall - a hierarchical search engine for VBS2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 518–523. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_48
Schoeffmann, K., Lokoč, J., Bailer, W.: 10 years of video browser showdown. In: Proceedings of the 2nd ACM International Conference on Multimedia in Asia. MMAsia 2020, pp. 1–3. Association for Computing Machinery (2021). https://doi.org/10.1145/3444685.3450215
Tran, M.-T., et al.: V-FIRST: a flexible interactive retrieval system for video at VBS 2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 562–568. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_55
Trang-Trung, H.P., et al.: Flexible interactive retrieval SysTem 2.0 for visual lifelog exploration at LSC 2021. In: Proceedings of the 4th Annual on Lifelog Search Challenge. LSC 2021, Taipei, Taiwan, pp. 81–87. Association for Computing Machinery (2021). https://doi.org/10.1145/3463948.3469072
Acknowledgement
This research was funded by Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA19.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hoang-Xuan, N. et al. (2023). V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_54
Download citation
DOI: https://doi.org/10.1007/978-3-031-27077-2_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)