Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Interactive Search vs. Automatic Search: An Extensive Study on Video Retrieval

Published: 11 May 2021 Publication History

Abstract

This article conducts user evaluation to study the performance difference between interactive and automatic search. Particularly, the study aims to provide empirical insights of how the performance landscape of video search changes, with tens of thousands of concept detectors freely available to exploit for query formulation. We compare three types of search modes: free-to-play (i.e., search from scratch), non-free-to-play (i.e., search by inspecting results provided by automatic search), and automatic search including concept-free and concept-based retrieval paradigms. The study involves a total of 40 participants; each performs interactive search over 15 queries of various difficulty levels using two search modes on the IACC.3 dataset provided by TRECVid organizers. The study suggests that the performance of automatic search is still far behind interactive search. Furthermore, providing users with the result of automatic search for exploration does not show obvious advantage over asking users to search from scratch. The study also analyzes user behavior to reveal insights of how users compose queries, browse results, and discover new query terms for search, which can serve as guideline for future research of both interactive and automatic search.

References

[1]
Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (01 Nov 2010), 345--379.
[2]
George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan G. Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas L. Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges Quénot. 2019. TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval. In 2019 TREC Video Retrieval Evaluation (TRECVID’19). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv19.papers/tv19overview.pdf.
[3]
George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan G. Fiscus, Afzal Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, João Magalhães, David Semedo, and Saverio G. Blasi. 2018. TRECVID 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. In 2018 TREC Video Retrieval Evaluation (TRECVID’18). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv18.papers/tv18overview.pdf
[4]
George Awad, Asad A. Butt, Jonathan G. Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, and Benoit Huet (Eds.). 2017. 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.17.org.html.
[5]
George Awad, Asad A. Butt, Jonathan G. Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, and Benoit Huet. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/tv17overview.pdf.
[6]
George Awad, Jonathan G. Fiscus, David Joy, Martial Michel, Alan F. Smeaton, Wessel Kraaij, Georges Quénot, Maria Eskevich, Robin Aly, Roeland Ordelman, Marc Ritter, Gareth J. F. Jones, Benoit Huet, and Martha A. Larson. 2016. TRECVID 2016: Evaluating video search, video event detection, localization, and hyperlinking. In 2016 TREC Video Retrieval Evaluation (TRECVID’16). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv16.papers/tv16overview.pdf
[7]
Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc.
[8]
Kai Uwe Barthel, Nico Hezel, and Klaus Jung. 2018. Fusing keyword search and visual exploration for untagged videos. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 413--418.
[9]
Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu (Eds.). Springer International Publishing, Cham, 418--423.
[10]
Rodrigo Benenson, Stefan Popov, and Vittorio Ferrari. 2019. Large-scale interactive object segmentation with human annotators. CoRR abs/1903.10830 (2019). arxiv:1903.10830 http://arxiv.org/abs/1903.10830.
[11]
Adam Blažek, Jakub Lokoč, Filip Matzner, and Tomáš Skopal. 2015. Enhanced signature-based video browser. In MultiMedia Modeling, Xiangjian He, Suhuai Luo, Dacheng Tao, Changsheng Xu, Jie Yang, and Muhammad Abul Hasan (Eds.). Springer International Publishing, Cham, 243--248.
[12]
Maaike H. T. De Boer, Yi-Jie Lu, Hao Zhang, Klamer Schutte, Chong-Wah Ngo, and Wessel Kraaij. 2017. Semantic reasoning in zero example video event retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 4, Article 60 (Oct. 2017), 17 pages.
[13]
Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1, Article Article 1 (Jan. 2012), 50 pages.
[14]
Claudiu Cobâ¢rzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, Stefanos Vrochidis, Kai Barthel, and Luca Rossetto. 2017. Interactive video search tools: A detailed analysis of the video browser showdown 2015. Multimedia Tools and Applications 76, 4 (01 Feb 2017), 5539--5571. https://doi.org/10.1007/s11042-016-3661-2
[15]
Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2016. Knowledge based query expansion in complex multimedia event detection. Multimedia Tools and Applications 75, 15 (2016), 9025--9043.
[16]
Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2016. Knowledge based query expansion in complex multimedia event detection. Multimedia Tools and Applications 75, 15 (2016), 9025--9043.
[17]
Maaike H. T. de Boer, Klamer Schutte, Hao Zhang, Yi-Jie Lu, Chong-Wah Ngo, and Wessel Kraaij. 2016. Blind late fusion in multimedia event retrieval. International Journal of Multimedia Information Retrieval 5, 4 (2016), 203--217.
[18]
J. Dong, X. Li, and C. G. M. Snoek. 2018. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3377--3388.
[19]
Danny Francis, Bernard Mérialdo, and Benoit Huet. 2017. EURECOM at TRECVID 2017: The adhoc video search. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/eurecom.pdf.
[20]
A. Habibian, T. Mensink, and C. G. M. Snoek. 2017. Video2vec embeddings recognize events when examples are scarce. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 10 (Oct. 2017), 2089--2103.
[21]
Alexander Hauptmann, R. Yan, W. Lin, M. Christel, and H. Wactlar. 2007. Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Transactions on Multimedia 9, 5 (2007), 958--966.
[22]
Alexander Hauptmann, Rong Yan, and Wei-Hao Lin. 2007. How many high-level concepts will fill the semantic gap in news video retrieval? Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR’07), 627--634.
[23]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.
[24]
F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 961--970.
[25]
Lu Jiang, Shoou-I. Yu, Deyu Meng, Teruko Mitamura, and Alexander G. Hauptmann. 2015. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR’15). Association for Computing Machinery, New York, NY, 27--34.
[26]
Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364.
[27]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). 1725--1732. https://doi.org/10.1109/CVPR.2014.223
[28]
Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The kinetics human action video dataset. CoRR abs/1705.06950 (2017). arxiv:1705.06950 http://arxiv.org/abs/1705.06950
[29]
Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 425--430.
[30]
Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully deep learning for ad-hoc video search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). Association for Computing Machinery, New York, NY, 1786--1794.
[31]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740--755.
[32]
Ying Liu, Dengsheng Zhang, Guojun Lu, and Wei-Ying Ma. 2007. A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40, 1 (Jan. 2007), 262--282.
[33]
Jakub Lokoč, Adam Blažek, and Tomáš Skopal. 2014. Signature-based video browser. In MultiMedia Modeling, Cathal Gurrin, Frank Hopfgartner, Wolfgang Hurst, Håvard Johansen, Hyowon Lee, and Noel O’Connor (Eds.). Springer International Publishing, Cham, 415--418.
[34]
Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 419--424.
[35]
Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, and Chong-Wah Ngo. 2017. Color-sketch simulator: A guide for color-based visual known-item search. In Advanced Data Mining and Applications, Gao Cong, Wen-Chih Peng, Wei Emma Zhang, Chengliang Li, and Aixin Sun (Eds.). Springer International Publishing, Cham, 754--763.
[36]
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video browser showdown 2015-2017. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3361--3376.
[37]
Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1, Article 29 (Feb. 2019), 18 pages.
[38]
Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, Laurent Amsaleg, Gylfi Þór Guðmundsson, Cathal Gurrin, Björn Þór Jónsson, and Shin’ichi Satoh (Eds.). Springer International Publishing, Cham, 463--468.
[39]
Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detection with zero example: Select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 127--134.
[40]
Foteini Markatopoulou, Anastasia Moumtzidou, Damianos Galanopoulos, Konstantinos Avgerinakis, Stelios Andreadis, Ilias Gialampoukidis, Stavros Tachos, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, and Ioannis Patras. 2017. ITI-CERTH participation in TRECVID 2017. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/iti_certh.pdf.
[41]
Pascal Mettes, Dennis C. Koelma, and Cees G. M. Snoek. 2016. The ImageNet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 175--182. Retrieved from https://doi.org/10.1145/2911996.2912036.
[42]
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations (ICLR'13), Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings. http://arxiv.org/abs/1301.3781.
[43]
Anastasia Moumtzidou, Stelios Andreadis, Foteini Markatopoulou, Damianos Galanopoulos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, and Ioannis Patras. 2018. VERGE in VBS 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 444--450.
[44]
Milind Naphade, J. R. Smith, Jelena Tešić, S. Chang, Winston Hsu, Lyndon Kennedy, Alexander Hauptmann, and Jon Curtis. 2006. Large-scale concept ontology for multimedia. IEEE Multimedia 13 (2006), 86--91.
[45]
Apostol (Paul) Natsev, Alexander Haubold, Jelena Tešić, Lexing Xie, and Rong Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 991--1000.
[46]
Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017. VIREO @ TRECVID 2017: Video-to-text, ad-hoc video search, and video hyperlinking. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/vireo.pdf.
[47]
Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 407--412.
[48]
Phuong Anh Nguyen, Chong-Wah Ngo, Danny Francis, and Benoit Huet. 2019. VIREO @ video browser showdown 2019. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 609--615.
[49]
Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st Workshop on Web-Scale Multimedia Corpus (Beijing, China) (WSMC'09). Association for Computing Machinery, New York, NY, USA, 25--32. https://doi.org/10.1145/1631135.1631141
[50]
Sang Phan, Martin Klinkigt, Vinh-Tiep Nguyen, Tien-Dung Mai, Andreu Girbau Xalabarder, Ryota Hinami, Benjamin Renoust, Thanh Duc Ngo, Minh-Triet Tran, Yuki Watanabe, Atsushi Hiroike, Duc Anh Duong, Duy-Dinh Le, Yusuke Miyao, and Shin’ichi Satoh. 2017. NII-HITACHI-UIT at TRECVID 2017. In 2017 TREC Video Retrieval Evaluation, (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/nii_hitachi_uit.pdf.
[51]
Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the video browser showdown 2018. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 438--443.
[52]
Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, and Heiko Schuldt. 2019. Deep learning-based concept detection in vitrivr. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 616--621.
[53]
Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 403--406.
[54]
Luca Rossetto, Ivan Giangreco, Claudiu Tănase, Heiko Schuldt, Stéphane Dupont, and Omar Seddati. 2017. Enhanced retrieval and browsing in the IMOTION system. In MultiMedia Modeling, Laurent Amsaleg, Gylfi Þór Guðmundsson, Cathal Gurrin, Björn Þór Jónsson, and Shin’ichi Satoh (Eds.). Springer International Publishing, Cham, 469--474.
[55]
Luca Rossetto, Andreas Leibetseder, Stefanos Vrochidis, Ralph Gasser, Jakub Lokoc, Werner Bailer, Klaus Schoeffmann, Bernd Münzer, Tomas Soucek, Phuong Nguyen, and Paolo Bolettieri. 2020. Interactive video retrieval in the age of deep learning - Detailed evaluation of VBS 2019. IEEE Transactions on Multimedia PP (Mar. 2020), 1--1.
[56]
Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 431--437.
[57]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. CoRR abs/1409.0575 (2014). arxiv:1409.0575 http://arxiv.org/abs/1409.0575
[58]
K. Schoeffmann. 2019. Video browser showdown 2012-2019: A review. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI’19). 1--4.
[59]
Cees G. M. Snoek, Xirong Li, Chaoxi Xu, and Dennis C. Koelma. 2017. University of Amsterdam and Renmin University at TRECVID 2017: Searching video, detecting events and describing video. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/mediamill.pdf.
[60]
Cees G. M. Snoek and Marcel Worring. 2009. Concept-Based Video Retrieval. Now Publishers Inc., Hanover, MA.
[61]
C. G. M. Snoek, M. Worring, O. d. Rooij, K. E. A. van de Sande, R. Yan, and A. G. Hauptmann. 2008. VideOlympics: Real-time evaluation of multimedia retrieval systems. IEEE MultiMedia 15, 1 (Jan. 2008), 86--91.
[62]
Christiane Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
[63]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012). arxiv:1212.0402 http://arxiv.org/abs/1212.0402
[64]
Stephanie Strassel, Amanda J. Morris, Jonathan G. Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, and Martial Michel. 2012. Creating HAVIC: Heterogeneous audio visual internet collection. In LREC.
[65]
Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, and Dinh-Duy Le. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, Klaus Schoeffmann, Thanarat H. Chalidabhongse, Chong Wah Ngo, Supavadee Aramvith, Noel E. O’Connor, Yo-Sung Ho, Moncef Gabbouj, and Ahmed Elgammal (Eds.). Springer International Publishing, Cham, 451--456.
[66]
Christos Tzelepis, Damianos Galanopoulos, Vasileios Mezaris, and Ioannis Patras. 2016. Learning to detect video events from zero or very few video examples. Image Vision Computing 53, C (Sept. 2016), 35--44.
[67]
Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsuji Ogawa, and Tetsunori Kobayashi. 2017. Waseda_Meisei at TRECVID 2017: Ad-hoc video search. In 2017 TREC Video Retrieval Evaluation (TRECVID’17). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/waseda_meisei.pdf.
[68]
Kazuya Ueki, Kotaro Kikuchi, Susumu Saito, and Tetsunori Kobayashi. 2016. Waseda at TRECVID 2016: Ad-hoc video search. In 2016 TREC Video Retrieval Evaluation (TRECVID’16). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv16.papers/waseda.pdf.
[69]
Dong Wang, Xirong Li, Jianmin Li, and Bo Zhang. 2007. The importance of query-concept-mapping for automatic video retrieval. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 285--288.
[70]
Xiao-Yong Wei and Chong-Wah Ngo. 2007. Ontology-enriched semantic space for video search. In Proceedings of the 15th ACM International Conference on Multimedia (MM’07). Association for Computing Machinery, New York, NY, 981--990.
[71]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492--1500.
[72]
Emine Yilmaz, Evangelos Kanoulas, and Javed A. Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 603--610.
[73]
Wei Zhang, Hao Zhang, Ting Yao, Yijie Lu, Jingjing Chen, and Chong-Wah Ngo. 2014. VIREO @ TRECVID 2014: Instance search and semantic indexing. In 2014 TREC Video Retrieval Evaluation (TRECVID’14). National Institute of Standards and Technology (NIST). Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv14.papers/vireo.pdf.
[74]
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. 2018. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (2018), 1452--1464. https://doi.org/10.1109/TPAMI.2017.2723009

Cited By

View all
  • (2024)Performance Evaluation in Multimedia RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367888121:1(1-23)Online publication date: 14-Oct-2024
  • (2024)Application Analysis of Music Video Retrieval Technology Based on Dynamic Programming in Piano Performance TeachingJournal of Information & Knowledge Management10.1142/S021964922450052723:04Online publication date: 13-Jun-2024
  • (2023)BDNet: a method based on forward and backward convolutional networks for action recognition in videosThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03073-940:6(4133-4147)Online publication date: 17-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
May 2021
410 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3461621
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2021
Accepted: 01 October 2020
Revised: 01 August 2020
Received: 01 April 2020
Published in TOMM Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Video retrieval
  2. ad hoc video search
  3. automatic search
  4. interactive search
  5. user study

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Research Grants Council of the Hong Kong Special Administrative Region, China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Performance Evaluation in Multimedia RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367888121:1(1-23)Online publication date: 14-Oct-2024
  • (2024)Application Analysis of Music Video Retrieval Technology Based on Dynamic Programming in Piano Performance TeachingJournal of Information & Knowledge Management10.1142/S021964922450052723:04Online publication date: 13-Jun-2024
  • (2023)BDNet: a method based on forward and backward convolutional networks for action recognition in videosThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03073-940:6(4133-4147)Online publication date: 17-Oct-2023
  • (2022)Feature Extraction of Fused Residual Network and Single Target-Assisted Vessel Image Recognition of MRF Grayscale InformationMobile Information Systems10.1155/2022/10419342022Online publication date: 1-Jan-2022
  • (2022)Interactive Video Corpus Moment Retrieval using Reinforcement LearningProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548277(296-306)Online publication date: 10-Oct-2022
  • (2022)Data Driven Disaggregation Method for Electricity Based Energy Consumption for Smart HomesJournal of Physics: Conference Series10.1088/1742-6596/2385/1/0120062385:1(012006)Online publication date: 1-Dec-2022
  • (2022)Evaluating a Bayesian-like relevance feedback model with text-to-image search initializationMultimedia Tools and Applications10.1007/s11042-022-14046-w82:15(22305-22341)Online publication date: 4-Nov-2022
  • (2022)AI-big data analytics for building automation and management systems: a survey, actual challenges and future perspectivesArtificial Intelligence Review10.1007/s10462-022-10286-256:6(4929-5021)Online publication date: 15-Oct-2022
  • (2021)Exploring the Strengths of Neural Codes for Video RetrievalMachine Learning, Advances in Computing, Renewable Energy and Communication10.1007/978-981-16-2354-7_46(519-531)Online publication date: 20-Aug-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media