Abstract
This paper presents the most recent additions to the vitrivr retrieval stack, which will be put to the test in the context of the 2019 Video Browser Showdown (VBS). The vitrivr stack has been extended by approaches for detecting, localizing, or describing concepts and actions in video scenes using various convolutional neural networks. Leveraging those additions, we have added support for searching the video collection based on semantic sketches. Furthermore, vitrivr offers new types of labels for text-based retrieval. In the same vein, we have also improved upon vitrivr’s pre-existing capabilities for extracting text from video through scene text recognition. Moreover, the user interface has received a major overhaul so as to make it more accessible to novice users, especially for query formulation and result exploration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), vol. 16, pp. 265–283. USENIX, Savannah, GA, USA (2016)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany (2018, page to appear)
Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools and Appl. (MTAP) 76(4), 5539–5571 (2017)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE, Las Vegas (2016)
Mark Everingham, S.M., Eslami, A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111(1), 98–136 (2015)
Furuta, R., Inoue, N., Yamasaki, T.: Efficient and interactive spatial-semantic image retrieval. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 190–202. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_16
Giangreco, I., Schuldt, H.: ADAM\(_{pro}\): database support for big multimedia retrieval. Datenbank-Spektrum 16(1), 17–26 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227, pp. 1–10 (2014)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.-F.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1725–1732. IEEE, Columbus (2014)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. (JMLR) 9(Nov), 2579–2605 (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, pp. 1–12 (2013)
Rossetto, L., Giangreco, I., Gasser, R., Schuldt, H.: Competitive video retrieval with vitrivr. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 403–406. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_41
Rossetto, L., et al.: IMOTION – searching for video sequences using multi-shot sketch queries. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 377–382. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27674-8_36
Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: Proceedings of the International Symposium on Multimedia (ISM), pp. 18–23. IEEE, Taichung, December 2014
Rossetto, L., et al.: IMOTION — a content-based video retrieval engine. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 255–260. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14442-9_24
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: vitrivr: a flexible retrieval stack supporting multiple query modesfor searching in multimedia collections. In: Proceedings of the ACM Conference on Multimedia Conference (ACM MM), pp. 1183–1186. ACM, Amsterdam, October 2016
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_43
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(11), 2298–2304 (2017)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE, Santiago (2015)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(4), 652–663 (2017)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 4. IEEE, Honolulu (2017)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651. IEEE, Honolulu (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H. (2019). Deep Learning-Based Concept Detection in vitrivr. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-05716-9_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)