Abstract
There is an increasing need for intelligent interaction with media collections, and mobile phones are gaining significant traction as the device of choice for many users. In this paper, we present XQM, a mobile approach for intelligent interaction with the user’s media on the phone, tackling the inherent challenges of the highly dynamic nature of mobile media collections and limited computational resources of the mobile device. We employ interactive learning, a method that conducts interaction rounds with the user, each consisting of the system suggesting relevant images based on its current model, the user providing relevance labels, the system’s model retraining itself based on these labels, and the system obtaining a new set of suggestions for the next round. This method is suitable for the dynamic nature of mobile media collections and the limited computational resources. We show that XQM, a full-fledged app implemented for Android, operates on 10K image collections in interactive time (less than 1.4 s per interaction round), and evaluate user experience in a user study that confirms XQM’s effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
XQM is an acronym of Exquisitor Mobile, as the design of XQM relies heavily on Exquisitor, the state-of-the-art interactive learning system [10]. The XQM app is available to the research community at www.github.com/ITU-DASYALab/XQM.
- 2.
In the current implementation, at least one positive and one negative example are needed; until these have been identified, random images replace the judged images.
- 3.
Loading random images is useful when the model is missing positive examples with concepts that have not yet been seen; in a future version we plan to implement search functionality to further help find positive examples.
- 4.
- 5.
References
Aggarwal, C., Kong, X., Gu, Q., Han, J., Yu, P.: Active learning: a survey. In: Data Classification, pp. 571–605. CRC Press (2014)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of SCG, pp. 253–262 (2004)
Ensor, A., Hall, S.: GPU-based image analysis on mobile devices. CoRR abs/1112.3110 http://arxiv.org/abs/1112.3110 (2011)
Geirhos, R., Temme, C.R.M., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Proceedings of NIPS, pp. 7538–7550 (2018)
Guðmundsson, G.Þ., Amsaleg, L., Jónsson, B.Þ.: Impact of storage technology on the efficiency of cluster-based high-dimensional index creation. In: Proceedings of DASFAA, pp. 53–64 (2012)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE PAMI 33(1), 117–128 (2010)
Khan, O.S., et al.: Interactive learning for multimedia at large. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 495–510. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_33
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, pp. 1097–1105 (2012)
Nielsen, J.: 10 usability heuristics for user interface design (1995). https://www.nngroup.com/articles/ten-usability-heuristics/. Accessed 25 Mar 2020
Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison (2009)
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE PAMI 22(12), 1349–1380 (2000)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of CVPR, pp. 1–9 (2015)
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of CVPR (2017)
Zahálka, J., Rudinac, S., Jónsson, B., Koelma, D., Worring, M.: Blackthorn: large-scale interactive multimodal learning. IEEE TMM 20, 687–698 (2018)
Zahálka, J., Worring, M.: Towards interactive, intelligent, and integrated multimedia analytics. In: Proceedings of IEEE VAST, pp. 3–12 (2014)
Zhou, X., Huang, T.: Relevance feedback in image retrieval: a comprehensive review. Multimed. Syst. 8, 536–544 (2003)
Acknowledgments
This work was supported by a PhD grant from the IT University of Copenhagen and by the European Regional Development Fund (project Robotics for Industry 4.0, CZ.02.1.01/0.0/0.0/15 003/0000470). Thanks to Dennis C. Koelma for his help with adopting the ResNext101 model.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bagi, A.M., Schild, K.I., Khan, O.S., Zahálka, J., Jónsson, B.Þ. (2021). XQM: Interactive Learning on Mobile Phones. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)