Abstract
The Internet of Things (IoT) is the upcoming one of the major networking technologies. Using the IoT, different items or devices can be allowed to continuously generate, obtain, and exchange information. The new video sensor network has gradually become a research hotspot in the field of wireless sensor network, and its rich perceptual information is more conducive to the realization of target positioning and tracking function. This paper presents a novel model for IoT video sensors object tracking via deep Reinforcement Learning (RL) algorithm and spatial-temporal context learning algorithm, which provides a tracking solution to directly predict the bounding box locations of the target at every successive frame in video surveillance. Crucially, this task is tackled in an end-to-end approach. Considering the tracking task can be processed as a sequential decision-making process and historical semantic coding that is highly relevant to future decision-making information. So a recurrent convolutional neural network is adopted acting as an agent in this model, with the important insight that it can interact with the video overtime. In order to maximize tracking performance and make a great use the continuous, inter-frame correlation in the long term, this paper harnesses the power of deep reinforcement learning (RL) algorithm. Specifically, Spatial-Temporal Context learning (STC) algorithm is added into our model to achieve its tracking performance more efficiently. The tracking model proposed above demonstrates good performance in an existing tracking benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. IEEE Conf. Comput. Vis. Pattern Recogn. 9(4), 2411–2418 (2013)
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. Int. Conf. Neural Inf. Process. Syst. 1, 809–817 (2013)
Wang, L., Ouyang, W., Wang, X.: Visual tracking with fully convolutional networks. In: IEEE International Conference on Computer Vision, pp. 3119–3127. IEEE (2016)
Wu, P.F., Xiao, F., Sha, C., Huang, H.P., Wang, R.C., Xiong, N.: Node scheduling strategies for achieving full-view area coverage in camera sensor networks. Sensors 17(6), 1303–1307 (2017)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)
Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 1449–1458. IEEE Computer Society (2016)
Girshick, R., Donahue, J., Darrell, T.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)
Gui, J., Hui, L., Xiong, N.X.: A game-based localized multi-objective topology control scheme in heterogeneous wireless networks. IEEE Access 5, 2396–2416 (2017)
Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed. Tools Appl. 75, 1947–1962 (2016). https://doi.org/10.1007/s11042-014-2381-8
Gao, L., Yu, F., Chen, Q., Xiong, N.: Consistency maintenance of do and undo/redo operations in real-time collaborative bitmap editing systems. Clust. Comput. 19(1), 255–267 (2015). https://doi.org/10.1007/s10586-015-0499-8
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Fang, W., Li, Y., Zhang, H., Xiong, N., Lai, J., Vasilakos, A.V.: On the through put-energy trade off for data transmission between cloud and mobile devices. Inf. Sci. 283, 79–93 (2014)
Lu, X., Chen, S., Xiong, N.: ViMediaNet: an emulation system for interactive multimedia based telepresence services. J. Super Comput. (SCI Indexed) 73, 3562–3578 (2017)
Zhang, D., Maei, H., Wang, X., Wang, Y.F.: Deep Reinforcement Learning for Visual Object Tracking in Videos, p. 10. arXiv preprint (2017)
Zhou, X., Liu, X., Yang, C., Jiang, A., Yan, B.: Multi-channel features spatio-temporal context learning for visual tracking. IEEE Access 5, 12856–12864 (2017)
Baek, S., Kim, K.I., Kim, T.: Real-time online action detection forests using spatio-temporal contexts. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, pp. 158–167(2017)
Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: ActionVLAD: learning spatio-temporal aggregation for action classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3165–3174 (2017)
Lee, H., Jung, M., Tani, J.: Recognition of visually perceived compositional human actions by multiple spatio-temporal scales recurrent neural networks. IEEE Trans. Cogn. Dev. Syst. 10(4), 1058–1069 (2018)
Wang, Y., et al.: Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel. Inf. Sci. 408, 70–83 (2017)
Lu, Z., Lin, Y.-R., Huang, X., Xiong, N., Fang, Z.: Visual topic discovering, tracking and summarization from social media streams. Multimed. Tools Appl. 76(8), 10855–10879 (2016). https://doi.org/10.1007/s11042-016-3877-1
He, S., Yang, Q., Wang, J., Yang, M.H.: Visual tracking via locality sensitive histograms. In: Computer Vision and Pattern Recognition. IEEE 2013, pp. 2427–2434 (2013)
Shu, L., Fang, Y., Fang, Z., Yang, Y., Fei, F., Xiong, N.: A novel objective quality assessment for super-resolution images. Int. J. Signal Process. Image Process. Pattern Recogn. 9(5), 297–308 (2016)
Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process. 28(11), 5596–5609 (2019)
Zhang, T.Z., Liu, S., Yan, S.C., Ghanem, B., Ahuja, N., Yang, M.H.: Structural sparse tracking. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 150–158. IEEE (2015)
Xiong, N., Liu, R.W., Liang, M., Liu, Z., Wu, H.: Effective alternating direction optimization methods for sparsity-constrained blind image deblurring. Sensors 7, 174–182 (2017)
Zhang, H., Liu, R.W., Wu, D., Liu, Y., Xiong, N.N: Non-convex total generalized variation with spatially adaptive regularization parameters for edge-preserving image restoration. J. Internet Technol. 17(7), 1391–1403 (2016)
Xia, Z., Xiong, N.N., Vasilakosc, A.V., Sun, X.: EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf. Sci. 387, 195–204 (2017)
Fang, Y., Fang, Z., Yuan, F., Yang, Y., Yang, S., Xiong, N.N.: Optimized Multi-operator Image Retargeting Based on Perceptual Similarity Measure. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47, 1–11 (2016)
Zhang, C., Wu, D., Xiong, N., et al.: Non-local regularized variational model for image deblurring under mixed gaussian-impulse noise. J. Internet Technol. 16(7), 1301–1320 (2015)
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. Comput. Sci. 1–10 (2014)
Xu, K., Ba, J., Kiros, R.: Show, attend and tell: neural image caption generation with visual attention. In: Computer Science, pp. 2048–2057 (2015)
Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems. IEEE, pp. 1–4 (2017)
Zhang, H.Y., Zheng, X.: Spatio-temporal context tracking algorithm based on dual-object model. Optics Preci. Eng. 24(5), 1215–1223 (2016)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: International Conference on Learning Representation. ICLR, pp. 1095–32 (2015)
Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision, ICCV 2011, pp. 6–11 (2011)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: IEEE International Conference on Computer Vision. IEEE, pp. 1195–1202 (2011)
Shahzad, A., et al.: Real time MODBUS transmissions and cryptography security designs and enhancements of protocol sensitive information. Symmetry 7(3), 1176–1210 (2015)
Huang, K., Zhang, Q., Zhou, C., Xiong, N., Qin, Y.: An efficient intrusion detection approach for visual sensor networks based on traffic pattern learning. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2704–2713 (2017)
Wu, W., Xiong, N., Wu, C.: Improved clustering algorithm based on energy consumption in wireless sensor networks. IET Netw. 6(3), 47–53 (2017)
Chunxue, W., et al.: UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 7, 117227–117245 (2019)
Ling-Fang Li, X., Wang, W.-J., Xiong, N.N., Yong-Xing, D., Li, B.-S.: Deep learning in skin disease image recognition. a review. IEEE Access 8, 208264–208280 (2020)
Acknowledgements
This research was supported by Shanghai Science and Technology Innovation Action Plan Project (16111107502, 17511107203) Shanghai key lab of modern optical systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
He, P., Wu, C., Liu, K., Xiong, N.N. (2021). Deep Reinforcement Learning Based on Spatial-Temporal Context for IoT Video Sensors Object Tracking. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2020. Lecture Notes in Computer Science(), vol 12608. Springer, Cham. https://doi.org/10.1007/978-3-030-74717-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-74717-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74716-9
Online ISBN: 978-3-030-74717-6
eBook Packages: Computer ScienceComputer Science (R0)