Abstract
The state of Kerala in India has seen multiple instances of intense cyclones in recent years, resulting in heavy flooding. One of the biggest challenges faced by rescuers is the accessibility to flooded areas and buildings during rescue operations. In such scenarios, unmanned aerial vehicles (UAVs) can deliver reliable aerial visual data to aid planning and operations during rescue. Object detectors based on deep learning methods provide an effective solution to automate the process of detecting relevant information from image/video data. These models are complex and resource-hungry, leading to severe speed constraints during field operations. The pixel displacement algorithm (PDA), a portable and effective technique, is developed in this work to speed up object detection models on devices with limited resources, such as edge devices. This method can be integrated with all object detection models to speed up the inference time. The proposed method is combined with multiple object detection models in this work to show its effectiveness. The YOLOv4 model combined with the proposed method outperformed the AP50 performance of the YOLOv4-tiny model by 6\(\%\) while maintaining the same processing time. This approach gave almost 10\(\times \) speed improvement to Jetson Nano at an accuracy cost of \(3\%\) when compared to YOLOv4. Further, a model to predict maximum pixel shift with respect to frame skip is proposed using parameters such as the altitude and velocity of the UAV and the tilt of the camera. Accurate prediction of pixel shift leads to a reduced search area, leading to reduced inference time. The effectiveness of the proposed model was tested against annotated locations, and it was found that the method was able to predict the search area for each test video segment with a high degree of accuracy.
Similar content being viewed by others
Data availability
Images used for model training cannot be published as they are privately owned. We have obtained permission to use them for research purposes. Videos used for the validation of equations are available on request.
References
Shubhasree, A.V., Sankaran, P., Raghu, C.: UAV image analysis of flooded area using convolutional neural networks. In: 2022 International Conference on Connected Systems & Intelligence (CSI), pp. 1–7. IEEE(2022)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Bacea, D.-S., Oniga, F.: Single stage architecture for improved accuracy real-time object detection on mobile devices. Image Vis. Comput. 130, 104613 (2023)
Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved yolov4-tiny (2020). arXiv preprint arXiv:2011.04244
Liu, S., Zha, J., Sun, J., Li, Z., Wang, G.: EdgeYOLO: an edge-real-time object detector (2023). arXiv preprint arXiv:2302.07483
Ganesh, P., Chen, Y., Yang, Y., Chen, D., Winslett, M.: YOLO-ReT: towards high accuracy real-time object detection on edge gpus. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3267–3277 (2022)
Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)
Luo, H., Xie, W., Wang, X., Zeng, W.: Detect or track: towards cost-effective video object detection/tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8803–8810 (2019)
Mao, H., Zhu, S., Han, S., Dally, W.J.: Patchnet—short-range template matching for efficient video processing (2021). arXiv preprint arXiv:2103.07371
Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 159, 296–307 (2020)
Ahmed, F., Mohanta, J., Keshari, A., Yadav, P.S.: Recent advances in unmanned aerial vehicles: a review. Arab. J. Sci. Eng. 47(7), 7963–7984 (2022)
Zhang, H., Sun, M., Ji, Y., Xu, S., Cao, W.: Learning-based object detection in high resolution uav images: an empirical study. In: 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), vol. 1, pp. 886–889. IEEE (2019)
Zhang, H., Sun, M., Li, Q., Liu, L., Liu, M., Ji, Y.: An empirical study of multi-scale object detection in high resolution uav images. Neurocomputing 421, 173–182 (2021)
Tian, G., Liu, J., Yang, W.: A dual neural network for object detection in uav images. Neurocomputing 443, 292–301 (2021)
Ševo, I., Avramović, A.: Convolutional neural network based automatic object detection on aerial images. IEEE Geosci. Remote Sens. Lett. 13(5), 740–744 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, Proceedings, Part I 14, 2016, pp. 21–37. Springer (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint arXiv:1804.02767
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Jocher, G., Nishimura, K., Mineeva, T., Vilariño, R.: Yolov5, Code repository (2020)
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: YOLOv6: a single-stage object detection framework for industrial applications (2022). arXiv preprint arXiv:2209.02976
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics (2023)
Wang, C.-Y., Yeh, I.-H., Liao, H.-Y.M.: You only learn one representation: unified network for multiple tasks (2021). arXiv preprint arXiv:2105.04206
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021 (2021). arXiv preprint arXiv:2107.08430
Long, X., Deng, K., Wang, G., Zhang, Y., Dang, Q., Gao, Y., Shen, H., Ren, J., Han, S., Ding, E. et al.: PP-YOLO: an effective and efficient implementation of object detector (2020). arXiv preprint arXiv:2007.12099
Han, W., Khorrami, P., Paine, T.L., Ramachandran, P., Babaeizadeh, M., Shi, H., Li, J., Yan, S., Huang, T.S.: Seq-nms for video object detection (2016). arXiv preprint arXiv:1602.08465
Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–825 (2016)
Kang, K., Li, H., Yan, J., Zeng, X., Yang, B., Xiao, T., Zhang, C., Wang, Z., Wang, R., Wang, X., et al.: T-cnn: tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2896–2907 (2017)
Jiao, L., Zhang, R., Liu, F., Yang, S., Hou, B., Li, L., Tang, X.: New generation deep learning for video object detection: a survey. IEEE Trans. Neural Netw. Learn. Syst. 33(8), 3195–3215 (2021)
Lu, Y., Lu, C., Tang, C.-K.: Online video object detection using association lstm. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2344–2352 (2017)
Xiao, F., Lee, Y.J.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 485–501 (2018)
Liu, M., Zhu, M.: Mobile video object detection with temporally-aware feature maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5686–5695 (2018)
Sabater, A., Montesano, L., Murillo, A.C.: Robust and efficient post-processing for video object detection. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10536–10542. IEEE (2020)
Sun, M., Xiao, J., Lim, E.G., Zhang, B., Zhao, Y.: Fast template matching and update for video object tracking and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10791–10799 (2020)
Dai, K., Wang, Y., Song, Q.: Real-time object tracking with template tracking and foreground detection network. Sensors 19(18), 3945 (2019)
Studio, W.H.E.: Kerala flood 2018—aerial view. https://www.youtube.com/watch?v=GAoFVMTe_24 (2018). Accessed 30 Dec 2021
OpenCV: Template matching, OpenCV. https://docs.opencv.org/3.4/de/da9/tutorial_template_matching.html (2013). Accessed 21 Nov 2022
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: International Conference on Computer Vision, pp. 548–2555. IEEE (2011)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Alcantarilla, P.F., Solutions, T.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2011)
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Computer Vision—ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12, 2012, pp. 214–227. Springer (2012)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Computer Vision—ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7–13, 2006. Proceedings, Part I 9, pp. 430–443. Springer (2006)
James, M.: Aerial footage of Kerala Varapuzha area flood August/15/2018. https://www.youtube.com/watch?v=zqcXd6ihRQY (2018). Accessed 10 Mar 2022
Media, M.: Munnar after massive Kerala flood 2018—arial view. https://www.youtube.com/watch?v=HVNkHPQgFvE (2018). Accessed 22 Jan 2022
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Author information
Authors and Affiliations
Contributions
S.AV wrote the manuscript under the supervision of P. S. and R. CV. Valuable corrections were given by P. S. and R. CV.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
AV, S., Sankaran, P. & C.V, R. Towards real-time video analysis of flooded areas: redundancy-based accelerator for object detection models. J Real-Time Image Proc 21, 119 (2024). https://doi.org/10.1007/s11554-024-01490-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01490-0