The INUS Platform: A Modular Solution for Object Detection and Tracking from UAVs and Terrestrial Surveillance Assets
<p>The proposed situational awareness system, namely the INUS platform.</p> "> Figure 2
<p>Basic hardware modules of the system.</p> "> Figure 3
<p>Example snapshot of the User Interface of the intelligence officer’s workstation on the OKUTAMA dataset.</p> "> Figure 4
<p>Workflow of the object tracking module.</p> "> Figure 5
<p>Processed frames as the time goes on (frames (<b>a</b>–<b>e</b>)) during a multiple car (three objects) tracking process on the UAV123 dataset.</p> "> Figure 6
<p>Processed frames as the time progresses (frames (<b>a</b>–<b>g</b>)) during a multiple person (two objects) tracking process on the OKUTAMA dataset.</p> "> Figure 7
<p>Processed frames as the time progresses (frames (<b>a</b>–<b>e</b>)) during a multiple car (two objects) tracking process on the VIVID dataset.</p> "> Figure 8
<p>Processed frames as the time progresses (frames (<b>a</b>–<b>e</b>)) during a multiple person (two objects) tracking process on the LITIV2012 dataset.</p> "> Figure 9
<p>Object detection results (persons in blue rectangles and cars in orange rectangles) as the time goes on (frames (<b>a</b>–<b>c</b>)) on the OKUTAMA dataset and applying several deep learning schemes.</p> "> Figure 9 Cont.
<p>Object detection results (persons in blue rectangles and cars in orange rectangles) as the time goes on (frames (<b>a</b>–<b>c</b>)) on the OKUTAMA dataset and applying several deep learning schemes.</p> "> Figure 10
<p>Object detection results (persons in blue rectangles and cars in orange rectangles) as time progresses (frames (<b>a</b>–<b>c</b>)) on the UCF-ARG dataset and applying several pretrained deep learning schemes.</p> "> Figure 10 Cont.
<p>Object detection results (persons in blue rectangles and cars in orange rectangles) as time progresses (frames (<b>a</b>–<b>c</b>)) on the UCF-ARG dataset and applying several pretrained deep learning schemes.</p> "> Figure 11
<p>Left: annotation file; right: the corresponding image subset with the bounding boxes of each sample object (green rectangles).</p> "> Figure 12
<p>Distribution of the samples in each class per dataset used for the training and validation sets.</p> "> Figure 13
<p>Training and validation loss percentages versus training epochs for the custom YOLOv3 model.</p> "> Figure 14
<p>Object detection results (persons in cyan and car in orange/yellow rectangles) applying the custom YOLOv3 model in real case videos.</p> "> Figure 15
<p>Left column: object detection results (green rectangles) as time progresses (frames (<b>a</b>–<b>c</b>)) on the LITIV2012 dataset and applying the OBIA technique. Right column: the corresponding binary masks associated with the selected color and area thresholds.</p> "> Figure 16
<p>Object detection results (persons in blue rectangles and cars in orange rectangles) for the terrestrial asset using the thermal camera as time progresses (frames (<b>a</b>–<b>f</b>)).</p> "> Figure 16 Cont.
<p>Object detection results (persons in blue rectangles and cars in orange rectangles) for the terrestrial asset using the thermal camera as time progresses (frames (<b>a</b>–<b>f</b>)).</p> "> Figure 17
<p>Collinearity condition in the 3D space.</p> "> Figure 18
<p>Examples of the calculated positions of a tracked object superimposed on Google Earth. The blue point indicates the real position of the object and the red point indicates the corresponding calculated position of the object through the localization module.</p> "> Figure 19
<p>Sample chess-board pattern images from the RGB camera that collected in our laboratory.</p> "> Figure 20
<p>Left: calibrated radial distortion curve. Right: calibrated intrinsic RGB camera parameters.</p> ">
Abstract
:1. Introduction
2. Approach
- (i)
- Modular architecture that facilitates the design of a multipurpose situational awareness system that exploits aerial assets from UAVs and terrestrial assets from fixed positions.
- (ii)
- Provision of localized object detection and tracking events exploiting novel computer vision, image processing, and machine learning techniques that can be efficiently performed in complex scenes. Object detection and tracking from RGB and thermal videos is a demanding task due to radiometric distortions (e.g., shadows, reflections, etc.), occlusions, repetitive object patterns, multiple moving targets, simultaneous existence of several types of objects in the scene, etc.
- (iii)
- User friendly operational interface for standalone deployment and for seamless integration with external systems such as surveillance systems, decision support systems, and command and control systems (C2s).
3. INUS Platform
3.1. Proposed System
3.2. Features and Hardware
3.3. Intelligence Officer’s Workstation
- Video Streaming Proxy Server: Janus is a general purpose WebRTC Server capable of setting up a WebRTC media communication with a browser, acting as an intermediate between the UAV and terrestrial assets, that generates video streams and any component that consumes those video streams.
- Message Broker: Eclipse Mosquitto is an open source (EPL/EDL licensed) message broker that implements the MQTT protocol. All messages from and to UAV and terrestrial assets are produced and consumed through appropriate topics on this message broker.
- Intelligence Officer’s Workstation Back-End: A back-end service was developed that consumes data from the message broker and stores them in a database. At the same time, it exposes a REST API for communication with the front-end. It was developed using Python/Flask and MongoDB.
- Intelligence Officer’s Workstation Front-End: A web application was developed that consumes data from the back-end and displays it to the intelligence officer. At the same time, the intelligence officer is capable of issuing several commands to the UAV and terrestrial assets. These commands are: (i) choose an event and focus it on the map; (ii) choose a UAV or terrestrial asset and focus it on the map; (iii) choose a camera on the UAV or terrestrial asset and show a video stream from that camera; (iv) change the zoom level of a camera on the UAV or terrestrial asset (only if supported); (v) control the camera of the UAV either by using the map to point to a specific location or by interacting with the video player; and (vi) draw a bounding box on the video player and start tracking the object in the bounding box.
- Map component: Used to represent the geolocalized information of the platform (assets, events, etc.).
- Streaming video player: Main task is to show the selected video stream, manipulate the camera, and start tracking and following of an object.
- Status bar: Used to display critical information in the form of colored boxes. This interactive display method is used to emphasize certain status information (e.g., the connection with the UAV or terrestrial asset, status of tracking objects) with the use of color (green indicates normal status, red indicates alert status, blue indicates general information) and the use of animations (a pulsing animation is triggered on alert status).
- Event list: A list of the events that has been detected by the platform.
- Asset list: A list of the UAVs or terrestrial assets that are being handled by the platform.
- Top menu: Supplies access to specific functions, such as camera drop down menu, zoom drop down menu, and settings menu.
4. Datasets for Experiments and Training
5. Object Tracking
5.1. Object Tracking Module
- Define and set the tracker. The object tracker is predefined among the: (i) MedianFlow, (ii) Boosting, (iii) GoTURN, (iv) Mosse, (v) CSRT, (vi) TLD, (vii) KCF, and (viii) MIL.
- Read the current frame of the camera streaming and the image coordinates of the defined from the user bounding box on the starting frame.
- Apply the tracker and get the matrix object name and matrix box point. The matrix object name includes the name of the tracked object (person or vehicle). The matrix box point includes the bounding box (i.e., the top left and the bottom right image coordinates) of the tracked object.
- Overlay the rectangle of the matrix box point to the current frame of the camera streaming.
- Calculate the object centroid (as image coordinates) of the bounding box.
- Import the object centroid to the localization module and calculate the corresponding WGS’84 coordinates (latitude, longitude) and UTM projection coordinates. The localization module is described in Section 7.
- Import the Universal Transverse Mercator (UTM) projection coordinates of the tracked object to the speed and heading module. The speed and heading module is described in Section 7.2.
- Export the matrix object name, the matrix box point, the WGS’84 coordinates (latitude, longitude), the speed and the heading to the Message Broker (see Section 3.3).
5.2. Experiments
6. Object Detection
6.1. Object Detection Module
- Custom deep learning YOLOv3 model.
- An OBIA technique based on a thresholding/rule-based model that considers color values and pixel area.
- Define and set the object detector. The object detector is predefined among the: (i) pretrained YOLOv3 model, (ii) pretrained TinyYOLOv3 model, (iii) pretrained RetinaNET model, and (iv) custom YOLOv3 model. The process used to create and train a custom YOLOv3 model is provided in Section 6.3.2.
- Read the deep learning weights of the defined object detector. The deep learning weights are in *.h5 format. In the case of the custom YOLOv3 model, a *.json file is also read that includes the new trained anchors.
- Read the current frame of the camera streaming.
- Apply the object detection model to the current frame and get the matrix object name, matrix probability, and matrix box point. The matrix object name includes the name of the class of each detected object (person or vehicle). The matrix probability includes the corresponding percentage probability where the detected object belongs to the assigned class. The matrix box point includes the bounding box (i.e., the top left and the bottom right image coordinates) of the detected object.
- Overlay the rectangles of the matrix box point, the matrix object name and the matrix probability to the current frame of the camera streaming.
- Calculate the object centroids (as image coordinates) of each bounding box from the matrix box point.
- Import the object centroids to the localization module. The localization module is described in Section 7.
- Export the matrix object name, the matrix box point and the WGS’84 coordinates (latitude, longitude) from the localization module of each detected object to the Message Broker (see Section 3.3).
6.2. Evaluation Metrics
6.3. Aerial Asset
6.3.1. Experiments with Pretrained Deep Learning Models
6.3.2. Experiments with Custom Deep Learning Model
6.3.3. Experiments with OBIA
- (i)
- color threshold = 170 pixels. Pixels with higher color values from this threshold correspond to the objects of interest of high temperature.
- (ii)
- dilation contour = 20 pixels. A dilation on the contours of the detected objects is performed to fill holes and to extract solid objects.
- (iii)
- minimum area = 10 pixels and maximum area = 30,000 pixels. Only the objects within this pixel area range are assigned as objects of interest.
6.4. Terrestrial Asset
Experiments with Pretrained Deep Learning Models
7. Localization, Speed and Heading
7.1. Localization Module
- Import the object centroids from the object detection or object tracking module.
- Import the turns yaw, pitch, roll of the gimbal.
- Import the UAV position as WGS’ 84 coordinates.
- Import the camera sensor parameters.
- Calculate the distance from the UAV position and the projection image center in the real world via the turn pitch.
- Transform the UAV position from WGS’ 84 to the UTM projection.
- Transform the object centroids coordinates from the image plane to photogrammetric image center.
- Calculate the real-world coordinates of the detected or tracked object in the UTM projection via the collinearity condition.
- Transform the coordinates from UTM projection to WGS’84.
- Export the UTM projection coordinates to the speed and heading module if the object tracking module is active. The speed and heading module is described in Section 7.2.
- Export the WGS’84 coordinates to the object detection or object tracking module.
7.1.1. Collinearity Condition
- -
- We assume the earth is semi-flat between the aircraft and the target in the Field Of View (FOV), with a variance of less than a couple of meters.
- -
- We have at least 1 arcsecond post data for the digital elevation model (DEM).
- -
- We assume that the plumb-line (e.g., local gravity vector) is parallel to the geodetic normal of the ellipsoid. This will allow us to simply add/subtract DEM, ellipsoid, and geoid elevations.
- -
- We assume that we know the aircraft position to less than a few centimeters, and attitude to some immeasurable error.
- -
- We assume that we know the position of the camera and its orientation relative to the aircraft perfectly.
- -
- We assume that we do not have any timing errors between image frames, and position/attitude information.
- -
- x′, y′ are the centroid coordinates of the object of interest in the image plane converted from the image plane system to the photogrammetric system.
- -
- xo, yo are the coordinates of the principal point.
- -
- Δxr, Δyr are the corrections of the radial-symmetric lens distortion parameters.
- -
- Δxd, Δyd are the corrections of the decentering lens distortion parameters.
- -
- Rωφκ is the matrix with elements ri,j incorporating the turns in the 3D space of the image plane referred to the world’s system.
- -
- c is the focal length of the camera.
- -
- X0, Y0, Z0 are the coordinates of the image center in the 3D space. Such coordinates are provided from the UAV position (as latitude and longitude) converted from the WGS’ 84 coordinate system to the UTM.
- -
- X, Y, Z are the coordinates of the object of interest in the 3D space. Because a monocular vision system is considered, the Z value should be predefined. Thus, the Z value is set as the home UAV height altitude associated with the ground geometric height in WGS’ 84. Solving Equation (5) for the unknowns X and Y, the corresponding UTM coordinates are calculated. The final latitude and longitude coordinates of the object are calculated by converting the UTM coordinates to WGS’ 84.
7.1.2. Camera Calibration
7.2. Speed and Heading
- Import the object coordinates of the tracked object in UTM projection from the localization module with a time interval of 1 s.
- Calculate the differences Dx = xB − xA, Dy = yB − yA in the X axis and Y axis between the two position of timestamps A and B.
- Calculate the azimuth of the vector between the two position points as azimuth = arctan (Dx/Dy).
- Calculate the distance between the two position of timestamps as distance = sqrt (Dx2 + Dy2).
- Export the speed, which is the distance in the time interval between the two position of timestamps, to the object tracking module.
- Export the heading, which is the azimuth, to the object tracking module.
8. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Römer, H.; Kiefl, R.; Henkel, F.; Wenxi, C.; Nippold, R.; Kurz, F.; Kippnich, U. Using airborne remote sensing to increase situational awareness in civil protection and humanitarian relief—the importance of user involvement. ISPRS Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2016, XLI-B8, 1363–1370. [Google Scholar] [CrossRef] [Green Version]
- Rummukainen, L.; Oksama, L.; Timonen, J.; Vankka, J.; Lauri, R. Situation awareness requirements for a critical infrastructure monitoring operator. In Proceedings of the 2015 IEEE International Symposium on Technologies for Homeland Security (HST), Waltham, MA, USA, 14–16 April 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Endsley, M.R. Toward a Theory of Situation Awareness in Dynamic Systems. Hum. Factors: J. Hum. Factors Ergon. Soc. 1995, 37, 32–64. [Google Scholar] [CrossRef]
- Geraldes, R.; Goncalves, A.; Lai, T.; Villerabel, M.; Deng, W.; Salta, A.; Nakayama, K.; Matsuo, Y.; Prendinger, H. UAV-Based Situational Awareness System Using Deep Learning. IEEE Access 2019, 7, 122583–122594. [Google Scholar] [CrossRef]
- Sharma, A.; Nazir, S.; Ernstsen, J. Situation awareness information requirements for maritime navigation: A goal directed task analysis. Saf. Sci. 2019, 120, 745–752. [Google Scholar] [CrossRef]
- Thombre, S.; Zhao, Z.; Ramm-Schmidt, H.; Garcia, J.M.V.; Malkamaki, T.; Nikolskiy, S.; Hammarberg, T.; Nuortie, H.; Bhuiyan, M.Z.H.; Särkkä, S.; et al. Sensors and AI Techniques for Situational Awareness in Autonomous Ships: A Review. IEEE Trans. Intell. Transp. Syst. 2020, 1–20. [Google Scholar] [CrossRef]
- Onwubiko, C. Designing Information Systems and Network Components for Situational Awareness. Available online: www.igi-global.com/chapter/designing-information-systems-network-components/62378 (accessed on 23 December 2020). [CrossRef] [Green Version]
- Nguyen, T.T.; Lim, C.P.; Nguyen, N.D.; Gordon-Brown, L.; Nahavandi, S. A Review of Situation Awareness Assessment Approaches in Aviation Environments. IEEE Syst. J. 2019, 13, 3590–3603. [Google Scholar] [CrossRef] [Green Version]
- Budiyono, A. Advances in Unmanned Aerial Vehicles Technologies. Chin. Sci. Bull. 2007, 52, 1–13. [Google Scholar]
- Valavanis, K.P. (Ed.) Advances in Unmanned Aerial Vehicles: State of the Art and the Road to Autonomy; Intelligent Systems, Control and Automation: Science and Engineering; Springer: Dordrecht, The Netherlands, 2007. [Google Scholar] [CrossRef]
- Li, B.; Fei, Z.; Zhang, Y. UAV Communications for 5G and Beyond: Recent Advances and Future Trends. IEEE Internet Things J. 2019, 6, 2241–2263. [Google Scholar] [CrossRef] [Green Version]
- Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Kim, T.-K. Multiple object tracking: A literature review. arXiv 2017, arXiv:14097618. [Google Scholar]
- Wang, Q.; Chen, F.; Xu, W.; Yang, M.-H. An experimental comparison of online object-tracking algorithms. In Wavelets and Sparsity XIV.; SPIE: Bellingham, WA, USA, 2011; Volume 8138, p. 81381A. [Google Scholar] [CrossRef] [Green Version]
- Mueller, M.; Sharma, G.; Smith, N.; Ghanem, B. Persistent Aerial Tracking system for UAVs. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 1562–1569. [Google Scholar] [CrossRef]
- Qu, Z.; Lv, X.; Liu, J.; Jiang, L.; Liang, L.; Xie, W. Long-term reliable visual tracking with UAVs. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 2000–2005. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Z.; Xiong, H. Visual Pedestrian Tracking from a UAV Platform. In Proceedings of the 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), Wuhan, China, 17–19 March 2017; pp. 196–200. [Google Scholar] [CrossRef]
- Zhao, Z.-Q.; Zheng, P.; Xu, S.-T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Networks Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [Green Version]
- Xu, Y.; Yu, G.; Wang, Y.; Wu, X.; Ma, Y. Car Detection from Low-Altitude UAV Imagery with the Faster R-CNN. Available online: https://www.hindawi.com/journals/jat/2017/2823617/ (accessed on 23 December 2020). [CrossRef] [Green Version]
- Xu, S.; Savvaris, A.; He, S.; Shin, H.-S.; Tsourdos, A. Real-time Implementation of YOLO+JPDA for Small Scale UAV Multiple Object Tracking. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; pp. 1336–1341. [Google Scholar] [CrossRef]
- Benjdira, B.; Khursheed, T.; Koubaa, A.; Ammar, A.; Ouni, K. Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3. In Proceedings of the 2019 1st International Conference on Unmanned Vehicle Systems-Oman (UVS), Muscat, Oman, 5–7 February 2019; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
- Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q. Vision Meets Drones: A Challenge. arXiv 2018, arXiv:180407437. [Google Scholar]
- Carrio, A.; Sampedro, C.; Rodriguez-Ramos, A.; Campoy, P. A Review of Deep Learning Methods and Applications for Unmanned Aerial Vehicles. Available online: https://www.hindawi.com/journals/js/2017/3296874/ (accessed on 23 December 2020). [CrossRef]
- Fraga-Lamas, P.; Ramos, L.; Mondéjar-Guerra, V.; Fernandez-Carames, T.M. A Review on IoT Deep Learning UAV Systems for Autonomous Obstacle Detection and Collision Avoidance. Remote. Sens. 2019, 11, 2144. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote. Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Ferreira, A.S. Workload and Situational Awareness management in UAV teams through interface modelling. Available online: https://www.semanticscholar.org/paper/Workload-and-Situational-Awareness-management-in-Ferreira/730c3085ea481256f7620ba7d3be0bfa9f33dd9d (accessed on 22 January 2021).
- Tijtgat, N.; Van Ranst, W.; Volckaert, B.; Goedemé, T.; De Turck, F. Embedded real-time object detection for a UAV warning system. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, 22–29 October 2017; pp. 2110–2118. [Google Scholar]
- Hadia, X.; Price, S.R.; Price, S.R.; Price, S.J.; Fairley, J.R. Object detection on aerial imagery to improve situational awareness for ground vehicles. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II; SPIE: Bellingham, WA, USA, 2020; Volume 11413, p. 114131J. [Google Scholar] [CrossRef]
- Bhattarai, M.; Ramon, M.M. A Deep Learning Framework for Detection of Targets in Thermal Images to Improve Firefighting. IEEE Access 2020, 8, 88308–88321. [Google Scholar] [CrossRef]
- Hossain, S.; Lee, D.-J. Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial Imagery via a Flying Robot with GPU-Based Embedded Devices. Sensors 2019, 19, 3371. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mazzia, V.; Khaliq, A.; Salvetti, F.; Chiaberge, M. Real-Time Apple Detection System Using Embedded Systems with Hardware Accelerators: An Edge AI Application. IEEE Access 2020, 8, 9102–9114. [Google Scholar] [CrossRef]
- Barekatain, M.; Marti, M.; Shih, H.-F.; Murray, S.; Nakayama, K.; Matsuo, Y.; Prendinger, H. Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 2153–2160. [Google Scholar] [CrossRef] [Green Version]
- Mueller, M.; Smith, N.; Ghanem, B. A Benchmark and Simulator for UAV Tracking. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 445–461. [Google Scholar]
- Nagendran, A.; Harper, D.; Shah, M. New System Performs Persistent Wide-Area Aerial Surveillance. 2010. Available online: http://spie.org/x41092.xml?ArticleID=x41092 (accessed on 28 January 2021).
- Bozcan, I.; Kayacan, E. AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude Traffic Surveillance. arXiv 2020, arXiv:200111737. [Google Scholar]
- Hsieh, M.-R.; Lin, Y.-L.; Hsu, W.H. Drone-Based Object Counting by Spatially Regularized Regional Proposal Network. arXiv 2017, arXiv:170705972. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. arXiv 2018, arXiv:180400518. [Google Scholar]
- Tanner, F.; Colder, B.; Pullen, C.; Heagy, D.; Eppolito, M.; Carlan, V.; Oertel, C.; Sallee, P. Overhead imagery research data set—An annotated data library & tools to aid in the development of computer vision algorithms. In 2009 IEEE Applied Imagery Pattern Recognition Workshop (AIPR 2009); IEEE: Piscataway, NJ, USA, 2009; pp. 1–8. [Google Scholar] [CrossRef]
- GitHub. Aerial-Car-Dataset. Available online: https://github.com/jekhor/aerial-cars-dataset (accessed on 23 December 2020).
- Davis, J.W.; Keck, M.A. A Two-Stage Template Approach to Person Detection in Thermal Imagery. In Proceedings of the Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION’05), Breckenridge, CO, USA, 5–7 January 2005; Volume 1, pp. 364–369. [Google Scholar] [CrossRef]
- Kumar, S.V.A.; Yaghoubi, E.; Das, A.; Harish, B.S.; Proenca, H. The P-DESTRE: A Fully Annotated Dataset for Pedestrian Detection, Tracking, and Short/Long-Term Re-Identification From Aerial Devices. IEEE Trans. Inf. Forensics Secur. 2021, 16, 1696–1708. [Google Scholar] [CrossRef]
- Collins, R.T.; Hebert, M.; Yalcin, H.; Tolliver, D.; Leordeanu, M.; Zhou, X.; Teh, S.K. VIVID Tracking Evaluation Web Site. Available online: http://vision.cse.psu.edu/data/vividEval/ (accessed on 23 December 2020).
- Torabi, A.; Massé, G.; Bilodeau, G.-A. An iterative integrated framework for thermal–visible image registration, sensor fusion, and people tracking for video surveillance applications. Comput. Vis. Image Underst. 2012, 116, 210–221. [Google Scholar] [CrossRef]
- Thermal Infrared Dataset. Available online: https://www.google.com/search?client=firefox-b-d&q=ir+iricra2014+%E2%80%93+ASL+Datasets (accessed on 23 December 2020).
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Forward-Backward Error: Automatic Detection of Tracking Failures. In Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2756–2759. [Google Scholar]
- Grabner, H.; Bischof, H. Real-Time Tracking via On-line Boosting. In Proceedings of the British Machine Vision Conference, Edinburgh, UK, 4–7 September 2006. [Google Scholar]
- Held, D.; Thrun, S.; Savarese, S. Learning to Track at 100 FPS with Deep Regression Networks. In Computer Vision—ECCV 2016; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 749–765. [Google Scholar] [CrossRef] [Green Version]
- Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar] [CrossRef]
- Lukežič, A.; Vojir, T.; Zajc, L.Č.; Matas, J.; Kristan, M. Discriminative Correlation Filter Tracker with Channel and Spatial Reliability. Int. J. Comput. Vis. 2018, 126, 671–688. [Google Scholar] [CrossRef] [Green Version]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Face-TLD: Tracking-Learning-Detection applied to faces. In Proceedings of the IEEE International Conference on Image Processing, Hong Kong, China, 12–15 September 2010; pp. 3789–3792. [Google Scholar] [CrossRef] [Green Version]
- Cai, C.; Liang, X.; Wang, B.; Cui, Y.; Yan, Y. A Target Tracking Method Based on KCF for Omnidirectional Vision. In Proceedings of the 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 2674–2679. [Google Scholar] [CrossRef]
- Babenko, B.; Yang, M.-H.; Belongie, S. Visual tracking with online Multiple Instance Learning. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 983–990. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:180402767. [Google Scholar]
- Yi, Z.; Shen, Y.; Jun, Z. An improved tiny-yolov3 pedestrian detection algorithm. Optik 2019, 183, 17–23. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.B.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:170802002. [Google Scholar] [CrossRef] [Green Version]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 Imagery. Remote. Sens. 2019, 11, 531. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Computer Vision—ECCV 2014; Lecture Notes in Computer Science; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Swizerland, 2014; pp. 740–755. [Google Scholar] [CrossRef] [Green Version]
- Rottensteiner, F.; Sohn, G.; Gerke, M.; Wegner, J.D. ISPRS Test Project on Urban Classification and 3D Building Reconstruction Results. Available online: http://www2.isprs.org/tl_files/isprs/wg34/docs/ComplexScenes_revision_v4.pdf (accessed on 28 January 2021).
- Kaya, A.; Keceli, A.S.; Catal, C.; Yalic, H.Y.; Temucin, H.; Tekinerdogan, B. Analysis of transfer learning for deep neural network based plant classification models. Comput. Electron. Agric. 2019, 158, 20–29. [Google Scholar] [CrossRef]
- El-Ashmawy, K.L.A. A comparison study between collinearity condition, coplanarity condition, and direct linear transformation (DLT) method for camera exterior orientation parameters determination. Geodesy Cartogr. 2015, 41, 66–73. [Google Scholar] [CrossRef] [Green Version]
- Johnston, M.G. Ground Object Geo-Location using UAV Video Camera. In Proceedings of the 2006 IEEE/AIAA 25TH Digital Avionics Systems Conference, Portland, OR, USA, 15–18 October 2006; pp. 1–7. [Google Scholar] [CrossRef]
- Douskos, V.; Grammatikopoulos, L.; Kalisperakis, I.; Karras, G.; Petsa, E. Fauccal: An Open Source Toolbox for Fully Automatic Camera Calibration. Available online: http://portal.survey.ntua.gr/main/labs/photo/staff/gkarras/Karras_Cipa_09b.pdf (accessed on 23 December 2020).
Dataset | Type of Sensor | View |
---|---|---|
OKUTAMA | RGB | |
UAV123 | RGB | |
UCF-ARG | RGB | |
VisDrone2019 | RGB | |
AU-AIR | RGB | |
CARPK | RGB | |
PUCPR | RGB | |
UAVDT | RGB | |
OIRDS | RGB | |
JEKHOR | RGB | |
OTCBVS-RGB | RGB | |
P-DESTRE | RGB | |
VIVID | Thermal | |
LITIV2012 | Thermal | |
OTCBVS- THERMAL | Thermal | |
IRICRA | Thermal |
OKUTAMA Dataset | UCF-ARG Dataset | |||||
---|---|---|---|---|---|---|
Pretrained Deep Learning Model | CM (%) | CR (%) | Q (%) | CM (%) | CR (%) | Q (%) |
YOLOv3 | 60.0 | 100.0 | 60.0 | 79.2 | 100.0 | 79.2 |
TinyYOLOv3 | 33.3 | 100.0 | 33.3 | 22.5 | 81.8 | 21.4 |
RetinaNET | 68.1 | 89.4 | 63.0 | 70.8 | 95.5 | 68.5 |
Dataset | Number of Total Images | Number of Samples of the Class “Vehicle” | Number of Samples of the Class “Person” |
---|---|---|---|
OKUTAMA | 4128 | - | 23,749 |
UAV123 | 795 | 2888 | 289 |
UCF-ARG | 1007 | 3021 | 1024 |
VisDrone2019 | 34,061 | 845,686 | 406,896 |
AU-AIR | 32,713 | 125,426 | 5158 |
CARPK | 1448 | 89,774 | - |
PUCPR | 120 | 16,916 | - |
UAVDT | 40,403 | 763,817 | - |
OIRDS | 428 | 1644 | - |
JEKHOR | 26 | 270 | - |
OTCBVS-RGB | 25 | 24 | 27 |
P-DESTRE | 20,635 | - | 292,003 |
VIVID | 239 | 875 | - |
LITIV2012 | 143 | - | 429 |
OTCBVS- THERMAL | 659 | 99 | 2034 |
IRICRA | 3237 | - | 5747 |
Total count | 140,067 | 1,850,440 | 737,356 |
Video | CM (%) | CR (%) | Q (%) |
---|---|---|---|
1 | 78.3 | 92.1 | 73.4 |
2 | 58.2 | 100.0 | 58.2 |
3 | 95.0 | 100.0 | 95.0 |
4 | 74.5 | 92.3 | 70.3 |
5 | 71.4 | 100.0 | 71.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maltezos, E.; Douklias, A.; Dadoukis, A.; Misichroni, F.; Karagiannidis, L.; Antonopoulos, M.; Voulgary, K.; Ouzounoglou, E.; Amditis, A. The INUS Platform: A Modular Solution for Object Detection and Tracking from UAVs and Terrestrial Surveillance Assets. Computation 2021, 9, 12. https://doi.org/10.3390/computation9020012
Maltezos E, Douklias A, Dadoukis A, Misichroni F, Karagiannidis L, Antonopoulos M, Voulgary K, Ouzounoglou E, Amditis A. The INUS Platform: A Modular Solution for Object Detection and Tracking from UAVs and Terrestrial Surveillance Assets. Computation. 2021; 9(2):12. https://doi.org/10.3390/computation9020012
Chicago/Turabian StyleMaltezos, Evangelos, Athanasios Douklias, Aris Dadoukis, Fay Misichroni, Lazaros Karagiannidis, Markos Antonopoulos, Katerina Voulgary, Eleftherios Ouzounoglou, and Angelos Amditis. 2021. "The INUS Platform: A Modular Solution for Object Detection and Tracking from UAVs and Terrestrial Surveillance Assets" Computation 9, no. 2: 12. https://doi.org/10.3390/computation9020012
APA StyleMaltezos, E., Douklias, A., Dadoukis, A., Misichroni, F., Karagiannidis, L., Antonopoulos, M., Voulgary, K., Ouzounoglou, E., & Amditis, A. (2021). The INUS Platform: A Modular Solution for Object Detection and Tracking from UAVs and Terrestrial Surveillance Assets. Computation, 9(2), 12. https://doi.org/10.3390/computation9020012