Multi-Dimensional Underwater Point Cloud Detection Based on Deep Learning
<p>Raw data obtained from the BV5000.</p> "> Figure 2
<p>Data pre-processing flow chart.</p> "> Figure 3
<p>Data augmentation flow chart.</p> "> Figure 4
<p>Schematic diagram of the data stack.</p> "> Figure 5
<p>Evolution from R-CNN to Faster R-CNN.</p> "> Figure 6
<p>3D results after choosing the ROI.</p> "> Figure 7
<p>3D results after removing the seabed.</p> "> Figure 8
<p>2D bird’s eye view of the point cloud.</p> "> Figure 9
<p>Results for 2D detection.</p> "> Figure 10
<p>Intersection Over the Union.</p> "> Figure 11
<p>Precision-recall curves for the YOLOv3 and Faster R-CNN neural networks.</p> "> Figure 12
<p>After removing the tire from the entire point cloud.</p> ">
Abstract
:1. Introduction
2. Acoustic Sonar Sensors
2.1. Side-Scan Sonar
2.2. Multi-Beam Echo Sounder
2.3. Blueview BV5000 3D MSS
3. Methods
3.1. Choosing the Region of Interest
3.2. Removing the Seabed
3.3. Random Sample Consensus (RANSAC) Algorithm
- (1)
- The seabed is assumed to be flat although in the real world the seabed is uneven; hence, the normal vector was set as (0,0,1) for the z-axis to fit a plane to the point cloud under additional orientation constraints.
- (2)
- The seabed thickness was set to 20 cm in this study. This represents the maximum distance allowed from an inlier point to the plane. However, the thickness may be changed for different experimental locations.
- (3)
- Points falling within the distance of 20 cm were found.
- (4)
- The inlier points considered part of the seabed were removed.
- (5)
- A point cloud without the seabed was thus obtained.
3.4. Pre-Processing and Expansion of 2D Data
3.5. Pre-Processing and Expansion of 3D Data
- (1)
- The data for each tire consisted of many points, and therefore, 2048 points were randomly selected from these points first.
- (2)
- The point cloud coordinates of a tire can be considered as a 2D array. One dimension is comprised of 2048 points, while the other dimension is based on x, y, and z coordinates (as shown in Figure 4).
- (3)
- The 2048 points were rotated in 5° increments to change their coordinates.
- (4)
- The 2D array can be stacked into a 3D array, where N is the third dimension, which denotes the number of data samples (as shown in Figure 4).
- (5)
- Determine whether the rotation procedure is complete. If not, return to step (3) to rotate the data again.
- (6)
- Determine whether all processing is complete. If not, return to step (1) to read new data.
- (7)
- The data are exported into ModelNet40 format.
- (8)
- Use PointNet and PointConv to train the models.
- (9)
- Evaluate the models’ accuracy.
3.6. Combining 3D Data with ModelNet40
3.7. Faster R-CNN
- (1)
- Feature extraction network: Serial convolution, rectified linear units, and pooling layers obtain a feature map of the original image.
- (2)
- Region Proposal Network (RPN): An RPN obtains the approximate location of objects on the feature map and generates region proposals. A softmax layer decides whether the anchors are positive or negative (i.e., if there is an object in the proposed region). The bounding box regression fixes the anchors and generates precise proposals.
- (3)
- ROI Pooling: Evolved from spatial pyramid pooling [32]. Feature maps and proposal information are used to extract proposal feature maps. A fully connected layer determines the target category.
- (4)
- Classification: The proposal feature maps determine the category of a proposal. Bounding box regression is used to obtain the precise locations of the detection boxes.
3.8. YOLOv3
3.9. PointNet
- (1)
- Disorder: The point cloud data are insensitive to the order of data.
- (2)
- Space relationship: An object is usually composed of a certain number of point clouds in a specific space, where spatial relationships exist among these point clouds.
- (3)
- Invariance: Point cloud data are invariant to certain spatial transformations, such as rotation and translation.
3.10. PointConv
4. Results
4.1. Two-Dimensional Image Detection
- (1)
- Every bounding box from the model was used to calculate the IOU of the ground truth. If the value was larger than the IOU threshold, it was considered true, and vice versa. If more than one bounding box was true, the bounding box with the highest IOU was set as true, and the others were set as false.
- (2)
- The bounding box prediction scores were sorted in the order of highest to lowest. If a bounding box was true, it was considered to be a true positive. Otherwise, it was considered to be a false positive. The precision and recall values were estimated using (15) and (16).
- (3)
- If the object existed in the ground truth, but the model did not detect it, it was regarded as a false negative.
- (4)
- If the model detected the object but the object did not exist in the ground truth, it was regarded as a false positive.
- (5)
- Every tire in the image would be detected by the steps mentioned above and obtained the results. This method can precisely analyze whether the model can obtain correct results for every tire.
4.2. Three-Dimensional Point Cloud Classification
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28; MIT Press 55 Hayward St.: Cambridge, MA, USA, 2015; pp. 91–99. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arxiv 2018, arXiv:1804.02767. [Google Scholar]
- Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
- Charles, R.Q.; Yi, L.; Su, H.L.; Guibas, J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems 30; Neural Information Processing Systems Foundation, Inc. (NIPS): Montréal, QC, Canada, 2017; pp. 5099–5108. [Google Scholar]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Menna, F.; Agrafiotis, P.; Georgopoulos, A. State of the art and applications in archaeological underwater 3D recording and mapping. J. Cult. Herit. 2018, 33, 231–248. [Google Scholar] [CrossRef]
- Abu, A.; Diamant, R. A Statistically-Based Method for the Detection of Underwater Objects in Sonar Imagery. IEEE Sens. J. 2019, 19, 6858–6871. [Google Scholar] [CrossRef]
- Williams, D.P. Transfer Learning with SAS-Image Convolutional Neural Networks for Improved Underwater Target Classification. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 78–81. [Google Scholar]
- Wang, Z.; Zhang, S. Sonar Image Detection Based on Multi-Scale Multi-Column Convolution Neural Networks. IEEE Access 2019, 7, 160755–160767. [Google Scholar] [CrossRef]
- Valdenegro-Toro, M. End-to-end object detection and recognition in forward-looking sonar images with convolutional neural networks. In Proceedings of the 2016 IEEE/OES Autonomous Underwater Vehicles (AUV), Tokyo, Japan, 12 December 2016; pp. 144–150. [Google Scholar]
- Sung, M.; Cho, H.; Kim, T.; Joe, H.; Yu, S. Crosstalk Removal in Forward Scan Sonar Image Using Deep Learning for Object Detection. IEEE Sens. J. 2019, 21, 9929–9944. [Google Scholar] [CrossRef]
- Aykin, M.D.; Negahdaripour, S. Three-Dimensional Target Reconstruction from Multiple 2-D Forward-Scan Sonar Views by Space Carving. IEEE J. Ocean. Eng. 2017, 42, 574–589. [Google Scholar] [CrossRef]
- Cho, H.; Kim, B.; Yu, S. AUV-Based Underwater 3-D Point Cloud Generation Using Acoustic Lens-Based Multibeam Sonar. IEEE J. Ocean. Eng. 2018, 43, 856–872. [Google Scholar] [CrossRef]
- Xuefeng, Z.; Qingning, L.; Ye, M.; Yongsheng, J. Dimensional Imaging Sonar Damage Identification Technology Research On Sea-Crossing Bridge Main Pier Pile Foundations. In Proceedings of the 2016 5th International Conference on Energy and Environmental Protection, Sanya, China, 21–23 November 2016. [Google Scholar]
- Moisan, E.; Charbonniera, P.; Fouchera, P.; Grussenmeyerb, P.; Guilleminb, S.; Koehlb, M. Building a 3D reference model for canal tunnel surveying using sonar and laser scanning. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Piano di Sorrento, Italy, 16–17 April 2015. [Google Scholar]
- Moisan, E.; Charbonnier, P.; Foucher, P.; Grussenmeyer, P.; Guillemin, S. Evaluating a Static Multibeam Sonar Scanner for 3D Surveys in Confined Underwater Environments. Remote Sens. 2018, 10, 1395. [Google Scholar] [CrossRef] [Green Version]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
- Wu, W.; Qi, Z.; Fuxin, L. PointConv: Deep Convolutional Networks on 3D Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9613–9622. [Google Scholar]
- Mishne, G.; Talmon, R.; Cohen, I. Graph-Based Supervised Automatic Target Detection. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2738–2754. [Google Scholar] [CrossRef]
- Sinai, A.; Amar, A.; Gilboa, G. Mine-Like Objects detection in Side-Scan Sonar images using a shadows-highlights geometrical features space. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–6. [Google Scholar]
- Herkül, K.; Peterson, A.; Paekivi, S. Applying multibeam sonar and mathematical modeling for mapping seabed substrate and biota of offshore shallows. Estuar. Coast. Shelf Sci. 2017, 192, 57–71. [Google Scholar] [CrossRef]
- Snellen, M.; Gaida, T.C.; Koop, L.; Alevizos, E.; Simons, D.G. Performance of Multibeam Echosounder Backscatter-Based Classification for Monitoring Sediment Distributions Using Multitemporal Large-Scale Ocean Data Sets. IEEE J. Ocean. Eng. 2019, 44, 142–155. [Google Scholar] [CrossRef] [Green Version]
- Landmark, K.; Solberg, A.H.S.; Austeng, A.; Hansen, R.E. Bayesian Seabed Classification Using Angle-Dependent Backscatter Data from Multibeam Echo Sounders. IEEE J. Ocean. Eng. 2014, 39, 724–739. [Google Scholar] [CrossRef]
- Söhnlein, G.; Rush, S.; Thompson, L. Using manned submersibles to create 3D sonar scans of shipwrecks. In Proceedings of the OCEANS’11 MTS/IEEE, Waikoloa, HI, USA, 19–22 September 2011; pp. 1–10. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Tang, J.; Mao, Y.; Wang, J.; Wang, L. Multi-task Enhanced Dam Crack Image Detection Based on Faster R-CNN. In Proceedings of the IEEE 4th International Conference on Image, Vision and Computing, Xiamen, China, 5–7 July 2019; pp. 336–340. [Google Scholar]
- Kafedziski, V.; Pecov, S.; Tanevski, D. Detection and Classification of Land Mines from Ground Penetrating Radar Data Using Faster R-CNN. In Proceedings of the 26th Telecommunications Forum, Belgrade, Serbia, 20–21 November 2018; pp. 1–4. [Google Scholar]
- You, W.; Chen, L.; Mo, Z. Soldered Dots Detection of Automobile Door Panels based on Faster R-CNN Model. In Proceedings of the Chinese Control and Decision Conference, Nanchang, China, 3–5 June 2019; pp. 5314–5318. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Zhang, X.; Zhu, X. Vehicle Detection in the Aerial Infrared Images via an Improved Yolov3 Network. In Proceedings of the IEEE 4th International Conference on Signal and Image Processing, Wuxi, China, 19–21 July 2019; pp. 372–376. [Google Scholar]
- Miao, F.; Tian, Y.; Jin, L. Vehicle Direction Detection Based on YOLOv3. In Proceedings of the 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 24–15 August 2019; pp. 268–271. [Google Scholar]
- Liu, Y.; Ji, X.; Pei, S.; Ma, Z.; Zhang, G.; Lin, Y.; Chen, Y. Research on automatic location and recognition of insulators in substation based on YOLOv3. High. Volt. 2020, 5, 62–68. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Maturana, D.; Scherer, S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany, 28 September–2 October 2015; pp. 922–928. [Google Scholar]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. In Advances in Neural Information Processing Systems 31; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 820–830. [Google Scholar]
- Li, J.; Chen, B.M.; Lee, G.H. SO-Net: Self-Organizing Network for Point Cloud Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9397–9406. [Google Scholar]
- Simonovsky, M.; Komodakis, N. Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 29–38. [Google Scholar]
Side-Scan | Multi-Beam | BV5000 | |
---|---|---|---|
Installation | Mount under the vehicle | Mount under the vehicle | Placed on the seabed |
Other sensor requirements | GPS | GPS, IMU | No |
Measurement method | Sailing back and forth | Sailing back and forth | Multiple measurement station |
Measurement range | 150 m on both sides | User-selected | 30 m in radius |
Resolution | 5 cm | 10 cm | 1 cm |
Application | Target search | Submarine geomorphology survey | Inspect dock structure |
Difficulty of operation | Pay attention to vehicle speed and distance from the sea floor | Inappropriate for shallow water because of post-processing difficulties | Easy to operate |
Confusion Matrix | Ground truth | ||
---|---|---|---|
Tire | Not a Tire | ||
Prediction | Tire | True Positive (TP) | False Positive (FP) |
Not a tire | False Negative (FN) | True Negative (TN) |
Model | Faster R-CNN | YOLOv3 |
---|---|---|
Accuracy (%) | 87.0 | 95.8 |
Precision (%) | 90.5 | 98.7 |
Recall (%) | 95.8 | 96.4 |
F1 (%) | 93.0 | 97.5 |
AP | 0.954 | 0.96 |
Method | Class | Only ModelNet40 | ModelNet40 & Tire |
---|---|---|---|
PointNet | 40 | 89.2 | - |
PointConv | 40 | 92.5 | - |
Expanded PointNet | 41 | 88.1 | 87.4 |
Expanded PointConv | 41 | 91.5 | 93 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tsai, C.-M.; Lai, Y.-H.; Sun, Y.-D.; Chung, Y.-J.; Perng, J.-W. Multi-Dimensional Underwater Point Cloud Detection Based on Deep Learning. Sensors 2021, 21, 884. https://doi.org/10.3390/s21030884
Tsai C-M, Lai Y-H, Sun Y-D, Chung Y-J, Perng J-W. Multi-Dimensional Underwater Point Cloud Detection Based on Deep Learning. Sensors. 2021; 21(3):884. https://doi.org/10.3390/s21030884
Chicago/Turabian StyleTsai, Chia-Ming, Yi-Horng Lai, Yung-Da Sun, Yu-Jen Chung, and Jau-Woei Perng. 2021. "Multi-Dimensional Underwater Point Cloud Detection Based on Deep Learning" Sensors 21, no. 3: 884. https://doi.org/10.3390/s21030884
APA StyleTsai, C. -M., Lai, Y. -H., Sun, Y. -D., Chung, Y. -J., & Perng, J. -W. (2021). Multi-Dimensional Underwater Point Cloud Detection Based on Deep Learning. Sensors, 21(3), 884. https://doi.org/10.3390/s21030884