Sensors 23 00053 v2
Sensors 23 00053 v2
Sensors 23 00053 v2
Article
RUC-Net: A Residual-Unet-Based Convolutional Neural
Network for Pixel-Level Pavement Crack Segmentation
Gui Yu 1,2,3,4 , Juming Dong 2 , Yihang Wang 2 and Xinglin Zhou 1,3,4, *
Abstract: Automatic crack detection is always a challenging task due to the inherent complex
backgrounds, uneven illumination, irregular patterns, and various types of noise interference. In
this paper, we proposed a U-shaped encoder–decoder semantic segmentation network combining
Unet and Resnet for pixel-level pavement crack image segmentation, which is called RUC-Net.
We introduced the spatial-channel squeeze and excitation (scSE) attention module to improve the
detection effect and used the focal loss function to deal with the class imbalance problem in the
pavement crack segmentation task. We evaluated our methods using three public datasets, CFD,
Crack500, and DeepCrack, and all achieved superior results to those of FCN, Unet, and SegNet.
In addition, taking the CFD dataset as an example, we performed ablation studies and compared
the differences of various scSE modules and their combinations in improving the performance of
crack detection.
Keywords: pavement crack segmentation; convolutional neural network; U-net; scSE attention
mechanism module
In recent studies, several minimal path methods [15,16] have also been used for crack
detection. Although these methods make use of crack features in a global view [3] and
achieve good performance, their main limitation is that seed points for path tracking need
to be set in advance [17], and the calculation cost is too high for practical application.
To improve the adaptability of IPTS-based methods in the real environment, methods
based on machine learning (ML) have been used for damage detection by researchers,
including artificial neural network (ANN) [18,19], support vector machine (SVM) [20–22],
random structure forest [23], AdaBoost [24], and so on. These methods have good perfor-
mance but heavily rely on manual feature extraction.
More recently, the supervised deep learning methods, such as convolutional neural
networks (CNNs), have achieved state-of-the-art performance in many advanced computer
vision tasks, such as image recognition [25], object detection [26,27], and semantic segmen-
tation [28–30]. The main advantage of deep learning is that it does not rely on expert-driven
heuristic thresholds or hand-designed features and has high accuracy and robustness to
image variations [31].
Unet [32], as a typical representative of semantic segmentation algorithm, has achieved
great success in medical image segmentation. There are many similarities between pave-
ment crack detection and medical image segmentation, so it is natural to apply Unet to
pavement crack segmentation.
The spatial-channel squeeze and excitation (scSE) [33] attention mechanism can en-
hance important information features while suppressing unimportant information features
in space and channels [34], which is helpful for improving the semantic segmentation effect.
Inspired by Unet and scSE, this paper proposed a U-shaped encoder–decoder semantic
segmentation network for pavement crack detection combining Unet with ResNet and
used the scSE attention module to enhance the crack detection effect.
The main contributions of this paper can be summarized as follows:
1. We modified Unet and proposed a residual U-shaped encoder–decoder semantic
segmentation network that combined Unet with ResNet18, named RUC-Net, which
achieved better detection effects than the original Unet and the other classical segmen-
tation algorithms, such as FCN [29] and SegNet [30].
2. We integrated the scSE attention mechanism in RUC-Net. This attention module
correlated the global information of cracks, effectively improving the detection effect.
In addition, we experimentally compared and investigated the difference of detection
performance improvement by using various scSE attention module combinations in
the encoder part (downsampling stage) and the decoder part (upsampling stage).
3. We introduced the focal loss function, which could reduce the weight of easy-to-
classify samples, to deal with the problem of class imbalance in crack segmentation.
The rest of the paper is organized as follows: Section 2 reviews the previous work
on pavement crack detection based on deep learning. Then, in Section 3, we describe
the network architecture of our model, loss function, and optimization method. Next,
in Section 4, we perform experimental vitrification and discuss our method. In addition,
we provide ablation studies on the scSE module and the focal loss parameter choice in
Section 5. Finally, in Section 6, we summarize our work and point out its limitations.
2. Related Work
2.1. Convolutional Neural Network-Based Method
With the tremendous success of deep learning methods in various computer vision
tasks, many deep convolutional neural network-based methods have been proposed for
road crack detection. According to the way the crack detection problem is handled, these
methods can be roughly divided into three categories, namely, pure image classification
methods, object detection-based methods, and pixel-level segmentation methods [35].
Sensors 2023, 23, 53 3 of 16
2.1.1. Classification
Some researchers have carried out image-level classification studies, which mainly
solve the problem of determining whether a road image contains cracks and, if so, what
type of cracks. Ma et al. [36] developed a deep learning method for road detection and
evaluation based on convolutional neural network, Fisher vector coding, and UnderBagging
random forest. Notably, they developed a way to create large-scale datasets of road images,
matching Google Street View maps with government inspectors’ ratings of road surfaces
on specific sections. However, this method can only determine whether the condition of
a road image is good, fair, or poor. Gopalakrishnan et al. proposed to use a pretrained
deep convolutional neural network model with transfer learning to automatically detect
pavement cracks [37]. Xu et al. proposed an end-to-end crack detection model based on a
convolutional neural network (CNN) with atrous convolution, the Atrous Spatial Pyramid
Pool (ASPP) module, and depthwise separable convolution [38]. Although these methods
achieved good accuracy, none of the above methods provided localization information
of cracks.
The patchwise detection method, which divides the original pavement images into
many small patches, is adopted by more researchers due to its two advantages. First,
more data can be generated, and second, the localization information of cracks can be
obtained. Zhang et al. [39] proposed a six-layer CNN network with four convolutional
layers and two fully connected layers and used their convolutional neural network to train
99 × 99 × 3 small patches, which were split from 3264 × 2248 road images collected by low-
cost smartphones. The output of the network was the probability of whether a small patch
was a crack or not. Their study shows that deep CNNs are superior to traditional machine
learning techniques, such as SVM and boosting methods, in detecting pavement cracks.
Pauly et al. [40] used a self-designed CNN model to study the relationship between network
depth and network accuracy and proved the effectiveness of using a deeper network to
improve detection accuracy in pavement crack detection based on computer vision. In
contrast with [39], which used the same number of convolution kernels in all convolution
layers, Nguyen et al. [41] used a convolution neural network with an increased number
of convolution kernels in each layer because the features were more generic in the early
layers and more original dataset specific in later layers [42]. Eisenbach et al. [43] presented
the GAPs dataset, constructed a CNN network with eight convolution layers and three
full connection layers, and analyzed the effectiveness of the state-of-the-art regularization
techniques. However, its network input size was 64 × 64 pixels, which was too small to
provide enough context information. The same problem also existed in [44–46].
Cha et al. [44] trained an eight-layer CNN and used sliding window technology to
detect concrete cracks. While the sliding window technology was helpful in locating the
crack, it was difficult to find the best size of the sliding window because the test images
may have had different sizes and scales.
Mandal et al. [48] used Yolo V2, and Hu et al. [49] used Yolo V5 for road crack detection.
Similar to patch-level classification, object detection can generate crack localization infor-
mation, but the important features of the cracks cannot be estimated from the generated
bounding boxes [50].
long-range dependencies of different parts in a crack image from local and global perspec-
tives. Qu et al. [64] proposed CrackT-net, which was a method for pavement crack segmen-
tation that combined a CNN with the transformer. The Swin Transformer Module was used
as the last feature extraction layer to obtain better global information. Wang et al. [65] put
forward SegCrack, which adopted a hierarchical Transformer as the encoder and employed
a top-down pathway with lateral connections as the decoder. Liu et al. [66] proposed a
crack transformer encoder–decoder structure, named CrackFormer, which proposed a
self-attention block and scaling-attention block for fine-grained crack detection. These
transformer-based methods used the cascaded self-attention module to capture feature
dependencies over long distances, so as to obtain better global information.
3. Proposed Method
Unet was originally designed for biomedical image segmentation, such as cell image
segmentation and retinal image capillary segmentation. Although these biomedical image
training datasets are generally small, Unet still achieves good segmentation results. Due to
the high cost of data acquisition and marking, the dataset of crack segmentation images is
usually small too. However, there are some similarities between the topological structures
of crack images and biomedical images. In view of the above two points, the segmentation
tasks of crack images and biomedical images have strong similarities. Therefore, the authors
preferred the Unet-based network for crack image segmentation.
To further improve the segmentation performance of Unet, we first considered intro-
ducing residual modules in downsampling, which increased gradient propagation and
helped to improve the generalization ability of the network. Second, we introduced the
scSE attention mechanism, which could enhance important information features while
suppressing unimportant information features in space and channels, so as to improve the
semantic segmentation effect.
3 64 64 64 64 64 64 64 2
64128 64 64 64
64 64
(W×H)/2
(W×H)/2
(W×H)/4 0 (W×H)/4
The decoder part of RUC-Net was an extended path, which upsampled the feature
map and improved the resolution of the feature map step by step. The feature map
obtained by each upsampling was skip connected with the feature map in the corresponding
downsampling path. This skip-connection technology reused the image details that may
have been lost in the encoding layers and took into account both the global information
and localization accuracy of the image, so that the decoding layers could reconstruct image
details more effectively [57].
• The cSE module. The feature map was first changed from [C, H, W] to [C, 1, 1] by
global average pooling, then converted to a C-dimension vector after twice performing
Sensors 2023, 23, x FOR PEER REVIEW 7 of 18
1 × 1 convolution operations. This vector was normalized by a sigmoid and was chan-
nelwise multiplied with the original feature map to obtain a feature map recalibrated
by channel information.The decoder part of RUC-Net was an extended path, which upsampled the feature
map and improved the resolution of the feature map step by step. The feature map
• The scSE module.obtained
The scSE was upsampling
by each the combination
was skipofconnected
the sSE withand the
cSEfeature
modules,
map inwhich
the
was essentially the parallel downsampling
corresponding connection path.of theThistwo modules.technology
skip-connection Specifically, after
reused the the
image
feature map was details that may
operated have been
through thelost
sSEin the
andencoding layers and we
cSE modules, took added
into account both two
up the the
global information and localization accuracy of the image, so that the decoding layers
outputs to recalibrate the feature
could reconstruct map
image both
details spatially
more effectivelyand
[57]. channelwise.
Feature map
64×W×H
64×W/2×H/2
Rusidual Downsample Block
Figure 2. The details of Figure 2. The details of the first residual downsample block and its subsequent links. The other three
the first residual downsample block and its subsequent links. The other
residual blocks are similar, except that the number of channels and the size of the feature map are
three residual blocks aredifferent.
similar, except that the number of channels and the size of the feature map
are different.
3.2. scSE Module
Roy et al. [33] proposed an scSE module, which had three variants: sSE (‘squeezes’
In this paper, we along
discuss the influence of various scSE modules or their combinations
the channels and ‘excites’ spatially), cSE (‘squeezes’ along the spatial domain and
on the performance of‘excites’
crackalong
detection in theand
the channels), downsampling
scSE (concurrent sSEandandupsampling stages.
cSE). Details of their The
structure
details are presented in
canSection
be found5.in the original article, and their principles are briefly described below.
•
The sSE module. The original feature map was changed from [C, H, W] to [1, H, W]
3.3. Loss Function via a 1 × 1 convolution, then activated by a sigmoid to obtain the spatial attention
map, which was applied to the original feature map to recalibrate the spatial
The loss function is a core component of deep learning methods that was used for
information.
• between
measuring the deviation The cSE module. The feature map
the predicted was and
values first changed from
the true [C, H, W]
values to [C, 1, 1]and
of models by
global average pooling, then converted to a C-dimension vector after twice
usually served as an objective function of the model optimization. The essence of crack
segmentation is to classify each pixel of the pavement image containing cracks as cracks or
background. It is worth noting that compared with the pavement background, the cracked
pixels only accounted for a small proportion of the whole pavement image. To solve this
serious class imbalance problem, we chose focal loss [67] as the loss function. Focal loss
was modified based on standard cross-entropy loss. It introduced two penalty factors
to reduce the weight of easy-to-classify samples, which made the model focus more on
difficult-to-classify samples in the training process. The focal loss could be expressed as
FL( p, p̂) = − α(1 − p̂)γ plog( p̂) + (1 − α) p̂γ (1 − p) log(1 − p̂) (1)
Sensors 2023, 23, 53 8 of 16
where α and (1 − α) were used to control the proportions of positive and negative samples,
respectively, with values ranging from [0, 1]. The parameter γ is called the focusing
parameter, and its value range was [0, +∞). When γ = 0, focal loss degenerated into
cross-entropy loss, and the larger γ was, the greater the punishment for the easy-to-classify
samples would be.
mt = β 1 mt−1 + (1 − β 1 )∇θ J (θ )
vt = β 2 vt−1 + (1 − β 2 )(∇θ J (θ ))2
mt
m̂t = 1− β 1 t (2)
v̂t = 1−vβt t
2
where β 1 and β 2 represent the exponential decay rates of first-order moment estimation
and second-order moment estimation, which are set to 0.9 and 0.99, respectively; t is the
index of iterations; α represents the learning rate; mt and vt represent exponential moving
averages of the first-order and second-order moments of the gradient, respectively; and m̂t
and v̂t are the unbiased values of mt and vt , respectively. θ represents the network model
parameters that need to be updated by learning [59].
4.2. Datasets
We evaluated our methods using three public datasets: CFD, Crack500, and DeepCrack.
The following is a brief description of them.
The CFD dataset, published in [23], consists of 118 RGB images with a resolution of
480 × 320 pixels. All of the images were taken using an iPhone5 smartphone on the road in
Beijing, China, and can roughly reflect the existing urban road conditions in Beijing. These
crack images have uneven illumination and contain noise such as shadows, oil spots, and
lane lines, and most cracks in these images are thin cracks, which make crack detection
difficult. We randomly divided 70% of the dataset (82 images) for training and 30% of the
dataset (36 images) for testing.
The Crack500 dataset, shared by Yang et al. in the literature [60], contains 500 original
images with a resolution of 2560 × 1440 collected at the main campus of Temple University.
Each original image was cropped into a non-overlapping image area of 640 × 360, resulting
in 1896 training images, 348 validation images, and 1123 test images. These images are
characterized by low contrast between cracks and background, as well as noise such as oil
pollution and occlusions, which increase the difficulty of detection.
Sensors 2023, 23, 53 9 of 16
The DeepCrack dataset [2] contains 537 crack images, including both concrete pave-
ment and asphalt pavement, with complex background and various crack widths, ranging
from 1 pixel to 180 pixels. We kept the same data split as the original paper, with 300 images
for training and 237 images for testing.
We randomly applied data augmentations to each image during training; the main
methods included random vertical or horizontal flipping, random brightness and contrast
changes, random scaling, and rotation.
Table 1. All the results of the predicted case and the ground truth case.
Predicted
Crack No Crack
Ground Truth
Crack True positive (TP) False negative (FN)
No crack False positive (FP) True negative (TN)
TP
Pr = (3)
TP + FP
TP
Re = (4)
TP + FN
2 × Pr × Re
F1 = (5)
Pr + Re
GroundTruth ∩ Prediction
IoU = (6)
GroundTruth ∪ Prediction
Table 4. The Pr, Re, F1, and IoU of compared methods for the DeepCrack dataset.
Methods Pr Re F1 IoU
FCN 0.8600 0.7737 0.8146 0.6871
SegNet 0.8632 0.7954 0.8279 0.7064
Unet 0.8810 0.7829 0.8291 0.7080
TransUnet 0.8730 0.7976 0.8336 0.7147
Ours 0.8833 0.8120 0.8461 0.7333
Sensors 2023, 23, 53
Sensors 2023, 23, x FOR PEER REVIEW
12 of 16
13 of 18
2.5, α being 0.6 achieved the best results. As far as the average value of F1 scores under
different α values was concerned, γ being 1.5 was superior to γ being 2 or 2.5. Obviously,
the best parameter combination was γ being 1.5 and α being 0.6. This was exactly the
parameter combination used in the previous experiments in this paper.
Table 5. The differences of various scSE modules and their combinations in improving the perfor-
mance of crack detection taking CFD as an example.
Methods Pr Re F1 IoU
RUC-Net 0.7136 0.7633 0.7375 0.5842
RUC-Net+downcSE * 0.7055 0.7596 0.7315 0.5767
RUC-Net+downsSE 0.7092 0.7699 0.7383 0.5851
RUC-Net+upsSE 0.7135 0.7643 0.7381 0.5849
RUC-Net+upcSE 0.7122 0.7676 0.7388 0.5858
RUC-Net+downscSE 0.7099 0.7691 0.7383 0.5852
RUC-Net+upscSE 0.7160 0.7657 0.7398 0.5871
RUC-Net+fullscSE 0.7064 0.7758 0.7395 0.5866
* The downcSE represents using only the sCE module in the downsampling stage, the upsSE represents using
only the sSE module in the upsampling stage, and so on, while the fullscSE represents using scSE module both in
the upsampling stage and downsampling stage.
Parameter Combination
Pr Re F1 IoU
γ α
0.5 0.7353 0.7347 0.7349 0.5809
0.6 0.7160 0.7657 0.7398 0.5871
1.5
0.7 0.7017 0.7747 0.7359 0.5822
0.8 0.6704 0.8058 0.7318 0.5770
0.5 0.7347 0.7289 0.7316 0.5768
0.6 0.7027 0.7776 0.7381 0.5850
2
0.7 0.6840 0.7987 0.7369 0.5834
0.8 0.6697 0.7999 0.7284 0.5729
0.5 0.7337 0.7293 0.7315 0.5767
0.6 0.7062 0.7748 0.7389 0.5859
2.5
0.7 0.6867 0.7924 0.7369 0.5834
0.8 0.6805 0.7825 0.7279 0.5722
6. Conclusions
In this paper, RUC-Net was proposed for pixel-level pavement crack segmentation.
The architecture of RUC-Net was a U-shaped encoder–decoder network combining Unet
and Resnet. The residual block in ResNet was used to replace the two 3 × 3 convolution
layers in the encoder of original Unet, so as to extract more precise crack feature information.
In the decoder network part, RUC-Net combined local information in shallow layers
and semantic information in deep layers through concatenating to obtain more refined
segmentation effects. In addition, we introduced the scSE attention module to enhance
important information features while suppressing unimportant information features in
space and channels, so as to further improve the crack segmentation effect. The focal loss
function was used to deal with the class imbalance problem in crack segmentation. Our
approach achieved an F1 score of 73.92% for the CFD dataset, 72.9% for the Crack500
dataset, and 84.61% for the DeepCrack dataset, outperforming FCN, Unet, and SegNet.
One limitation of this research was that our algorithm still needed to manually mark
every pixel of the ground truth image, which made data acquisition expensive. To mitigate
this issue, it was a research direction to adopt unsupervised learning-based techniques. As
the supervised learning algorithm aimed to fit the function that approximated the given
labeled training data, the actual performance of this kind of algorithm largely depended on
the size and quality of the training dataset. So, establishing a wider, larger, and high-quality
Sensors 2023, 23, 53 14 of 16
dataset and fully investigating data augmentation techniques are also directions we need
to work on.
Author Contributions: Conceptualization, G.Y.; methodology, G.Y.; software, G.Y. and Y.W.; valida-
tion, G.Y. and J.D.; formal analysis, G.Y.; investigation, G.Y. and J.D.; resources, G.Y. and J.D.; data
curation, G.Y. and Y.W.; writing—original draft preparation, G.Y.; writing—review and editing, G.Y.
and X.Z.; visualization, G.Y.; supervision, X.Z.; project administration, X.Z.; funding acquisition, X.Z.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (grant no.
51827812 and 51778509).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Please contact Gui Yu (yugui@hgnu.edu.cn).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zakeri, H.; Nejad, F.M.; Fahimifar, A. Image Based Techniques for Crack Detection, Classification and Quantification in Asphalt
Pavement: A Review. Arch. Comput. Methods Eng. 2017, 24, 935–977. [CrossRef]
2. Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation.
Neurocomputing 2019, 338, 139–153. [CrossRef]
3. Fan, Z.; Wu, Y.; Lu, J.; Li, W. Automatic Pavement Crack Detection Based on Structured Prediction with the Convolutional Neural
Network. arXiv 2018, arXiv:1802.02208.
4. Oliveira, H.; Correia, P.L. Automatic Road Crack Segmentation Using Entropy and Image Dynamic Thresholding. In Proceedings
of the 2009 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009; pp. 622–626. [CrossRef]
5. Li, P.; Wang, C.; Li, S.; Feng, B. Research on Crack Detection Method of Airport Runway Based on Twice-Threshold Segmentation.
In Proceedings of the 5th International Conference on Instrumentation and Measurement, Computer, Communication and
Control (IMCCC), Qinhuangdao, China, 18–20 September 2015; pp. 1716–1720. [CrossRef]
6. Tsai, Y.C.; Kaul, V.; Mersereau, R.M. Critical Assessment of Pavement Distress Segmentation Methods. J. Transp. Eng. 2010, 136,
11–19. [CrossRef]
7. Abdel-Qader, I.; Abudayyeh, O.; Kelly, M.E. Analysis of Edge-Detection Techniques for Crack Identification in Bridges. J. Comput.
Civ. Eng. 2003, 17, 255–263. [CrossRef]
8. Santhi, B.; Krishnamurthy, G.; Siddharth, S.; Ramakrishnan, P.K. Automatic Detection of Cracks in Pavements Using Edge
Detection Operator. J. Theor. Appl. Inf. Technol. 2012, 36, 199–205.
9. Nisanth, A.; Mathew, A. Automated Visual Inspection of Pavement Crack Detection and Characterization. Int. J. Technol. Eng.
Syst. 2014, 6, 14–20.
10. Yeum, C.M.; Dyke, S.J. Vision-Based Automated Crack Detection for Bridge Inspection. Comput. Civ. Infrastruct. Eng. 2015, 30,
759–770. [CrossRef]
11. Cheng, H.D.; Chen, J.R.; Glazier, C.; Hu, Y.G. Novel Approach to Pavement Cracking Detection Based on Fuzzy Set Theory. J.
Comput. Civ. Eng. 1999, 13, 270–280. [CrossRef]
12. Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic Pixel-Level Crack Detection and Measurement Using Fully
Convolutional Network. Comput. Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [CrossRef]
13. Zhou, J. Wavelet-Based Pavement Distress Detection and Evaluation. Opt. Eng. 2006, 45, 027007. [CrossRef]
14. Wu, S.; Liu, Y. A Segment Algorithm for Crack Dection. In Proceedings of the 2012 IEEE Symposium on Electrical & Electronics
Engineering (EEESYM), Kuala Lumpur, Malaysia, 24–27 June 2012; pp. 674–677. [CrossRef]
15. Nguyen, T.S.; Begot, S.; Duculty, F.; Avila, M. Free-Form Anisotropy: A New Method for Crack Detection on Pavement Surface
Images. In Proceedings of the IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011.
16. Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic Crack Detection on Two-Dimensional Pavement Images: An Algorithm
Based on Minimal Path Selection. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2718–2729. [CrossRef]
17. Zou, Q.; Zhang, Z.; Li, Q.; Qi, X.; Wang, Q.; Wang, S. DeepCrack: Learning Hierarchical Convolutional Features for Crack
Detection. IEEE Trans. Image Process. 2019, 28, 1498–1512. [CrossRef] [PubMed]
18. Lee, B.J.; Lee, H.D. Position-Invariant Neural Network for Digital Pavement Crack Analysis. Comput. Civ. Infrastruct. Eng. 2004,
19, 105–118. [CrossRef]
19. Moon, H.G.; Kim, J.H. Inteligent Crack Detecting Algorithm on the Concrete Crack Image Using Neural Network. In Proceedings
of the 28th International Symposium on Automation and Robotics in Construction (ISARC), Seoul, Republic of Korea, 29 June–2
July 2011; pp. 1461–1467. [CrossRef]
Sensors 2023, 23, 53 15 of 16
20. Gavilán, M.; Balcones, D.; Marcos, O.; Llorca, D.F.; Sotelo, M.A.; Parra, I.; Ocaña, M.; Aliseda, P.; Yarza, P.; Amírola, A. Adaptive
Road Crack Detection System by Pavement Classification. Sensors 2011, 11, 9628–9657. [CrossRef]
21. O’Byrne, M.; Schoefs, F.; Ghosh, B.; Pakrashi, V. Texture Analysis Based Damage Detection of Ageing Infrastructural Elements.
Comput. Civ. Infrastruct. Eng. 2013, 28, 162–177. [CrossRef]
22. Cha, Y.J.; You, K.; Choi, W. Vision-Based Detection of Loosened Bolts Using the Hough Transform and Support Vector Machines.
Autom. Constr. 2016, 71, 181–188. [CrossRef]
23. Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road Crack Detection Using Random Structured Forests. IEEE Trans. Intell.
Transp. Syst. 2016, 17, 3434–3445. [CrossRef]
24. Cord, A.; Chambon, S. Automatic Road Defect Detection by Textural Pattern Recognition Based on AdaBoost. Comput. Civ.
Infrastruct. Eng. 2012, 27, 244–259. [CrossRef]
25. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM
2017, 60, 84–90. [CrossRef]
26. Girshick, R.B. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile,
7–13 December 2015; pp. 1440–1448.
27. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE
Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [CrossRef] [PubMed]
28. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep
Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848.
[CrossRef] [PubMed]
29. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach.
Intell. 2017, 39, 640–651. [CrossRef] [PubMed]
30. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [CrossRef]
31. Alipour, M.; Harris, D.K.; Miller, G.R. Robust Pixel-Level Crack Detection Using Deep Fully Convolutional Neural Networks. J.
Comput. Civ. Eng. 2019, 33, 04019040. [CrossRef]
32. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015,
arXiv:1505.04597.
33. Roy, A.G.; Navab, N.; Wachinger, C. Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks.
arXiv 2018, arXiv:1803.02579v2.
34. Qiao, W.; Liu, Q.; Wu, X.; Ma, B.; Li, G. Automatic Pixel-Level Pavement Crack Recognition Using a Deep Feature Aggregation
Segmentation Network with a Scse Attention Mechanism Module. Sensors 2021, 21, 2902. [CrossRef]
35. Cao, W.; Liu, Q.; He, Z. Review of Pavement Defect Detection Methods. IEEE Access 2020, 8, 14531–14544. [CrossRef]
36. Ma, K.; Hoai, M.; Samaras, D. Large-Scale Continual Road Inspection: Visual Infrastructure Assessment in the Wild. In
Proceedings of the British Machine Vision Conference 2017 (BMVC), London, UK, 4–7 September 2017. [CrossRef]
37. Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep Convolutional Neural Networks with Transfer Learning for
Computer Vision-Based Data-Driven Pavement Distress Detection. Constr. Build. Mater. 2017, 157, 322–330. [CrossRef]
38. Xu, H.; Su, X.; Xu, H.; Li, H. Autonomous Bridge Crack Detection Using Deep Convolutional Neural Networks. In Proceedings of
the 3rd International Conference on Computer Engineering, Information Science & Application Technology, Chongqing, China,
30–31 May 2019. [CrossRef]
39. Zhang, L.; Yang, F.; Daniel Zhang, Y.; Zhu, Y.J. Road Crack Detection Using Deep Convolutional Neural Network. In Proceedings
of the International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712. [CrossRef]
40. Pauly, L.; Peel, H.; Luo, S.; Hogg, D.; Fuentes, R. Deeper Networks for Pavement Crack Detection. In Proceedings of the 34th
International Symposium on Automation and Robotics in Construction and Mining (ISARC), Taipei, Taiwan, 28 June–1 July 2017;
pp. 479–485. [CrossRef]
41. Nguyen, N.T.H.; Le, T.H.; Perry, S.; Nguyen, T.T. Pavement Crack Detection Using Convolutional Neural Network. ACM Int.
Conf. Proceeding Ser. 2018, 251–256. [CrossRef]
42. Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable Are Features in Deep Neural Networks? In Proceedings of the
27th International Conference on Neural Information Processing Systems, Montreal, Canada, 8–13 December 2014; MIT Press:
Cambridge, MA, USA, 2014; Volume 2, pp. 3320–3328.
43. Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.M. How to
Get Pavement Distress Detection Ready for Deep Learning? A Systematic Approach. In Proceedings of the 2017 International
Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2039–2047. [CrossRef]
44. Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks.
Comput. Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
45. Nie, M.; Wang, K. Pavement Distress Detection Based on Transfer Learning. In Proceedings of the 2018 5th International
Conference on Systems and Informatics (ICSAI), Nanjing, China, 10–12 November 2018; pp. 435–439.
46. Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous Structural Visual Inspection Using Region-Based
Deep Learning for Detecting Multiple Damage Types. Comput. Civ. Infrastruct. Eng. 2018, 33, 731–747. [CrossRef]
Sensors 2023, 23, 53 16 of 16
47. Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road Damage Detection and Classification Using Deep Neural
Networks with Smartphone Images. Comput. Civ. Infrastruct. Eng. 2018, 33, 1127–1141. [CrossRef]
48. Mandal, V.; Uong, L.; Adu-gyamfi, Y. Automated Road Crack Detection Using Deep Convolutional Neural Networks. In
Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018;
pp. 5212–5215. [CrossRef]
49. Hu, G.X.; Hu, B.L.; Yang, Z.; Huang, L.; Li, P. Pavement Crack Detection Method Based on Deep Learning Models. Wirel. Commun.
Mob. Comput. 2021, 2021, 1–13. [CrossRef]
50. Hsieh, Y.-A.; Tsai, Y.J. Machine Learning for Crack Detection: Review and Model Performance Comparison. J. Comput. Civ. Eng.
2020, 34, 04020038. [CrossRef]
51. Huyan, J.; Li, W.; Tighe, S.; Xu, Z.; Zhai, J. CrackU-Net: A Novel Deep Convolutional Neural Network for Pixelwise Pavement
Crack Detection. Struct. Control Health Monit. 2020, 27, e2551. [CrossRef]
52. Zhang, A.; Wang, K.C.P.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated Pixel-Level Pavement Crack
Detection on 3D Asphalt Surfaces Using a Deep-Learning Network. Comput. Civ. Infrastruct. Eng. 2017, 32, 805–819. [CrossRef]
53. Fei, Y.; Wang, K.C.P.; Zhang, A.; Chen, C.; Li, J.Q.; Liu, Y.; Yang, G.; Li, B. Pixel-Level Cracking Detection on 3D Asphalt Pavement
Images through Deep-Learning- Based CrackNet-V. IEEE Trans. Intell. Transp. Syst. 2020, 21, 273–284. [CrossRef]
54. Huang, H.-W.; Li, Q.-T.; Zhang, D.-M. Deep Learning Based Image Recognition for Crack and Leakage Defects of Metro Shield
Tunnel. Tunn. Undergr. Sp. Technol. 2018, 77, 166–176. [CrossRef]
55. Li, S.; Zhao, X.; Zhou, G. Automatic Pixel-Level Multiple Damage Detection of Concrete Structure Using Fully Convolutional
Network. Comput. Civ. Infrastruct. Eng. 2019, 34, 616–634. [CrossRef]
56. Cheng, J.; Xiong, W.; Chen, W.; Gu, Y.; Li, Y. Pixel-Level Crack Detection Using U-Net. In Proceedings of the IEEE Region 10
Annual International Conference TENCON 2019, Jeju, Republic of Korea, 28–21 October 2018; pp. 462–466. [CrossRef]
57. Jenkins, M.D.; Carr, T.A.; Iglesias, M.I.; Buggy, T.; Morison, G. A Deep Convolutional Neural Network for Semantic Pixel-Wise
Segmentation of Road and Pavement Surface Cracks. In Proceedings of the 2018 26th European Signal Processing Conference
(EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 2120–2124. [CrossRef]
58. Lau, S.L.H.; Chong, E.K.P.; Yang, X.; Wang, X. Automated Pavement Crack Segmentation Using U-Net-Based Convolutional
Neural Network. IEEE Access 2020, 8, 114892–114899. [CrossRef]
59. Bang, S.; Park, S.; Kim, H.; Kim, H. Encoder–Decoder Network for Pixel-Level Road Crack Detection in Black-Box Images. Comput.
Civ. Infrastruct. Eng. 2019, 34, 713–727. [CrossRef]
60. Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement
Crack Detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [CrossRef]
61. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using
Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17
October 2021; pp. 9992–10002. [CrossRef]
62. Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic
Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6877–6886. [CrossRef]
63. Ju, X.; Zhao, X.; Qian, S. TransMF: Transformer-Based Multi-Scale Fusion Model for Crack Detection. Mathematics 2022, 10, 2354.
[CrossRef]
64. Qu, Z.; Li, Y.; Zhou, Q. CrackT-Net: A Method of Convolutional Neural Network and Transformer for Crack Segmentation. J.
Electron. Imaging 2022, 31, 023040. [CrossRef]
65. Wang, W.; Su, C. Automatic Concrete Crack Segmentation Model Based on Transformer. Autom. Constr. 2022, 139, 104275.
[CrossRef]
66. Liu, H.; Miao, X.; Mertz, C.; Xu, C.; Kong, H. CrackFormer: Transformer Network for Fine-Grained Crack Detection. In
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021;
pp. 3763–3772. [CrossRef]
67. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell.
2020, 42, 318–327. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.