Open AccessArticle

Deep Learning for Pavement Condition Evaluation Using Satellite Imagery

Prathyush Kumar Reddy Lebaku

¹,

Lu Gao

^1,*

Pan Lu

and

Jingran Sun

Department of Construction Management, University of Houston, Houston, TX 77004, USA

College of Business, North Dakota State University, Fargo, ND 58105, USA

Center for Transportation Research, The University of Texas at Austin, Austin, TX 78712, USA

Author to whom correspondence should be addressed.

Infrastructures 2024, 9(9), 155; https://doi.org/10.3390/infrastructures9090155

Submission received: 6 August 2024 / Revised: 1 September 2024 / Accepted: 6 September 2024 / Published: 9 September 2024

(This article belongs to the Special Issue Pavement Design and Pavement Management)

Download

Browse Figures

Versions Notes

Abstract

Civil infrastructure systems cover large land areas and need frequent inspections to maintain their public service capabilities. Conventional approaches of manual surveys or vehicle-based automated surveys to assess infrastructure conditions are often labor-intensive and time-consuming. For this reason, it is worthwhile to explore more cost-effective methods for monitoring and maintaining these infrastructures. Fortunately, recent advancements in satellite systems and image processing algorithms have opened up new possibilities. Numerous satellite systems have been employed to monitor infrastructure conditions and identify damages. Due to the improvement in the ground sample distance (GSD), the level of detail that can be captured has significantly increased. Taking advantage of these technological advancements, this research evaluated pavement conditions using deep learning models for analyzing satellite images. We gathered over 3000 satellite images of pavement sections, together with pavement evaluation ratings from the TxDOT’s PMIS database. The results of our study show an accuracy rate exceeding 90%. This research paves the way for a rapid and cost-effective approach for evaluating the pavement network in the future.

Keywords:

deep learning; satellite image; pavement condition evaluation; PMIS; ensemble learning

1. Introduction

Civil infrastructure encompasses essential elements such as roads, bridges, buildings, and other physical structures and systems that are crucial for society’s seamless functioning and progress. These components, which differ in their applications and objectives, play a vital role and must be engineered to last for an extended period. Therefore, the management of civil infrastructure is critical and is necessary to govern the planning, designing, construction, maintenance, and operation.

Managing infrastructure presents a range of challenges, notably in detecting failures and ensuring timely maintenance and rehabilitation. Insights from infrastructure management agencies show that early detection and preventative actions significantly enhance the longevity of infrastructure and cut down on maintenance expenses. The adoption of large-scale monitoring tools is essential for efficient problem detection. Traditionally, damage assessments and evaluations required extensive fieldwork by inspection teams, which is both time-consuming and labor-intensive. However, advances in technology have introduced an array of efficient tools for damage detection, reducing the reliance on manual inspections and facilitating rapid identification of wear and tear. Notably, the analysis of satellite images has become a promising method for monitoring the condition and utilization of infrastructure. Recent progress in satellite technology, image analysis techniques, and computer vision has expanded the possibilities for infrastructure management.

To take the advantages of the recent technological advancements, this research explores the application of satellite image processing in pavement condition assessments using deep learning models. One contribution of this research is innovatively applying satellite imagery to evaluate the condition of pavements, offering a cost-effective and scalable alternative to traditional surveys conducted manually or with vehicles. To enhance the accuracy of pavement condition classification, the research utilizes various pre-trained deep learning models and the ensemble model, which combines predictions from multiple top-performing models and achieves an impressive accuracy of 93% and an

F 1

score of 0.93. The practical implications of these findings highlight the potential of combining satellite imagery with deep learning for network-level pavement assessments. This approach could greatly reduce the need for resource-intensive on-site inspections.

The rest of this paper is organized as follows: Section 2 reviews the current models in pavement inspection and assessment, Section 3 presents the proposed method, Section 4 applies and evaluates the performance of the proposed method, and Section 5 summarizes the findings.

2. Literature Review

Pavement management systems play a crucial role in facilitating effective decision-making regarding pavement evaluation and maintenance. Pavement inspections and assessments have also been made easier by the introduction of different machine-learning algorithms. Cha et al. [1] introduced a deep convolutional neural network architecture using images taken with hand-held cameras for concrete crack detection, achieving an impressive 98.22% accuracy. Pan et al. [2] employed UAV-captured pavement images and applied machine learning methods, attaining an accuracy of 98.78% with support vector machines, 98.46% with artificial neural networks, and 98.83% with the random forest approach in identifying pavement deformations, such as cracks and potholes. Fan et al. [3] demonstrated the superiority of deep learning over conventional machine learning and image processing methods, particularly through a CNN-based strategy for crack detection. Chitale et al. [4] utilized the deep learning algorithm YOLO (You Only Look Once) to identify deformities in the pavement and it was effective in detecting potholes and estimating their size. Fan et al. [5] innovated further by proposing a CNN ensemble without pooling layers, based on probability fusion. When tested on two public crack databases, this method delivered remarkable results, surpassing traditional approaches with over 90% precision and recall, and an

F 1

score exceeding 0.9, showcasing the potent capability of deep learning in enhancing pavement inspection and assessment methodologies. Similarly, Ji et al. [6] exploited UAV-gathered images, employing a Deep Convolutional Neural Network (DCNN) to identify cracks, accurately measuring their width and location for a detailed risk assessment. Maniat et al. [7] explored the assessment of pavement quality utilizing Google Street View (GSV) images, analyzed through a Convolutional Neural Network (CNN). Ahmadi et al. [8] utilized different machine learning techniques like a neural network, K-Nearest Neighbors, Support Vector Machine, Decision Tree, a hybrid model, and Bagged trees for crack detection in asphalt. Sholevar et al. [9] in their review paper show that machine and deep learning models significantly outperform traditional methods in pavement condition assessments, delivering faster, more accurate, and adaptable results. Jiang et al. [10] introduce an innovative method for the swift inspection of urban road cracks by harnessing aerial imagery in conjunction with deep learning techniques for 3D reconstruction and crack segmentation. The team employed a deep learning-supported segmentation network for efficient image processing, alongside an improved U-Net model to ensure precise crack detection and analysis. Gagliardi et al. [11] employed Deep Neural Network (DNN) algorithms, including YOLO v7 and U-Net, to detect and assess the severity of pavement distress. These algorithms were specifically utilized for object detection and semantic segmentation, enhancing the accuracy of identifying pavement damages.

Over the past decade, the research community has explored utilizing satellites for pavement management. For instance, Haider et al. [12] proposed a method using satellite images for pavement monitoring, envisioning improved highway maintenance and reduced reliance on traditional inspections. Fagrhi et al. [13] reviewed various applications employing satellite imagery, including pavement management, where they analyzed historical data to detect deformations and assess the deformation speed in highways, railways, and pavement irregularities. Li et al. [14] conducted a cost–benefit analysis of pavement monitoring activities to study the financial aspects of this technology. Recently, the emergence of deep learning-based image processing methods has driven additional research into the efficacy of using satellite imagery to monitor pavement surface conditions. Brewer et al. [15] summarized the use of high-resolution satellite imagery with a CNN to determine road quality, achieving an impressive accuracy rate of 80%. Bashar and Torres-Machi [16] highlighted the cost-effective and rapid nature of using satellite imagery to evaluate road conditions. They conducted assessments by analyzing spectral and texture information derived from satellite imagery, successfully identifying pavement distresses. Jiang et al. [17] proposes a pavement evaluation tool that used a Deep Convolution Neural Network to segment the highway images from Google Earth to detect longitudinal and transverse cracking. Karimzadeh et al. [18] explored the application of remote sensing and deep learning in evaluating road surface conditions. Utilizing Synthetic Aperture Radar (SAR) for quick data gathering and deep learning for detailed analysis, their research successfully forecasted the state of roads before and after the 2016 Kumamoto earthquake, reaching a precision level of 87%.

Based on previous studies, this research aims to investigate the capability of using satellite imagery to monitor the condition of pavements by linking satellite imagery data to pavement performance assessments provided by highway authorities. For this purpose, we accessed satellite imagery through the National Oceanic and Atmospheric Administration (NOAA). This imagery, sourced from the National Geodetic Survey’s emergency database, offers a vast collection of high-resolution, geographically pinpointed images from significant storms and disasters over time. In a similar manner, the data on pavement condition assessments were acquired from the Texas Department of Transportation (TxDOT).

3. Methodology

In this study, we investigated pre-trained models, utilizing various well-established architectures and also employed an ensemble learning strategy, combining predictions from a select few top-performing models to enhance the overall performance. Figure 1 illustrates the workflow of our method, which consisted of four main stages.

Data preprocessing: We divided the dataset into training and test sets in an 8:2 ratio. For the training dataset, we used an oversampling strategy to create a balanced dataset with an equal distribution across the five condition categories to ensure balanced coverage;
Individual transfer learning models: To obtain suitable networks, we employed various pre-trained models, such as VGG19, ResNet50, InceptionV3, DenseNet121, InceptionResNetV2, MobileNet, MobileNetV2, and EfficientNetB0. After evaluating their performance, we selected four top models as the base classifiers for the subsequent steps;
Ensemble learning: We used a weighted voting method in this strategy to further enhance classification performance. The weight for each model was determined based on its accuracy;
Performance evaluation: to measure the overall classification ability of the algorithm, we used various evaluation indicators such as accuracy, precision, recall, and $F 1$ -score.

3.1. Pre-Trained Models

Pre-trained models represent a powerful application of transfer learning within deep neural networks, particularly for image processing tasks. This technique involves utilizing a deep neural network trained on one dataset and transferring its learned weights to a new, related dataset. This approach expedites model training and enhances performance, especially when dealing with limited data.

The success of pre-trained models stems from the hierarchical nature of Convolutional Neural Networks (CNNs). Early layers capture low-level, generic features such as edges, textures, and shapes, which are widely applicable across different datasets. In contrast, later layers learn high-level features specific to the original dataset. Leveraging a pre-trained model allows these generic features to be transferred to a target dataset, enabling the model to adapt quickly to new tasks and conserving computational resources and time compared to training from scratch.

For our image processing project, we evaluated 16 popular pre-trained models, including ResNet50, VGG19, MobileNet, MobileNetV2, EfficientNetB0, DenseNet121, InceptionResNetV2, and InceptionV3. These models, extensively trained on large-scale datasets, have learned rich and transferable feature representations. Our performance evaluation involved running each model on our target dataset and assessing its accuracy, generalization capability, and ability to extract relevant features. The characteristics of these models are outlined in Table 1.

In this study, we utilized a pre-trained model for image classification training. We discarded the top classification layer of the pre-trained model, preserving the feature extraction layers responsible for capturing general patterns and image representations. Subsequently, we introduced a customized top classification layer consisting of global average pooling and dense layers. To adapt a pre-trained model for a different task, its architecture is employed to build a base model with existing weights. It is important to lock the pre-trained layers to avoid resetting their weights, as this would effectively mean starting the training process from the beginning. Additional trainable layers are introduced to leverage features from the original dataset for predictions on the new task. The pre-trained model, initially loaded without its final output layer, requires the addition of a new output layer, usually a dense layer with units matching the number of desired predictions. This setup enables the model to generate predictions on new datasets. To further enhance performance, fine-tuning can be employed, which involves unfreezing the base model and retraining it on the entire dataset with a very slow learning rate. Deep transfer learning for image classification operates on the principle that a model trained on a large and diverse dataset can effectively capture a broad representation of the visual world. By utilizing a model trained on such a dataset, the learned feature maps can be leveraged, eliminating the need to begin training from the ground up [32].

The following table presents an overview of selected pre-trained models, detailing their parameters, development years, and depths. Each model possesses distinct attributes that make it ideal for various image processing tasks. The VGG16 and VGG19 models, developed in 2014, are known for their simplicity and depth, consisting of 16 and 19 layers, respectively. They use small (3 × 3) convolution filters, which capture fine details in images, making them effective for image classification tasks despite their large parameter sizes (138 million for VGG16 and 143.7 million for VGG19). InceptionV3, with 23.9 million parameters and 159 layers, is part of the Inception series developed by Google in 2016. It introduced factorized convolutions and aggressive regularization, enhancing computational efficiency and decreasing the parameter count relative to previous models. DenseNet121, introduced in 2017, employs a dense connectivity pattern where each layer receives inputs from all preceding layers. With 8.1 million parameters and 121 layers, it mitigates the vanishing gradient problem and improves feature propagation while maintaining a smaller model size compared to other architectures. InceptionResNetV2 combines the Inception and ResNet architectures, featuring 55.9 million parameters and 572 layers. Developed in 2017, it merges the efficiency of Inception modules with the residual connections of ResNet, leading to improved training and convergence. ResNet50 and its updated version ResNet50V2, both with 25.6 million parameters and 50 layers, were introduced in 2016. Known for their use of residual blocks, these models mitigate the vanishing gradient problem, allowing for deeper networks and robust performance in various image classification tasks. MobileNet and MobileNetV2, developed in 2017 and 2018, respectively, are designed for mobile and embedded vision applications. They use depthwise separable convolutions to reduce parameters and computational cost, making them highly efficient for real-time applications. MobileNet has 4.2 million parameters and 88 layers, while MobileNetV2 has 3.5 million parameters and the same number of layers. EfficientNetB0, introduced in 2019 with 5.3 million parameters and 69 layers, is part of the EfficientNet family. It scales the network dimensions (depth, width, and resolution) uniformly using a compound scaling method, resulting in better performance and efficiency compared to traditional models. MobileNetV3Large and MobileNetV3Small, released in 2019, improve on the efficiency of the MobileNet series. MobileNetV3Large has 5.4 million parameters and 88 layers, while MobileNetV3Small has 2.9 million parameters and 88 layers. RegNetX models, introduced in 2020, have varying parameters and depths. They are designed to provide a simple, regular structure that can be easily adjusted to achieve optimal performance across a range of tasks, making them highly flexible. EfficientNetV2B0, part of the EfficientNetV2 series introduced in 2021, has 7.1 million parameters and 69 layers. It builds upon the original EfficientNet models with improved training speed and parameter efficiency, making it suitable for high-performance applications with limited computational resources. Mobile_ViT, developed in 2021, combines the efficiency of MobileNet with the performance benefits of Vision Transformers (ViT). With 5.6 million parameters and 20 layers, it provides a lightweight model that performs well on a variety of image classification tasks. ConvNeXtBase, introduced in 2022, represents a modernized version of conventional CNN architectures. With 88 million parameters and 53 layers, it integrates several advancements from recent research to achieve cutting-edge performance on image classification benchmarks. These pre-trained models, each with their unique architectures and strengths, offer a range of options for enhancing image classification tasks through deep transfer learning. By leveraging these models, the process of training for new tasks becomes significantly more efficient and effective.

3.2. Ensemble Learning Model

In this paper, we leveraged ensemble learning to improve the image classification performance of our dataset. We selected the top four performing models among those discussed earlier and applied an ensemble approach. Each model was equipped with custom top classification layers and trained independently on the training data, providing predictions for the test dataset. To create a combined prediction, we took advantage of the complementary strengths of these four models by averaging their individual predictions. This ensemble learning approach capitalized on the varied strengths of the individual models, leading to improved accuracy and robustness in image classification tailored to our specific task. By combining the strengths of these top models, we aimed to achieve better results than when using a single model alone. We found that combining these pre-trained models through ensemble techniques significantly enhances classification accuracy by optimally extracting features. The ensemble model, designed for pavement evaluation, leverages fine-tuned transfer learning to precisely extract features, addressing the multi-class classification challenge. Furthermore, the predicted class label can be obtained by using Equation (1).

\hat{y} = \underset{i \in {1, \dots, c}}{argmax} \frac{\sum_{j = 1}^{k} p_{j} (y = i | M_{j}, x)}{k}

(1)

where

M_{j}

= the j-th model;

k = the number of the selected pre-trained models;

p_{j} (y = i | M_{j}, x)

= the prediction probability of a class value i in a data sample x using

M_{j}

3.3. Evaluation Indicators

To evaluate the classification results, we used a confusion matrix and various indicators to assess the overall performance of the models. Accuracy (Equation (2)) represents the overall correctness of a model’s predictions. Precision (Equation (3)) measures the proportion of true positive predictions of a class to all instances predicted as positive of the same class. Recall (Equation (4)) assesses the model’s ability to capture all positive instances of a class and is calculated as the ratio of true positive predictions to all actual positive instances. The

F 1

score (Equation (5)), a harmonic mean of precision and recall, balances these metrics and is particularly useful when dealing with imbalanced class distributions or when both precision and recall are equally important.

Accuracy = \frac{TP + TN}{TP + TN + FN + FP}

(2)

Recall = \frac{TP}{TP + FN}

(3)

Precision = \frac{TP}{TP + FP}

(4)

F 1 = \frac{2 * Recall * Precision}{Recall + precision}

(5)

4. Case Study

4.1. Data Collection

4.1.1. Pavement Image Data Collection

In this case study, we utilized satellite imagery obtained from the National Oceanic and Atmospheric Administration (NOAA) with a resolution of 50 cm per pixel [33]. The images were specifically captured in September 2017 from the Houston metropolitan area. These images were originally collected by NOAA to document Hurricane Harvey, which struck Texas as a Category 4 hurricane on 25 August 2017. Figure 2 offers a comprehensive view of the areas encompassed by the satellite images.

Commercial high-resolution satellite imagery can be quite expensive, with costs for 30–50 cm resolution images typically ranging from USD 10 to USD 20 per square kilometer. For this case study, which focuses on the road network within the Houston metropolitan area, covering approximately 26,000 square kilometers, the use of commercial satellite imagery would incur significant costs. However, following Hurricane Harvey, NOAA released high-resolution satellite images to the public at no charge. This invaluable access allowed us to leverage unique data for our research, an opportunity that would have been prohibitively expensive without access to these free images.

The extraction of road segments from the satellite images involved several steps. Initially, we acquired the coordinates of the centerline of the pavement network from the Texas Department of Transportation (TxDOT). By using interpolation methods, we obtained the centerline coordinates of the entire pavement network with an accuracy to the level of feet. This allowed us to match the coordinates with the reference markers of the pavement network accurately. For any pavement section defined in TxDOT’s pavement management system, we could determine the exact coordinates of that section.

Subsequently, for each pavement section in the TxDOT pavement management dataset, typically 0.5 miles in length, a polygon was generated by extending 12 feet on each side of the centerline, effectively outlining the lane’s coverage area. The satellite image provided by NOAA is georeferenced, meaning that each pixel of the image corresponds to a specific coordinate. Utilizing the polygons as guides, we precisely cropped out the corresponding portions of the georeferenced satellite images that represented the pavement sections. This process ensured that only the relevant parts of the satellite imagery were retained for further analysis. Using this approach, we obtained images of more than 3000 pavement sections. The next step was to match these pavement sections with the pavement condition evaluation.

4.1.2. Pavement Condition Data Collection

Pavement condition data for this study were obtained from the Texas Department of Transportation (TxDOT). Figure 3 shows the road network managed by TxDOT’s Houston district, and the data were specifically sourced from the Pavement Management Information System (PMIS) database. For this research, we utilized PMIS data corresponding to the same year from which satellite images were obtained from NOAA. Each pavement section in the PMIS database is uniquely identified by the columns ROUTE NAME, OFFSET FROM, and OFFSET TO, with most sections being approximately 0.5 miles long, though the exact length can vary from section to section. The starting and ending points of each section are fixed and remain consistent year over year. The coordinates of these points are available, allowing us to precisely locate each pavement section within the georeferenced satellite images, where every pixel is tagged with specific coordinates. This geospatial data enables us to accurately crop the pavement sections from the satellite images using their coordinates and then match each section’s image with its corresponding condition score provided by TxDOT’s annual evaluations.

We used the condition score in the PMIS as the pavement condition indicator in this research. In the PMIS, there are three primary indicators that reflect the general condition of the road pavement: the distress score, ride score, and condition score. The distress score, ranging from 1 (most severe distress) to 100 (minimal distress), evaluates visible surface deterioration. This score combines various distress types to assess the impacts of cracking and rutting comprehensively. TxDOT calculates the overall distress score using a multiplicative utility analysis, which converts each specific distress type into a utility value (ranging from 0 to 1) via a formula:

U_{i} = 1 - α e^{- {(ρ / L_{i})}^{β}}

(6)

where

U_{i} =

the utility value of distress type i;

L_{i} =

the length of the distress type i;

α, ρ

, and

β =

the coefficients controlling the shape of the curve and value of

U_{i}

The overall distress score is calculated by multiplying 100 by the utility values associated with each type of distress relevant to the pavement category of the data collection segment and is included in the condition score.

The ride score quantifies the ride quality of a pavement section on a scale from 0.1 (roughest) to 5.0 (smoothest). It is determined by calculating the length-weighted average of raw serviceability index (SI) values collected from the area, providing a comprehensive evaluation of ride comfort.

The condition score offers an integrated evaluation of a pavement’s state, combining the distress score and ride score into a single value ranging from 1 (poorest condition) to 100 (best condition), presented in Equation (7). This metric captures the general public’s perception of the pavement’s overall condition. TxDOT classifies pavements into five condition states based on their condition score, as shown in Table 2, and these states are used for labeling each satellite image.

C S = U_{r i d e} \times 100 \times \prod_{i}^{n} U_{i}

(7)

where

U_{i} =

the utility value of the distress type i;

U_{r i d e} =

the utility value of the ride score.

Table 2. PMIS condition score classes.

Condition Score	Description
90–100	Very good
70–89	Good
50–69	Fair
35–49	Poor
1–34	Very poor

Figure 4 presents sample cropped satellite images from the datasets utilized in this research. Each image corresponds to a pavement management unit within the TxDOT Pavement Management System. These units are assigned a condition score based on annual inspections. As illustrated in Figure 4, the length and shape of each pavement segment vary. Figure 5 provides a closer view of segments from five different pavement sections categorized as very good, good, fair, poor, and very poor. As evident in Figure 5, the image processing algorithm effectively learns from these images and identifies patterns in the pixel changes, thereby establishing a correlation with the condition evaluation results.

4.2. Result and Analysis

In this study, we employed TensorFlow and Keras to construct and train our models, incorporating data augmentation techniques to enhance the training process. Data augmentation was used to artificially increase the diversity of our dataset by applying random transformations such as rotations, flips, zooms, shifts, and brightness adjustments to the original images. This approach helps prevent overfitting by allowing the models to generalize better from a more varied set of training examples. Figure 6 shows the process of evaluating pavement conditions using satellite images through a deep learning model, organized into three phases: preprocessing, training, and testing. In the Preprocessing Phase, a dataset of satellite images, containing various views of roads and pavements, undergoes augmentation to create diverse variations. The augmented dataset is then split into 80% for training and 20% for testing. During the Training Phase, the training images are fed into a deep learning transfer model that utilizes pre-trained architectures, which are fine-tuned to learn specific features related to pavement conditions. In the Testing Phase, the trained model is evaluated using the remaining images to assess its accuracy in predicting pavement conditions, categorizing them into levels ranging from very good to very poor.

The training process was carried out for 50 epochs on all models, using the default learning rate of the Adam optimizer. Each training iteration involved processing batches of 32 samples. Table 3 presents a comprehensive comparison of the models’ performance, evaluating both the accuracy and

F 1

score. Notably, MobileNet and ResNet50V2 achieved the highest accuracies, with both models attaining accuracies of over 89% and

F 1

scores of 0.91 and 0.90, respectively. InceptionV3, DenseNet121, and MobileNetV2 also performed exceptionally well, each achieving accuracies of around 87% and

F 1

scores between 0.87 and 0.88. These results indicate that these models are highly effective for image classification tasks. In contrast, the VGG19 model, despite being a well-known architecture, achieved a relatively lower accuracy of 72% and an

F 1

score of 0.73. This lower performance is likely due to the model’s large number of parameters, which may result in overfitting. Similarly, the VGG16 model showed slightly better performance with an accuracy of 75% and an

F 1

score of 0.74. Models like EfficientNetB0 and RegNetX demonstrated significantly lower performance, with accuracies of around 20% and 18%, respectively, and corresponding

F 1

scores. These results suggest that these models may not be well-suited for the specific dataset or task at hand without further fine-tuning. Other models, such as MobileNetV3Large and MobileNetV3Small, showed moderate performance with accuracies of 58% and 31%, respectively. The ConvNeXtBase model also performed relatively well, achieving an accuracy of 78% and an

F 1

score of 0.76.

To diagnose the training behavior of the models, learning curves (training loss + validation loss over epochs) for pavement extraction models are plotted in Figure 7. Based on the observations of the learning curve for each model, it is evident that all models were trained well without showing signs of underfitting or overfitting during the training process. The learning curves depict the training loss and validation loss over 50 epochs for each model. For instance, the learning curves for models such as VGG16, VGG19, InceptionV3, DenseNet121, and InceptionResNetV2 show a consistent decrease in both training and validation losses, indicating stable learning and good generalization to the validation set. The MobileNet, MobileNetV2, and ResNet50V2 models exhibit particularly smooth curves with minimal fluctuation, suggesting robust training processes and high stability. These models reached convergence relatively quickly and maintained low validation losses, further supporting their strong performance metrics noted earlier. On the other hand, models like ResNet50 and EfficientNetB0 display higher variability in validation loss compared to training loss, which may indicate sensitivity to the validation set or potential overfitting that requires closer inspection. However, their overall trends still demonstrate effective learning without severe overfitting issues. MobileNetV3Large and MobileNetV3Small show some fluctuations in their validation loss curves, but they eventually stabilize, suggesting that with more epochs or fine-tuning, their performance could improve. RegNetX and EfficientNetV2B0 show high initial losses that decrease steadily, indicating that, despite their initial high error rates, they do manage to learn effectively over time.

After completing the comparison of the model performances, we then chose the top four models: ResNet50V2, InceptionV3, MobileNet, and DenseNet121, for the ensemble learning model. The ensemble model’s performance is illustrated in the confusion matrix shown in Figure 7. The accuracy of this ensemble model is 0.93, and the

F 1

score is 0.93.

The confusion matrix (Figure 8) provides a detailed breakdown of the ensemble model’s predictions across different categories: fair, good, poor, very good, and very poor. The matrix reveals that the model achieved high accuracy across most categories. Specifically,

For the “Fair” category, the model correctly identified 160 instances without any misclassifications;
In the “Good” category, the model correctly predicted 142 instances, with minimal misclassifications (3 instances classified as fair and 6 as very good);
For the “Poor” category, the model achieved perfect classification with all 147 instances correctly identified;
The “Very Good” category had the most variance, with 121 correct predictions. There were some misclassifications: 12 instances were labeled as fair, 24 as good, 5 as poor, and 2 as very poor;
Lastly, for the “Very Poor” category, the model demonstrated perfect classification, correctly identifying all 153 instances.

Overall, the confusion matrix indicates that the ensemble model performs exceptionally well, with high precision and recall across all categories. This strong performance is reflected in the overall accuracy and

F 1

score of 0.93, confirming the effectiveness of combining the strengths of ResNet50V2, InceptionV3, MobileNet, and DenseNet121 in an ensemble approach. This ensemble model effectively balances the advantages of each individual model, leading to superior classification performance and robust predictions.

5. Conclusions

This research focuses on the experimentation of satellite image analysis and its application in evaluating pavement conditions by exploring deep learning-based computer vision models in estimating road conditions at relatively large scales. These models showed potential in identifying general pavement conditions. The use of 50 cm resolution satellite images in evaluating pavement conditions presents both challenges and opportunities. While this resolution is insufficient for detecting smaller pavement cracks directly, the aggregation of pixels can indicate the overall texture and pattern of the pavement surface, and deep learning models are capable of recognizing these patterns to infer the general condition of the pavement to highlight areas needing more detailed inspection, thus optimizing resource allocation for manual surveys. By considering the surrounding area and the overall appearance of the pavement, models can make broad predictions about its condition, which is useful for differentiating between condition states such as very good, good, fair, poor, and very poor. The case study results show that satellite-based infrastructure monitoring offers a promising and cost-effective approach for continuously monitoring pavement assets across extensive areas. By complementing traditional methods, this approach allows for extensive coverage without the need for on-site inspections, reducing both time and resource expenditure.

Although the initial application focused on a specific dataset, the framework was designed to be adaptable to various types of road conditions, including those influenced by unique climatic, geographic, or usage factors. Future work will involve training the models on a more diverse set of data, which includes different types of road environments, to enhance their robustness and accuracy across a broader spectrum of conditions. It is important to note that this approach cannot replace vehicle-based pavement condition inspections, which provide much more detailed assessments, including specific measurements of cracks, rutting, and roughness. The current satellite image approach is only suitable for network-level quick estimates of the condition of the network, facilitating high-level decision-making. Future research could focus on enhancing satellite imagery analyses with additional data sources to improve accuracy. Incorporating information such as traffic patterns, historical maintenance records, and climatic conditions can provide a comprehensive view of the pavement’s health. This integrated approach enables more precise condition assessments.

Author Contributions

Conceptualization, P.K.R.L. and L.G.; methodology, L.G.; software, P.K.R.L. and L.G.; validation, P.K.R.L. and L.G.; formal analysis, P.K.R.L. and L.G.; investigation, P.K.R.L. and L.G.; resources, P.K.R.L. and L.G.; data curation, P.K.R.L. and L.G.; writing—original draft preparation, P.K.R.L. and L.G.; writing—review and editing, P.K.R.L., L.G., P.L. and J.S.; visualization, P.K.R.L. and L.G.; supervision, L.G.; project administration, L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to legal reason.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, X.; Cervone, G.; Yang, L. Detection of Asphalt Pavement Potholes and Cracks Based on the Unmanned Aerial Vehicle Multispectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3701–3712. [Google Scholar] [CrossRef]
Fan, Z.; Wu, Y.; Lu, J.; Li, W. Automatic Pavement Crack Detection Based on Structured Prediction with the Convolutional Neural Network. arXiv 2018, arXiv:1802.02208. [Google Scholar]
Chitale, P.A.; Kekre, K.Y.; Shenai, H.R.; Karani, R.; Gala, J.P. Pothole Detection and Dimension Estimation System Using Deep Learning (YOLO) and Image Processing. In Proceedings of the 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), Wellington, New Zealand, 25–27 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
Fan, Z.; Li, C.; Chen, Y.; Mascio, P.D.; Chen, X.; Zhu, G.; Loprencipe, G. Ensemble of Deep Convolutional Neural Networks for Automatic Pavement Crack Detection and Measurement. Coatings 2020, 10, 152. [Google Scholar] [CrossRef]
Ji, A.; Xue, X.; Wang, Y.; Luo, X.; Wang, L. Image-based Road Crack Risk-informed Assessment Using a Convolutional Neural Network and an Unmanned Aerial Vehicle. Struct. Control Health Monit. 2021, 28, e2749. [Google Scholar] [CrossRef]
Maniat, M.; Camp, C.V.; Kashani, A.R. Deep Learning-Based Visual Crack Detection Using Google Street View Images. Neural Comput. Appl. 2021, 33, 14565–14582. [Google Scholar] [CrossRef]
Ahmadi, A.; Khalesi, S.; Golroo, A. An Integrated Machine Learning Model for Automatic Road Crack Detection and Classification in Urban Areas. Int. J. Pavement Eng. 2022, 23, 3536–3552. [Google Scholar] [CrossRef]
Sholevar, N.; Golroo, A.; Esfahani, S.R. Machine Learning Techniques for Pavement Condition Evaluation. Autom. Constr. 2022, 136, 104190. [Google Scholar] [CrossRef]
Jiang, S.; Gu, S.; Yan, Z. Pavement Crack Measurement Based on Aerial 3D Reconstruction and Learning-Based Segmentation Method. Meas. Sci. Technol. 2023, 34, 015801. [Google Scholar] [CrossRef]
Gagliardi, V.; Giammorcaro, B.; Francesco, B.; Sansonetti, G. Deep Neural Networks for Asphalt Pavement Distress Detection and Condition Assessment. In Proceedings of the Earth Resources and Environmental Remote Sensing/GIS Applications XIV; Schulz, K., Nikolakopoulos, K.G., Michel, U., Eds.; SPIE: Amsterdam, The Netherlands, 2023; p. 35. [Google Scholar] [CrossRef]
Haider, S.W.; Baladi, G.Y.; Chatti, K.; Dean, C.M. Effect of Frequency of Pavement Condition Data Collection on Performance Prediction. Transp. Res. Rec. J. Transp. Res. Board 2010, 2153, 67–80. [Google Scholar] [CrossRef]
Fagrhi, A.; Li, M.; Ozdem, A. Satellite Assessment and Monitoring for Pavement Management; Technical Report CAIT-UTC-NC4; Delaware Center for Transportation: Newark, DE, USA, 2015. [Google Scholar]
Li, M.; Faghri, A.; Ozden, A.; Yue, Y. Economic Feasibility Study for Pavement Monitoring Using Synthetic Aperture Radar-Based Satellite Remote Sensing: Cost–Benefit Analysis. Transp. Res. Rec. J. Transp. Res. Board 2017, 2645, 1–11. [Google Scholar] [CrossRef]
Brewer, E.; Lin, J.; Kemper, P.; Hennin, J.; Runfola, D. Predicting Road Quality Using High Resolution Satellite Imagery: A Transfer Learning Approach. PLoS ONE 2021, 16, e0253370. [Google Scholar] [CrossRef] [PubMed]
Bashar, M.Z.; Torres-Machi, C. Exploring the Capabilities of Optical Satellite Imagery in Evaluating Pavement Condition. In Proceedings of the Construction Research Congress 2022, Arlington, Virginia, 9–12 March 2022; pp. 108–115. [Google Scholar] [CrossRef]
Jiang, Y.; Han, S.; Bai, Y. Development of a Pavement Evaluation Tool Using Aerial Imagery and Deep Learning. J. Transp. Eng. Part B Pavements 2021, 147, 04021027. [Google Scholar] [CrossRef]
Karimzadeh, S.; Ghasemi, M.; Matsuoka, M.; Yagi, K.; Zulfikar, A.C. A Deep Learning Model for Road Damage Detection After an Earthquake Based on Synthetic Aperture Radar (SAR) and Field Datasets. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5753–5765. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17); Google Inc.: Mountain View, CA, USA, 2017; Volume 31. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proc. Mach. Learn. Res. 2019, 97, 6105–6114. [Google Scholar]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollar, P. Designing Network Design Spaces. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10425–10433. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. Proc. Mach. Learn. Res. 2021, 139, 10096–10106. [Google Scholar]
Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. arXiv 2022, arXiv:2110.02178. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar] [CrossRef]
Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018, Proceedings, Part III 27; Springer: Berlin/Heidelberg, Germany, 2018; pp. 270–279. [Google Scholar]
NOAA. Hurricane Harvey Imagery. 2023. Available online: https://shorturl.at/13dtB (accessed on 5 September 2024).

Figure 1. Proposed workflow.

Figure 2. Satellite images coverage.

Figure 3. Pavement network data used in this case study.

Figure 4. Sample cropped satellite images.

Figure 5. Close-up view of pavement segments across condition categories.

Figure 6. Satellite image-based pavement evaluation.

Figure 7. Learning curve for different models.

Figure 8. Confusion matrix of the ensemble model.

Table 1. Summary of pre-trained models.

Model	Parameters (Millions)	Year of Development	Depth (Layers)
VGG16 [19]	138	2014	16
VGG19 [19]	143.7	2014	19
InceptionV3 [20]	23.9	2016	159
DenseNet121 [21]	8.1	2017	121
InceptionResNetV2 [22]	55.9	2017	572
ResNet50 [23]	25.6	2016	50
ResNet50V2 [23]	25.6	2016	50
MobileNet [24]	4.2	2017	88
MobileNetV2 [25]	3.5	2018	88
EfficientNetB0 [26]	5.3	2019	69
MobileNetV3Large [27]	5.4	2019	88
MobileNetV3Small [27]	2.9	2019	88
RegNetX [28]	Varies	2020	Varies
EfficientNetV2B0 [29]	7.1	2021	69
Mobile_ViT [30]	5.6	2021	20
ConvNeXtBase [31]	88	2022	53

Table 3. Performance of different models.

Model	Accuracy	$F 1$ Score
VGG16 [19]	0.75	0.74
VGG19 [19]	0.72	0.73
InceptionV3 [20]	0.87	0.88
DenseNet121 [21]	0.86	0.88
InceptionResNetV2 [22]	0.86	0.87
ResNet50 [23]	0.39	0.32
ResNet50V2 [23]	0.89	0.90
MobileNet [24]	0.91	0.91
MobileNetV2 [25]	0.87	0.88
EfficientNetB0 [26]	0.20	0.19
MobileNetV3Large [27]	0.58	0.53
MobileNetV3Small [27]	0.31	0.32
RegNetX [28]	0.18	0.12
EfficientNetV2B0 [29]	0.20	0.19
Mobile_ViT [30]	0.55	0.54
ConvNeXtBase [31]	0.78	0.76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lebaku, P.K.R.; Gao, L.; Lu, P.; Sun, J. Deep Learning for Pavement Condition Evaluation Using Satellite Imagery. Infrastructures 2024, 9, 155. https://doi.org/10.3390/infrastructures9090155

AMA Style

Lebaku PKR, Gao L, Lu P, Sun J. Deep Learning for Pavement Condition Evaluation Using Satellite Imagery. Infrastructures. 2024; 9(9):155. https://doi.org/10.3390/infrastructures9090155

Chicago/Turabian Style

Lebaku, Prathyush Kumar Reddy, Lu Gao, Pan Lu, and Jingran Sun. 2024. "Deep Learning for Pavement Condition Evaluation Using Satellite Imagery" Infrastructures 9, no. 9: 155. https://doi.org/10.3390/infrastructures9090155

APA Style

Lebaku, P. K. R., Gao, L., Lu, P., & Sun, J. (2024). Deep Learning for Pavement Condition Evaluation Using Satellite Imagery. Infrastructures, 9(9), 155. https://doi.org/10.3390/infrastructures9090155

Article Menu

Deep Learning for Pavement Condition Evaluation Using Satellite Imagery

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Pre-Trained Models

3.2. Ensemble Learning Model

3.3. Evaluation Indicators

4. Case Study

4.1. Data Collection

4.1.1. Pavement Image Data Collection

4.1.2. Pavement Condition Data Collection

4.2. Result and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI