1. Introduction
Current practices in steel bridge fatigue crack inspection primarily relies on human vision. In the United States, to ensure inspection quality, the Federal Highway Administration (FHWA) and state departments of transportation have established guidelines concerning inspector qualifications, training, and certification. Nevertheless, Campbell et al. [
1] assessed inspectors’ ability to detect fatigue cracks by inviting a diverse group of 30 inspectors to conduct hands-on inspections of 147 steel bridge specimens with known fatigue cracks. The crack lengths for an average of 50% and 90% detection rates were 26 mm and 113 mm, respectively, highlighting the limitations of visual inspection even under optimal conditions. Moreover, 42% of the bridges in the U.S. are at least 50 years old, and many of them are in poor condition [
2]. In particular, fatigue cracks can lead to catastrophic failures in these bridges [
3,
4]. No existing fatigue crack detection techniques, whether contact-based or non-contact-based, can fully substitute human visual inspection with absolute accuracy and reliability. The involvement of human judgment remains essential for decision-making in the real-life bridge inspection process. Therefore, a human-in-the-loop approach, combining the expertise of the inspector with the capabilities of technology, can potentially result in better accuracy and efficiency in detecting fatigue cracks, thereby enhancing the safety and integrity of steel bridges.
Augmented reality (AR) is a great example of such technology that can provide inspectors with additional information and visual cues to aid in the detection and assessment of fatigue cracks. Azuma [
5] defines AR as systems with three essential characteristics as follows: (1) it combines the real and virtual worlds by overlaying virtual elements onto the physical environment, (2) it provides real-time interaction between the virtual and real world, and (3) the virtual objectives are registered in three dimensions. Examples of AR devices include Microsoft’s HoloLens 1 (HL1), HoloLens 2 (HL2), and Magic Leap. Moreu et al. [
6] used HL1 for a hands-free measurement of distance between two points on a structure. Napolitano et al. [
7] demonstrated a system for documenting, visualizing and annotating data using an AR environment. Maharajan et al. [
8] used an AR head-mounted device to access real-time data during a field inspection to enhance the field inspection. In an experimental validation, the bridge inspector visualized strain changes which helped the inspector make real-time decision based on changes in the surrounding environment. Kamat and El-Tawil [
9] superimposed computer-aided design (CAD) shear wall images onto the actual structure using an AR device integrated with a GPS-based tracking system. Inter-story drift was measured by interpreting key differences between the real and augmented views of the facility. Dunston and Shin [
10] applied AR to evaluate steel column tilt in a laboratory setting and reported higher efficiency than traditional manual inspection methods. Although AR devices can capture high-dimensional data such as images and videos, their computational power is still limited, making it challenging to provide near-real-time analysis results in many cases. To address the challenge, AR devices can leverage more computational power and analysis resources through wireless connection, allowing image and video data to be processed in real-time. The analysis results can then be sent back to the AR device, providing the inspector with necessary information for decision-making. Using this approach, Mohammadkhorasani et al. [
11] developed an AR software integrating the computer vision-based fatigue crack detection algorithm by Mojidra et al. [
12]. However, the algorithm itself lacks sufficient computational efficiency for nonstationary cameras, and the developed AR environment lacks the capability of enabling in-depth human–machine interaction for refining the crack detection result, as will be elaborated on later in the paper.
Various image processing-based crack detection methods have been developed as a result of the recent rapid advancement in computer vision. Abdel-Qader et al. [
13] used fast Haar transform, fast Fourier transform, and Sobel and Canny filters to detect cracks in concrete bridge images. Nishikawa et al. [
14] developed multi-sequential filters to remove noise and detect cracks in concrete structures. Yamaguchi et al. [
15] designed a percolation model to extract a continuous texture representing a crack depending on a length criterion. Iyer and Sinha [
16] generated crack segmentation maps in pipe structure images by enhancing the contrast of input images, applying filters, and evaluating curvature in the cross direction to produce the final binary map for segmented cracks. However, these methods based on traditional image processing struggle to differentiate between real cracks and crack-like features such as edges, corrosion marks, and scratches. Recent advancements in artificial intelligence (AI) have led to the development of convolutional neural networks (CNNs), which are capable of extracting valuable information from images. For example, Cha and Choi [
17] successfully detected cracks in concrete images under various challenging conditions, such as different lighting, shadows, blurs, and close-up views, by training a CNN for image analysis.
The aforementioned techniques were designed primarily to detect cracks in concrete structures or pavements, which are relatively easier to detect compared to fatigue cracks in steel structures. This is because concrete cracks are typically more conspicuous and visually evident due to their wider and more irregular nature, while fatigue cracks in steel structures are often more subtle and harder to detect visually. In addition, images of steel structures often contain features that resemble cracks, such as corrosion marks, bolts, and bolt holes, which can make crack detection more challenging. Dung et al. [
18] trained a CNN to identify fatigue cracks in steel bridges. The dataset images were partitioned into smaller segments to eliminate features resembling cracks in the segments containing actual cracks. Dong et al. [
19] suggested a neural network with an encoder–decoder architecture, adapted from the U-Net structure, for segmenting fatigue crack pixels. However, the fatigue cracks were annotated by markers above the cracks, which made the detection easier. The authors observed that the network also identified marker curves, weld line edges, and handwriting as crack pixels. The performance of CNNs is often dependent on the quality and characteristics of the data used for training. Specifically, if the test images have significant variations or differences from the images used in the training dataset, it can lead to inaccurate or false predictions. To develop a neural network that can accurately detect fatigue cracks in a variety of bridge images, a significant amount of data gathering and labeling is required, leading to a process that is time-consuming and expensive. Furthermore, the issue of false positive predictions of other crack-like features remains a significant challenge in the development of accurate fatigue crack detection techniques.
To overcome the challenges associated with fatigue crack detection, Kong and Li [
20] developed an approach through the use of a short video stream under live load for structural surface displacement tracking and analysis to accurately detect fatigue cracks. Specifically, a short video is recorded using a fixed camera to capture a region of interest (ROI) undergoing several fatigue cycles. Subsequently, feature points are identified to track the surface displacements, from which fatigue cracks are detected by recognizing differential displacement patterns caused by the crack opening and closing under fatigue load, which surpass a predefined threshold. The suggested method demonstrates strong robustness in the presence of crack-like features such as corrosion marks, dust, and bolt holes, as it relies not only on spatial surface features but also on temporal changes in these features. Unlike crack-like features that remain relatively stable over short periods, feature points around fatigue cracks experience substantial displacements due to the opening and closing of the cracks. Later, Mojidra et al. [
12] improved the above displacement-based method for nonstationary cameras by introducing Global Motion Compensation (GMC) techniques specifically designed for both 2D and 3D videos. GMC eliminates camera motion from the video stream, such that the tracked displacements are free of the camera motion and the same method by Kong and Li [
20] can be applied to videos captured by cameras hosted on a moving platform such as unmanned aerial vehicles and wearable devices. Nonetheless, a limitation of the displacement-based crack detection method is its dependence on GMC, which is computationally expensive. For example, the authors reported that the GMC took 1.1 s per video frame for 2D videos, while for 3D videos with reduced resolution, it cost 1.75 s per frame. A 5–6 s video with 30 frames per second could take up to 5 min for processing, posing a challenge for achieving near-real-time human-centered bridge inspection.
In this paper, we present a novel human-centered fatigue crack inspection methodology empowered by AR and computer vision. The novelties and contributions of this paper are twofold as follows: (1) improvement of the crack detection algorithm by Mojidra et al. [
12] and development of a method to quantify the detection accuracy and (2) integration of the algorithm into an AR environment with unique features to enable human–machine interaction for bridge inspections. First, for fatigue crack detection using a moving camera, we further improved the video-based method [
12] by eliminating the need for GMC, achieving near-real-time results with higher accuracy. Rather than analyzing the absolute displacements of feature points, we switch to tracking the distance change between feature point pairs, which effectively removed the impact of any rigid body motion including those induced by camera movement, eliminating the need for GMC. In addition, unlike other methods, our crack detection result is composed of discrete feature points surrounding the crack. No method currently exists to quantify its accuracy. To bridge this gap, this paper proposes a new method to quantify the crack detection result through clustering analysis. Furthermore, an interactive AR environment is created to integrate the crack detection algorithm and facilitate human–machine interaction for achieving optimal detection results. The AR environment enables the inspector to perform video acquisition via the embedded camera of the AR headset, then sends the video to the cloud server for processing. The processing results are then converted into holograms. In particular, these holograms, along with an AR virtual menu that allows the inspector to toggle through various crack detection parameters, enable in-depth human–machine interaction to refine the crack detection results, hence facilitating human-in-the-loop decision-making. Finally, the integrated human-centered bridge inspection process is demonstrated using a laboratory bridge specimen through near real-time crack detection.
2. Methodology
This section describes the methodology of the proposed human-centered bridge inspection process.
Figure 1 depicts the overall concept and the essential components. First, a local area network (LAN) is created using a Wi-Fi router, which establishes a connection between the AR headset, e.g., HL2, and the server computer which is responsible for performing the developed crack detection algorithm and data storage (raw video data and processing results). Then, the bridge inspector proceeds to record a short video of a fatigue-crack-prone region in the structure to capture several fatigue load cycles using the embedded camera through the developed AR application. The video is then automatically transmitted to the server through the LAN connection, where the video is stored and processed for crack detection. Note that the LAN can be replaced by a cellular network, which connects to a remote server computer for data processing and storage.
To process the video at the server, an ROI is first selected based on the first video frame. Distinctive feature points, determined by pixel intensity gradient of the image, are identified in the first frame. Their positions are then tracked throughout the video. Subsequently, these features are grouped into small local circular regions (LCRs), within which the distance between each feature point pair is calculated. Feature point pairs not situated on opposite sides of the fatigue crack display minimal distance changes. Otherwise, they exhibit distance variations resulting from the crack opening and closing. By examining the distance history of all unique feature point pairs within the LCRs and identifying significant distance changes that surpass a threshold value, the feature pairs associated with a fatigue crack within the LCRs are isolated and highlighted. Ultimately, the collection of highlighted feature points, or feature point cluster, outlines the location of the fatigue crack. Compared to the method in [
12], which performs GMC first before tracking the absolute displacements, this improved method does not require GMC for removing camera motion as tracking distance changes effectively eliminates the impact of rigid-body motions, hence it possesses much higher computational efficiency and generates results in near-real-time. A more in-depth explanation of the improved crack detection algorithm is provided in the subsequent section.
Further, since the optimal threshold value cannot be uniquely defined for all situations, human input in the field is essential to ensure success. To facilitate in-depth human–machine interaction, a range of threshold values are applied in the algorithm at the server to produce a group of feature point clusters as candidate crack detection results, which are then transmitted to the AR device via the LAN. The AR application converts each feature point cluster into a hologram which can be overlaid onto the real-world view of the actual structure for further examination. Meanwhile, an interactive virtual menu is presented to the inspector, in which the threshold values can be selected to view the associated crack detection results. By toggling through the range of threshold values, the inspector interacts with the holograms and decides the optimal threshold for the final crack detection result. This enhanced visualization and visual indicators help the inspector in detecting fatigue cracks that may be imperceptible to the naked eye, enabling them to make more informed decisions during the inspection process.
3. Crack Detection Algorithm
The core principle of the proposed crack detection approach is adopted from [
20], which focuses on identifying crack-induced surface motion under fatigue loading, rather than merely extracting crack edges and features. The recorded video is analyzed to detect discontinuities caused by fatigue crack opening and closing under live load. Feature points, which are specific pixels in images with high intensity gradient, are detected and used to track the surface motion. Notable algorithms for feature point detection include Harris–Stephens [
21], Scale Invariant Feature Transform (SIFT) [
22], Speeded-Up Robust Features (SURF) [
23], Shi–Tomasi [
24], and Features from Accelerated Segment Test (FAST) [
25]. In this study, the Shi–Tomasi algorithm was chosen for its robust performance because it is improved upon the Harris–Stephens method by using the minimum eigenvalue of the gradient matrix for corner detection, leading to more accurate and reliable feature point detection in many scenarios. Moreover, the Shi–Tomasi algorithm is relatively simple and computationally efficient compared to more complex methods like SIFT and SURF. This makes it suitable for real-time applications where speed is crucial. Feature points are detected within the selected ROI in the first frame of the video. These features are then tracked in subsequent video frames using the Kanade–Lucas–Tomasi (KLT) tracker [
26,
27]. For robust tracking, we implemented forward and backward tracking [
28] as well as five pyramid level representations in the KLT tracker. One advantage of using the KLT tracker is its computation efficiency, as it is based on sparse optical flow.
However, the original method tracks absolute displacements of surface motion and only works for videos taken by a fixed camera. Mojidra et al. [
12] introduced GMC to remove the camera motion, but it significantly increased the processing time. In this section, to achieve accurate detection results while minimizing computation, a method is proposed based on tracking distance change between feature point pairs.
To illustrate the distance-based crack detection algorithm, consider the four video frames from a video stream as shown in
Figure 2. The camera’s field of view, represented by the black rectangle, covers a steel plate (illustrated by the blue rectangle) and other background elements. The detected feature points in the selected ROI are indicated by the plus symbols. Two LCRs are evaluated including LCR 1, situated over the fatigue crack, and LCR 2, located away from the fatigue crack. LCR 1 contains a total of seven feature points, with feature point 1 at the center of the LCR. Feature points 1 to 4 are beneath the fatigue crack, while points 5 to 7 are above the crack. Meanwhile, LCR 2 encompasses a total of six feature points, with feature point 10 situated at the center of the LCR. As an illustration, in Frame 2, due to the camera movement, the steel plate has moved to the right within the camera’s field of view. In Frame 3, the camera has moved further, and the fatigue crack has opened under the live load. In Frame 4, the steel plate has moved upward, and the fatigue crack has closed. The distances between the central feature point and the remaining feature points within the same LCRs of all four video frames are depicted at the bottom right of
Figure 2.
The distance between feature point pairs within the LCRs in the first and second frames remains virtually identical because the global camera motion only introduces ridge-body movements, hence feature points within a local vicinity move in an identical manner. However, in Frame 3 and Frame 4, the distances between the feature point pairs that cross the crack have increased due to the crack opening or closing, and the distance between the remaining feature pairs that do not cross indicate a crack that continues to remain unchanged. In this case, LCR 1 contains two different patterns in the distances among feature points due to the crack’s response to live load, whereas LCR 2, lacking crack activity, shows only one pattern. The presence of camera-induced rigid body motion did not influence distance patterns in both cases, eliminating the requirement for motion compensation when distance serves as the metric. Consequently, the computationally demanding process to remove the camera motion, a crucial part of the displacement-based method [
12], becomes unnecessary for the distance-based method, leading to much improved computation efficiency, which is essential for near-real-time crack detection for field applications. Moreover, as will be described in the validation studies, this distance-based method also outperforms the displacement-based approach in terms of crack detection accuracy.
To effectively analyze the entire ROI, an initial random feature point is chosen as the center for the LCR, followed by a search to collect feature points within the LCR. Next, the standard deviations of distance histories for all unique feature pairs are computed. If the standard deviation of the distance history falls below a certain threshold value, no differential motion is detected. Conversely, if the standard deviation exceeds the threshold value, the feature point pair is highlighted in the results. This process is repeated by selecting a different feature point as the center of the next LCR until all feature points have been analyzed. The final crack detection result comprises clusters of all the highlighted feature points for which a differential movement pattern is identified within the associated LCR. As these highlighted feature point clusters trace along the fatigue crack, they intuitively indicate the location and extent of the detected crack. As explained previously, a range of threshold values are employed to filter out feature point pairs with low distance variations, generating a series of outputs. Subsequently, the inspector can toggle through the holograms of this series of feature point clusters as the crack detection results at the given bridge size, and the enhanced visualization aids in decision-making in terms of selecting the optimal threshold.
4. AR Environment
An AR software package has been developed to create a holographic interface and facilitate human-centered bridge inspections. We utilized the HL2 as the AR device, which is capable of performing 3D scanning (spatial mapping) of the surrounding environment, programming, and projection. A server, which could be a laptop or PC, runs both a crack detection algorithm in MATLAB (R2021a) [
29] and a Structured Query Language (SQL) database. This setup facilitates communication with an HL2 device through a local area network (LAN), established using a wireless router that also serves as a Wi-Fi hotspot. The network allows for bidirectional communication, enabling the SQL database on the server to send and receive data not only with the HL2 but also with the MATLAB application. For robust and seamless interaction between the HL2 and the server, it is crucial that both devices remain within the range of the Wi-Fi hotspot. The database transmits the video captured by the HL2 to the server for crack detection analysis, and the results are sent back to the HL2 to generate a hologram. An Auto Anchoring System (AAS) has also been developed for automatic anchoring of holograms onto the structure. More details on the AAS and the database are described in [
11,
30].
To provide a user-friendly AR interface, a virtual menu has been designed for the AR software, enabling users to smoothly operate and interact with the software.
Figure 3 displays the virtual menu with its associated functionalities labeled. A “Visual Mesh” button can be found on the top right section of the virtual menu. The visual mesh button leverages HL2’s spatial mapping capabilities, which involve its comprehension of the surrounding environment. This feature allows the user to visualize all objects that the HL2 assesses as part of its 3D surroundings, enabling the inspector to confirm that the HL2 is examining the correct surface during the inspection. The video upload button facilitates video recording during the inspection process. The slider for controlling video length allows the user to adjust the video duration (between 1 and 10 s) captured by the HL2. This feature enables users to ensure that an adequate number of fatigue load cycles are included in the recorded video. The flying menu button toggles the virtual menu between flying and hovering modes. The flying mode enables the menu to follow the inspector’s movement during the inspection, freeing the inspector’s hands for other tasks and eliminating the need to manually move the menu. In contrast, the hovering mode is suitable when the inspector needs to inspect a specific section of the structure. By switching to the hovering mode, the inspector can manually reposition the virtual menu as needed, allowing for more focused examination. The “Thresholds” function presents a series of virtual buttons corresponding to the number of thresholds for crack detection specified by the inspector. As illustrated in
Figure 4, the virtual threshold options are displayed upon pressing the “Thresholds” button. Each button represents a single threshold result produced by the crack detection algorithm. This feature is the key to enabling human-in-the-loop decision-making, allowing the inspector to interact with the crack detection results, effortlessly reviewing the outcomes associated with a range of thresholds, thereby enabling the selection of the most appropriate result for documentation, monitoring, and informed decision-making.
6. Quantification for Crack Detection
Multiple methodologies have been adopted for quantitatively evaluating outcomes in crack detection. Vision-based deep learning models for semantic segmentation assign a label to each pixel of the image. These pixel labels are then compared to the ground truth labels of the corresponding pixels. However, our vision-based approach uses salient feature points to analyze differential surface motion, and feature points surrounding the crack are highlighted as the detected crack. Hence, the crack detection result is composed of sparse pixel points rather than a continuous area labelled as the crack region. However, detecting the clusters of feature points for crack detection can be considered as segmenting the crack region based on the area associated with the detected feature points. Moreover, the width of the detected feature point cluster depends on the radius of the chosen LCR, which can be used to define the ground truth of crack region. To provide a quantitative measure of the crack detection result, a new approach is developed in this paper, which consists of three steps as follows: (1) clustering the detected feature points and forming boundaries of the clusters, (2) determining the boundary of the ground truth, and (3) evaluating the result based on the intersection over union (IOU). Details are explained as follows.
6.1. Clustering
Clustering involves organizing data points into groups according to their similarities. Within each group, data points exhibit a higher degree of similarity to each other than to data points in other groups. To determine the level of similarity or difference, a domain-specific dissimilarity measure (or distance metric) applicable to the dataset is utilized. Various algorithms for clustering exist, including K-means, Gaussian Mixture Models (GMM), Hierarchical Clustering, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Spectral Clustering. Among these, K-means and Hierarchical Clustering rely on distance for grouping, whereas DBSCAN focuses on the density of data points, and GMM is predicated on the mixture of Gaussian distributions. Both K-means and GMM necessitate predetermined input regarding the number of clusters, whereas Hierarchical Clustering provides a dendrogram that offers a range of clustering options based on different threshold levels. On the other hand, DBSCAN clusters the detected feature points based on density, and it produces a single result, forming clusters by connecting dense regions and treating noise as a separate group. Since it requires no manual input for number of clusters, DBSCAN was selected as the clustering method in this study.
The DBSCAN algorithm depends on two principal parameters for its operation including the following: (1) epsilon (ε), which serves as a measure of neighborhood distance for determining cluster membership, and (2) the minimum number of points (MinPts), which establishes the threshold for cluster formation. Specifically, ε delineates whether a point is considered as a part of a cluster based on proximity. For instance, should a point be situated 13 units from cluster K and ε is set to 13 or more, the point is deemed a member of cluster K; if not, it is classified as external to the cluster K. MinPts, on the other hand, specifies the least number of points required to constitute a cluster.
In this study, ε is defined as the radius of the LCR because our method relies on analyzing the movement patterns of sparse feature points within this radius. By setting ε equal to the radius of the LCR, we ensure that feature points located beyond this boundary are classified as belonging to a separate cluster. This distinction is crucial as feature points situated at greater distances should not be interpreted as part of the contiguous area identified as a crack region. In our study, the parameter MinPts is set to 10. This value is chosen to be sufficiently low to eliminate isolated feature points in the crack detection results. Setting MinPts to a value greater than 10 could erroneously classify a valid cluster of detected feature points as noise, which is not desirable. The DBSCAN methodology initiates with a random feature point, assessing its ε-neighboring points. This neighborhood qualifies as a cluster if the number of feature points is at least MinPts; if not, the point is tagged as noise. Subsequent points are then evaluated for potential inclusion in this or another cluster or identified as noise. To illustrate,
Figure 7a presents the crack detection outcome using a low threshold value, while
Figure 7b visualizes three distinct clusters colored in red, black, and blue. Feature points on the lower clevis and at the boundary with the plate, being more distant than ε from any cluster and numbering fewer than MinPts, are disregarded as noise. For identified clusters, boundaries are formed by connecting their outermost points, with every pixel within these boundaries classified as part of a crack. This process effectively transforms sparse feature points into continuous areas, providing a reliable basis for assessing accuracy of crack detection.
6.2. Ground Truth
The ground truth for crack detection is established by defining the actual dimensions and positioning of the cracks.
Figure 8a illustrates the path of an in-plane fatigue crack, indicated by a continuous black line for the C(T) specimen. Given that this crack follows a straight trajectory, the ground truth is represented as a rectangular area, with its length corresponding to that of the crack. The width of the ground truth rectangle is determined to be the diameter D of the LCR since the width of the detected feature point cluster closely aligns with the LCR’s diameter. Meanwhile, the fatigue crack in the bridge girder specimen features three distinct branches, designated as A, B, and C. Due to the fact that Branch A dominated the crack movement, it is chosen as the reference for establishing the ground truth. As illustrated in
Figure 8b, Branch A includes three linear sections; therefore, a polygon is formed to represent the ground truth. Similar to the C(T) specimen, the height of this polygon is defined based on the diameter D of the LCR.
6.3. Intersection over Union (IOU)
With the clustering result and the ground truth, intersection over union (IOU) is computed as the metric for assessing the performance of crack detection. IOU describes the extent of overlap between two regions. The value of IOU ranges between 0 and 1, with 0 indicating no overlap and 1 indicating a perfect overlap.
Figure 9 demonstrates examples of three different IOU values, in which an IOU score of 0.40 represents good performance in localization but not in coverage, an IOU score of 0.73 signifies satisfactory result in both localization and coverage, and an IOU of 0.92 exemplifies exceptionally high precision in both localization and coverage.
8. Parametric Study
Since the developed method relies on the crack opening and closing under live load, the result is significantly influenced by the load level. Therefore, a parametric study was carried out to examine the influence of fatigue load level on crack detection accuracy. Videos were captured from two distinct perspectives, as depicted in
Figure 6b. View 1 observes the cracked region approximately equidistant from the connection plate and the girder web. In View 2, the video is recorded almost parallel to the girder web, resulting in a considerably greater parallax effect compared to View 1. A total of 10 fatigue load cases were considered in this parametric study as listed in
Table 1. The minimum load level (Fmin) in each load case was 0.9 kN, and the maximum load level (Fmax) starts at 2.2 kN for LC1 and is increased by 2.2 kN in each subsequent load case. As a result, the first fatigue load case has a load range of 0.9 kN to 2.2 kN and the last fatigue load case ranges from 0.9 kN to 22.2 kN. In particular, based on a deflection criterion, the AASHTO fatigue truck loading [
35] corresponds to an applied actuator load of 7.8 kN [
34], making it somewhere between LC3 and LC4. The IOU values for the various load cases are tabulated in
Table 1.
As shown in
Table 1, results based on the videos recorded with View 1 angle have IOU values ranging from 0 to 0.77. The IOU scores can be categorized into two groups. The first group has lower IOU scores ranging from 0 to 0.40 for View 1 and from 0 to 0.44 for View 2. The second group has higher IOU scores ranging from 0.62 to 0.77 for View 1 and from 0.64 to 0.70 for View 2. As the load level increases, a corresponding increase in the IOU value is observed, enhancing crack detection capability. The IOU value is recorded at 0 for the initial load cases, LC1 and LC2, but increases to 0.34 and 0.40 for LC3 and LC4, respectively. For higher load cases from LC5 to LC10, IOU values exceed 0.62, indicating a significant overlap between the detected and actual crack areas, with more than two-thirds of the actual crack area being covered. In the context of View 2, the IOU values for LC2 to LC5 range between 0.42 and 0.44, with the detected crack area covering more than 64% of the actual crack area for load cases LC6 to LC10. Additionally,
Table 1 includes the magnitude of crack opening for each load case, demonstrating the algorithm’s consistent performance in detecting fatigue cracks when the crack opening size surpasses 0.5 mm.
9. Validation of Human-Centered Bridge Inspection
The proposed human-centered bridge inspection approach was validated on the bridge girder specimen through the developed AR environment integrated with the proposed crack detection algorithm. First, as shown in
Figure 14, the HL2 and the server, which hosts the MATLAB program for the crack detection algorithm, and the SQL database were both connected to a Wi-Fi hotspot. Then, the MATLAB program was initiated to enter the standby mode, waiting for a new inspection video to be uploaded to the database. The bridge inspector opened the AR software and adjusted the virtual menu to a convenient position using the flying menu button. Then, a 10 s video was recorded by pressing the video upload button as shown in
Figure 15a, in which the recorder view of the figure shows the third-person view of the entire bridge inspection process, and the HoloLens view shows the first-person view of the bridge inspector through the HL2. Once the 10 s video was recorded, it was automatically sent to the server for crack detection using the MATLAB program.
Upon completion of the near-real-time processing of the video (approximately 30 s), the crack detection results corresponding to a range of threshold values were transmitted to the HL2. Subsequently, the AR software converted the feature points associated with the crack detection results into holograms and generated a virtual menu (
Figure 4), from which the inspector could choose a specific threshold value to examine the crack detection outcome anchored on the bridge surface. For instance, as depicted in
Figure 15b, the inspector initially selects the threshold value of zero to examine all the detected feature points from the analyzed region. This step provides the inspector with an understanding of the extent of the area assessed for fatigue crack detection. Subsequently, as illustrated in
Figure 15c, the inspector selects a higher threshold value and evaluates the identified fatigue crack result. Throughout this process, the inspector altered its position multiple times and was able to see the fatigue crack feature points anchored on top of the crack, demonstrating the effectiveness of the automatic hologram anchoring system. This enhanced visualization aids the inspector in detecting and localizing fatigue cracks with minimal effort, making the inspection process more robust and efficient.