1 Introduction

With the advent of the digital age, the demand for electronic devices is growing exponentially, and flexible circuit boards (FPCs) are a fundamental part of this technology-driven world. Printed circuit boards are the backbone of modern electronics, providing vital interconnections for countless electronic components. As technology advances, the complexity and miniaturization of printed circuit boards continues to increase, requiring a robust and efficient defect detection mechanism to ensure product quality and reliability. Connectors are one of the key components of flexible circuit boards, and defects that occur during the manufacturing process can lead to functional failures, shortened product life, and even safety hazards [1,2,3]. Therefore, the development of accurate and efficient printed circuit board defect detection technology is an important task in the electronics manufacturing industry. Based on the defects encountered by our partner enterprises during the production process, combined with the classification of FPC defects by other researchers [1,2,3], we have categorized common defects into the following five types: scratches, oxidation, foreign matter, dirty defects, and missing electrodes. Foreign matter, dirty defect, and missing electrodes are three types of defects that have less data and are all characterized by their black color. Therefore, in this paper, they are combined into a single category called black defects (Fig. 1).

Fig. 1
figure 1

Five types of defects. All of the defects found in the 1898 sample defect maps we collected can be grouped into five categories. According to descriptions provided by our partner manufacturers, these categories are also representative of the vast majority of defect types encountered in actual production. a Scratches, b Oxidization, c Dirty defect, d Missing electrodes, e Foreign matter, and f Foreign matter

Traditional printed circuit board defect inspection methods, such as optical and manual visual inspection, are time-consuming, labor-intensive, and prone to human error. With the proliferation of consumer electronics and Internet of Things (IoT) devices, there is an urgent need to automate FPC defect inspection to meet the demand for low-volume production and customized products. This has led to the exploration of various image processing and machine learning algorithms to enhance the inspection process.

Currently, target detection is mainly divided into two-stage detection algorithms represented by Fast R-CNN [4, 5] and single-stage detection algorithms represented by YOLO [6,7,8]. According to previous comparisons, the current new series of YOLO algorithms not only outperforms common two-stage detection algorithms in terms of detection speed but also surpasses them in terms of accuracy [9,10,11,12,13].

In the task of defect detection for Flexible Printed Circuits (FPCs), challenges arise due to subtle distinctions between oxidized defects and the background, as well as the varied boundaries of oxidized and black defects. YOLOv9 is not effective in their detection. In addition, the connectors has a clear boundary between the conductive sheet and the void, which greatly increases the possibility that the predicted box and the real box are included. As a result, the possibility of degradation of the CIoU used by YOLOv9 in this task is also increased, which will significantly reduce the effect of gradient descent during bounding box regression. FPC defect detection requires high speed, and there is still room for improvement in the computational load and accuracy of the RepNCSPELAN4 module in YOLOv9. To address the above problems, the following improvements are made in this paper:

  1. 1.

    The insertion of the Deformable Large Kernel Attention (DLKA) module improves the adaptability to defects with complex boundaries.

  2. 2.

    The inclusion of the Multi-Scale Dilated Attention (MSDA) module at the output further enhances the extraction of defective features.

  3. 3.

    Replacing obsolete CIoU loss with MPDIoU completely eradicates the possibility of IoU failure or degradation.

  4. 4.

    Replacing RepConvN operations in the RepNCSPELAN4 module with Faster Block increases accuracy while reducing parameters and Floating Point Operations Per Second (FLOPs).

Experiments have demonstrated that our solution targets the aforementioned problems, and the modified YOLOv9 significantly increases the detection performance of oxidation defects and black defects. The mAP75 for oxide defects increased by 7.5%, and the mAP50 for black defects increased by 5.7%. Due to the distinct boundaries of the connectors’ conductive sheet, a defect is often divided into multiple parts that are marked by multiple prediction boxes. Only when multiple prediction boxes are not marked, the board with defects passes the quality inspection. Therefore, the reliability of the model in practical applications will be higher than that presented by the recall data.

The next parts of this paper are arranged as follows: the second part introduces the work related to circuit board defect identification and the improvement of the YOLO algorithm; the third part introduces the framework and implementation details of the improved YOLOv9 model; the fourth part conducts comparative experiments with the existing model, ablation experiments with each module, and case studies; the fifth part elaborates the conclusions.

2 Related Work

Redmon et al. proposed YOLO [6], a real-time object detection model that offers high speed and accuracy by predicting bounding boxes and class probabilities directly from images. YOLOv4 by Bochkovskiy et al. [9] enhances real-time object detection, achieving 43.5% AP with 65 FPS on MS COCO, through features like WRC, CSP, and Mish, setting a new standard in accuracy and speed. Wang et al. [11] introduce YOLOv7, a state-of-the-art real-time object detection system that achieves superior speed and accuracy through innovative trainable bag-of-freebies and model scaling techniques, significantly outperforming other detectors in its category. Wang et al. [13] present YOLOv10, a cutting-edge object detection model that advances the state-of-the-art by offering superior real-time performance and accuracy, underpinned by its novel NMS-free training strategy and efficiency-accuracy driven design. The YOLO algorithm is widely used in the research field of PCB defect detection. Adibhatla et al. [14] used the YOLO algorithm to achieve high accuracy in the detection of PCB defects through a deep convolutional neural network (CNN). Wang et al. [15] improved the YOLO model by introducing a hybrid attention module and jumping null convolution pyramid to enhance the recognition of tiny defects. Xia et al. [16] proposed a YOLO model combining global contextual attention and a ConvMixer prediction head, which effectively improves the detection accuracy of small targets. Li et al. [17] optimized the YOLOv3 algorithm by combining real and virtual PCB image datasets, significantly improving detection accuracy. Additionally, Yuan et al. [18] proposed an improved YOLOv5 framework, YOLO-HMC, for more accurate identification of tiny-sized PCB defects. Chen et al. [19] proposed a Transformer-YOLO-based PCB defect detection method, which improves detection accuracy and efficiency by enhancing the clustering algorithm and feature extraction network. Agus Dwi Santoso et al. [20] developed a PCB defect detection system using the YOLO CNN method through image processing techniques to help determine fracture paths and drilled holes in PCBs.

Mnih et al. [21] proposed a novel recurrent neural network model that efficiently processes large images by adaptively selecting image regions and independently controlling the amount of computation. This is an early application of the attention mechanism in the field of computer vision, demonstrating its potential in complex visual tasks. Nowadays, the attention mechanism is widely used to improve YOLO, significantly enhancing the model’s feature extraction capabilities. For example, Hu et al. [22] proposed an attention-guided YOLO network (PAG-YOLO) based on an attention-guided YOLO network, which improves the detection accuracy of small-boat targets through an attention mechanism in both spatial and channel dimensions. Additionally, Xue et al. [23] developed a multimodal attention fusion YOLO (MAF-YOLO) for pedestrian detection tasks, achieving efficient detection in nighttime environments by combining visible and infrared images. For object detection in UAV-captured scenes, Zhu et al. [10] proposed an improved YOLOv5 model (TPH-YOLOv5), which utilizes the Transformer predictive head and convolutional block attention model to significantly improve detection performance. Zixiao et al. [24] pioneered the use of Multihead Self-Attention (MHSA) to construct an improved backbone, MHSA-Darknet, which preserves global contextual information and extracts more differentiated features. The integration of a Bidirectional Feature Pyramid Network (BiFPN) further optimizes cross-scale feature fusion, showing notable results in UAV-captured images. Zexuan et al. [25] introduced a TRANS module based on the Transformer architecture in the backbone and detection header of YOLOv5. The addition of this module allows the combination of features with global information, enhancing the model’s ability to accurately detect steel surface defects in real time.

In bounding box regression, the loss function is the basis of gradient descent. To solve the problem of easy failure that existed in early IoU loss, improved models such as GIoU and DIoU were gradually proposed [26, 27]. They add extra terms after the IoU to alleviate the failure problem. However, DIoU degenerates into IoU when the centroids of the prediction frame and the actual frame coincide. To solve this problem, Zhaohui et al. proposed the CIoU loss function based on DIoU [28], which is also used in the YOLOv9 model. When the ratio of the predicted frame to the actual frame’s length and width is equal, CIoU degrades into DIoU, leading to performance degradation.To avoid this degradation, Yi-Fan et al. proposed EIoU [29], and Siliang et al. proposed MPDIoU [30] (Table 1).

Table 1 Comparison of the contributions and drawbacks of common YOLO models

3 Methods

3.1 Overview of YOLOv9

YOLOv9 represents a significant advancement in the YOLO family of single-stage object detection algorithms within the field of computer vision. This article explores the problem of the gradual loss of gradient information when there are too many layers in a model, making the gradient of the layers in front of the model unreliable. To address this problem, YOLOv9 revolutionizes the introduction of Programmable Gradient Information (PGI), a novel auxiliary supervision mechanism. This mechanism enables the model to ensure that the initial layer can obtain reliable gradient information for gradient descent through an auxiliary reversible branch. Additionally, since this auxiliary branch is not involved during inference, it does not increase the computational cost during model inference [12].

Due to its network characteristics, YOLOv9 does not have the flexibility to adjust the number of network parameters using depth multipliers and width multipliers, as is common in models such as YOLOv5. The original paper provides two different network structures, YOLOv9-C and YOLOv9-E, with the latter performing better in this task. Therefore, we focus on improving YOLOv9-E. Furthermore, YOLOv9 comes with a rich set of data enhancement methods to address the problem of a small amount of data in the training set, so we do not make changes in the data enhancement section in this paper.

3.2 IoU with Minimum Points Distance

Fig. 2
figure 2

This figure illustrates how the various parameters of the expression are obtained across three different IoUs: a IoU, b DIoU,CIoU, and c MPDIoU

In bounding box regression, the model quantifies the difference between the predicted bounding box and the groundtruth bounding box based on the bounding box loss function. It then gradually brings the predicted box closer to the groundtruth box through gradient descent. Intersection over Union (IoU) quantifies the magnitude of the difference between the predicted bounding box and the groundtruth bounding box by considering the areas where the predicted and groundtruth bounding boxes coincide and do not coincide (Fig. 2)

$$\begin{aligned} IoU=\frac{{{B}_{\textrm{prd}}}\cap {{B}_{\textrm{gt}}}}{{{B}_{\textrm{prd}}}\cup {{B}_{\textrm{gt}}}}. \end{aligned}$$
(1)

However, when the predicted bounding box and the groundtruth bounding box do not overlap at all, the IoU is constantly equal to 0. At this point, the gradient disappears, and the IoU fails to perform gradient descent

$$\begin{aligned} DIoU=IoU-\frac{{{\rho }^{2}}({{B}_{\textrm{gt}}},{{B}_{\textrm{prd}}})}{{{d}^{2}}}. \end{aligned}$$
(2)

DIoU introduced \(\rho ^2(B_{\text {gt}},B_{\text {prd}})\) as a way to represent the Euclidean distance between the centroids of the groundtruth and predicted bounding boxes. As shown by the dotted line above, d denotes the minimum diagonal length of the rectangle enclosing the two boxes. This makes it possible to still perform gradient descent to bring the predicted bounding box progressively closer to the the groundtruth bounding box using the latter term when the IoU fails. However, the latter term is 0 when the centers of the predicted bounding box and the groundtruth bounding box coincide, at which point DIoU degenerates into IoU

$$\begin{aligned} \begin{aligned}&CIoU=DIoU-\alpha V,\\&V=\frac{4}{{{\pi }^{2}}} \left( \arctan {\frac{{{w}^{\textrm{gt}}}}{{{h}^{\textrm{gt}}}}}-\arctan {\frac{{{w}^{\textrm{prd}}}}{{{h}^{\textrm{prd}}}}}\right) ^{2}\\&\alpha =\frac{V}{1-IoU+V}. \end{aligned} \end{aligned}$$
(3)

CIoU adds the \(\alpha V\) term to DIoU, introducing the difference between the proportions of the length and width of the predicted bounding box and the real bounding box into the equation to reduce the possibility of degradation. However, CIoU only considers whether the ratio of the length and width of the two rectangular boxes is the same, not their actual values. When the predicted and real boxes have the same aspect ratio but different sizes, the \(\alpha V\) term is 0. At this point, CIoU degenerates into DIoU.

To address these issues, Minimum Points Distance Intersection over Union (MPDIoU) was proposed [30]. MPDIoU is not an improvement based on CIoU or DIoU. Instead, it directly incorporates the distances of the upper left and lower right points between the predicted and groundtruth boxes into the loss function

$$\begin{aligned} MPDIoU=IoU-\frac{{{d}_{1}^{2}}}{{{d}_{0}^{2}}}-\frac{{{d}_{2}^{2}}}{{{d}_{0}^{2}}}. \end{aligned}$$
(4)

We believe that this method, more than all the above modifications of IoU, is closer to the essence of the loss function-quantifying the difference between the predicted box and the true box. MPDIoU avoids the possibility of degradation or gradient vanishing when the predicted frame does not intersect with the true frame, the aspect ratio is equal, and the centroids coincide. It can effectively quantify the difference between the predicted box and the real frame in any case and make the predicted frame gradually closer to the groundtruth box through gradient descent.

3.3 Faster Block

Fig. 3
figure 3

Comparison of faster block and traditional convolution. As shown in the figure, to replicate the sampling feature requirements of the conventional convolutional layer, PConv handles the task of sampling features in the length and width directions, while PWConv is responsible for sampling in the depth direction

PConv (Partial Convolution), a new type of convolution, is proposed to address the drawback of DWConv (Depthwise Convolution) where frequent memory accesses lead to low computational speed. Compared to traditional Conv, PConv only performs the convolution operation on one part of the tensor, while the rest remains unchanged [31]. As shown in Fig. 3b, c, Faster Block is a combination of PConv and PWConv. It greatly reduces the amount of computation while simulating the traditional convolutional sampling features.

3.4 Multi-scale Dilated Attention

The global attention mechanism is more effective for feature extraction but suffers from a large computational overhead, which makes it difficult to use effectively on high-resolution images. Sliding Window Dilated Attention (SWDA) convolves only \({m\times m}\) small windows at a time, which greatly reduces the amount of computation [32]. Its output \(x_{ij}\) can be expressed as

$$\begin{aligned} \begin{aligned} {x}_{ij}&= {\text {Attention}}({Q}_{ij}, {K}_{r}, {V}_{r}) \\&= {\text {Softmax}}\left( \frac{{Q}_{ij} {K}_{r}^{T}}{\sqrt{{d}_{k}}}\right) {V}_{r} \\&\qquad {(i',j') \text { s.t. } i' = i + p \times r, j' = j + q \times r }\\&\quad -\frac{m}{2} \le p, q \le \frac{m}{2}, \end{aligned} \end{aligned}$$
(5)

where H and W are the height and width of the feature map, respectively. Q is selected in the original feature map (ij). K and V consider only a small portion of the original feature map, which is selected in \((i', j')\) through a sliding window. In this way, we can vary the sparsity of attention by the expansion rate r to obtain different extraction characteristics. Multiple different r’s are selected to compute SWDA separately, and then, they are merged by concatenation before outputting them through the linear layer, which is the Multi-Scale Dilated Attention (MSDA) module that we use (Fig. 4).

Fig. 4
figure 4

Multi-scale dilated attention. Q, K, and V are inputted by the linear layer and then passed through three SWDA layers with different dilation rates of \(r=1\), 2, 3, respectively. These are then concatenated and outputted by the linear layer

3.5 Deformable Large Kernel Attention

Deformable Convolution, a novel convolution operation, was first proposed by [33]. The method adds an integer offset \(\Delta P\) to the sampling position of the standard convolution kernel, achieving free deformation of the convolution kernel for better adaptability

$$\begin{aligned} y(p_{0}) = \sum _{n=1}^{N} w(p_{n}) \cdot x(p_{0} + p_{n} + \Delta p_{n}). \end{aligned}$$
(6)

The offset \(\Delta P\) is obtained by learning during model training without human intervention or setting. Meng-Hao et al. [34] compared the advantages and disadvantages of generating attention maps using both self-attention and Large Kernel Convolution, and subsequently improved Large Kernel Convolution. The improved LKA (Large Kernel Attention) expression is as follows:

$$\begin{aligned} LK\text {-}Attention&= Conv_{1 \times 1}(DW\text {-}D\text {-}Conv(DW\text {-}Conv(F)))\nonumber \\ Output&= LK\text {-}Attention \otimes F . \end{aligned}$$
(7)

LKA avoids the large computational overhead of a large convolution kernel while guaranteeing computational accuracy (Fig. 5).

Fig. 5
figure 5

DLKA and the modules that constitute it. a Samples of different ways of offsetting the variability convolution. b LKA decomposes a large kernel convolution into three parts: depth convolution, depth expansion convolution, and one-dimensional convolution. c DLKA upgrades LKA’s three convolutions to be deformable and incorporates the conv and GeLU modules

Deformable Large Kernel Attention is a combination of the previous two, and it was used by Azad et al. [35] for lesion site recognition in the medical field. In medical image segmentation, the shape of the lesion region boundary is diverse, so the variable convolutional kernel improves the adaptability of model recognition. In defect detection in FPC, the boundaries of oxidation defects and black defects also exhibit similar shape diversity to the boundaries of lesion regions. Due to this similarity, we introduce DLKA into the FPC defect detection algorithm.

3.6 Our Improvement Methods

Fig. 6
figure 6

Our improved network structure based on YOLOv9-E. The blue dotted box in the figure indicates the output header section. Six red lines indicate the outputs that are produced after passing through the output header section

After comparison experiments, YOLOv9-E performs better in the FPC defect recognition task compared to YOLOv9-C. Therefore, we chose YOLOv9-E as a benchmark for model improvement. TPH-YOLOv5 [10] incorporates attention mechanisms into the baseline model at several points, such as where the backbone connects to the neck and at the output header. Inspired by it, we added the attention mechanism at multiple places in the YOLOv9-E network. We incorporated the MSDA module (layers 43, 47, and 49) at the output header and the DLKA module (layer 52) at the other side of the output header. Additionally, we replaced the RepConvN module in RepNCSPELAN4 with the Faster Block module (Fig. 6).

Due to the specificity of the conductive sheet and void boundaries of the FPC connectors, the likelihood of degradation or gradient vanishing in CIoU in bounding box regression is likely to increase compared to other commonly used datasets. To mitigate this, we replaced the loss function of bounding box regression with MDPIoU, eradicating the possibility of gradient vanishing.

4 Experiments

4.1 Experiments Settings

For all our models, we use a single NVIDIA RTX 4090 GPU on an online server for training. We tested them on personal computers using a single NVIDIA RTX 1660 Super GPU. As for the hyperparameters used for training, we utilized the default hyp.yaml file from the YOLOv9 open-source code without any modifications. Training was stopped when no performance improvement was observed in the latest 100 iterations. No pre-training parameters were used for any of the models. As is customary in papers related to the YOLO series of target detection algorithms, we use mAP as the final metric for evaluating the performance of the model. The YOLO series models do not output mAP75 when tested, so we added this metric to diversify the evaluation metrics.

Table 2 Statistics on the number of defects in each category of the dataset used for the experiments
Table 3 Three types of defects merged into black defects
Table 4 Comparison of performance against black defects and oxidation defects before and after model improvement

Examples of the images were captured by the manufacturer’s Automated Visual Inspection equipment, and the defective points were identified by members of our laboratory using marking software. A dataset of defective images, sized 300 \(\times \) 300, was then produced. There are 1898 images in this dataset, out of which we have used 1486 as the training set and 412 as the test set. The FPC connectors shows five types of defects: scratches, oxidation, foreign matter, dirty defect, and missing electrodes. The three classes of defects—foreign matter, dirty defect, and missing electrodes—all showed obvious black shadows in the dataset, so we combined them into the black defect class. This measure somewhat addresses the problem of the small amount of data that exists for all three classes of defects. Due to the small dataset, we use the test set for validation (Tables 2, 3).

4.2 Targeted Testing of Methods

Table 5 Statistical analysis of the YOLOv9 model and our proposed method for Class 1 and Class 2 defects (black defect, oxidation) with regard to statistical significance

Our improvement scheme is proposed based on the poor detection of black defects and oxidation by the original model. Table 4 compares the detection of these two defects in the YOLOv9-E model before and after the improvement. As shown in Table 5, we analyzed the data presented in Table 4 for statistical significance. For Class 1 and Class 2 defects, the Wilcoxon Signed-Rank Test, Sign Test, and Marginal Homogeneity Test all confirm the significance of our improved method compared to the benchmark model (p value < 0.05). It is evident that the improved model has significantly enhanced the detection of black defects and oxidation. We predict that the main factor limiting the accuracy of Black Defect and Oxidation detection is the diversity of boundaries and the minimal difference between the defects and the background. Therefore, this performance improvement can, in a sense, demonstrate that our method is specifically targeted to address the issue of poor detection resulting from the diversity of boundaries and the small differences with the background. This verifies the correctness and relevance of our improvement strategy.

4.3 Comparison with State-of-the-Art

Table 6 Performance comparison with state-of-the-art models for all defect types

Since all the common models perform poorly in this task, it is difficult to meet the needs of practical engineering applications. Therefore, our primary goal in selecting models is to pursue higher accuracy, and the secondary goal is to optimize computational efficiency. For the sake of fairness in comparison, we choose the X model with the largest number of parameters for YOLOv5, YOLOv8, and YOLOv10. As a lightweight model, YOLOv10-X also has fewer parameters than the other models. We set both the depth and width of the network to a larger 1.4, following the scale of YOLOv10-L, calling it YOLOv10-XX, to obtain a parameter count closer to the other models. It is important to note that this change may have somewhat disrupted the model’s inherent balance, resulting in a model that did not necessarily perform optimally (Fig. 7).

Based on Table 6, we plotted the performance of each model against each other. It is evident that the network structure of YOLOv9-E has significantly improved in performance after our enhancements. For all defect categories, mAP50 improves by 3.8%, mAP75 improves by 2.0%, and mAP50:95 improves by 2.3%. Comparing the four milestones of the YOLO series-YOLOv5, YOLOv8, and YOLOv9-C, our model is also significantly superior, given that the number of parameters and the amount of floating-point operations are similar. In the dataset used for this task, the magnitude of model performance improvement by our enhanced method clearly exceeds the performance gain of YOLOv9 over YOLOv8. As an improved model based on YOLOv8, YOLOv10-X achieves similar accuracy to YOLOv8-X with only half the number of parameters. This undoubtedly proves the rationality of the design of YOLOv10 and its great potential for development.

Fig. 7
figure 7

Comparison with mAP50, mAP75, mAP50:95 of each model

4.4 Ablation Studies

Table 7 Performance comparison for all defect types after removing each module

Regarding the effectiveness of the Faster Block module, comparing the models before and after replacing the Faster Block module in YOLOv9-E, the FLOPs decreased from 240.7 to 230.3, and the parameters decreased from 68.6 to 65.1. With the reduction in the number of parameters and the amount of computation, the average accuracy of the three indexes also slightly increased. This justifies the replacement of the Faster Block module in this task (Table 7).

Regarding the effectiveness of the MDPIoU module, adding MDPIoU did not increase the parameters or the FLOPs, but the average accuracy saw a small gain. The model achieved a small gain at zero cost. This proves that MDPIoU is indeed more suitable for defect detection in FPC connectors than the CIoU used by default in YOLOv9.

Regarding the effectiveness of the MSDA module, the addition of the MSDA module resulted in an increase in parameters from 65.1 to 67.4 and an increase in FLOPs by 7.6. However, mAP75 and mAP50:95 still increased by 1.9 and 1.7, respectively, on top of the previous two improvements. The MSDA module’s gain in performance is significant, and we believe that these computational gains are worth the price of admission.

Regarding the effectiveness of the DLKA module, the DLKA module increases the parameters by 1.8 and the floating-point operations by 22.3. However, the module does result in a further increase in average accuracy. We retain it for practical engineering reasons where there is a high demand for system reliability and detection accuracy. If it needs to be deployed on devices with poorer computational power, it can be handled flexibly, considering removing it (Fig. 8).

4.5 Case Study

Fig. 8
figure 8

Comparison of the detection effectiveness of YOLOv5-X, YOLOv9-E, and our model with the threshold set to 0.25. It is evident that for small targets and defects that are easily confused with the background, the detection performance of our model is better than that of the former two

Observing Fig. 9, it can be seen that in this system, the detection box are bounded only by the boundaries of the conductive sheet, and defects spanning more than one conductive sheet rarely occur. Since defects in circuit boards often span several conductive sheets, a defect is usually marked by multiple detection box. Let the defect be marked by n boxes, and the probability that the ith prediction box is successfully marked is \(P(Box_{i})\). Then, the probability that the defect is marked by at least one prediction box can be expressed as

$$\begin{aligned} P_{\text {existbox}} = 1 - \prod _{i=1}^{n} (1 - P(Box_{i})). \end{aligned}$$
(8)

This board with defects passes the quality inspection when and only when n marker boxes fail simultaneously. In addition, since the marked box are relatively independent between different conductive sheets, we also consider the probability of each box to be relatively independent. According to Eq. (8), the probability of multiple box failing at the same time is very low. Therefore, we believe that the actual reliability of the model is much higher than that reflected in the accuracy data.

Fig. 9
figure 9

Example of testing. As shown in this figure, our model demonstrates effective detection results for various defects that are easily confused with the background, as well as for defects with complex shapes. Additionally, the detection bounding box for most defects is accurately constrained by the boundary of the conductive sheet

5 Conclusion

Defect detection in Flexible Printed Circuit (FPC) connectors has inherent problems such as complex boundaries of black defects and oxide defects, minimal differences between oxide defects and background, and easy degradation of bounding boxes. Experimental results show that these measures significantly improve the model’s ability to recognize oxidation defects and black defects, leading to an improvement in the average accuracy of all defects. Our method outperforms the existing models and is expected to contribute to the automation and intelligence of FPC production.

When used in practice, the model can be used alone or as a preliminary screen for manual inspection. When applying this model in practice, the IoU threshold can be adjusted lower during detection to ensure recall and reduce the possibility of defective FPCs passing quality inspection. Since a defect is often marked by multiple boxes, the model has high reliability.