Introduction

Printed circuit boards (PCBs) constitute a crucial part of numerous electronic products, significantly reducing wiring and assembly errors while optimizing space utilization. In the production process of PCBs, the soldering of electronic components is an extremely important part, and the quality of the solder joints directly affects the quality of electronic products. Recent advancements in production processes and technology have led to PCBs becoming more miniaturized and complex, thereby increasing the likelihood of soldering defects such as leakage welding, less tin, even tin, tip, hole, and other defects. Therefore, it is necessary to detect defects in PCB solder joints.

Initially, human inspection was the primary method for detecting solder joints, but it was inefficient, prone to human error, and influenced by emotions and perceptions1. Compared to human observation methods, machine vision-based defect detection is not only independent from human factors, but also able to reach a high level of detection accuracy2. Traditional vision-based detection methods3 typically involve preprocessing steps to remove image noise and then extract and identify defective features from the processed images. For instance, Szymanski et al.4 used SIFT (Scale Invariant Feature Transformation)5, which has the feature of scale invariant, to detect solder joint defects using PCB solder joint shapes as the detection features. But the detection efficiency was not high enough, and the whole model lacks real-time detection capability. On the basis of Szymanski, Dai et al.6 combined the PSO particle swarm optimization algorithm with SIFT to improve detection accuracy by adding key points for solder joint identification. Nevertheless, this also suffered from inefficiency. Huang et al.7 developed a multi-graphic simultaneous solder joint segmentation algorithm that uses geometric neighborhood features to extract and locate the best PCB solder joint region. It is superior to the traditional multi-threshold solder joint extraction method in detection and positioning accuracy. Raihan et al.8 employed a multi-threshold segmentation method for defect recognition in order to avoid the effect of illumination on the recognition results in single-threshold segmentation, but it shows low detection accuracy in images with large resolution. Kumar et al.9 segmented PCB solder pads based on the distinct color of copper against varying background. Baygin et al.10 combined Canny and Hough to segment the defects of PCB solder joints, thereby improving the efficiency of PCB pad detection. Li et al.11 used MSR algorithm to process PCB images, and then constructed a neural network model based on PCB solder joint defects using BP neural network. However, BP neural network is only a single feedforward network without memory function, the calculation performance is bad, and the understanding ability of features is not good enough, and ultimately, low accuracy.

In defect detection, traditional machine vision approaches rely on features of the defects themselves that are easy to identify, such as texture information, color features, detection target background, foreground threshold differences, and other features. While these methods yield higher detection accuracy compared to observational techniques, they require manually design features for specific detection target. The approaches are heavily dependent on template, segmentation threshold, color information, texture, and other features. So the model is less robust.

In recent years, as an important research method in machine vision, deep learning-based target detection has been widely used in fields such as autonomous driving, image classification, face detection, visual search, target tracking and detection, and medical diagnosis. Deep learning is a deep neural network structure with multiple convolutional layers. By learning the features of input data, the underlying features are abstracted into higher-level semantic features, and attribute categories or features of data are expressed through vectors, feature maps, etc.12, so as to improve the effect of deep learning algorithms. Since deep learning can obtain powerful learning ability and feature extraction ability through large amount of data, many researchers explored the application of deep learning in product defect detection to improve the quality of products13. Liang et al.14 proposed a plastic detection model that addresses sample imbalance and small target detection problems. By using data augmentation methods to obtain a large dataset, and using ShuffleNetV2 as the backbone network to, the model achieved noteworthy results. Li et al.15 employed network cascade, took YOLOv2 as the foreground detector, and employed improved ResetNet101 as the classification network, leading to improved detection accuracy. But the detection efficiency was low, with 15 s per the single detection. Li et al.16 designed a fast detection network for PCB components by using the YOLOv3.

In summary, deep learning-based target detection has been successfully applied in various fields. Compared with traditional vision detection methods, deep learning does not require manually designed features, and the model is robust, with high detection accuracy and good generalization ability. YOLO series algorithm is the most classic detection algorithm in deep learning target detection algorithms. Detection network based on YOLO series drops the candidate frame extraction branch and directly implements feature extraction, candidate frame classification, and regression in the same branchless deep convolutional network, which simplifies the network structure and speeds the detection. The YOLO series algorithms can fully meet the requirement of real-time detection on the factory assembly line and also has high detection accuracy.

PCB component soldering is a production technology requiring high precision and strict standards. During the soldering process of components, various factors such as process, materials, design, and environment can all affect the quality of solder joints. Typically, issues with solder joint quality can be identified through the appearance of the joints. We conduct a study on five major types of solder joint defects that seriously impact circuit performance: (a) Less tin, referring to insufficient solder which may lead to unstable welding and open circuits, often caused by excessively large solder pad or oxidation; (b)Tip, caused by excessive welding time or improper withdrawal direction of the welding head, resulting in a tip that can easily cause short circuits; (c) Hole, where the solder joint fails to completely wrap around the pins and solder pads due to oil stains or excessively high welding temperatures; (d) Even tin, which occurs due to overly close solder joints, excessive solder, or offset component positions, commonly seen in densely designed PCBs, leading to short circuits; (e) Leakage welding, primarily occurring in manual welding where some solder joints are not welded due to visual fatigue, severely affecting electrical performance. Figure 1 illustrates these five types of defects. Table 1 illustrates the visual characteristics of these defects.

Figure 1
figure 1

Defects of PCB soldering.

Table 1 The visual characteristics of PCB soldering defects.

The main difficulties in detecting defects in PCB plug-in soldering points lie in: (1) Real-time. In the production process, PCBs move rapidly on the assembly line, so the detection must be fast enough; (2) soldering points exhibit multi-scale and dense small target features in images. If existing models are directly applied to production, dense small solder joints and small-scale solder joints are prone to being missed or mis-detected, and the detection accuracy cannot meet requirements. Therefore, it is particularly important to explore faster and more accurate detection methods for PCB plug-in solder joint defects. Using YOLOv3 network as the basic framework, this paper proposes a PCB plug-in solder joint defect detection method based on spatial convolutional pooling and information fusion. The main contributions of this paper are as follows:

  1. 1.

    For the problem that FPN in YOLOv3 networks fuses feature by up sampling directly without considering the feature differences between contextual higher-level semantics, a pyramid structure based on attention-guided fusion of contextual feature information is proposed. Three dilated convolutions with different dilated rates (6, 12 and 18) are used with a 1 × 1 convolution kernel to explore more semantic information in the upper layers, and then the fused feature channels are calibrated using a coordinated attention network to reduce the impact of the fused redundant parameters in the model.

  2. 2.

    The ASPP (Atrous Spatial Pyramid Pooling) structure is introduced to the original Darknet53 network structure. The multi-scale feature information of the detection target is obtained using ASPP, which enhances the backbone network feature extraction capability.

The remainder of this article is organized as follows: The second section mainly introduces related work, including the network structure and loss function of the YOLOv3 detection network; The third section describes the improvement of the network in detail; The fourth section focuses on corresponding experimental validation of the improved network; The last section concludes the research of this article.

Related work

YOLOv3 network

The algorithms of YOLO series are deep learning algorithms for multi-object detection proposed by Redmon et al. in 201617. In the following two years, YOLOv2 and YOLOv3 were released. YOLO algorithm is a typical one-stage based target detection algorithm, which combines classification and target regression problems with anchors to achieve efficient and flexible detection18. At the same time, the feature extraction network of YOLO algorithm can be replaced by many other networks, so it has become more and more popular among researchers in the field of engineering. In this section, the network structure of YOLOv3 is described.

As shown in Fig. 2, The YOLOv3 network structure is mainly divided into two parts: the backbone network (Darknet53) for feature extraction; the detection network (FPN + Head) for classification and recognition. The backbone network of YOLOv3 consists of 5 convolutional layers at different scales, each of which is connected using a residual network, and each layer makes a different number of residuals. From input to output, the number of residuals made at each scale is 1, 2, 8, 8, and 4, respectively.

Figure 2
figure 2

Schematic diagram of YOLOv3 network structure.

YOLOv3 predicts objects in multiple scales, the structure of the network is similar to image pyramid19, the last three feature layers of the backbone network that contain high-level semantic feature information are utilized. In order to obtain more feature of small objects, which leads to improvement the detection performance on small objects, the YOLOv3 detection algorithm fuses the features at 1/32,1/16,1/8 in a top-down order, and after fusion the predicted targets are output in 3 scales: 13 × 13, 26 × 26 and 52 × 52 respectively, and each scale needs to predict the length and width of the target, the coordinates of the center and the category.

The difference between the predicted value and the true value of the model is defined the training error. In the training process of the network model, the error back propagation idea of the neural network is used to continuously adjust the weights of the model until the training error reaches a minimum value and the training stops. The performance of the model is determined by the degree of convergence of the loss function.

$$\begin{aligned} Loss & = \lambda_{{{\text{coord}}}} \mathop \sum \limits_{i = 0}^{{S^{2} }} \mathop \sum \limits_{j = 0}^{B} {\text{I}}_{ij}^{{{\text{obj}}}} \left[ {\left( {x_{i} - \hat{x}_{i} } \right)^{2} + \left( {y_{i} - \hat{y}_{i} } \right)^{2} } \right] \\ & \quad + \lambda_{{{\text{coord}}}} \mathop \sum \limits_{i = 0}^{{S^{2} }} \mathop \sum \limits_{j = 0}^{B} \left[ {\left( {\sqrt {w_{i} } - \sqrt {\hat{w}_{i} } } \right)^{2} + \left( {\sqrt {h_{i} } - \sqrt {\hat{h}_{i} } } \right)^{2} } \right] + \mathop \sum \limits_{i = 0}^{{S^{2} }} \mathop \sum \limits_{j = 0}^{B} {\text{I}}_{ij}^{{{\text{obj}}}} \left( {C_{i} - \hat{C}_{i} } \right)^{2} \\ & \quad + \lambda_{{{\text{noobj}}}} \mathop \sum \limits_{i = 0}^{{S^{2} }} \mathop \sum \limits_{j = 0}^{B} {\text{I}}_{ij}^{{{\text{noobj}}}} \left( {C_{i} - \hat{C}_{i} } \right)^{2} + \mathop \sum \limits_{i = 0}^{{S^{2} }} {\text{I}}_{i}^{{{\text{obj}}}} \mathop \sum \limits_{{c\epsilon {\text{classes}}}} \left( {p_{i} \left( c \right) - \hat{p}_{i} \left( c \right)} \right)^{2} \\ \end{aligned}$$
(1)

Equation (1) is the YOLOv3 loss function, consisting of four parts: coordinate loss, length and width loss of prediction box, confidence loss and classification loss, where the coordinate loss and the length and width loss of prediction box are collectively referred to as border loss. \({\lambda }_{\text{coord}}\) and \({\lambda }_{\text{noobj}}\) denote the weight value of coordinate and confidence. In the loss function, the coordinate position contributes the most, so \({\lambda }_{\text{coord}}\) is 5, and the confidence contributes least, so \({\lambda }_{\text{noobj}}\) is 0.5; \({\text{I}}_{ij}^{\text{obj}}\) determines that whether the \(j\) th box in the \(i\) th grid is responsible for the current target, which takes the value of 1 or 0; \({\text{I}}_{ij}^{\text{noobj}}\) is just the opposite of \({\text{I}}_{ij}^{\text{obj}}\); \({x}_{i}\), \({y}_{i}\), \({w}_{i}\), \({h}_{i}\) denote the predicted coordinate offset values; \({\widehat{x}}_{i}\), \({\widehat{y}}_{i}\), \({\widehat{w}}_{i}\), \({\widehat{h}}_{i}\) denote the actual coordinate offset values; \({C}_{i}\) denotes the prediction confidence score; \({\widehat{C}}_{i}\) denotes the true confidence score; \({p}_{i}\) denotes the forecast category; \({\widehat{p}}_{i}\) denotes the true category.

Dilated convolution

For feature information in different scales, the detection performance is affected by the receptive field of the network, and the different sizes of the receptive field help to distinguish complex background information, so the information perception field is expanded using the dilated convolution in this paper. Dilated convolution20 was first proposed for image segmentation and is able to increase the receptive field of the convolutional layer with the same cost of computation, but without additional computational cost to the whole network computation. In the computation, zeros are inserted in the middle of the convolution kernel. Take 7 × 7 dilated convolution for example, in fact, 4 zero are added to the convolution kernel of 3 × 3 to expand the receptive field, as shown in Fig. 3.

Figure 3
figure 3

Dilated convolution receptive field.

From Fig. 3, it can be seen that the actual training parameters of the dilated convolution of 7 × 7 are the same as those of the convolution of 3 × 3. Considering that the dilated convolution can increase the convolution field and obtain more feature information without increasing the computational cost, in this paper, the FPN of YOLOv321 is used to explore more feature information in the upper layer network by combining several different expansion rates of the dilated convolution to reduce the inter-layer differences of different semantic information in the FPN and achieve PCB plug-in solder joint defect detection.

Improved YOLOv3 detection algorithm

PCB plug-in solder joints vary in size. When it comes to detecting smaller solder joints, as the number of convolution layer increases, the distinctiveness of features pertaining to these joints diminishes, resulting in a reduced feature size. Consequently, there is a heightened risk of missing such small solder joints during detection. The traditional YOLOv3 target detection algorithm attempts to enhance the detection of small targets by employing up-sampling and channel concatenation. This approach fuses the last three high-level semantic feature layers using simple top-down convolutions and pooling, thereby combining semantic information from different layers. However, it fails to account for the inherent differences between high-level and low-level features. Merely up-sampling and reducing feature channels independently for features at various levels, without considering the significant semantic disparities among them, diminishes the algorithm’s ability to represent features across multiple scales.

Inspired by22, this paper proposes a novel pyramid structure for feature fusion, taking into account that contextual feature information can enhance the model’s ability to distinguish between normal and abnormal patterns, ultimately improving the accuracy of defect recognition. This pyramid structure facilitates attention-guided fusion of contextual feature information. Moreover, we introduce a CA (coordinated attention) network within the pyramid to eliminate redundant parameters after feature fusion, thus enhancing the precision of solder joint detection. Additionally, integrating multi-scale contextual information enables the detection network to extract features across different scales and perspectives, thereby strengthening the model’s robustness and ensuring stability across various PCB types and production conditions. Consequently, we incorporate the ASPP (Atrous Spatial Pyramid Pooling) structure into the backbone network to capture richer contextual feature information. Figure 4 illustrates the detection framework of our proposed method.

Figure 4
figure 4

Schematic diagram of the improved network structure.

Contextual information fusion pyramid structure

To detect defects in small solder joints, a pyramid structure is proposed based on attention-guided contextual feature information fusion23. As shown in Fig. 5, we first utilize three convolution blocks with different dilation rates, along with a standard 1 × 1 convolution, to capture feature information from varying receptive fields. After obtaining the feature information from four distinct receptive fields, we apply a 1 × 1 convolution block to reduce the dimension of the fused features. This approach ensures that there is minimal variability between the up-sampled feature map and the upper-level feature map.

Figure 5
figure 5

Contextual information fusion pyramid structure.

When fusing multi-dimensional features, indiscriminately combining multiple dilated convolutional feature information can lead to feature parameter redundancy. Excessive irrelevant feature information can degrade the detection performance of the algorithm. Therefore, we have implemented a CA network to calibrate the downscaled semantic feature information after fusion. This allows to emphasize important features and increase the accuracy of solder joint detection. The fundamental purpose of the attention mechanism network is to select the most crucial data information from vast amounts of data and focus the network’s attention on these screened-out important details24.

The CA structure diagram is shown in Fig. 6. the high-level feature information \({f}^{h}\in {\mathbb{R}}^{W\times H\times C}\) produced by upper-level fusion is expanded as \({f}^{h}\in [{f}_{1}^{h},{f}_{2}^{h},{f}_{3}^{h},\dots ,{f}_{c}^{h} ]\), where \({f}_{i}^{h}\in {\mathbb{R}}^{W\times H}\) is the \(i\) th feature map and \(C\) denotes the total number of feature channels. First, a global pooling operation is used for each feature map to obtain two one-dimensional feature vectors. Specifically, given the inputs X and Y, the input features in the vertical and horizontal directions are first aggregated into two independent direction-aware feature maps using the pooling kernel of size (H, 1) or (1, W) respectively; Then, these two feature maps embedded with direction-specific information are encoded into two attention maps, each capturing the dependencies of the input feature maps along one spatial direction. That is, the two feature maps generated by the previous module are cascaded first, then use a shared 1 × 1 convolution to transform to generate an intermediate feature map with spatial information in the horizontal and vertical directions, and finally the generated feature maps are sliced, convolved, and normalized to finally generate attention weights.

Figure 6
figure 6

The Structure diagram of coordinated attention network.

Spatial convolution-based pooling backbone feature extraction network

By integrating the ASPP (Atrous Spatial Pyramid Pooling) module into the network backbone, the model aims to enhance the integration of contextual information and improve the ability to extract multi-scale features, thereby effectively detecting various defects across different scales and complex backgrounds. ASPP achieves this through parallel application of dilated convolutions with varying dilation rates and global average pooling, which simultaneously maintains high resolution while enlarging the receptive field. This addresses the imbalance issue between receptive field and resolution in traditional convolutional networks. Without notably increasing computational complexity, the ASPP module enhances the robustness of the network’s feature extraction and defect detection capabilities. As illustrated in Fig. 7, the ASPP network architecture comprises multiple branches, each with distinct convolution kernels for independent feature extraction. The input feature maps are identical for each branch, ensuring that the module can capture feature information of PCB plug-in solder joints across various scales within the same feature map.

Figure 7
figure 7

The ASPP network structure diagram.

Experimental results and discussion

PCB data set

Currently, to our knowledge, there is no readily available open PCB dataset specifically designed for solder joint defect detection. So, we produced a PCB image dataset containing less tin, tip, hole, even tin, leakage welding. These original defect images were captured using a Hikvision industrial matrix camera (MV-CE200-11UC) on a commercial PCB production line, with resolution of 2448 × 2048 pixels. The dataset comprises over 300 defect images, each containing multiple defect area, and Table 2 details the amount of each defect. Considering the huge labor cost on collection and annotation of data set, we performed data augment including cropping, rotation, and blurring25 on the original PCB defect images, which expands the number of samples and enhances the diversity of the dataset.

Table 2 Amount of each defect.

Evaluation metrics

To evaluate the model accurately, several most common evaluation metrics of target detection are used in this paper to evaluate the model: recall, precision and F1. The calculation formula are as follows:

$$R=\frac{TP}{TP+FN}$$
(2)
$$P=\frac{TP}{TP+FP}$$
(3)
$$F1=2\times \frac{precision\times recall}{precision+recall}$$
(4)

R is recall, P is model precision, F1 is harmonic mean of precision and recall. Higher R, P, F1 indicate better model performance. TP denotes True Positive, FP denotes False Positive, FN stands for False Negative, and TN denotes True Negative.

In model evaluation, it is not intuitive to use precision and recall alone for model evaluation. Because the recall rate may be low when the precision rate is high, and the recall rate is often high when the precision rate is low. Therefore, it is necessary to evaluate the model for each type of precision, and the expression is given in (5).

$$AP={\int }_{0}^{1}P\left(R\right)dR$$
(5)

In the case of multi-category detection, mAP indicates the average AP of all categories. A higher mAP indicates better performance of the model.

$$mAP=\frac{\sum_{i=1}^{n}A{P}_{i}}{n}$$
(6)

where n denotes the number of categories of the detection target.

Experimental environment

Our model is implemented in Python and the framework is based on Pytorch. The computer is configured with windows10, Intel(R) Core(TM) i5-9400f. CPU@ 2.90 GHz, NVIDAI 1080Ti with 11G memory.

Ablation study

To validate the effectiveness of each improved part of the algorithm, ablation experiments are conducted. In ablation experiments, we apply only a single improved part to the original YOLOv3 network each time and observe the performance improvement to verify the impact of the improved part. The experimental networks are: original YOLOv3, YOLOv3 with Proposed FPN (contextual information fusion pyramid structure), and YOLOv3 with ASPP. The parameters of all three comparative models remain consistent with the original model to ensure fairness and comparability in the experimental results. And the networks are all trained without pre-trained weights. The results are shown in Tables 3, 4 and 5. The indicators where the improved algorithm outperforms the original algorithm are highlighted in bold.

Table 3 Evaluation metrics comparison between YOLOv3 and YOLOv3 with Proposed FPN.
Table 4 Evaluation metrics comparison between YOLOv3 and YOLOv3 with ASSP.
Table 5 mAP of original YOLOv3, with Proposed FPN, and with ASSP.

Tables 3 and 4 show the R, P, AP, and F1 of the three comparing networks. In terms of the R, P, AP, and F1, both the proposed FPN and ASPP demonstrate significant improvements in the detection of all five types of defects compared to the original YOLO algorithm in most cases.

Table 5 shows the mAP of the three comparing networks. From Table 5, it can be seen that the YOLOv3 with the proposed pyramid structure based on contextual feature information fusion in this paper improves the mAP from 84.35% to 87.04%. And the mAP of YOLOv3 with the ASPP structure reached 86.62%.

Therefore, the results of the ablation study indicate that the two improvement are effective and can independently yield positive impacts on the original YOLOv3 model.

Experimental comparison of the improved algorithm with other algorithms

The improved algorithm is compared with the classical detection algorithms two-stage Faster-RCNN26 and one-stage SSD27, as well as the latest state-of-the-art algorithms YOLOv428 and YOLOv5-L29. For the latter two, YOLOv4 and YOLOv5-L, the hyper parameters are consistent with their original papers. Faster-RCNN employs ResNet50 as its backbone feature extraction network, the other components remain unchanged. The SSD is consistent with original network.

Prior to network training, both Faster-RCNN and SSD utilize the COCO30 dataset to obtain pre-trained weights, which are then loaded during the training of the PCB dataset. To minimize the impact of experimental environments and parameters, the batch sizes of all models are set to 8 when loading the pre-trained weights, and are set to 4 when loading the PCB dataset. The network learning rates are all set at 0.0001.

Table 6 presents the AP of five detection methods on five PCB component soldering joint defects. Overall, the improved detection network presented in this paper significantly outperforms the Faster-RCNN and SSD, in terms of detection accuracy AP. While slightly inferior to SSD in the less tin, it remains almost comparable. Table 7 demonstrates the mAP and FPS of our improved method against other algorithms. It can be seen that mAP of the improved model reaches 96.43%, surpassing SSD’s 95.18%, Faster-RCNN’s 92.06%. In Tables 6 and 7, the numbers in bold indicate the best method in terms of each defect.

Table 6 Comparison of AP among different methods (%).
Table 7 Comparison of mAP and FPS among different methods.

In terms of mAP, our results exhibit only a slight edge over YOLOv4 and YOLOv5-L. However, considering that our work is based on the relatively older YOLOv3, this is acceptable. We have elevated the performance of YOLOv3 to the same level as the latest state-of-the-art algorithms, which validates the effectiveness of our work.

Regarding speed, FPS of the proposed model stands at 40.8, which is lower than SSD, YOLOv4, and YOLOv5-L. While this speed is sufficient to meet most industrial real-time detection requirements, it remains an area worthy of further exploration in future work.

Figure 8 shows the detection samples of 5 different detection methods on the 3 PCB images (A, B, and C). The yellow arrows pointing to solder joints indicate detection error. The total errors in all 3 samples for each detection method is: 7, 12, 2, 4, and 2. Although the test results based on a small number of samples cannot absolutely represent the performance of the algorithm on the entire data set, they are generally consistent with our validation results.

Figure 8
figure 8

Samples of detection.

Conclusion

Existing PCB solder joint defect detection algorithms struggle to satisfy the concurrent demands of high accuracy, low false alarm rate, and high speed. To address this challenge, this paper proposes a novel PCB solder joint defect detection method, leveraging the speed of the YOLOv3 algorithm and integrating spatial pyramid pooling with information fusion techniques. Firstly, to mitigate the limitation of the original YOLOv3 network, which only extracts single-scale feature information from the same convolution block, ASPP (Atrous Spatial Pyramid Pooling) is introduced into the Darknet53 backbone feature extraction network to capture multi-scale feature information of the detection target. Secondly, while the original YOLOv3 fuses three high-level semantic feature layers through up-sampling and channel concatenation, the differences between high- and low-level features are ignored. To address this, an attention-guided pyramid structure for contextual information fusion is proposed, employing multiple dilated convolutions of varying sizes to enrich high-level semantic information. Lastly, a coordinated attention network structure is implemented to refine the fused pyramid feature information, emphasizing relevant fused feature channels and minimizing redundant network parameters. When compared to the original YOLOv3 network, which achieved detection accuracy of 94.45%, the proposed network boasts an average detection accuracy of 96.43%, surpassing classical detection algorithms Faster-RCNN and SSD, reaching comparative level as the latest state-of-the-art algorithms YOLOv4 and YOLOv5-L. Extensive experiments demonstrate that the enhanced YOLOv3 algorithm offers higher accuracy in detecting PCB solder joints defects.

Based on the experimental results, the improved algorithm rivals the performance of some of the latest detection models. So, in our future study, we are about to apply our approach to these new state-of-the-art networks, seeking to further enhance the accuracy and speed of model detection.

Although the dataset is sufficient to obtain expected experimental results through data augmentation, it lacks feature diversity, making it difficult to better demonstrate generalization. Therefore, in future work, we will gradually address this issue.

Furthermore, while the proposed algorithm can detect defects of PCB solder joints efficiently and precisely, it requires significant computational resources. In industrial production, smaller equipment sizes and reduced computational demands are favorable for enterprises to minimize production costs. Therefore, our future research will focus on exploring how to detect PCB defects within the constraints of limited hardware resources.