Open AccessArticle

Diagnostic Model for Transformer Core Loosening Faults Based on the Gram Angle Field and Multi-Head Attention Mechanism

State Key Laboratory of Electrical Insulation and Power Equipment, Xi’an Jiaotong University, Xi’an 710049, China

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 10906; https://doi.org/10.3390/app142310906

Submission received: 7 October 2024 / Revised: 13 November 2024 / Accepted: 22 November 2024 / Published: 25 November 2024

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Figure 1
GAF coding schematic diagram. "> Figure 2
Residual module. "> Figure 3
The GAF-ResNet-MA diagnostic model. "> Figure 4
The GAF–ResNet–MA diagnostic process. "> Figure 5
The transformer vibration test platform. "> Figure 6
Time-domain waveforms and spectra before and after reconstruction. (a) Time-domain signal before reconstruction; (b) Time−domain signal after reconstruction; (c) Spectral signal before reconstruction; (d) Spectral signal after reconstruction. "> Figure 7
GASF and GADF images of iron-core loosening at different points. (a) GASF under normal operation; (b) GADF under normal operation; (c) GASF with loose yoke in phase A; (d) GADF with loose yoke in phase A; (e) GASF with loose lower yoke in phase A; (f) GADF with loose lower yoke in phase A. "> Figure 8
t-SNE visualization. "> Figure 9
Confusion matrix. "> Figure 10
Fault recognition accuracy of different models. ">

Versions Notes

Abstract

Aiming to address the problems of difficulty in selecting characteristic quantities and the reliance on manual experience in the diagnosis of transformer core loosening faults, a diagnosis method for transformer core looseness based on the Gram angle field (GAF), residual network (ResNet), and multi-head attention mechanism (MA) is proposed. This method automatically learns effective fault features directly from GAF images without the need for manual feature extraction. Firstly, the vibration signal is denoised using ensemble empirical mode decomposition (EEMD), and the one-dimensional temporal signal is converted into a two-dimensional image using Gram angle field to generate an image dataset. Subsequently, the image set is input into ResNet to train the model, and the output of ResNet is weighted and summed using a multi-head attention module to obtain the deep feature representation of the image signal. Finally, the classification probabilities of different iron-core loosening states of the transformer are output through fully connected layers and Softmax layers. The experimental results show that the diagnostic model proposed in this paper has an accuracy of 99.52% in identifying loose iron cores in transformers, and can effectively identify loose iron cores in different positions. It is suitable for the identification and diagnosis of loose iron cores in transformers. Compared with traditional methods, this method has better fault classification performance and noise resistance.

Keywords:

transformer vibration; loose iron core; Gram angle field; multi-head attention mechanism; fault diagnosis

1. Introduction

In the power system, transformers are critical devices that convert electrical energy from one voltage level to another, playing an indispensable role in the transmission and distribution of electrical energy. The normal operation of transformers is crucial for ensuring the stable operation of the power grid and the quality of power supply [1,2,3]. However, due to the long-term operation of transformers in harsh environments, such as high temperature and high voltage, as well as the influence of load changes, electrical faults, and other factors, transformers may have various potential fault hazards. Unpredictable accidents can cause serious social impacts. Therefore, strengthening transformer condition monitoring and fault diagnosis is of great significance.

The common faults of transformers mainly include mechanical faults, insulation faults, and overheating faults. Among them, electrical faults caused by mechanical faults occur most frequently, with iron-core faults accounting for a large proportion of these [4]. According to statistical research, iron-core faults are mostly caused by mechanical structural problems in the initial stage, and as the degree of the fault deepens, they pose a threat of damage to the clamping force and insulation of the iron core, and even lead to serious electrical faults, such as the multi-point grounding of the iron core [5]. How to detect the operating status of transformer iron cores and diagnose faults, as well as detect internal faults and potential safety hazards in transformers early to avoid accidents, has gradually become a focus of research for scholars at home and abroad.

The main methods for detecting the state of transformer iron cores include offline detection and online detection [6]. Offline detection involves performing maintenance after the transformer has stopped, such as testing insulation resistance. However, power outage maintenance can cause significant economic losses and result in high costs. Online detection involves detecting the operating status of the transformer through real-time monitoring, without the need to shut down the transformer. At present, the monitoring methods for transformer core faults mainly include gas chromatography analysis and measuring the presence or absence of a current in the grounding wire. These methods focus more on judging the core faults of large oil-immersed transformers, but the overall structure of dry-type transformers is significantly different from that of oil-immersed transformers, and many judgment methods are not fully applicable. Compared with many traditional methods, transformer state monitoring based on vibration signals has no electrical connection to the electrical system as a whole and will not have any impact on the normal operation of the power system. It also has strong anti-interference ability and high sensitivity, and can quickly, safely, and reliably monitor the operating status of transformers, diagnose faults, and better ensure the safe and reliable operation of transformers [7,8].

At present, vibration-signal diagnosis often uses machine learning methods such as support vector machines and K-means clustering for fault identification and classification. For example, Wang et al. [9] used wavelet packet transform to analyze the vibration signals of transformer windings in different states. The energy spectrum entropy of the vibration signal was used as the feature input vector, and an improved multi-class support vector machine was used to train and test the feature vectors, achieving the classification diagnosis of transformer windings in different states. Reference [10] proposed two effective feature extraction techniques and used corresponding feature vectors to train SVM classifiers, demonstrating the effectiveness of this method through experiments. Xue et al. [11] used a variational mode decomposition algorithm to obtain vibration fault signal components and calculated single-component multi-scale entropy to form feature vectors, further optimizing the features for fault identification. Zhang et al. [12] defined the vibration characteristics of high-voltage circuit breakers based on the integrated empirical mode decomposition method and diagnosed mechanical faults of high-voltage circuit breakers using support vector machines. The above methods have achieved satisfactory results in fault diagnosis, feature extraction, and state prediction. However, such methods often rely on manual means to extract features and expert experience, and are often shallow learning-based, lacking the ability to process large-scale data.

In order to overcome the influence of manual extraction methods on the results, some deep learning algorithms have begun to play a role in transformer fault diagnosis, among which convolutional neural networks (CNN) have received increasing attention for their adaptive feature extraction ability. Wen et al. [13] proposed a novel convolutional neural network based on LeNet-5 for fault diagnosis. This method converts signals into two-dimensional images and extracts features from the converted two-dimensional images, eliminating the influence of manual feature extraction. Shi et al. [14] first converts the fault vibration signal into a two-dimensional image form through recursive encoding, and then uses a deep separation convolutional neural network (DSD-CNN) to extract features from these images, ultimately achieving accurate fault diagnosis. Reference [15] proposes a new deep CNN transformer model that can automatically detect the type, phase, and location of faults in power lines. The deep residual network (ResNet), as an improved version of a convolutional neural network, can effectively solve the problems of time gradient explosion and network degradation caused by the increase in network layers, and is also widely used in transformer fault diagnosis. Li et al. [16] decompose the vibration signals of the contactor into intrinsic modal components through variational mode decomposition and then convert them into modal time–frequency diagrams, which are input into ResNet50 for fault diagnosis. Lan et al. [17] convert one-dimensional perturbation signals into two-dimensional trajectory circles with distinct shape features and input them into ResNet for feature learning and classification recognition. Yan et al. [18] used Markov transform fields to convert one-dimensional signals into two-dimensional images and established a deep residual network for feature extraction. Li et al. [19] proposed a multi-bolt loosening identification method based on a time–frequency graph and ResNet50, and verified through experiments that the method can identify fault types with reasonable accuracy, computational efficiency, and robustness.

Although deep learning methods have achieved significant results in feature extraction and pattern recognition, existing deep learning methods still face some challenges in the field of transformer fault diagnosis. Especially in terms of feature extraction, traditional methods often struggle to capture weak features and complex changes in fault signals. In addition, the diagnostic performance of existing methods in noisy environments is not satisfactory, and they are prone to interference and misdiagnosis. Therefore, how to develop an accurate, efficient, and noise-robust transformer fault diagnosis method has become a hot and difficult research topic.

In order to overcome the shortcomings of existing deep learning methods in transformer fault diagnosis, this paper proposes a transformer fault diagnosis method based on the residual network, a multi-head attention mechanism, and ensemble empirical mode decomposition (EEMD). This method first uses EEMD to denoise the original fault signal and extract purer signal components. Then, based on the residual network, we introduce a multi-head attention mechanism to enhance the model’s ability to capture key features. By combining these two techniques, we aim to improve the model’s ability to recognize complex and variable fault signals, thereby further enhancing the accuracy and robustness of diagnosis. The specific contributions and novelty of this article are as follows.

(1): This paper uses EEMD to denoise vibration signals, resulting in vibration signals with higher signal-to-noise ratios.
(2): Adding a multi-head attention mechanism to the residual network can further enhance the feature representation ability of the model, enabling it to learn deeper features and improve its accuracy.
(3): By building a transformer iron-core loosening fault test platform, vibration data were collected at different positions of iron-core loosening, and a transformer fault dataset was established.

2. Basic Principles of Diagnostic Models

2.1. Gramian Angle Field

Gram angle field is a mapping method that converts one-dimensional waveforms into two-dimensional images in Cartesian coordinates. After the Gram angle field conversion, the image signal not only contains the dynamic information of the original signal in the discrete time domain, but also preserves the dependence of the original signal on the time series [20,21]. GAF first converts the step size and amplitude in the time sequence into radius and angle through polar projection. Then, the correlation is measured between each point in polar coordinates using trigonometric functions, and a Gram matrix is formed through trigonometric transformation.

For the given time series,

X = \{x_{1}, x_{2}, \dots, x_{n}\}

; in order to facilitate subsequent angle calculations, it is first scaled to the [−1,1] interval. The scaled time series

x_{i}^{'}

is shown below.

x_{i}^{'} = \frac{[(x_{i} - \max X) + (x_{i} - \min X)]}{\max X - \min X}

(1)

Then, map the scaled time series

x_{i}^{'}

to a polar coordinate system, encoding the numerical values as cosine angles

θ

with a range of [0,

π

]. Mapping time nodes

t_{i}

to radii r preserves the temporal relationships. The mapping relationship is shown below.

θ = \arccos x_{i}^{'}, - 1 ⩽ x_{i}^{'} ⩽ 1, x_{i}^{'} \in X^{'}

(2)

r = \frac{t_{i}}{N}, t_{i} \in N

(3)

where

t_{i}

is the time node and N is the total number of time nodes.

Within the interval

θ \in [0, π]

c o s θ

monotonically decreases, so for a given time series, there is a unique corresponding result in polar coordinates. In addition, by using the polar coordinate radius, the absolute time relationship is also ensured, which belongs to bijective transformation. After mapping the time series to polar coordinates, the correlation can be calculated based on the relationship between the angle sum or angle difference between each point using two encoding methods: Gram Angular Summation Field (GASF) and Gram Angular Difference Field (GADF). The calculation formulas for GASF and GADF are as follows.

GASF = [\begin{matrix} \cos (θ_{1} + θ_{1}) & \cos (θ_{1} + θ_{2}) & \dots & \cos (θ_{1} + θ_{n}) \\ \cos (θ_{2} + θ_{1}) & \cos (θ_{2} + θ_{2}) & \dots & \cos (θ_{2} + θ_{n}) \\ ⋮ & ⋮ & ⋮ \\ \cos (θ_{n} + θ_{1}) & \cos (θ_{n} + θ_{2}) & \dots & \cos (θ_{n} + θ_{n}) \end{matrix}] = \tilde{X}' \cdot \tilde{X} - \sqrt{I - {\tilde{X}}^{2}'} \cdot \sqrt{I - {\tilde{X}}^{2}}

(4)

GADF = [\begin{matrix} \sin (θ_{1} - θ_{1}) & \sin (θ_{1} - θ_{2}) & \dots & \sin (θ_{1} - θ_{n}) \\ \sin (θ_{2} - θ_{1}) & \sin (θ_{2} - θ_{2}) & \dots & \sin (θ_{2} - θ_{n}) \\ ⋮ & ⋮ & ⋮ \\ \sin (θ_{n} - θ_{1}) & \sin (θ_{n} - θ_{2}) & \dots & \sin (θ_{n} - θ_{n}) \end{matrix}] = \sqrt{I - {\tilde{X}}^{2}'} \cdot \tilde{X} - \tilde{X}' \cdot \sqrt{I - {\tilde{X}}^{2}}

(5)

where

I

is the unit row vector,

\tilde{X}

is the scaled time series, and

\tilde{X}'

is the transpose of

\tilde{X}

After the above transformation, the given time series can be transformed into a two-dimensional matrix that is symmetric along the diagonal, while preserving the integrity and time-dependent characteristics of the time series. The GAF encoding process is shown in Figure 1.

In summary, the generation process of Gram angle field usually includes the following steps:

(1): Data preprocessing: Normalize the input one-dimensional time-series data to ensure that its values are within an appropriate range.
(2): Polar coding: Map each time-series value to an angle in a polar coordinate system. This typically involves converting time-series values into offsets relative to a reference value (such as maximum or minimum) and calculating the corresponding angle.
(3): Construct Gram Matrix: Construct the Gram matrix by calculating the cosine values of the angles between different time points. Each element of this matrix reflects the correlation between two time points.
(4): Image generation: Visualize the Gram matrix as a two-dimensional image. This image preserves the temporal dependence and potential connectivity features of the original time series while removing redundant information between multiple modalities.

2.2. Residual Network

The deep residual network is a convolutional neural network model that improves the degradation problem of traditional CNNs. Based on experience, the more layers a CNN has, the more comprehensive the extracted features and the better the performance. However, in practical applications, as the network deepens, the effectiveness of the deep network fails to improve and network degradation is experienced.

For the above issues, He et al. [22] proposed a ResNet model based on the residual network structure. The main innovation of the ResNet model is the introduction of residual modules into deep convolutional neural networks, allowing the network to learn identity maps during training to alleviate gradient vanishing and exploding problems in deep networks. Specifically, the residual module ensures that the output of a certain layer in the network not only depends on the result of a series of transformations of the input of that layer, but also directly includes the input of that layer itself. ResNet consists of multiple residual module stacks, with each basic residual block typically consisting of two or three convolutional layers, followed by batch normalization and Rectified Linear Unit (ReLU) activation functions. The input of the residual block will first undergo an identity mapping, which directly skips the convolution operation and then adds it to the feature map obtained through the convolution operation. Finally, the output of the residual block is obtained through an activation function. This design allows the network to learn residuals, which are the differences between input and output, in order to better adapt to the training data. A basic residual module is shown in Figure 2.

BN is a batch normalization layer that aims to (1) transform the output into a standard normal distribution, reduce sample differences, and avoid gradient vanishing and exploding; and (2) reduce dependence on initial parameters, accelerate network convergence speed, and improve model generalization ability. The calculation formula for the BN process is as follows.

\{\begin{matrix} μ_{B} = \frac{1}{m} \sum_{i = 1}^{m} x_{i} \\ σ_{B}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - μ_{B})}^{2} \\ {\hat{x}}_{i} = \frac{x_{i} - μ_{B}}{\sqrt{σ_{B}^{2} + ε}} \\ y_{i} = γ {\hat{x}}_{i} + β \end{matrix}

(6)

where

μ_{B}

and

σ_{B}^{2}

are the mean and variance of the batch-processed data,

γ

and

β

are the parameters for model backpropagation learning, and

y_{i}

is the output of

{\hat{x}}_{i}

after scale transformation and offset.

ReLu is the activation function, and the calculation formula for the ReLu process is as follows.

ReLu (x) = \{\begin{matrix} x, x > 0 \\ 0, x \leq 0 \end{matrix}

(7)

where x is the input feature value.

2.3. Multi-Head Attention

The multi-head attention (MA) mechanism is the cornerstone of transformer construction [23,24]. Compared with conventional attention mechanisms, the MA mechanism can make the output of the attention layer contain representation information from different subspaces, thereby enhancing the expression ability of the model. It uses different query vectors to focus on different parts of the input information in order to analyze the current input information from different perspectives.

In the field of transformer fault diagnosis based on vibration signals, the multi-head attention mechanism has demonstrated significant advantages over traditional standard attention mechanisms in various aspects. Through the parallel processing of multiple heads, the multi-head attention mechanism can more comprehensively and deeply capture the multidimensional features contained in vibration signals, which is crucial for accurately determining the type, location, and severity of faults. At the same time, this mechanism also has stronger robustness and can cope with possible noise and interference in vibration signals, ensuring the stability and reliability of diagnostic results.

It mainly consists of three steps: Firstly, input the extracted features, then perform linear transformation, and map the results to the query space Q, key space K, and value space V, respectively. Secondly, the scaled dot product and Softmax function are used to calculate each attention distribution, and the attention distributions are weighted and summed to obtain the corresponding output. Finally, use concatenation to concatenate multiple output results. The formula is shown in Equations (8)–(12).

Q = H W_{Q} = [β_{1}, β_{2}, \dots, β_{λ}]

(8)

K = H W_{K} = [k_{1}, k_{2}, \dots, k_{μ}]

(9)

V = H W_{V} = [v_{1}, v_{2}, \dots, v_{μ}]

(10)

Y_{i} = \sum_{j = 1}^{μ} f_{Softmax} [s_{3} (β_{i}, k_{j})] \cdot v_{j} = f_{Softmax} (\frac{Q K^{T}}{\sqrt{D_{k}}}) \cdot V

(11)

Y = f_{Concat} (Y_{1}, Y_{2}, \dots, Y_{λ})

(12)

where

W_{Q}

W_{K}

, and

W_{V}

are linear transformation parameters for the query space, key space, and value space.

D_{k}

is a matrix composed of the number of dimensions for each key,

k

is the element vector in the key space,

v

is an element vector in the value space,

λ

is the number of query vectors,

f_{Concat} (\cdot)

is the feature concatenation function, and

μ

is the number of dimensions after linear transformation.

3. Diagnostic Model and Diagnostic Process

3.1. Diagnostic Model

The overall model of this article mainly consists of three parts: an image feature extraction model, an attention model, and a classification model. The image feature extraction module is the ResNet model, and the attention module and classification module together form a complete neural network structure, as shown in Figure 3.

The ResNet model mainly extracts features from GAF images to obtain features from different types of fault images. The attention module further focuses on the key information in the image, using a multi-head attention mechanism to enhance the image features in the flattened features, and finally inputs them into the classification module composed of a fully connected network for transformer fault classification.

ResNet has multiple variant structures, including ResNet18, ResNet34, and ResNet50. Considering the relatively small dataset used in this study, to avoid overfitting caused by too many model parameters, ResNet18 (with fewer layers) was chosen as the base model. The structure of the multi-head attention mechanism is shown in Figure 4, with 32 linear transformation parameters, 8 heads, and 256 linear transformation parameters after concatenation. The classification layer adds a ReLU activation layer after a linear transformation with 256 parameters, fully connects with 16 neurons, and finally activates through the Softmax function. The Adam optimizer is used, with an initial learning rate set to 0.001 and a loss function of cross entropy loss.

3.2. Diagnostic Process

The fault diagnosis process based on GAF–ResNet–MA is as follows: Firstly, the vibration signal is divided into frames of a certain length, encoded by GAF, and a two-dimensional image is generated. Secondly, the image dataset is divided into a training set and a testing set in a certain proportion. Thirdly, the training set is input into ResNet for training to obtain a fault diagnosis model, and then the test set data are input to analyze the diagnostic performance. The specific diagnostic process is shown in Figure 4.

4. Results and Discussion

4.1. Data Acquisition

With reference to the DL/T 1540-2016 electric power industry standard of the People’s Republic of China, we built a transformer core vibration-signal acquisition platform to collect vibration signal data. The main equipment includes transformers, voltage regulators, vibration sensors, etc. Among them, the transformer adopts a three-phase dry-type transformer. The sensor adopts a B&K 4533-B-004 piezoelectric vibration acceleration sensor with a measurement frequency range of 0.2 Hz–12.8 kHz, sensitivity of (±10%) 100 mV/g, and accuracy of 0.001 m/s². During the experiment, sensors were used to measure the vibration data of the A-phase overlap area on the transformer iron core along the stacking direction, which was transmitted to the computer for storage via a data cable. The experimental platform is shown in Figure 5.

This experiment simulated six different locations of iron core looseness, namely, iron core looseness at the upper yoke of phase A, iron core looseness at the upper yoke of phase B, iron core looseness at the upper yoke of phase C, iron core looseness at the lower yoke of phase A, iron core looseness at the lower yoke of phase B, and iron core looseness at the lower yoke of phase C. By loosening the bolts at the corresponding points, the clamping force of the clamp on the iron core is reduced, and the loosening displacement is set to 0.5 cm to simulate the loosening of the iron core. During the experiment, we performed 100 replicates of the experiment for each fault type, with a sampling time of 0.1 s and a sampling frequency of 2 KHz. The types of faults are shown in Table 1.

4.2. Data Processing

In the actual experiments, the vibration signal of the transformer core during operation was simultaneously affected by other components and external environment interference. These interferences can cause the GAF to generate blurry images, which is not conducive to subsequent image recognition. Therefore, this article adopts EEMD to eliminate interference signals. EEMD utilizes the characteristic that Gaussian white noise energy is equal across all frequencies. EMD is a completely data-driven, adaptive decomposition method that decomposes a signal from high to low frequency into a finite sum of physically significant Intrinsic Mode Functions (IMFs) and remainders. By adding white noise to the initial signal and performing multiple EMD processes, a series of linear and stationary IMF components are obtained, which decompose the initial signal into the sum of several single-component amplitude and frequency modulation signals. This process can be regarded as linear filtering. The main steps are as follows:

(1): Add Gaussian white noise with a mean of 0 to the decomposed signal $x_{i} (t)$ and normalize it. Apply the EMD algorithm to decompose the normalized signal and obtain the IMFs of each order.
(2): Repeat step (1) and ensure that the intensity of Gaussian white noise added each time is the same, but the sequence is different.
(3): Perform ensemble averaging on the IMF obtained in step (2) to obtain the EEMD of the initial signal, as shown in the following equation.

x (t) = R (t) + \sum_{s = 1}^{S} I M F s

(13)

where R(t) is the residual component and S is the number of IMF components.

EEMD decomposes the original signal by adding white noise, so there may be false components in the numerous IMF components obtained from the decomposition. Therefore, by calculating the correlation coefficient r between each IMF component and the original signal, IMF components that are strongly similar to the original signal (r > 0.6) can be selected and considered as a true component of the original signal. Otherwise, they can be discarded. The calculation steps for the correlation coefficient are as follows [25]:

(1): Calculate the autocorrelation functions of each IMF component, $C_{I M F 1}, C_{I M F 2}$ …, $C_{I M F S}$ , and the autocorrelation function of the original signal, $C_{x}$ , using the following formula:

C_{x} (m) = \frac{1}{N} \sum_{i = 0}^{N - 1} x (i) x (i + m)

(14)

where

x (i)

is the state of the signal at a certain moment and N is the number of sampling points of the signal.

(2): Normalize the autocorrelation coefficients and calculate the correlation coefficients $r (j)$ between $C_{I M F 1}, C_{I M F 2} . . ., C_{I M F S},$ and $C_{x}$ using the following formula:

r (j) = \frac{\sum_{i = 1}^{2 N - 1} R_{I M F j} (i) R_{x} (i)}{\sqrt{\sum_{i = 1}^{2 N - 1} R_{I M F j}^{2} (i) \sum_{i = 1}^{2 N - 1} R_{x}^{2} (i)}}

(15)

where

r (j)

represents the correlation coefficient between the jth IMF component and the original signal.

It is generally believed that when

r (j) > 0.6

, the corresponding IMF component has good correlation with the original signal and can be retained. Select these IMF components with good correlation to the sum and reconstruct the signal.

Taking the vibration signal of the B-phase lower-yoke bolt loosening as an example, the vibration signal of the transformer iron core is decomposed by EEMD. The correlation coefficients between each IMF component and the original signal are shown in Table 2.

After EEMD, eight IMF components were obtained. Based on Table 2, IMF1 was chosen to reconstruct the signal. The waveform and frequency spectrum of the vibration signal before and after reconstruction are shown in Figure 6 (with a sampling frequency of 2000 Hz).

In Figure 6, the changes in the time and frequency domains of the original signal before and after EEMD and reconstruction are shown. The time-domain signals display the changes in the signal over time, while the frequency-domain signals reveal the frequency components of the signal. By comparing the signals before and after reconstruction, we can observe the following points:

(1): Time-domain signal: The reconstructed signal is smoother in the time domain and significantly reduces noise components. This indicates that the EEMD and reconstruction process effectively removes noise interference from the original signal, making the main features of the signal more prominent.
(2): Frequency-domain signal: In the frequency domain, the reconstructed signal has clearer frequency components. By comparison, we can find that the key frequency components in the reconstructed signal are consistent with the original signal, but the noise frequency components are effectively suppressed. This further validates the effectiveness of the EEMD and reconstruction method in signal processing.

In summary, the results in Figure 6 demonstrate the significant effect of the EEMD and reconstruction method in improving signal quality, providing a more accurate and reliable signal foundation for subsequent diagnostic analysis.

4.3. Signal Image Conversion

The size of the GAF images is square related to the length of the time series. When the sampling length is too long, it can lead to an increase in the size of the GAF image, thereby increasing the time and difficulty of model training. If the sampling length is too short, it cannot effectively reflect fault information. This article selects a time-series length of 200 for a single sample and converts it into a two-dimensional image using the Gram angle field. Figure 7 shows the GASF and GADF images when the iron core is loose at different points.

In Figure 7, it can be seen that the different types of GAF two-dimensional images have significant pixel differences. Using this as ResNet input can further enhance the network’s feature extraction ability and improve classification and recognition accuracy.

4.4. Experimental Results

This article selects a time-series length of 200 for a single sample, which is converted to a duration of 0.1 s. The data collection time for each group of experiments is 10 s, so each group of experimental data can obtain 100 image samples. There are also seven combinations of fault loads, and a total of 700 GAF image samples can be obtained. The segmentation ratio of the training and testing sets is 7:3, and the number of images in the testing set is 210.

Two encoding image sets, GASF and GADF, were used separately and trained using the method proposed in this paper, with a training frequency of 100. Adam was selected as the optimizer. The classification accuracy of the two encoding methods is shown in Table 3.

From Table 3, it can be seen that there is not much difference in performance between the two on the training set, but on the test set, the accuracy of the GASF-encoded images is higher than that of GADF encoding. This may be due to GASF’s ability to more comprehensively capture the correlation and similarity between signals by calculating their angles and summing them up. GADF uses angle difference in constructing the feature matrix, which may to some extent lose some important features of the signal. Therefore, in this article, only the t-distributed stochastic neighbor embedding (t-SNE) and confusion matrix of the GASF-encoded images are presented to characterize the specific classification situation, as shown in Figure 8 and 9. Figure 8 shows the visualization results after classification by the ‘classifier’. From Figure 8, it can be seen that each type of fault is basically clustered together, and different types of faults are directly separated by a certain distance. There is no obvious overlap between different categories, indicating a high distinguishability of the features. Figure 9 shows the confusion matrix of the classification results. From Figure 9, it can be seen that the accuracy of the model proposed in this paper for identifying various types of loose-iron-core faults in transformers is 99.52%. Among them, one F5 is predicted as F2. This is because there may be similar vibration characteristics when the upper and lower yoke bolts of the same phase of the transformer are loose, which leads to confusion in fault diagnosis.

Table 4 shows the diagnostic results before and after signal reconstruction. The model used for testing is the GAF-ResNet-MA model proposed in this article. By comparison, it can be seen that the diagnostic accuracy before reconstruction is 97.62%, and the diagnostic accuracy before reconstruction is 99.52%. This is because there are a large number of interference components in the original signal, and the diagnostic algorithm may not be able to accurately identify the fault characteristics, resulting in misjudgments or omissions in the diagnostic results. However, in the reconstructed diagnostic results, due to the clearer and more stable signals, the diagnostic algorithm can more accurately identify the fault characteristics and provide correct diagnostic conclusions.

Meanwhile, to further validate the superiority of the method proposed in this paper, five sets of comparative experiments were conducted. We compared the method proposed in this article with the GAF ResNet18 model, a support vector machine (SVM), a random forest model (RF), and a one-dimensional convolutional neural network model (1D-CNN). The same dataset was used for each algorithm to ensure the comparability of the results. The unique configuration of each model (e.g., hyperparameters, layer structure, etc.) is referenced from settings from other studies to ensure the reliability of the test. In addition, the data input for each model varies according to the needs of the algorithm. SVM and RF directly use the raw statistical features extracted from the dataset. 1D-CNN uses the one-dimensional time-series representation of the original data, CNN, GAF-ResNet18, and the proposed method to convert the time-series data into images through the GAF, capture time-dependent features, and then input them into the ResNet18 model. The overall average recognition accuracy comparison of transformer iron-core looseness using different methods is shown in Figure 10. It can be seen from Figure 10 that the accuracy of the method proposed in this paper is improved compared to the shallow CNN model and the GAF ResNet18 model without a multi-head attention mechanism. Compared with traditional methods, the method proposed in this article not only improves the accuracy of diagnosis, but also realizes an automation process from the signal to the diagnosis result, reducing manual intervention.

Finally, a comparison was made between the methods proposed in this paper and the recent literature [26] in the relevant field, which utilized continuous wavelet transform (CWT) and an improved CNN. On the basis of the same dataset, the accuracy of the method is 99.24%, which is slightly lower than that of the method in this paper, which further illustrates the feasibility and superiority of the proposed method.

5. Conclusions

To achieve transformer core loosening diagnosis without relying on manual feature extraction and improve diagnostic accuracy, this paper proposes a transformer vibration-signal core loosening diagnosis method based on the Gram angle field and multi-head attention mechanism. The following conclusions are drawn:

(1): By using GAF to convert vibration signals into images and directly inputting the images into a deep residual network for learning, the feature extraction ability of convolutional neural networks can be fully utilized, without the need for the manual extraction of vibration-signal feature quantities.
(2): Applying the multi-head attention mechanism to the residual network can enhance the feature representation ability of the model and improve its accuracy.
(3): The use of Ensemble Empirical Mode Decomposition (EEMD) for signal decomposition and reconstruction can effectively eliminate the noise components of vibration signals in complex environments and enhance robustness, improving the applicability of the proposed method.

The model proposed in this paper has opened up a new way to effectively identify transformer core loosening faults and has good practical application potential. However, it is worth noting that all testing and validation procedures were conducted in an ideal laboratory environment. Therefore, in the future, we plan to conduct field tests in different environments to verify the model’s generalization ability and robustness. This includes testing on various types of transformers and under different operating conditions to ensure that the model can accurately diagnose various iron-core loosening faults. In addition, in practical applications, besides vibration signals, there may be other factors (such as temperature, humidity, load, etc.) that can affect the operating status of transformers. Therefore, we plan to incorporate these variables into the model in future research to more comprehensively evaluate the health status of transformers.

Author Contributions

J.C. proposed the topic of the review; Z.W. surveyed the literature and composed the manuscript; X.Z. conducted the literature review; N.D., J.C., and X.Z. discussed and revised the manuscript; N.D. and J.C. supervised this project; N.D. and J.C. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under Grant 52077161.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and ethical.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sarma, G.S.; Reddy, R.; Nirgude, P.M.; Naidu, V. A review on real time fault detection and intelligent health monitoring techniques of transformer. Int. J. Eng. Res. Appl. 2021, 11, 40–47. [Google Scholar]
Rampersad, R.M.; Bahadoorsingh, S.; Sharma, C. Multifactorial frameworks modelling linkages of power transformer failure modes. In Proceedings of the 2018 IEEE Electrical Insulation Conference (EIC), San Antonio, TX, USA, 17–20 June 2018; pp. 398–402. [Google Scholar]
Moravej, Z.; Bagheri, S. Condition monitoring techniques of power transformers: A review. J. Oper. Autom. Power Eng. 2015, 3, 71–82. [Google Scholar]
Pleite, J.; Gonzalez, C.; Vazquez, J.; Lazaro, A. Power transformer core fault diagnosis using frequency response analysis. In Proceedings of the MELECON 2006—2006 IEEE Mediterranean Electrotechnical Conference, Malaga, Spain, 16–19 May 2006; pp. 1126–1129. [Google Scholar]
Yao, D.; Li, L.; Zhang, S.; Zhang, D.; Chen, D. The vibroacoustic characteristics analysis of transformer core faults based on multi-physical field coupling. Symmetry 2022, 14, 544. [Google Scholar] [CrossRef]
Ji, S.C.; Zhang, F.; Shi, Y.H.; Zhan, C.; Zhu, Y.Y.; Lu, W.F. A review of research on mechanical state diagnosis methods for power transformers based on vibration signals. High Volt. Technol. 2020, 46, 257–272. [Google Scholar]
Tavakoli, A.; De Maria, L.; Bartalesi, D.; Garatti, S.; Bittanti, S.; Valecillos, B.; Piovan, U. Diagnosis of transformers based on vibration data. In Proceedings of the 2019 IEEE 20th International Conference on Dielectric Liquids (ICDL), Roma, Italy, 23–27 June 2019; pp. 1–4. [Google Scholar]
Zhan, C.; Ji, S.; Liu, Y.; Zhu, L.; Shi, Y.; Ren, F. Winding Mechanical Fault Diagnosis Technique of Power Transformer Based on Time-Frequency Vibration Analysis. In Proceedings of the 2018 Condition Monitoring and Diagnosis (CMD), Perth, WA, Australia, 23–26 September 2018; pp. 1–6. [Google Scholar]
Bin, Z.; Xu, J.Y.; Chen, J.B.; Li, H.; Lin, X.; Zang, Z. Winding Deformation Diagnosis Method Based on Vibration Information of Power Transformers. High Volt. Technol. 2015, 41, 2341–2349. [Google Scholar]
Hong, K.; Huang, H.; Zhou, J.; Shen, Y.; Li, Y. A method of real-time fault diagnosis for power transformers based on vibration analysis. Meas. Sci. Technol. 2015, 26, 115011. [Google Scholar] [CrossRef]
Xue, X.; Li, C.; Cao, S.; Sun, J.; Liu, L. Fault diagnosis of rolling element bearings with a two-step scheme based on permutation entropy and random forests. Entropy 2019, 21, 96. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Liu, M.; Wang, K.; Sun, L. Mechanical fault diagnosis for HV circuit breakers based on ensemble empirical mode decomposition energy entropy and support vector machine. Math. Probl. Eng. 2015, 2015, 101757. [Google Scholar] [CrossRef]
Wen, L.; Li, X.; Gao, L.; Zhang, Y. A new convolutional neural network-based data-driven fault diagnosis method. IEEE Trans. Ind. Electron. 2017, 65, 5990–5998. [Google Scholar] [CrossRef]
Shi, Y.; Wang, H.; Sun, W.; Bai, R. Intelligent fault diagnosis method for rotating machinery based on recurrence binary plot and DSD-CNN. Entropy 2024, 26, 675. [Google Scholar] [CrossRef] [PubMed]
Thomas, J.B.; Chaudhari, S.G.; Shihabudheen, K.V.; Verma, N.K. CNN-based transformer model for fault detection in power system networks. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
Li, H.Y.; Sun, Y.; Zhang, X.; Song, J.C. A Fault Diagnosis Method for Vacuum Contactors Using Modal Time Frequency Graph and ResNet50 Fusion. High Volt. Technol. 2023, 49, 1831–1840. [Google Scholar]
Lan, M.Y.; Liu, Y.L.; Jin, T.; Gong, Z.; Liu, Z. Identification of Composite Power Quality Disturbance Types Based on Visual Trajectory Circle and ResNet18. Chin. J. Electr. Eng. 2022, 42, 6274–6286. [Google Scholar]
Yan, J.; Kan, J.; Luo, H. Rolling bearing fault diagnosis based on Markov transition field and residual network. Sensors 2022, 22, 3936. [Google Scholar] [CrossRef] [PubMed]
Li, X.-X.; Li, D.; Ren, W.-X.; Zhang, J.-S. Loosening identification of multi-bolt connections based on wavelet transform and ResNet-50 convolutional neural network. Sensors 2022, 22, 6825. [Google Scholar] [CrossRef] [PubMed]
Zhang, Q.; Cui, J.; Xiao, W.; Mei, L.; Yu, X. Demagnetization Fault Diagnosis of a PMSM for Electric Drilling Tools Using GAF and CNN. Electronics 2024, 13, 189. [Google Scholar] [CrossRef]
Luo, L.; Liu, Y. Fault Diagnosis of Planetary Gear Train Crack Based on DC-DRSN. Appl. Sci. 2024, 14, 6873. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Liu, H.-I.; Chen, W.-L. X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism. Appl. Sci. 2022, 12, 4502. [Google Scholar] [CrossRef]
Chen, R.; Tang, B.; Lv, Z. A denoising method for EEMD rotor vibration signals based on correlation coefficient Vibration. Test. Diagn. 2012, 32, 542–546+685. [Google Scholar]
Li, C.; Chen, J.; Yang, C.; Yang, J.; Liu, Z.; Davari, P. Convolutional neural network-based transformer fault diagnosis using vibration signals. Sensors 2023, 23, 4781. [Google Scholar] [CrossRef] [PubMed]

Figure 1. GAF coding schematic diagram.

Figure 2. Residual module.

Figure 3. The GAF-ResNet-MA diagnostic model.

Figure 4. The GAF–ResNet–MA diagnostic process.

Figure 5. The transformer vibration test platform.

Figure 6. Time-domain waveforms and spectra before and after reconstruction. (a) Time-domain signal before reconstruction; (b) Time−domain signal after reconstruction; (c) Spectral signal before reconstruction; (d) Spectral signal after reconstruction.

Figure 7. GASF and GADF images of iron-core loosening at different points. (a) GASF under normal operation; (b) GADF under normal operation; (c) GASF with loose yoke in phase A; (d) GADF with loose yoke in phase A; (e) GASF with loose lower yoke in phase A; (f) GADF with loose lower yoke in phase A.

Figure 8. t-SNE visualization.

Figure 9. Confusion matrix.

Figure 10. Fault recognition accuracy of different models.

Table 1. Typical types of faults.

Name	Type
N1	Normal
F1	Loose yoke on phase A
F2	Loose yoke on phase B
F3	Loose yoke on phase C
F4	Loose lower yoke of phase A
F5	Loose lower yoke of phase B
F6	Loose lower yoke of phase C

Table 2. Correlation coefficients between IMF components and original signal.

IMF	1	2	3	4	5	6	7	8
r	0.8788	0.3753	0.2869	0.2532	0.1486	0.0170	0.0145	0.0078

Table 3. Comparison of accuracy between two encoding methods.

Encoding Method	Training Set Accuracy	Test Test Set Accuracy
GASF	100%	99.52%
GASF	99.59%	95.71%

Table 4. Diagnostic results before and after signal reconstruction.

Signal Type	Training Set Accuracy	Test Set Accuracy
Before refactoring	98.98%	97.62%
After refactoring	100%	99.52%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Duan, N.; Zhou, X.; Wang, Z. Diagnostic Model for Transformer Core Loosening Faults Based on the Gram Angle Field and Multi-Head Attention Mechanism. Appl. Sci. 2024, 14, 10906. https://doi.org/10.3390/app142310906

AMA Style

Chen J, Duan N, Zhou X, Wang Z. Diagnostic Model for Transformer Core Loosening Faults Based on the Gram Angle Field and Multi-Head Attention Mechanism. Applied Sciences. 2024; 14(23):10906. https://doi.org/10.3390/app142310906

Chicago/Turabian Style

Chen, Junyu, Nana Duan, Xikun Zhou, and Ziyu Wang. 2024. "Diagnostic Model for Transformer Core Loosening Faults Based on the Gram Angle Field and Multi-Head Attention Mechanism" Applied Sciences 14, no. 23: 10906. https://doi.org/10.3390/app142310906

APA Style

Chen, J., Duan, N., Zhou, X., & Wang, Z. (2024). Diagnostic Model for Transformer Core Loosening Faults Based on the Gram Angle Field and Multi-Head Attention Mechanism. Applied Sciences, 14(23), 10906. https://doi.org/10.3390/app142310906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Diagnostic Model for Transformer Core Loosening Faults Based on the Gram Angle Field and Multi-Head Attention Mechanism

Abstract

1. Introduction

2. Basic Principles of Diagnostic Models

2.1. Gramian Angle Field

2.2. Residual Network

2.3. Multi-Head Attention

3. Diagnostic Model and Diagnostic Process

3.1. Diagnostic Model

3.2. Diagnostic Process

4. Results and Discussion

4.1. Data Acquisition

4.2. Data Processing

4.3. Signal Image Conversion

4.4. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI