Notes For 3rd Year Electrical Engineer

2020 International Conference on Power, Energy and Innovations (ICPEI 2020)
October 14-16, 2020, Chiangmai, THAILAND
Intelligent Machine Learning Techniques for

Condition Assessment of Power Transformers
Kunanya Leauprasert and Cattareeya Suwanasri Nitchamon Poonnoy
Thanapong Suwanasri Department of Electrical and Computer Department of Teacher Training in
Electrical and Software Systems Engineering Electrical Engineering
2020 International Conference on Power, Energy and Innovations (ICPEI) | 978-1-7281-7240-8/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICPEI49860.2020.9431460
Engineering, The Sirindhorn Faculty of Engineering Faculty of Technical Education

International Thai-German Graduate King Mongkut’s University of King Mongkut’s University of
School of Engineering (TGGS), King Technology North Bangkok Technology North Bangkok
Mongkut’s University of Technology Bangkok, Thailand Bangkok, Thailand
North Bangkok, Bangkok, Thailand cattareeya.s@eng.kmutnb.ac.th nitchamon.p@fte.kmutnb.ac.th
kunaya.l-epe2018@tggs.kmutnb.ac.th
thanapong.s@tggs.kmutnb.ac.th
Abstract— This paper introduces a condition major components of power transformer as given in
assessment of power transformer in term of percentage Table I.
of health index (%HI) by using regression models. The
conditions of major components of power transformer TABLE I. POWER TRANSFORMER’S COMPONENTS AND TESTING
are assessed by using input datasets from visual
inspection, electrical test as well as paper and oil Subsystems Measurement
insulation test. 90 features of these input datasets are Active part; Turn ratio test, DC winding resistance test,
winding and core short circuit impedance test, capacitance test
tested in regression models for determining the predicted
Paper insulation Polarization index test
HI. Six regression models such as linear regression,
Ridge regression and Lasso regression, random forest Oil insulation Oil dielectric strength test, tan delta test
regression, support vector regression, and deep neural Bushing Capacitance test, tan delta test
network regression are tested to predict %HI. Actual Surge arrester Capacitance test, tan delta test, watt loss test
input datasets related to actual %HI of 317 power CO2, C2H4, C2H2, C2H6, CH4, CO, C3H6,
DGA
transformers are used to teach such learning regression C3H8, H2, O2, N2
models. The random forest regression performs the best
model providing the best output dataset with the lowest Therefore, this paper presents a condition
errors. assessment of power transformer and its components in
term of percentage Health Index (%HI) by using
Keywords— Condition assessment, Deep neural different historical test results such as visual inspection,
network, Health index, Regression method, Power electrical tests, paper and oil insulation, and Dissolved
transformer Gas Analysis (DGA). Different regression techniques
I. INTRODUCTION are applied to develop a predictive model based on
actual data and historical test results in order to
Electrical utilities have to manage their assets determine %HI, accordingly. Six prediction models
effectively with maximum benefit. The operational include ordinary linear regression model, Ridge
methods must satisfy optimal manner to achieve goals regression model, Lasso regression model, random
for asset management [1] focusing mainly on life cycle forest regression (RFR) model, support vector
of the asset. All activities including specification data, regression (SVR) model, and deep neural network
inspection and test records, operating and maintenance (DNN) regression model. Evaluation metrics for
process are involved to increase the lifespan of continuous variables are carried out to perform a
apparatus. Definitely, the maintenance is a crucial comparison between models.
process, generally consisting of preventive
maintenance (PM) and condition based maintenance II. DATA COLLECTION AND ANALYSIS
(CM).
A. Health Index as Training Data
One of the most important components in the The technical data and test results of 350 units
transmission and distribution network is power 115/22 kV power transformers is collected for creating
transformer. Major components of power transformer datasets, which were subsequently used to determine a
include winding and core, bushing, surge arrester and percentage of %HI by using scoring and weighting
insulation. They require such complex maintenance technique and AHP technique as given in [2]. %HI of 5
processes. The correct configuration and proper major components in power transformer includes
implementation process will result in higher reliability %HIactive-part, %HIpaper-insulation, %HIoil-insulation, %HIbushing,
of the equipment as well as reduction of operating and %HIsurge-arrester, %HIDGA. They are used as training data
maintenance cost. Many tests and visual inspection for the mentioned regression models as deep learning
have been performed in order to help maintaining the machine.
equipment condition and extension of life span of
978-1-7281-7240-8/20/$31.00 ©2020 IEEE

65
Authorized licensed use limited to: Syracuse University Libraries. Downloaded on June 15,2021 at 08:19:33 UTC from IEEE Xplore. Restrictions apply.
N between 0 to 1. The higher coefficient provides the

 (S ×W )
i i
(1) higher percentage of the data fitting.
% HI = N
i =1
× 100
 ( Smax ×Wi )  ( y − yˆ ) 2
i =1 R2 = 1 − i i
(4)
(y − y )
i i
2
where Si is score, Smax is maximum score, Wi is weight

of each diagnostic result ith. B. Linear Regression Model
B. Data Preprocessing Linear regression is a well-known method in
To archive the best performance of the power machine learning and statistical analysis. This method
transformer assessment model, the input dataset must aims to establish a relationship of features such as
be firstly preprocessing. There are 317 samples with independent, explanatory, or predictor variables with
89 variables with no categorical features of target variables such as dependent or response
transformer. No categorical encoding is needed but variables. In Eq. (5), Y can be written in terms of X
feature selection is particularly necessary. It is through a functional relationship.
important to reduce a number of input variables in This model is a simple regression model for one
order to reduce a computational cost, overfitting and feature. However, if there are various features, it is
occasionally in order to improve the model’s multivariate regression [5]. Then, the multivariate
performance. The features are reduced from 89 to 46 regression model is developed in this paper.
by manual selection of having only significant impact
to condition assessment of power transformer.
Correlations considering correlations between features Yi = Xi ,* β + ε i
(5)
and the target are considered. = β1 X i ,1 + ... + β p X i , p + ε i
Then, the input data is split into training set with
269 samples and 48 samples. There are missing values β is the regression parameter. It represents the effect
in both input training data and output testing data. The size of covariate on the response Y where εi is the error
null record is substituted by the mean value of each term and p is a number of features. The ordinary least
feature of each dataset. Data scaling is one the most squares (OLS) method is widely used to estimate β.
crucial steps during the preprocessing. The purpose of
data scaling is to eliminate the range differences C. Ridge Regression Model [6]
between features because some algorithms are Ridge regression is a tool to analyze the data
sensitive to distances between data and help improving suffered by multicollinearity. It is a phenomenon when
convergence speed of algorithms. two or more variables in multiple regression are highly
linear related. If there is multicollinearity in the data,
III. MACHINE LEARNING TECHNIQUES least squares estimation is unbiased; it is in contrast if
A. Evaluation Metrics [3-4] the overfitting occurs. Ridge regression model is
similar to least squares estimation; unless Ridge
Root-mean-square error (RMSE) provides some coefficient variable is added and optimized. It is
error of numerical prediction. The mean absolute error expected that the net effect will provide more reliable
(MAE) describes average error of the model. The estimation. The Ridge coefficients are used to reduce a
coefficient of determination (R2) is calculated and used penalized residual sum of squares. Wide range of alpha
to compare the performances of linear regression, are investigated and found that 20 is the best value as
Ridge regression, Lasso regression, RFR, SVR, and given the error shown in Table II.
DNN methodologies in transformer HI prediction. The
RMSE, MAE are determined by using Eqs. (2) - (3). D. Lasso Regression Model [7]
Least Absolute Shrinkage and Selection Operator
1 M (Lasso) was introduced by [7] in order to estimate
 ( y − yˆ )
2
RMSE = i i
(2) parameter and also select variable simultaneously in the
M i =1 analysis. The lasso model adds L1-penalty function to
panelize the least squares regression. The alpha of 0.05
1 M was used to obtain the error shown in Table II.
MAE =
M
| y
i =1
i − yˆ | (3)
E. Random Forest Regression Model [8]
Random Forest regression (RFR) consists of a
where M is a number of test samples, is an actual coordination of decision tree predictors. Each tree is
%HI, and the predicted %HI. The lower RMSE constructed independently using a different subsample
indicates a better fitting model. of the data. Random forest is ensemble learning method
by bootstrap aggregation implementation. It generates
The coefficient of determination (R2) as given in a training data set by randomly drawing apart of N
Eq. (4) provides a measure of consistency of the whereas N is the size of original training set. It is used
model. The coefficient is based on the ratio of total to construct individual trees for each feature or
variation of outcomes in the model. R2 range is in combined features aiming to mitigate problems of both
66
high variance and high bias of final prediction results combinations of weighted inputs. The optimization
by reducing a correlation between the trees leading to a algorithm in back propagation utilizes predicted
cumulative output of decision trees as shown in Fig. 1. results from output layer and adjusts weights of edges.
In contrast to the mentioned linear models, the RFR In this paper, deep neural network model consists of
model is able to deal with non-linear interaction two hidden layers with 500 nodes and 100 nodes,
between features and targets. In this paper, the MAE is respectively. Activation function of its input layer as
lowest for 225 estimators with depth of the tree as 15 given in Eq. (6) is Rectified Linear Unit (ReLU) as of
shown in Table II. that two hidden layers as shown in Fig. 4. By
comparing with sigmoid and ‘tanh’ functions, the
ReLU function greatly accelerates a convergence of
stochastic gradient descent [11]. In addition, Adam
Optimizer [12] is used to update inputs’ weights in
back propagation process by computing individual
learning rates of different parameters during
estimation process of the gradients. This optimization
algorithm shows a better performance in practice than
other stochastic optimization methods. [13]
Fig. 1. Random Forest model.
F. Support Vector Regression Model [9]

Support Vector Machine (SVM) is an effective
machine learning model due to its capability to classify
and well perform regression of both linear and
nonlinear data. Support Vector Machine for regression
is commonly referred to as Support Vector Regression
(SVR). In linear regression model, the error rate should
be minimized. In the SVR model, it will approximate
the best value with a margin named ε tube. The SVR Fig. 3. Neural network model.
fits many instances within a certain threshold ±ε as
shown in Fig. 2. In this paper, the RBF/Gaussian kernel f ( x ) = x + = max(0, x ) (6)
[10] is selected. Its scores are obtained in Table II.
Fig. 4. ReLU function model.
Fig. 2. Support Vector Machine model. IV. RESULT AND DICCUSSION

G. Deep Neural Network Model [11] After evaluating all six models on the input dataset,
the performance of all models are compare to find out
Artificial Neural network emulates function of the best model. The RMSE, MAE, and R2 are tabulated
human brain. Moreover, it simplifies and derives a for all the models in Table II. A comparison between
structure of biological neurons. Generally, neural the determined %HI or predicted %HI by the models
network model comprises input, hidden and output and the trained %HI from actual utility’s practice as
layers as shown in Fig. 3. Nodes in the layers imitate input dataset is illustrated in Fig. 5.
biological neurons; and the nodes are connected by
applying different weights. Algorithms of neural TABLE II. PERFORMANCE ANALYSIS
network includes input data forward propagation and
error back propagation. The original data in forward Degree of Polynomial RMSE MAE R2
propagation is transferred from input to hidden layer. Linear Regression 41.5857 4.1505 -0.96753
Then, the hidden layer extracts some data features by Ridge Regression 3.9778 3.3126 0.2588
applying activation function. Lasso Regression 3.9170 3.2963 0.2741
RFR 2.7147 2.0889 0.6484
Activation function aims to generate a non-linear SVR 3.4757 2.4610 0.4849
decision boundary by performing non-linear DNN 3.7347 3.0317 0.4102
67
%HI of RFR are slightly different from the actual

trained %HI; only a few points, the percentage errors of
the predicted values spike up/down from the actual
values.
V. CONCLUSION
In this paper, the development of machine learning
model for condition assessment of power transformer
has been proposed in order to predict %HI. The
historical testing data and actual %HI of The
investigation are used as input data of the six different
regression models including deep neural network. A
total of 269 power transformers are involved in the
training process while 48 power transformers are used
in the evaluation process. It is clearly seen that the
Random Forest Regression is the most effective model.
It performs the best predicted results with the lowest
root mean squared error, lowest mean absolute error
and the highest R2 among other models.
REFERENCES
[1] R. Davies, J. Dieter, and T. McGrail, "The IEEE and asset
management: A discussion paper," IEEE Power and Energy
Society General Meeting, 2011, pp. 1-5: IEEE.
[2] G. Montanari, "Condition monitoring and dynamic Health
Index in electrical grids," International Conference on
Fig. 5. Scatter plots of all the predicted value versus actual value. Condition Monitoring and Diagnosis (CMD), 2016, pp. 82-85:
IEEE.
[3] C. J. Willmott and K. Matsuura, "Advantages of the mean
absolute error (MAE) over the root mean square error (RMSE)
in assessing average model performance," Climate research,
vol. 30, No. 1, pp. 79-82, 2005.
[4] A. C. Cameron and F. A. Windmeijer, "An R-squared measure
of goodness of fit for some common nonlinear regression
models," Journal of econometrics, vol. 77, No. 2, pp. 329-342,
1997.
[5] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction
to Linear Regression Analysis. John Wiley & Sons, 2012.
[6] A. E. Hoerl and R. W. Kennard, "Ridge regression:
applications to nonorthogonal problems," Technometrics, vol.
12, No. 1, pp. 69-82, 1970.
Fig. 6. Comparison of RFR predicted values with actual values.
[7] R. Tibshirani, "Regression shrinkage and selection via the
lasso: a retrospective," Journal of the Royal Statistical Society:
Series B (Statistical Methodology), vol. 73, No. 3, pp. 273-
282, 2011.
[8] A. Liaw and M. Wiener, "Classification and regression by
randomForest," R news, vol. 2, No. 3, pp. 18-22, 2002.
[9] H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and V.
Vapnik, "Support vector regression machines," in Advances in
neural information processing systems, 1997, pp. 155-161.
[10] S. S. Keerthi and C.-J. Lin, "Asymptotic behaviors of support
vector machines with Gaussian kernel," Neural computation,
vol. 15, No. 7, pp. 1667-1689, 2003.
[11] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
MIT press, 2016.
Fig. 7. Percentage error of RFR model. [12] D. P. Kingma and J. Ba, "Adam: A method for stochastic
optimization," arXiv preprint arXiv:1412.6980, 2014.
According to the result, it is clear that the Random [13] H. MehdipourPicha, R. Bo, H. Chen, M. M. Rana, J. Huang,
Forest Regression is the best model for our dataset in and F. Hu, "Transformer fault diagnosis using deep neural
power transformer condition assessment, with the network," IEEE Innovative Smart Grid Technologies-Asia
(ISGT Asia), 2019, pp. 4241-4245.
lowest RMSE, MAE and the highest R2. The Support
Vector Regressor shows the second best model. The
results can be observed in Fig. 6. In Fig. 7, the predicted
68

Notes For 3rd Year Electrical Engineer

Uploaded by

Copyright:

Available Formats

Notes For 3rd Year Electrical Engineer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes For 3rd Year Electrical Engineer

Uploaded by

Copyright:

Available Formats

2020 International Conference on Power, Energy and Innovations (ICPEI 2020)

October 14-16, 2020, Chiangmai, THAILAND

Intelligent Machine Learning Techniques for

Engineering, The Sirindhorn Faculty of Engineering Faculty of Technical Education

978-1-7281-7240-8/20/$31.00 ©2020 IEEE

N between 0 to 1. The higher coefficient provides the

where Si is score, Smax is maximum score, Wi is weight

Fig. 1. Random Forest model.

F. Support Vector Regression Model [9]

Fig. 4. ReLU function model.

Fig. 2. Support Vector Machine model. IV. RESULT AND DICCUSSION

%HI of RFR are slightly different from the actual

You might also like