CN109308306B

CN109308306B - User power consumption abnormal behavior detection method based on isolated forest

Info

Publication number: CN109308306B
Application number: CN201811151326.9A
Authority: CN
Inventors: 张程; 曹宇佳; 田野; 杨璨宇; 古平; 陈自郁; 陈柯芯
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2021-07-06
Anticipated expiration: 2038-09-29
Also published as: CN109308306A

Abstract

The invention provides a user electricity consumption abnormal behavior detection method based on an isolated forest, which comprises the following steps: s1, acquiring power utilization time sequence data in a data acquisition mode; s2, cleaning the data to remove incomplete data, error data and repeated data; s3, feature extraction based on statistics; s4, preprocessing data; s5, pairing matrix Y_M×KCarrying out normalization processing to obtain a new matrix Y_M×K'; s6, judging whether the power consumption is abnormal or normal by adopting an isolated forest model: s61, selecting the new matrix Y_M×KExtracting psi statistical characteristics from each user, and setting the number t, y of iTree_ijIs a new matrix Y_M×KRow i and column j; s62, calculating y_ijIs given as an anomaly score s (y)_ijψ); s63, determination S (y)_ijψ) is less than 1- Δ e, Δ e is a constant in the range of 0.22 to 0.07; if yes, the power utilization is abnormal; if not, the electricity is normal. The user electricity consumption abnormal behavior detection method based on the isolated forest solves the problem that in the prior art, the analysis and calculation running time is long due to large follow-up operation caused by the fact that data are not processed.

Description

User power consumption abnormal behavior detection method based on isolated forest

Technical Field

The invention relates to the field of power utilization monitoring, in particular to a user power utilization abnormal behavior detection method based on an isolated forest.

Background

The earlier electricity utilization abnormity monitoring method is to determine each electricity utilization abnormity index, determine the threshold value of each abnormity index, assign different weight values to each abnormity index, and calculate the electricity stealing suspicion coefficient of each user after accumulation. General electricity consumption abnormality indicators are briefly classified into line loss abnormality and instantaneous quantity abnormality. And designing a power stealing identification model according to the abnormalities, and identifying the power stealing users by calculating suspicion coefficients.

However, for the detection of such equipment failure and abnormal index of power consumption of users, an on-site detection method is often adopted in the early stage, that is, technicians go to the power consumption site to perform troubleshooting. The processing mode consumes manpower and material resources, has low efficiency and poor effect, can only monitor daily electricity quantity even if centralized meter reading is realized in partial areas, and cannot acquire instantaneous quantity data such as voltage, current, power and the like of the metering device. Meanwhile, the mode has great human factors, and is not beneficial to the management of the power industry.

The Chinese patent discloses a power consumption abnormal behavior identification method based on a fuzzy neural network with the application number of CN201810104000.4, and original data of part of users are extracted from a power consumption database to be used as sample data; carrying out data preprocessing; designing an electricity abnormal behavior evaluation index system on the basis of analyzing the historical electricity abnormal behavior case; constructing an expert sample by utilizing the preprocessed data; constructing a modeling fuzzy neural network model by taking the abnormal electricity consumption behavior mark as an input item and taking the abnormal electricity consumption suspicion coefficient as an output item; inputting test data into the constructed fuzzy neural network model, and carrying out abnormal electricity utilization behavior diagnosis; and evaluating the abnormal power utilization diagnosis result, setting target evaluation and optimizing the model. The invention realizes the automatic identification and diagnosis of abnormal power utilization behaviors, realizes the automatic training, learning and modeling of the system by using the fuzzy neural network method, achieves the quick and accurate positioning of suspected users, and provides convenience for acquiring various illegal behaviors of abnormal power utilization. However, since the subsequent operation is larger and the running time is long due to no data processing, the crash phenomenon is very easy to happen.

Disclosure of Invention

The invention provides a user power utilization abnormal behavior detection method based on an isolated forest, and solves the problem that in the prior art, the analysis and calculation running time is long due to large follow-up operation caused by the fact that data are not processed.

In order to achieve the purpose, the invention adopts the following technical scheme:

a user electricity consumption abnormal behavior detection method based on an isolated forest comprises the following steps:

s1, acquiring power utilization time sequence data in a data acquisition mode;

s2, cleaning the data to remove incomplete data, error data and repeated data;

s3, feature extraction based on statistics:

s31, data definition: s311, let X be { X ═ in the dataset_nN is 1 to N, N daily electricity users are contained in the data set, and each user is divided into electricity data of D days, M months and Q quarters; s312, the daily electricity consumption sequence of each user: x is the number of_n＝{x_ndD is 1 to D; s313, the monthly electricity consumption sequence of each user: y is_n＝{y_nmTaking 1 to M as M,

s114, quarterly electricity consumption sequence of each user: z is a radical of_n＝{z_nqQ is 1 to Q,

s32, dividing the electricity consumption behavior characteristics of the users in units of year, quarter and month in time, and calculating the mean value, standard deviation and discrete coefficient sequence of each user in unit time, namely calculating: the system comprises a standard deviation D1 of annual power consumption of each user, a discrete coefficient D2 of annual power consumption of each user, a standard deviation D3-D6 of quarterly power consumption, discrete coefficients D7-D10 of quarterly power consumption, standard deviations D11-D21 of monthly power consumption, discrete coefficients D22-D32 of monthly power consumption, a descending trend D33-D41 of an average power consumption ascending trend of each month, maximum values D42-D43 of differences and ratios of the average values of adjacent two months, minimum values D44-D45 of differences and ratios of the average values of adjacent two months, maximum values D46-D47 of differences and ratios of the average values of adjacent quarterly power consumption, and minimum values D48-D49 of the differences and ratios of the average values of adjacent quarterly power consumption, wherein D1-D49 are statistical characteristics;

s4, preprocessing data: assuming that the original data is used for M sample values which are processed based on statistical characteristics and then form N-dimensional vectors, wherein M represents the number of users, N represents the number of statistical characteristics extracted by each user, and the statistical characteristics are made to be an M multiplied by N matrix X, and X in the matrix X is a matrix X of M multiplied by N_mnThe specific value of the Nth statistical characteristic of the Mth user is represented; reducing the matrix X to a matrix Y of MxK by a pre-processing model_M×K，K＜N；

S5, judging whether the power consumption is abnormal or normal by adopting an isolated forest model:

s51, selecting the new matrix Y_M×KExtracting psi statistical characteristics from each user, and setting the number t, y of iTree_ijIs a new matrix Y_M×KRow i and column j;

s52, the detection process is to make the statistical characteristic value y of each user_ijTraverse each iTree tree and then compute y in the traversal process_ijPath length h (y) through each iTree tree_ij) Finally, y is calculated according to all path lengths_ijIs (a) is (b) is (d)_ijψ), the calculation formula is:

c (psi) is used for calculating the average path length of the binary search tree, and the function is to normalize the result; the calculation of H (ψ) is:

gamma is the Euler constant; e (h (yij)) is the average path length of yij for all iTree trees in soliton;

s53, determination S (y)_ijψ) is less than 1- Δ e, Δ e is a constant in the range of 0.22 to 0.07; if yes, the power utilization is abnormal; if not, the electricity is normal.

Compared with the prior art, the invention has the following beneficial effects:

effective data are obtained by realizing the extraction of statistical characteristics; by realizing dimension reduction processing, the operation data is reduced, the operation speed is improved, the crash phenomenon is avoided, meanwhile, the operation data is guaranteed to have representativeness through condition selection, the phenomenon of missing judgment caused by selecting some statistical characteristics for calculation is shown, and the precision of the judgment result is guaranteed.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a diagram of an algorithm implementation process of an isolated forest model;

FIG. 2 is a diagram of an autoencoder network architecture;

FIG. 3 is a ReLU activation function image map of an auto-encoder;

FIG. 4 is a diagram of an implementation of a training optimization function algorithm of an autoencoder;

FIG. 5 is a diagram of a deep level auto-encoder network architecture;

FIG. 6 is a network structure of an auto encoder built using a keras tool;

fig. 7 is a network structure of a deep level automatic encoder built using a keras tool.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the functions of the invention clearer and easier to understand, the invention is further explained by combining the drawings and the detailed implementation mode as follows:

example 1:

s1, acquiring power utilization time sequence data in a data acquisition mode;

s2, cleaning the data to remove incomplete data, error data and repeated data;

s3, feature extraction based on statistics:

s4, preprocessing data: assuming that the original data is used for M sample values which are processed based on statistical characteristics and then form N-dimensional vectors, wherein M represents the number of users, N represents the number of statistical characteristics extracted by each user, and the statistical characteristics are made to be an M multiplied by N matrix X, and X in the matrix X is a matrix X of M multiplied by N_mnTo express the MthThe specific value of the nth statistical characteristic; reducing the matrix X to a matrix Y of MxK by a pre-processing model_M×K，K＜N；

s52, the detection process is to make the statistical characteristic value y of each user_ijTraverse each iTree tree and then compute y in the traversal process_ijPath length h (y) through each iTree tree_ij) (the walking mode is the same as the isolated forest model, the counting is 1 when no step is taken), and finally, y is calculated according to all path lengths_ijIs given as an anomaly score s (y)_ijψ), the calculation formula is:

In order to obtain the isolated forest model, as shown in fig. 1, the obtaining step of the isolated forest model comprises:

s711, assuming that the original data set is represented by F, randomly selecting F' samples from the data setThis point is placed as a child sample into the root node of the tree,

s712, randomly selecting a dimension q, and randomly generating a division point p in the current node data, wherein the division point p is generated between the maximum value and the minimum value of the specified dimension q in the current node data;

s713, generating a hyperplane by the division point p, and then dividing the data space of the current node into 2 subspaces: putting data with q < p in a specified dimension into a left sub-tree Fl of a current node, and putting data with q being more than or equal to p into a right sub-tree Fr of the current node;

s714, recursion steps S712 and S713 in the child nodes are carried out, new sub-tree nodes are continuously constructed until only one data or the sub-tree nodes in the sub-tree nodes reach the limited height, and the segmentation is not continued, so that t iTree trees are obtained.

In this embodiment, the preprocessing model is PCA dimension reduction.

In order to obtain more effective statistical features, the following steps are also performed after step S12:

s13, dividing the power utilization trend into three trend types of a variation trend, a fluctuation trend and a lifting trend;

s14, calculating a variation trend, a fluctuation trend and a lifting trend:

s141, fluctuation trend: in statistics, the standard deviation is used to evaluate the possible variation or fluctuation degree of the sequence, and the larger the standard deviation is, the larger the range of the numerical fluctuation is; therefore, the standard deviation std of the electricity consumption is calculated to represent the fluctuation trend characteristics of the electricity consumption data; meanwhile, calculating a power consumption discrete coefficient cv to measure the discrete degree of the power consumption of the user, and making the average value of the power consumption in a certain time period be mu, then:

standard deviation of electricity consumption:

power consumption dispersion coefficient:

cv＝std/μ (2.2)

s142, variation trend: the variation trend characteristic refers to a front-back difference measurement of the power consumption of the user, that is, the average power consumption of a certain time period and a previous adjacent time period is compared, and the difference value and the ratio value reflect the speed of the power consumption variation, and the calculation mode is defined as follows:

difference of electricity utilization mean values of adjacent k months or k quarters:

ratio of electricity average values of adjacent k months or k quarters:

s143, ascending and descending trend: the ascending and descending trend characteristic means that the possibility of ascending or descending is obtained by predicting the next electricity consumption according to the electricity consumption of the user for several consecutive days and comparing the predicted next electricity consumption with the next actual electricity consumption; here, a simple moving average method is used to determine the feature vector of the ascending and descending trend; the simple moving average method sequentially calculates a group of average values of fixed terms according to the item-by-item transition of the time sequence, and the group of average values are used as next predicted values; let k be the number of the moving terms, and the actual value at time t be xnt, then the method for calculating the trend characteristic:

predicted value at time t:

F_t＝(x_n(t-1)+x_n(t-2)+…x_n(t-k))/k (2.5)

rising and falling trend at time t:

tr＝x_nt-F_t (2.6)

if tr is less than 0, indicating that the power utilization trend is reduced; if tr is greater than 0, the electricity utilization trend is increased;

wherein, the standard deviation std of the electric quantity, the ionization dispersion coefficient cv, and the difference avg of the electricity utilization mean value of adjacent k months or k quarters_aRatio avg of average electricity consumption values of adjacent k months or k quarters_bT time goes up and downThe potentials tr are all statistical characteristic values.

Preferably, the PCA dimension reduction step in step S2 is as follows:

s21, subtracting the mean value of each column of X, i.e. zero-averaging the features of each row of data X, to obtain X':

s22, calculating X' covariance matrix C, vector X_iAnd x_j(3.1) in the formula,

s23, obtaining N eigenvalues lambda of the covariance matrix C and an eigenvector V corresponding to each eigenvalue lambda:

CV＝λV (3.2)

s24, arranging all the characteristic values lambda into a queue from large to small { lambda₁，…，λ_i，…，λ_NAnd (4) arranging the eigenvectors V into a matrix W of N x N according to the eigenvalues from large to small, wherein the element of the ith column in the matrix W is the ith eigenvalue lambda in the queue_iCorresponding to the elements of the eigenvector V, and taking the eigenvectors corresponding to the first K eigenvalues from the matrix W to obtain an NxK matrix A_N×K；

S25, calculating K according to the formula 3.3, and taking the first K value meeting the formula 3.3:

s26, calculation formula 3.4, wherein Y_M×KNamely new characteristic data after dimension reduction to k dimension;

Y_M×K＝X_M×NA_N×K (3.4)

1. introduction to the examples: the experimental data is derived from a daily electricity consumption data table collected by a national power grid for 2015 years of nearly 10000 users all the year, the daily electricity consumption table of the users records the total electricity consumption indicating values of kilowatt-hour, the current day and the previous day of all the users, and each user has a group of time sequence data with the dimension of 334. The user list determines user identification information and provides an identification of whether the corresponding numbered user is an abnormal power utilization user.

2. Data cleaning: the original data set of the user power consumption is cleaned to obtain 334 effective data dimensions, wherein the effective data dimensions comprise 1394 users with abnormal power consumption behaviors and 8562 users with unknown power consumption behaviors, and the proportion of the abnormal users is 14.00%.

3. Data preprocessing:

1) data pre-processing based on an auto-encoder: and performing data preprocessing on the cleaned data set based on an automatic encoder and a depth self-encoder. Firstly, normalizing the data, expressing each feature dimension data between [0,1], and then establishing network layer structures of two kinds of self-encoders by utilizing a neural network tool keras based on TensorFlow according to a designed self-encoder network model, as shown in FIG. 4. And setting an activation function ReLU of the middle layer, training an optimization function adadelta, training a loss function binary _ cross, and training times of 100 times.

The data is preprocessed through the established automatic encoder model and the established depth self-encoder model, after 100 times of training, the model tends to be stable, and the loss values respectively reach 0.0313 and 0.0311.

After the raw data is preprocessed, the dimensionality of the data is compressed to 32 dimensions. In order to intuitively test the effectiveness and the performance of the preprocessing method based on the automatic encoder model, a new preprocessed data set is mapped to a two-dimensional visualization plane as shown in fig. 6 for observation.

Wherein the white points represent users with no electricity abnormal suspicion, and the red points represent users with electricity abnormal behavior. On one hand, it can be seen that most white data points in the graph gather near the (0,0) region and have small diffusion outward, while most red data points have obvious outward diffusion and have a tendency of deviating from the region in the data set, showing the characteristic of outliers. On the other hand, compared to the auto-encoder model, the abnormal data points preprocessed based on the depth auto-encoder model show a more dispersed distribution, and the similarity metric function defined by the similarity function (equation 7) is used to analyze the two types of data points, where α is 0.1, and the calculation result is shown in table 1.

Where dist is a distance function, when two data samples are similar, dist approaches 0, Lp is 1; otherwise Lp approaches 0.

Table 1 comparison of similarity measures for autoencoder results (α ═ 0.1)

In the experiment, dist calculation adopts a Euclidean distance method to calculate the average distance between the same type of data. As can be seen from the table, the Lp values of the normal data points are far greater than those of the abnormal data points, so that the similarity degree of normal users is high, which indicates that the distribution is more concentrated, and the distance between users of the abnormal electricity consumption behavior is far, which indicates that the data dispersion is large. Meanwhile, compared with the preprocessing models of the automatic encoder and the depth self-encoder, the normal user data Lp trained by the depth self-encoder model are larger and more aggregated, and the abnormal user data Lp is smaller and more dispersed. Therefore, compared with the traditional automatic encoder, the depth self-encoder-based preprocessing method in the part of experiments has better effect performance when applied to power consumption abnormal data detection.

Data preprocessing based on a principal component analysis method: and performing linear PCA-based data preprocessing on the cleaned data set. The obtained principal components are arranged from big to small, and a new feature dimension is calculated by selecting the feature space corresponding to the first 32 principal components so as to facilitate comparative analysis.

The method comprises the steps of respectively establishing a linear PCA (principal component analysis) -based data dimension reduction method to preprocess original data, selecting eigenvectors corresponding to the first 32 principal components after preprocessing, and mapping the original data to a 32-dimensional new eigenspace. The purpose of selecting the first 32 principal components is to unify the results of all preprocessing methods into the same dimension.

Wherein the white points represent users with no electricity abnormal suspicion, and the red points represent users with electricity abnormal behavior. First, it can be seen that the data after PCA-based preprocessing all have a tendency to spread outward from a certain aggregation point, and relatively, the white data points are relatively aggregated, and the red data points are relatively more dispersed. Then, from the graph after the PCA-based preprocessing, the white data points and the red data points still have a large part of coincidence, and the preprocessing method is not obvious in the effect of dividing the two types of data.

The similarity metric function defined by equation (7) is used to analyze the two types of data points, and α is 0.03, and the results are calculated as shown in the following table.

Table 2 comparison of similarity measures for principal component analysis results (α ═ 0.03)

In the experiment, dist calculation still adopts a Euclidean distance method to calculate the average distance between the same type of data. As can be seen from the table, the PCA-based approach works well.

Establishing an isolated forest model: and performing two-dimensional visual display on the new data set obtained by the four data preprocessing modes, and comparing the effects of different preprocessing methods.

Next, for the four data preprocessing methods adopted by the isolated forest model, the finally obtained corresponding confusion matrix, Precision-reduce index and P-R curve graphs thereof are respectively shown in table 3 and table 4.

TABLE 3 confusion matrix results for isolated forest models under different preprocessing methods

TABLE 4 Precision-Recall index and Overall Precision for abnormal data

Firstly, as can be seen from the confusion matrix and Precision-Recall index results of the above experiments, the anomaly detection model based on the isolated forest achieves higher overall accuracy under different preprocessing models. Meanwhile, different data preprocessing method choices have different influences on the detection effect of the model. By observing the user data detection condition with abnormal electricity utilization behaviors, the model abnormity detection Precision value and the Recall value based on the depth self-encoder are found to be higher than the indexes of 0.07 and 0.14 based on the automatic encoder method, and the effect is better than that of the automatic encoder method. The preprocessing method based on the linear PCA is better than the automatic encoder method in the performance improvement of the model abnormity detection, the Precision value and the Recall value are higher by 0.05 and 0.04, but the performance improvement of the model abnormity detection by the depth self-encoder is not as great.

Example 2:

this example differs from example 1 only in that: in this embodiment, only the preprocessing model is changed based on embodiment 1, and an automatic encoder is adopted in this embodiment.

First, a conventional single-hidden-layer auto-encoder model is built, which is a fully-connected neural network, as shown in fig. 2.

In fig. 2, the first half of the model serves as an automatic encoding part, and the second half serves as an automatic decoding part. The model takes 334 characteristic dimensions obtained by cleaning the raw data as input and output at the same time, namely the number of neurons in an input layer is the same as that of neurons in an output layer. Here, the number of nodes in the intermediate layer is set to 32, which is smaller than the number of nodes in the input layer and the output layer, and the data compression function is performed.

Next, relevant parameters are configured for the auto-encoder model. The network middle layer activation function uses a ReLU activation function, the graph of the ReLU activation function is shown in FIG. 3, and the basic mathematical form is as follows:

f(x)＝max(0，w^Tx+b) (5.1)

compared with the traditional sigmoid activation function, for the nonlinear function, firstly, because the gradient of the non-negative interval is constant, the ReLU is applied to the deep network without the problems of gradient disappearance and gradient explosion, so that the convergence speed of the model is maintained in a stable state. Then, the ReLU only needs one threshold value to obtain the activation value, and a large pile of complex operation is not needed to be calculated, so that the calculation process is simplified.

An adapelta gradient descent function is adopted as a training optimization function of the model, the adapelta gradient descent function is a learning rate self-adaptive optimization method, and faster convergence rate can be achieved when a deep complex network is trained. The specific calculation process of the algorithm is shown in FIG. 4.

The loss function selected for the model is binary _ cross, i.e., a logarithmic loss function, which is mainly used for maximum likelihood estimation and its calculation formula is shown in 4.2. And finally setting the number of training iterations as 100.

L(Y，P(Y|X))＝-logP(Y|X) (5.2)

The software algorithm implementation is shown in fig. 6.

Example 3:

this example differs from example 2 in that: this embodiment adds an implicit layer to the auto-encoder based on embodiment 2 only.

The previous autoencoder data processing model only establishes a single hidden layer, this time establishes a deeper autoencoding model for the data to be processed, and the network structure is shown in fig. 5:

the basic configuration parameters are the same as the configuration of the previous model, the training optimization function of the configuration model is adadelta, the loss function is binary _ cross, the training times are 100 times, and the ReLU activation function is used by the intermediate coding layer and decoding layer activation functions. The software algorithm is shown in fig. 7.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered by the claims of the present invention.

Claims

1. A user electricity consumption abnormal behavior detection method based on an isolated forest is characterized by comprising the following steps:

s1, acquiring power utilization time sequence data in a data acquisition mode;

s2, cleaning the data to remove incomplete data, error data and repeated data;

s3, feature extraction based on statistics:

s31, data definition: s311, let X be { X ═ in the dataset_nN is 1 to N, N daily electricity users are contained in the data set, and each user is divided into electricity data of D days, M months and Q quarters; s312, daily electric quantity sequence of each user: x is the number of_n＝{x_ndD is 1 to D; s313, the monthly electricity consumption sequence of each user: y is_n＝{y_nmTaking 1 to M as M,

s32, dividing the electricity consumption behavior characteristics of the users in units of year, quarter and month in time, and calculating the mean value, standard deviation and discrete coefficient sequence of each user in unit time, namely calculating: the system comprises a standard deviation D1 of annual power consumption of each user, a discrete coefficient D2 of annual power consumption of each user, a standard deviation D3-D6 of quarterly power consumption, a discrete coefficient D7-D10 of quarterly power consumption, a standard deviation D11-D21 of monthly power consumption, a discrete coefficient D22-D32 of monthly power consumption, an ascending and descending trend D33-D41 of monthly average power consumption, maximum values D42-D43 of differences and ratios of adjacent monthly power consumption average values, minimum values D44-D45 of differences and ratios of adjacent monthly power consumption average values, maximum values D46-D47 of differences and ratios of adjacent quarterly power consumption average values, and minimum values D48-D49 of differences and ratios of adjacent quarterly power consumption average values, wherein D1-D49 are statistical characteristics;

s52, the detection process is to make the statistical characteristic value y of each user_ijTraversing each iTree tree, and then calculating y in the traversal process_ijPath length h (y) through each iTree tree_ij) Finally, y is calculated according to all path lengths_ijIs given as an anomaly score s (y)_ijψ), the calculation formula is:

gamma is the Euler constant; e (h (yij)) is the average path length of yij for all iTree trees in the isolated forest;

2. The isolated forest-based user electricity consumption abnormal behavior detection method as claimed in claim 1, wherein the isolated forest model obtaining step comprises:

s711, assuming that the original data set is represented by F, randomly selecting F' sample points from the data set as root nodes of the subsample putting into the tree,

s713, generating a hyperplane by the division point p, and then dividing the data space of the current node into 2 subspaces: putting data with q < p in a specified dimension into a left sub-tree Fl of the current node, and putting data with q being more than or equal to p into a right sub-tree Fr of the current node;

s714, recursion steps S712 and S713 in the child nodes are carried out, new sub-tree nodes are continuously constructed until only one data or sub-tree node in the sub-tree nodes reaches the limited height, and the segmentation is not continued, so that t iTree trees are obtained.

3. The method for detecting abnormal behavior of users in solitary forests as claimed in claim 1, wherein in step S4, the preprocessing model is an auto-encoder, a deep auto-encoder or PCA dimension reduction.

4. The method for detecting abnormal user electricity consumption behavior based on the isolated forest as claimed in claim 3, wherein the following steps are further performed after step S12:

s14, calculating a variation trend, a fluctuation trend and a lifting trend:

standard deviation of electricity consumption:

power consumption dispersion coefficient:

cv＝std/μ (2.2)

s142, variation trend: the fluctuation trend characteristic refers to the difference measurement before and after the power consumption of the user, that is, the average power consumption of a certain time period and the previous adjacent time period is compared, and the difference value and the ratio value reflect the speed of the power consumption fluctuation, and the calculation mode is defined as follows:

ratio of electricity average values of adjacent k months or k quarters:

predicted value at time t:

F_t＝(x_n(t-1)+x_n(t-2)+…+x_n(t-k))/k (2.5)

rising and falling trend at time t:

tr＝x_nt-F_t (2.6)

wherein, the standard deviation std of the electric quantity, the ionization dispersion coefficient cv, and the difference avg of the electricity utilization mean value of adjacent k months or k quarters_aRatio avg of average electricity consumption values of adjacent k months or k quarters_bAnd the ascending and descending trend tr at the time t is a statistical characteristic value.

5. The method for detecting abnormal behaviors of users on power utilization based on the isolated forest as claimed in claim 4, wherein in step S2, the PCA dimension reduction step is as follows:

s22, calculating X' covariance matrix C, vector X_iAnd x_j(3.1) in the formula,

CV＝λV (3.2)

Y_M×K＝X_M×NA_N×K (3.4)。