Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a power saving potential quantitative prediction method.
The purpose of the invention can be realized by the following technical scheme:
a method for power saving potential quantization prediction, the method comprising the steps of:
s1, extracting power utilization data of industrial users, acquiring power utilization characteristic indexes from the power utilization data of the users, and dividing power utilization groups through cluster analysis;
and S2, establishing an electricity-saving potential prediction model, selecting a marker post in the same electricity consumption group, and inputting the electricity consumption of the marker post into the electricity-saving potential prediction model to obtain a future electricity-saving potential prediction value.
Preferably, when the electricity consumption data of the industrial users are extracted in step S1, the power data are collected at intervals of 15min, and the power data are collected at 96 points per day.
Preferably, the electricity characteristic index at step S1 includes a daily average electricity usage, a daily average peak electricity usage, a daily average valley electricity usage, a peak-valley electricity ratio, and a daily average load factor.
Preferably, the step of dividing the electricity consumption group through the cluster analysis in the step S1 specifically includes the steps of:
s101, adopting a classification accuracy index to self-adaptively select an optimal clustering number;
s102, selecting k centers mukAn initial value of (d);
s103, classifying each data point into a cluster represented by a central point closest to the data point;
s104, acquiring a new central point mu of each clusterkAnd repeating S103 until the maximum step number or the difference between the function values of the clustering criteria before and after is smaller than the set threshold value.
Preferably, the step S2 of establishing the power saving potential prediction model specifically includes the following steps:
s201: taking weather and social factor historical data as input, and taking daily power saving potential of a target user as output to form a sample data set D;
s202, a training set and a testing set are respectively formed by using a cross verification method, and an electricity-saving potential prediction model is established by using an XGboost decision tree algorithm.
Preferably, the establishing of the power saving potential prediction model by using the XGBoost decision tree algorithm specifically comprises: setting XGboost model parameters including basic parameters and training parameters, training the model, judging the prediction accuracy of the model by using the parameters of the test set after the training is finished, resetting the parameters if the prediction accuracy does not meet the set value, and continuing the training until the accuracy requirement is met.
Preferably, the step S202 of using the cross-validation method to respectively form the training set and the test set specifically includes: and (3) dividing the data set D into D1-D10 in equal proportion, taking D1 as a test set, taking D2-D10 as a training set, calculating an evaluation index, taking D2 as the test set, taking D1 and D3-D10 as the training set, and performing prediction model evaluation index calculation by adopting a cross validation method in each iteration process.
Compared with the prior art, the invention has the following advantages:
1. through the prediction of the power saving potential, the user can be reminded in advance when the power saving potential is higher in the future, the power utilization habit of the user is concerned, and the user with high power consumption is urged to carry out power saving reconstruction in time;
2. the power saving potential is quantized, and the power utilization behavior is guided more intuitively.
3. The training model refers to meteorological and social factor historical data, the model is evaluated by adopting a cross validation method, the prediction result is accurate, and better prediction accuracy can be realized only by using the prediction data of several factors with the highest sensitivity.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
The invention discloses a power-saving potential quantitative prediction method, which applies big data correlation technology and provides a complete user power-saving analysis method, and mainly comprises three modules: firstly, carrying out group division on industries, calculating power utilization characteristic indexes of all users in a certain industry, and dividing the power utilization characteristic indexes into typical power utilization groups with similar scales and comparability through clustering analysis; secondly, performing further comparative analysis in the same group, calculating scores of all indexes of the users in the group, quantitatively evaluating the power saving potential of each user by taking the average level of the group as a reference, predicting the power saving potential, modeling each user through historical data, and predicting the power saving potential of five days in the future; and finally, analyzing the influence factors of the power saving potential of the user and the power saving strategy from the typical power consumption behavior, the power price and the environment perception of the user so as to guide the power consumption behavior of the user. The specific flow is shown in figure 1:
industry group division
1. Electrical characteristic index extraction
Except for the traditional collected data such as peak-valley electric quantity and daily frozen electric quantity, large industrial and commercial users and partial residential users in east China with the power consumption of more than 100kW begin to collect 15min as interval power data, the number of the interval power data is 96 points every day, and the electricity consumption characteristic description indexes are as follows:
Xk={α1,...,αu;β1,...,βν;f1,...,fl};
k=1,2,...m
wherein alpha represents daily 96-point power time sequence data, beta represents monthly electricity consumption time sequence data, f represents non-time sequence evaluation parameters, including daily average electricity consumption, daily average peak electricity consumption, daily average valley electricity consumption, peak-valley electricity ratio, daily average load rate, and m represents the number of samples.
2. Division of industry power utilization group
On the basis of extracting the user electricity utilization characteristic indexes, clustering users in the same industry by adopting a statistical distance-based K-means clustering algorithm. Because the quality levels of the time sequence data acquired on site are not uniform, and some points are lost and abnormal sometimes, a K-means clustering algorithm based on statistical distance is adopted, the algorithm receives a parameter K, and then n data objects input in advance are divided into K clusters so as to ensure that the obtained clusters meet the conditions that the object similarity in the same cluster is higher and the object similarity in different clusters is smaller, and the method comprises the following specific steps:
1) adopting a classification accuracy Index (DBI) to self-adaptively select an optimal clustering number, and adopting a calculation formula of
In the formula C
i,C
jRepresents the mean distance within class, w
i,w
jRepresenting the cluster center distance.
2) Selecting K center mukThe initial value of (c). This process is usually a heuristic selection method for a specific problem, or in most cases a random selection method. Since it was previously said that K-means cannot guarantee completenessThe selection of the real and initial values of the local optimal solution and whether the global optimal solution can be converged to have a great relationship.
3) Each data point is classified into cluster represented by the center point closest to it, wherein the distance calculation formula does not adopt Euclidean distance, but selects statistical distance, which is defined as dij=(eij 2+sij 2)0.5Wherein d isijRepresents the distance between curve i and curve j (the distance between data point i and the class j center point), eijRepresenting the horizontal distance, s, between the data points i and the class j center pointsijRepresenting the distance in the vertical direction between the data point i and the class j center point.
4) Calculating new central point of each cluster by formula
5) Repeating step 3) until the maximum number of steps is iterated or the difference between the values of J before and after the iteration is less than a threshold value.
Wherein tau is
nk1 when data point n is classified into cluster k, and 0, τ otherwise
nkFor a data point N belonging to a class coefficient of a cluster k, N representing the number of data points, x
nRepresenting the sample value, mu
kRepresenting the center point value.
Power saving potential quantitative evaluation and prediction
1. Power saving potential quantitative evaluation
After an industrial power utilization group is obtained, an electricity-saving benchmark in the power utilization group is selected, the group mean value is used as the electricity-saving benchmark of the group according to the index of daily average power consumption, and after the electricity-saving benchmark is obtained, the electricity-saving potential of a target user can be obtained by subtracting the daily power consumption of the target user and the electricity-saving benchmark.
2. Short term power saving potential prediction
The method comprises the steps of taking weather and social factor historical data as input, taking daily electricity-saving potential of a target user as output, forming a sample data set D, respectively forming a training set and a testing set by using a cross verification method, and establishing an electricity-saving potential prediction model by using an XGboost algorithm. The model adopts 10-fold cross validation, the data set D is divided into D1-D10 in an equal proportion, D1 is used as a test set, D2-D10 is used as a training set, evaluation indexes are calculated, then D2 is used as the test set, D1 and D3 … D10 are used as the training set, and the like. And in each iteration process, evaluating the model by adopting a cross validation method. After the model is established, the future short-term electricity-saving potential value can be predicted by using the high-precision numerical weather forecast prediction result of the prediction day and social information such as whether the prediction day is a working day.
A tree is trained using a training set (sample true values) to build a decision tree model, and the complexity of the model is limited by adding regularization terms to prevent overfitting. The result can then be predicted on the real observations by means of this decision tree.
The method for establishing the prediction model comprises the following steps: setting model parameters of the XGboost, including basic parameters and training parameters, such as a base classifier, the number of threads, the maximum depth of a tree, the learning rate, the iteration times and the like; starting to train the model; evaluating the model after training the model; if the prediction accuracy of the model meets the requirement, storing the trained model, otherwise resetting the parameters and carrying out the training again until the prediction accuracy of the model meets the requirement; the trained model is saved as a file with a certain format, such as xgboost model, and the model can predict the node potential value from the input data.
The XGboost model prediction function is as follows:
wherein h isiIs the weight of the node, giIs a gradient, λ is a constant term;
the objective function is:
in the Obj, the number of the columns is,
for the second-order expression after Taylor expansion of the penalty function defining the complexity of the tree model, G
jIs a gradient of the hierarchy, H
jThe weight of a node is defined as λ, a constant term, and γ T is defined as the number of nodes.
Influence factor of power saving
The method has the advantages that the prediction accuracy of the high-sensitivity factors of the power saving potential is preferentially ensured, the key for improving the prediction accuracy of the power saving potential is realized, and under the condition that prediction data source information such as weather is incomplete, better prediction accuracy can be realized only by providing prediction data of several factors with the highest sensitivity. Through the prediction of the power saving potential, the user can be reminded in advance when the power saving potential is higher in the future, the power utilization habit of the user is concerned, and the user with high power consumption is urged to perform power saving reconstruction in time.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.