CN106251027B

CN106251027B - Electric load probability density Forecasting Methodology based on fuzzy support vector quantile estimate

Info

Publication number: CN106251027B
Application number: CN201610682457.4A
Authority: CN
Inventors: 何耀耀; 刘瑞; 李海燕; 王刚; 郑丫丫; 秦杨; 严煜东
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2016-08-17
Filing date: 2016-08-17
Publication date: 2018-05-01
Anticipated expiration: 2036-08-17
Also published as: CN106251027A

Abstract

The invention relates to a power load probability density prediction method based on fuzzy support vector quantile regression. First, the daily maximum load data and average temperature data before the prediction date are collected, and a training set and a test set are constructed using historical data. Then, use the training set to obtain the Lagrangian multipliers and support vector subscripts of the fuzzy support vector quantile regression model, and establish the fuzzy support vector quantile regression prediction model according to the obtained model parameter values, and transfer the test set Substitute into the model to get the predicted value. Finally, using the predicted values obtained under different quantile points, and using kernel density estimation to realize the probability density prediction of daily maximum load. The invention can effectively reduce forecasting errors, improve power load forecasting accuracy, achieve good forecasting effects, and provide a relatively reliable basis for power system dispatching departments to adjust power consumption plans and optimize generator output.

Description

Electric Load Probability Density Forecasting Method Based on Fuzzy Support Vector Quantile Regression

技术领域technical field

本发明属于统计方法与智能计算相结合的电力负荷预测领域，主要涉及一种基于模糊支持向量分位数回归的电力负荷概率密度预测方法。The invention belongs to the field of power load prediction combining statistical methods and intelligent calculations, and mainly relates to a power load probability density prediction method based on fuzzy support vector quantile regression.

背景技术Background technique

电力系统负荷预测是根据电力负荷、经济、社会、气象等的历史数据，探索电力负荷历史数据变化规律对未来负荷的影响，寻求电力负荷与各种相关因素之间的内在联系，从而对未来的电力负荷进行科学的预测。准确的电力系统负荷预测对于电力系统调度、用电、规划、制定购电计划、安排运行方式等具有重要的意义。这也要求电力系统研究人员提出更为有效的方法来提高预测精度。Power system load forecasting is based on the historical data of power load, economy, society, weather, etc., to explore the influence of the change law of the historical data of power load on the future load, and to seek the internal relationship between the power load and various related factors, so as to predict the future. Scientific prediction of power load. Accurate power system load forecasting is of great significance for power system dispatching, power consumption, planning, making power purchase plans, and arranging operation modes. This also requires power system researchers to propose more effective methods to improve prediction accuracy.

近年来，随着智能电网的快速发展以及不确定因素的增加，而且电力负荷的变化受到系统运行特性、社会因素与自然条件等较多因素的制约，因此负荷预测需要大量的数据，而这些数据很难保证准确可靠；即使所得到的数据是准确的，但也存在不确定性。例如，温度因素可能因素引起负荷的变化。因此，电力系统要求更加智能化的方法来提高电力负荷预测的精度。此外，为了提高电网运行的安全性和经济性，以改善供电质量，电力系统运行调度对负荷预测精度的要求也越来越高。In recent years, with the rapid development of smart grid and the increase of uncertain factors, and the change of power load is restricted by many factors such as system operating characteristics, social factors and natural conditions, load forecasting requires a large amount of data, and these data It is difficult to guarantee accuracy and reliability; even when the resulting data is accurate, there are uncertainties. For example, temperature factors may factor in load changes. Therefore, the power system requires a more intelligent method to improve the accuracy of power load forecasting. In addition, in order to improve the safety and economy of power grid operation and improve the quality of power supply, the requirements for load forecasting accuracy of power system operation scheduling are getting higher and higher.

近年来，传统的电力系统负荷预测领域只是考虑了历史负荷以及影响负荷预测的因素，而未能对不确定的影响因素进行处理。主要存在的不足之处有：In recent years, the traditional field of power system load forecasting only considers the historical load and factors affecting load forecasting, but fails to deal with uncertain influencing factors. The main shortcomings are:

(1)、影响电力负荷预测的因素很多，包括日期、气象因素、温度因素等。这些因素如何影响负荷精度是不确定的或模糊的。而传统的电力负荷预测方法并没有对这些不确定因素进行预处理。即预测时采用的历史数据均为确定值，但是这些历史数据形成具有一定的偶然因素，忽视了历史数据的不确定性；(1) There are many factors affecting power load forecasting, including date, meteorological factors, temperature factors, etc. How these factors affect load accuracy is uncertain or ambiguous. However, traditional power load forecasting methods do not preprocess these uncertain factors. That is to say, the historical data used in the prediction are all definite values, but the formation of these historical data has certain accidental factors, ignoring the uncertainty of historical data;

(2)、传统的电力负荷预测方法只给出了点预测结果或区间预测结果，并不能准确地刻画电力负荷的波动性，也未给出所得到的预测结果的概率密度分布。(2) The traditional power load forecasting method only gives point forecasting results or interval forecasting results, which cannot accurately describe the volatility of power loads, and does not give the probability density distribution of the forecasting results obtained.

发明内容Contents of the invention

本发明的目的是为克服上述现有技术的不足，提出一种基于模糊支持向量分位数回归的电力负荷概率密度预测方法，以期能对平均温度因素进行模糊化处理，并大大减少影响因素的不确定性，从而有效地降低预测误差，提高电力负荷预测精度，并为电力系统调度部门调整用电计划、优化发电机组出力等提供较为可靠的依据。The purpose of the present invention is to overcome above-mentioned deficiencies in the prior art, propose a kind of electric load probability density prediction method based on fuzzy support vector quantile regression, in order to carry out fuzzy treatment to average temperature factor, and greatly reduce influence factor Uncertainty, so as to effectively reduce the forecast error, improve the accuracy of power load forecast, and provide a more reliable basis for the power system dispatching department to adjust the power plan and optimize the output of the generator set.

本发明一种基于模糊支持向量分位数回归的电力负荷概率密度预测方法的特点是按如下步骤进行：A kind of electric load probability density prediction method based on fuzzy support vector quantile regression of the present invention is characterized in that it is carried out as follows:

步骤1、采集并确定预测日以前的日最大负荷数据L′以及平均温度数据W′＝(w₁′,w₂′,…,w_j′,…,w′_N)；w′_j为第j天的平均温度；1≤j≤N，N为总天数；Step 1. Collect and determine the daily maximum load data L′ and average temperature data W′=(w ₁ ′,w ₂ ′,…,w _j ′,…,w′ _N ); w′ _j is the first The average temperature of j days; 1≤j≤N, N is the total number of days;

步骤2、对所述日最大负荷数据L′进行归一化处理，得到归一化后的日最大负荷数据L＝(l₁,l₂,…,l_j,…,l_N)；l_j为第j天的归一化后日最大负荷；Step 2. Normalize the daily maximum load data L′ to obtain normalized daily maximum load data L=(l ₁ ,l ₂ ,...,l _j ,...,l _N ); l _j is the normalized daily maximum load on day j;

步骤3、对所述平均温度数据W′，采用模糊理论进行模糊化处理，得到模糊后的平均温度数据W＝(w₁,w₂,…,w_j,…,w_N)，w_j为第j天的模糊后的平均温度；Step 3. Fuzzy the average temperature data W′ using fuzzy theory to obtain fuzzy average temperature data W=(w ₁ ,w ₂ ,...,w _j ,...,w _N ), where w _j is Average temperature after fuzzing on day j;

步骤4、选取第i天的前d天的日最大负荷值以及模糊后的平均温度作为第i天训练样本的输入向量，即选取第i天的日最大负荷作为第i天训练样本的目标输出值，即从而得到训练样本集i＝1,2,…,N^train，N^train为训练样本总个数；Step 4. Select the daily maximum load value of the first d days before the i-th day and the average temperature after blurring As the input vector of the training sample on the i-th day, that is Select the daily maximum load on the i-th day As the target output value of the training sample on the i-th day, that is So as to get the training sample set i=1,2,...,N ^train , N ^train is the total number of training samples;

步骤5、选取第k天的前d天的日最大负荷值以及模糊后的平均温度作为第k天测试样本的输入向量，即选取第k日的日最大负荷作为第k天测试样本的目标输出值，即从而得到测试样本集k＝1,2,…,N^test，N^test为测试样本总个数；Step 5. Select the daily maximum load value of the first d days before the kth day and the average temperature after blurring As the input vector of the test sample on the kth day, that is Select the daily maximum load on the kth day As the target output value of the test sample on the kth day, that is To get the test sample set k=1,2,...,N ^test , N ^test is the total number of test samples;

步骤6、将所述第i行输入的d维行向量分别作为训练输入变量的第i个非线性成分x_i和第i个线性成分u_i，将所述训练集中第i行的一维实际输出值作为第i个实际输出值y_i，建立如式(1)所示的模糊支持向量分位数回归模型为：Step 6, input the d-dimensional row vector of the i-th row As the i-th nonlinear component x _i and the i-th linear component u _i of the training input variable respectively, the one-dimensional actual output value of the i-th row in the training set As the i-th actual output value y _i , the fuzzy support vector quantile regression model shown in formula (1) is established as:

式(1)中，T为转置；w_i表示第i天模糊化后的平均温度因子；τ_r表示第r个分位点，且τ_r∈(0,1)，r＝1,2,…,N_τ；N_τ表示分位点的个数；表示第r个分位点τ_r下的参数向量，C为惩罚参数，为第r个分位点τ_r下的系数向量，φ(·)表示非线性映射函数，表示检验函数；并有：其中，表示第r个分位点τ_r下的阈值，并有：其中，α_i和表示第i个最优Lagrange乘子；K为输入空间中的核矩阵，并有：In formula (1), T is the transposition; w _i represents the average temperature factor after fuzzification on the i-th day; τ _r represents the r-th quantile point, and τ _r ∈ (0,1), r=1,2 ,…,N _τ ; N _τ represents the number of quantile points; Represents the parameter vector under the rth quantile point τ _r , C is the penalty parameter, is the coefficient vector under the rth quantile point τ _r , φ(·) represents the nonlinear mapping function, denotes the test function; and has: in, Denotes the threshold at the rth quantile τ _r and has: Among them, α _i and Represents the i-th optimal Lagrange multiplier; K is the kernel matrix in the input space, and has:

K(x_i,x_v)＝φ(x_i)^Tφ(x_v)；v∈I；I为所得到的支持向量的下标集 K(x _i , x _v )=φ(x _i ) ^T φ(x _v ); v∈I; I is the subscript set of the obtained support vector

步骤7、根据所述模糊支持向量分位数回归模型，引入松弛变量构建Lagrange函数并进行求解式(1)，得到如式(2)所示的第r个分位点τ_r下的参数向量阈值和系数向量 Step 7, according to described fuzzy support vector quantile regression model, introduce slack variable and construct Lagrange function and solve formula (1), obtain the parameter vector under the rth quantile point τ _r as shown in formula (2) threshold and coefficient vector

式(2)中，α,α^*为最优Lagrange乘子向量，y＝{y_i|i∈I}；设计矩阵为 In formula (2), α, α ^* is the optimal Lagrange multiplier vector, y={y _i |i∈I}; the design matrix is

步骤8、将所述第r个分位点τ_r下的参数向量阈值和系数向量代入模糊支持向量分位数回归预测模型；并将所述测试集中第k行输入的d维行向量作为测试输入变量的第k个非线性成分x_k和第k个线性成分u_k，从而利用式(3)得到测试集中第r个分位点τ_r下的第k行输出值 Step 8, the parameter vector under the rth quantile point τ _r threshold and coefficient vector Substituting into the fuzzy support vector quantile regression prediction model; and the d-dimensional row vector of the kth row input in the test set The k-th nonlinear component x _k and the k-th linear component u _k are used as test input variables, so that the output value of the k-th row under the r-th quantile point τ _r in the test set can be obtained by using formula (3)

式(3)中，K_k表示核矩阵K的第k个行向量；In formula (3), K _k represents the kth row vector of kernel matrix K;

步骤9.运用核密度估计实现日最大负荷的概率密度预测：Step 9. Use kernel density estimation to achieve probability density prediction of daily maximum loads:

步骤9.1、令第k天的第r个分位点τ_r下的预测结果为：从而获得第k天所有分位点下的预测结果进而得到日最大负荷的测试输出值 Step 9.1, let the prediction result under the r-th quantile point τ _r on the k-th day be: In order to obtain the forecast results under all quantile points on the kth day Then get the test output value of the daily maximum load

步骤9.2、利用式(4)得到第k天第w个分位点τ_w所对应的概率密度函数值 Step 9.2, using formula (4) to obtain the value of the probability density function corresponding to the wth quantile point τ _w on the kth day

式(4)中，z_k,w表示第k天的第w个分位点τ_w下的预测结果；分位点个数w＝1,2,…,N_τ；h是窗宽，K₁(η)是Epanechnikov核函数，并有：其中，|η≤1|； In formula (4), z _{k, w} represent the prediction results under the wth quantile point τ _w on the kth day; the number of quantile points w=1,2,...,N _τ ; h is the window width, K ₁ (η) is the Epanechnikov kernel function, and has: Among them, |η≤1|;

步骤9.2、利用式(5)所示的拇指原则计算Epanechnikov核函数K₁(η)的最优窗宽h^*：Step 9.2, utilize the principle of thumb shown in formula (5) to calculate the optimal window width h ^* of Epanechnikov kernel function K ₁ (η):

h^*＝1.06s_XN_τ ^-1/5 (5)h ^* ＝1.06s _X N _τ ^-1/5 (5)

式(5)中，s_X为日最大负荷的测试输出值的标准差；In formula (5), s _X is the standard deviation of the test output value of the daily maximum load;

步骤9.3、根据Epanechnikov核函数K₁(η)和所述最优窗宽h^*求取日最大负荷的概率密度预测结果。Step 9.3: According to the Epanechnikov kernel function K ₁ (η) and the optimal window width h ^* , obtain the probability density prediction result of the daily maximum load.

本发明所述的电力负荷概率密度预测方法的特点也在于，所述步骤3中的模糊化处理是按如下过程进行：The electric load probability density prediction method of the present invention is also characterized in that the fuzzy processing in the step 3 is carried out as follows:

根据式(6)所示的低温的隶属度函数、式(7)所示的中温的隶属度函数和式(8)所示的高温隶属度函数，将所述第j天的平均温度w′_j划分为低温数据、中温数据或高温数据，得到第j天的平均温度w′_j所属的模糊集；从而得到模糊后的平均温度数据W＝＝(w₁,w₂,…,w_j,…,w_N)；According to the membership degree function of low temperature shown in formula (6), the membership degree function of medium temperature shown in formula (7) and the high temperature membership function shown in formula (8), the average temperature w' of the jth day _j is divided into low-temperature data, medium-temperature data or high-temperature data, and the fuzzy set to which the average temperature w′ _j of the j-th day belongs is obtained; thus, the fuzzy average temperature data W==(w ₁ ,w ₂ ,…,w _j , ..., w _N );

所述低温的隶属度函数为：The membership function of the low temperature is:

所述中温的隶属度函数为：The membership function of the middle temperature is:

所述高温的隶属度函数为：The membership function of the high temperature is:

式(6)、式(7)和式(8)中满足e＜g＜f＜m＜n＜p。In formula (6), formula (7) and formula (8), e<g<f<m<n<p is satisfied.

与已有技术相比，本发明有益效果体现在：Compared with the prior art, the beneficial effects of the present invention are reflected in:

1、本发明将传统的支持向量回归方法和分位数回归方法相结合并引入模糊隶属度函数得到模糊支持向量分位数回归方法，采用支持向量回归方法能够解决非线性问题，而分位数回归方法利用回归变量提供的信息对响应变量的条件分位数进行估计，得到在不同分位点下解释变量对响应变量的影响，进而能够准确地描述解释变量对响应变量变化范围和条件分布形状的影响，这样把两种方法相结合可以获得未来负荷完整的概率分布，便于决策者的科学决策。1, the present invention combines traditional support vector regression method and quantile regression method and introduces fuzzy membership function to obtain fuzzy support vector quantile regression method, adopts support vector regression method to solve nonlinear problems, and quantile The regression method uses the information provided by the regressor variable to estimate the conditional quantile of the response variable, and obtains the influence of the explanatory variable on the response variable at different quantile points, and can accurately describe the variation range and conditional distribution shape of the explanatory variable on the response variable. In this way, the combination of the two methods can obtain a complete probability distribution of the future load, which is convenient for decision makers to make scientific decisions.

2、本发明主要考虑平均温度对日最大负荷预测的影响，由于平均温度因素是模糊的，所以引入模糊理论并采用隶属度函数对平均温度进行模糊化处理来提高预测精度，并给出了不同分位点下的概率密度预测结果，证明方法的有效性和准确性。2. The present invention mainly considers the influence of the average temperature on the daily maximum load forecast. Because the average temperature factor is fuzzy, the fuzzy theory is introduced and the average temperature is fuzzified by the degree of membership function to improve the prediction accuracy, and different The probability density prediction results under the quantile point prove the effectiveness and accuracy of the method.

3、本发明不仅有效地降低了预测误差，提高了电力负荷预测精度，而且能够准确地刻画电力负荷的波动性并给出了概率密度预测曲线图。这就为电力系统调度部门调整用电计划、优化发电机组出力等提供较为可靠的依据。3. The present invention not only effectively reduces the prediction error and improves the prediction accuracy of electric load, but also can accurately describe the fluctuation of electric load and provide the probability density prediction curve. This provides a more reliable basis for the power system dispatching department to adjust the power plan and optimize the output of the generator set.

附图说明Description of drawings

图1为本发明方法整体流程图；Fig. 1 is the overall flowchart of the method of the present invention;

图2为本发明方法详细流程图；Fig. 2 is a detailed flow chart of the inventive method;

图3为本发明方法概率密度图。Fig. 3 is a probability density diagram of the method of the present invention.

具体实施方式Detailed ways

在实施过程中，一种基于模糊支持向量分位数回归的电力负荷概率密度预测方法，主要考虑平均温度对电力负荷预测的影响。流程图如图1所示，并按如下步骤进行：In the implementation process, a power load probability density forecasting method based on fuzzy support vector quantile regression mainly considers the influence of average temperature on power load forecasting. The flowchart is shown in Figure 1, and proceed as follows:

步骤1、影响电力负荷预测的因素较多，通过研究分析得出平均温度因素对电力负荷预测结果的影响较大；Step 1. There are many factors affecting power load forecasting. Through research and analysis, it is concluded that the average temperature factor has a greater impact on the power load forecasting results;

步骤1.1、本发明选用EUNITE network组织的全球负荷预测竞赛的数据进行试验，该数据包括1997-1998年每日48小时的时间区间(每半小时对应一个负荷点)的负荷数据、以及1997-1998年每日的平均温度数据。该数据属于完整数据。并预测1999年1月份31天的每日最大负荷数据；Step 1.1, the present invention selects the data of the global load forecasting competition that EUNITE network organizes to test, and this data comprises the load data of the time interval of 48 hours every day in 1997-1998 (corresponding to a load point every half hour), and 1997-1998 Average daily temperature data for the year. This data is complete data. And predict the daily maximum load data for 31 days in January 1999;

步骤1.2、采集并确定预测日以前的日最大负荷数据L′以及平均温度数据W′＝(w′₁,w′₂,…,w′_j,…,w′_N)；w′_j为第j天的平均温度；1≤j≤N，N为总天数；Step 1.2. Collect and determine the daily maximum load data L' and average temperature data W'=(w' ₁ ,w' ₂ ,...,w' _j ,...,w' _N ); w' _j is the first The average temperature of j days; 1≤j≤N, N is the total number of days;

步骤2、为避免计算过程中出现计算饱和现象，对日最大负荷数据进行归一化处理。Step 2. In order to avoid calculation saturation during the calculation process, normalize the daily maximum load data.

对日最大负荷数据L′进行归一化处理，得到归一化后的日最大负荷数据L＝(l₁,l₂,…,l_j,…,l_N)；l_j为第j天的归一化后日最大负荷，其中，平均温度因素根据气象部门的天气预报确定，日最大负荷数据根据电力公司提供的数据确定；Normalize the daily maximum load data L′ to obtain the normalized daily maximum load data L=(l ₁ ,l ₂ ,…,l _j ,…,l _N ); l _j is the Daily maximum load after normalization, wherein the average temperature factor is determined according to the weather forecast of the meteorological department, and the daily maximum load data is determined according to the data provided by the power company;

步骤3、由于平均温度因素在实际生活中是不确定的或者是模糊的，所以，采用模糊理论建立隶属度函数对平均温度因素进行模糊化处理。Step 3. Since the average temperature factor is uncertain or fuzzy in real life, the fuzzy theory is used to establish a membership function to fuzzify the average temperature factor.

对平均温度数据W′，采用模糊理论进行模糊化处理，得到模糊后的平均温度数据W＝(w₁,w₂,…,w_j,…,w_N)，w_j为第j天的模糊后的平均温度；其中，模糊化处理是按如下过程进行：For the average temperature data W′, use the fuzzy theory to carry out fuzzy processing, and obtain the average temperature data after fuzzy W=(w ₁ ,w ₂ ,...,w _j ,...,w _N ), w _j is the fuzzy data of the jth day After the average temperature; Among them, the fuzzy treatment is carried out according to the following process:

根据式(6)所示的低温的隶属度函数、式(7)所示的中温的隶属度函数和式(8)所示的高温隶属度函数，将第j天的平均温度w′_j划分为低温数据、中温数据或高温数据，得到第j天的平均温度w′_j所属的模糊集；从而得到模糊后的平均温度数据W＝＝(w₁,w₂,…,w_j,…,w_N)；According to the low temperature membership function shown in formula (6), the middle temperature membership function shown in formula (7) and the high temperature membership function shown in formula (8), the average temperature w′ _j of day j is divided into For low temperature data, medium temperature data or high temperature data, get the fuzzy set to which the average temperature w′ _j of the jth day belongs; thus get the fuzzy average temperature data W==(w ₁ ,w ₂ ,…,w _j ,…, w _N );

低温的隶属度函数为：The membership function for low temperature is:

中温的隶属度函数为：The membership function of medium temperature is:

高温的隶属度函数为：The membership function for high temperature is:

式(6)、式(7)和式(8)中满足e＜g＜f＜m＜n＜p且这些变量的取值根据具体情况确定。其中，e∈[-10，-2],g∈[-3，3],f∈[5，12],m∈[10，16]n∈[17，25],p∈[30，40]。根据本文选取的数据，这里的变量取值分别为：e＝-5,g＝0,f＝10,m＝15,n＝20,p＝35。E<g<f<m<n<p is satisfied in formula (6), formula (7) and formula (8), and the values of these variables are determined according to specific conditions. Among them, e ∈ [-10, -2], g ∈ [-3, 3], f ∈ [5, 12], m ∈ [10, 16] n ∈ [17, 25], p ∈ [30, 40 ]. According to the data selected in this paper, the values of the variables here are: e=-5, g=0, f=10, m=15, n=20, p=35.

步骤4、构建训练集：选取第i天的前d天的日最大负荷值以及模糊后的平均温度作为第i天训练样本的输入向量，即选取第i天的日最大负荷作为第i天训练样本的目标输出值，即从而得到训练样本集i＝1,2,…,N^train，N^train为训练样本总个数；Step 4. Build a training set: select the daily maximum load value of the first d days of the i-th day and the average temperature after blurring As the input vector of the training sample on the i-th day, that is Select the daily maximum load on the i-th day As the target output value of the training sample on the i-th day, that is So as to get the training sample set i=1,2,...,N ^train , N ^train is the total number of training samples;

步骤5、构建测试集：选取第k天的前d天的日最大负荷值以及模糊后的平均温度作为第k天测试样本的输入向量，即选取第k日的日最大负荷作为第k天测试样本的目标输出值，即从而得到测试样本集k＝1,2,…,N^test，N^test为测试样本总个数；Step 5. Build a test set: select the daily maximum load value of the first d days of the kth day and the average temperature after blurring As the input vector of the test sample on the kth day, that is Select the daily maximum load on the kth day As the target output value of the test sample on the kth day, that is To get the test sample set k=1,2,...,N ^test , N ^test is the total number of test samples;

步骤6、将第i行输入的d维行向量分别作为训练输入变量的第i个非线性成分x_i和第i个线性成分u_i，将训练集中第i行的一维实际输出值作为第i个实际输出值y_i，建立如式(1)所示的模糊支持向量分位数回归模型为：Step 6. Input the d-dimensional row vector of the i-th row As the i-th nonlinear component x _i and the i-th linear component u _i of the training input variable respectively, the one-dimensional actual output value of the i-th row in the training set As the i-th actual output value y _i , the fuzzy support vector quantile regression model shown in formula (1) is established as:

式(1)中，T为转置；w_i表示第i天模糊化后的平均温度因子；τ_r表示第r个分位点，且τ_r∈(0,1)，r＝1,2,…,N_τ；N_τ表示分位点的个数；表示第r个分位点τ_r下的参数向量，C为惩罚参数，为第r个分位点τ_r下的系数向量，φ（·）表示非线性映射函数，表示检验函数；并有：其中，表示第r个分位点τ_r下的阈值，并有：其中，α_i和表示第i个最优Lagrange乘子；K为输入空间中的核矩阵，并有：In formula (1), T is the transposition; w _i represents the average temperature factor after fuzzification on the i-th day; τ _r represents the r-th quantile point, and τ _r ∈ (0,1), r=1,2 ,…,N _τ ; N _τ represents the number of quantile points; Represents the parameter vector under the rth quantile point τ _r , C is the penalty parameter, is the coefficient vector under the rth quantile point τ _r , φ( ) represents the nonlinear mapping function, denotes the test function; and has: in, Denotes the threshold at the rth quantile τ _r and has: Among them, α _i and Represents the i-th optimal Lagrange multiplier; K is the kernel matrix in the input space, and has:

步骤7、求解参数值：根据模糊支持向量分位数回归模型，引入松弛变量构建Lagrange函数并进行求解式(1)，得到如式(2)所示的第r个分位点τ_r下的参数向量阈值和系数向量 Step 7. Solve parameter values: According to the fuzzy support vector quantile regression model, introduce slack variables to construct the Lagrange function and solve formula (1) to obtain the rth quantile point τ _r as shown in formula (2) parameter vector threshold and coefficient vector

步骤8、将所得到的参数值代入模型中并求解测试目标输出值：Step 8. Substitute the obtained parameter values into the model and solve the test target output value:

将第r个分位点τ_r下的参数向量阈值和系数向量代入模糊支持向量分位数回归预测模型；并将测试集中第k行输入的d维行向量作为测试输入变量的第k个非线性成分x_k和第k个线性成分u_k，从而利用式(3)得到测试集中第r个分位点τ_r下的第k行输出值 The parameter vector under the rth quantile point τ _r threshold and coefficient vector Substitute into the fuzzy support vector quantile regression prediction model; and input the d-dimensional row vector of the kth row in the test set The k-th nonlinear component x _k and the k-th linear component u _k are used as test input variables, so that the output value of the k-th row under the r-th quantile point τ _r in the test set can be obtained by using formula (3)

步骤9.运用核密度估计对日最大负荷进行概率密度预测：流程图如图2所示；Step 9. Use kernel density estimation to predict the probability density of the daily maximum load: the flow chart is shown in Figure 2;

步骤9.1、求解不同分位点下的测试目标输出值：令第k天的第r个分位点τ_r下的预测结果为：从而获得第k天所有分位点下的预测结果进而得到日最大负荷的测试输出值 Step 9.1, Solve the test target output value under different quantile points: Let the prediction result under the rth quantile point τ _r on the kth day be: In order to obtain the forecast results under all quantile points on the kth day Then get the test output value of the daily maximum load

步骤9.2、求出概率密度函数值：利用式(4)得到第k天第w个分位点τ_w所对应的概率密度函数值 Step 9.2. Calculate the value of the probability density function: use formula (4) to obtain the value of the probability density function corresponding to the wth quantile point τ _w on the kth day

步骤9.2、求解最优窗宽：在核密度估计方法的研究中，窗宽选择是概率密度预测函数局部光滑问题中的一个非常重要的问题。利用式(5)所示的拇指原则计算Epanechnikov核函数K₁(η)的最优窗宽h^*：Step 9.2. Solve the optimal window width: In the research of kernel density estimation method, window width selection is a very important issue in the local smoothness of the probability density prediction function. The optimal window width h ^* of the Epanechnikov kernel function K ₁ (η) is calculated using the thumb principle shown in formula (5):

h^*＝1.06s_XN_τ ^-1/5 (5)h ^* ＝1.06s _X N _τ ^-1/5 (5)

式(5)中，s_X为日最大负荷的测试输出值的标准差，然后以样本标准差1.06倍这一标准，得到最优窗宽。拇指原则也就是通过固定标准，根据样本标准差，直接获得最优窗宽。In formula (5), s _X is the standard deviation of the test output value of the daily maximum load, and then the optimal window width is obtained based on the standard of 1.06 times the sample standard deviation. The thumb principle is to directly obtain the optimal window width according to the sample standard deviation through fixed standards.

步骤9.3、根据Epanechnikov核函数K₁(η)和最优窗宽h^*求取日最大负荷的概率密度预测结果。Step 9.3: According to the Epanechnikov kernel function K ₁ (η) and the optimal window width h ^* , obtain the probability density prediction result of the daily maximum load.

步骤9.4、概率密度曲线图如图3所示：图3给出了第1天、第6天、第11天、第16天、第21天、第26天的预测值的概率密度分布图。从图中可以看出，预测值都以较大的概率出现在概率密度预测曲线的众数处。Step 9.4, the probability density curve is shown in Figure 3: Figure 3 shows the probability density distribution of the predicted values on the 1st day, the 6th day, the 11th day, the 16th day, the 21st day, and the 26th day. It can be seen from the figure that the predicted values all appear at the mode of the probability density prediction curve with a relatively high probability.

Claims

1. A power load probability density prediction method based on fuzzy support vector quantile regression is characterized by comprising the following steps:

step 1, collecting and determining day maximum load data L 'before the predicted day and average temperature data W ═ W'₁,w′₂,…,w′_j,…,w′_N)；w′_jAverage temperature on day j; j is more than or equal to 1 and less than or equal to N, and N is the total days;

step 2, carrying out normalization processing on the daily maximum load data L' to obtain normalized daily maximum loadLoad data L ═ L₁,l₂,…,l_j,…,l_N)；l_jNormalized postday maximum load for day j;

step 3, fuzzifying the average temperature data W' by adopting a fuzzy theory to obtain fuzzy average temperature data W ═ (W ═₁,w₂,…,w_j,…,w_N)，w_jMean temperature after blur for day j;

step 4, selecting the day maximum load value of d days before the ith dayAnd the mean temperature after blurringAs input vectors for day i training samples, i.e.Selecting the day maximum load of the ith dayAs target output values for day i training samples, i.e.Thereby obtaining a training sample seti＝1,2,…,N^train，N^trainThe total number of training samples;

step 5, selecting the day maximum load value d days before the k dayAnd the mean temperature after blurringAs day k testInput vectors of samples, i.e.Selecting the day maximum load of the k dayAs target output value of the day k test sample, i.e.Thereby obtaining a test sample setk＝1,2,…,N^test，N^testThe total number of the test samples is;

step 6, inputting the d-dimensional row vector of the ith rowI-th nonlinear components x as training input variables respectively_iAnd the ith linear component u_iAnd one-dimensional actual output value of the ith row in the training setAs the ith actual output value y_iEstablishing a fuzzy support vector quantile regression model shown as the formula (1) as follows:

<mrow> <munder> <mi>min</mi> <mrow> <msub> <mi>&omega;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>,</mo> <msub> <mi>b</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> </mrow> </munder> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <msub> <mi>&omega;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mi>C</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>N</mi> <mrow> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi> </mrow> </msup> </munderover> <msub> <mi>w</mi> <mi>i</mi> </msub> <msub> <mi>&rho;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>b</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>-</mo> <msubsup> <mi>&beta;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> <mi>T</mi> </msubsup> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>-</mo> <msubsup> <mi>&omega;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> <mi>T</mi> </msubsup> <mi>&phi;</mi> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

in the formula (1), T is transposition; w is a_iRepresents the average temperature factor after fuzzification on day i; tau is_rDenotes the r-th quantile and_r∈(0,1)，r＝1,2,…,N_τ；N_τrepresenting the number of quantiles;denotes the r-th quantile_rParameter vector ofAnd C is a penalty parameter,is the r-th quantile_rThe lower coefficient vector, phi (-) represents a non-linear mapping function,representing a checking function; and comprises the following components:wherein, denotes the r-th quantile_rThe following thresholds, in combination:wherein alpha is_iAndrepresents the ith optimal Lagrange multiplier; k is a kernel matrix in the input space and has:

K(x_i,x_v)＝φ(x_i)^Tφ(x_v) (ii) a v is an element of I; i is the subscript set of the resulting support vectors

Step 7, according to the fuzzy support vector quantile regression model, introducing a relaxation variable to construct a Lagrange function and solving the formula (1) to obtain the gamma quantile point tau shown in the formula (2)_rParameter vector ofThreshold valueSum coefficient vector

<mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&omega;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>N</mi> <mrow> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi> </mrow> </msup> </munderover> <mrow> <mo>(</mo> <msub> <mi>&alpha;</mi> <mi>i</mi> </msub> <mo>-</mo> <msubsup> <mi>&alpha;</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>b</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>,</mo> <msub> <mi>&beta;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>U</mi> <mi>T</mi> </msup> <mi>U</mi> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <msup> <mi>U</mi> <mi>T</mi> </msup> <mrow> <mo>(</mo> <mi>y</mi> <mo>-</mo> <mi>K</mi> <mo>(</mo> <mrow> <mi>&alpha;</mi> <mo>-</mo> <msup> <mi>&alpha;</mi> <mo>*</mo> </msup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

in the formula (2), α^*For the optimal Lagrange multiplier vector, y ═ y_iI belongs to I; design the matrix as

Step 8, dividing the r th quantile point tau_rParameter vector ofThreshold valueSum coefficient vectorSubstituting the fuzzy support vector quantile regression prediction model; and inputting the d-dimensional row vector of the k line in the test setKth nonlinear component x as a test input variable_kAnd the k-th linear component u_kThereby obtaining the r-th quantile τ in the test set using equation (3)_rOutput value of line k below

<mrow> <msub> <mi>Q</mi> <msub> <mi>y</mi> <mi>k</mi> </msub> </msub> <mrow> <mo>(</mo> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> <mo>|</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>b</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> </msub> <mo>+</mo> <msubsup> <mi>&beta;</mi> <msub> <mi>&tau;</mi> <mi>r</mi> </msub> <mi>T</mi> </msubsup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msub> <mi>K</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>&alpha;</mi> <mo>-</mo> <msup> <mi>&alpha;</mi> <mo>*</mo> </msup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

In the formula (3), K_kA K-th row vector representing the kernel matrix K;

and 9, realizing the probability density prediction of daily maximum load by using the kernel density estimation:

step 9.1, let the kth quantile τ on day k_rThe following predicted results are:thereby obtaining the prediction results of all the quantiles on the k dayFurther obtaining the test output value of daily maximum load

Step 9.2, obtaining w quantile tau at day k by using the formula (4)_wCorresponding probability density function value

<mrow> <msub> <mover> <mi>f</mi> <mo>^</mo> </mover> <mi>h</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>w</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>N</mi> <mi>&tau;</mi> </msub> <mi>h</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>&tau;</mi> </msub> </munderover> <msub> <mi>K</mi> <mn>1</mn> </msub> <mfrac> <mrow> <msub> <mi>z</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>w</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>z</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>r</mi> </mrow> </msub> </mrow> <mi>h</mi> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

In the formula (4), z_k,wW quantile τ representing day k_wThe prediction results of (1); number of quantile points w is 1,2, …, N_τ(ii) a h is window width, K₁(η) is an Epanechnikov kernel function and has:wherein | η is less than or equal to 1 |;

step 9.3, calculating the Epanechnikov kernel function K by using the thumb principle shown in the formula (5)₁(η) optimum window width h^*：

h^*＝1.06s_XN_τ ^-1/5(5)

In the formula (5), s_XStandard deviation of the test output value for daily maximum load;

step 9.4, according to Epanechnikov kernel function K₁(η) and the optimum window width h^*And obtaining the probability density prediction result of the daily maximum load.

2. The method of predicting the probability density of an electric power load according to claim 1, wherein the blurring process in the step 3 is performed as follows:

the average temperature w 'of the j-th day is calculated from the membership function for low temperature represented by the formula (6), the membership function for medium temperature represented by the formula (7) and the membership function for high temperature represented by the formula (8)'_jDividing the data into low temperature data, medium temperature data or high temperature data to obtain the average temperature w 'of the j day'_jThe fuzzy set to which it belongs; the fuzzy average temperature data W ═ W (W) is obtained₁,w₂,…,w_j,…,w_N)；

The low temperature membership function is:

<mrow> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo>></mo> <mi>f</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <mi>f</mi> <mo>-</mo> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> </mrow> <mrow> <mi>f</mi> <mo>-</mo> <mi>e</mi> </mrow> </mfrac> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>e</mi> <mo>&le;</mo> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo><</mo> <mi>f</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo><</mo> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

the membership function of the intermediate temperature is as follows:

the membership function for high temperature is:

<mrow> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo><</mo> <mi>n</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo>-</mo> <mi>n</mi> </mrow> <mrow> <mi>p</mi> <mo>-</mo> <mi>n</mi> </mrow> </mfrac> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>n</mi> <mo>&le;</mo> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo>&le;</mo> <mi>p</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>w</mi> <mi>j</mi> <mo>&prime;</mo> </msubsup> <mo>></mo> <mi>p</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

in the formula (6), the formula (7) and the formula (8), e < g < f < m < n < p, and e ∈ 10, -2, g ∈ 3, f ∈ 5, 12, m ∈ [10, 16], n ∈ [17, 25], p ∈ [30, 40] are satisfied.