WO2022222230A1 - 基于机器学习的指标预测方法、装置、设备及存储介质 - Google Patents
基于机器学习的指标预测方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2022222230A1 WO2022222230A1 PCT/CN2021/097309 CN2021097309W WO2022222230A1 WO 2022222230 A1 WO2022222230 A1 WO 2022222230A1 CN 2021097309 W CN2021097309 W CN 2021097309W WO 2022222230 A1 WO2022222230 A1 WO 2022222230A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- result
- regression
- classification
- indicator
- trained
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000010801 machine learning Methods 0.000 title claims abstract description 18
- 238000013145 classification model Methods 0.000 claims abstract description 191
- 238000012545 processing Methods 0.000 claims abstract description 68
- 238000010200 validation analysis Methods 0.000 claims description 41
- 238000012795 verification Methods 0.000 claims description 35
- 238000012549 training Methods 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 14
- 230000018199 S phase Effects 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 abstract description 5
- 230000000630 rising effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013215 result calculation Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Definitions
- the present application relates to the technical field of artificial intelligence, and in particular, to a method, apparatus, device and storage medium for index prediction based on machine learning.
- the embodiments of the present application provide an index prediction method, apparatus, device, and storage medium based on machine learning, which can effectively integrate different models to improve the prediction accuracy of statistical and measurement processes.
- an embodiment of the present application provides a method for predicting indicators based on machine learning, including: acquiring data to be predicted; the data to be predicted includes a first value of a specified leading indicator and a second value of a specified explanatory variable. value, use each of the pre-trained classification models in the first number of pre-trained classification models to classify the first value of the specified leading indicator to obtain the first number of classification results, and use the first number of classification results from the first number of The classification result with the largest number of occurrences determined in the classification results is used as the first prediction result for the predicted main indicator; the first prediction result of the specified leading indicator is determined by each pre-trained regression model in the second number of pre-trained regression models.
- the final prediction result for the main indicator is determined according to the first prediction result, the second prediction result, and the average classification and backtracking precision, and the average classification and backtracking precision is based on the first number of pre-trained classification models The mean calculated from the classification backtracking accuracy of each pretrained classification model in .
- an embodiment of the present application provides an indicator prediction device based on machine learning, including: an acquisition module for acquiring to-be-predicted data; a classification processing module for when the to-be-predicted data includes a first index of a specified leading indicator When the first value and the second value of the specified explanatory variable are selected, each pre-trained classification model in the first number of pre-trained classification models is used to classify the first value of the specified leading indicator, and the first number of classification result, and the classification result with the largest number of occurrences determined from the first number of classification results is used as the first prediction result of the predicted main index; the regression processing module is used for using the second number of pre-trained Each pre-trained regression model in the regression model performs regression processing on the first value of the specified leading indicator and the second value of the specified explanatory variable, to obtain a second number of regression results, and will perform regression processing according to the second number of regression results.
- the mean value obtained by the result calculation is used as the second prediction result for the main indicator; the determination module is used to determine the final prediction result for the main indicator according to the first prediction result, the second prediction result, and the mean value of classification and backtracking accuracy.
- the mean value of classification and backtracking precision is an average value calculated according to the classification and backtracking precision of each pre-trained classification model in the first number of pre-trained classification models.
- an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions , the processor is configured to call the program instructions, and execute the following method: obtain the data to be predicted; when the data to be predicted includes the first value of the specified leading indicator and the second value of the specified explanatory variable, Use each of the pre-trained classification models in the first number of pre-trained classification models to perform classification processing on the first value of the specified leading indicator to obtain the first number of classification results, and use the first number of classification results from the first number of classification results.
- the classification result with the largest number of occurrences is determined as the first prediction result of the predicted main indicator; the first value of the specified leading indicator and the Specify the second value of the explanatory variable to perform regression processing to obtain a second number of regression results, and use the mean value calculated according to the second number of regression results as the second prediction result for the main indicator; according to the The first prediction result, the second prediction result, and the average classification and backtracking accuracy determine the final prediction result for the main indicator, and the average classification and backtracking precision is determined according to each of the first number of pretrained classification models.
- an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following methods: acquiring data to be predicted; When the data to be predicted includes the first value of the specified leading indicator and the second value of the specified explanatory variable, each pre-trained classification model in the first number of pre-trained classification models is used to determine the first value of the specified leading indicator.
- the first number of classification results is obtained, and the classification result with the largest number of occurrences determined from the first number of classification results is used as the first prediction result of the predicted main index; the second number of Each of the pre-trained regression models performs regression processing on the first value of the specified antecedent indicator and the second value of the specified explanatory variable to obtain a second number of regression results.
- the mean value calculated from the two regression results is used as the second prediction result for the main indicator; the final prediction for the main indicator is determined according to the first prediction result, the second prediction result, and the mean value of classification and retrospective accuracy
- the mean value of classification and backtracking precision is an average value calculated according to the classification and backtracking precision of each pre-trained classification model in the first number of pre-trained classification models.
- the present application effectively integrates the first prediction result of each pre-trained classification model and the second prediction result of each pre-trained regression model, and then combines each prediction result and classification backtracking accuracy to determine the final prediction result for the main indicator, Compared with the method of using a single model to predict the main index in the prior art, considering the problem of low prediction accuracy caused by different models having different expressive capabilities, the present application can effectively improve the statistical and measurement process through the above process. prediction accuracy.
- FIG. 1 is a schematic flowchart of a method for predicting indicators based on machine learning provided by an embodiment of the present application
- FIG. 2 is a schematic flowchart of another method for predicting indicators based on machine learning provided by an embodiment of the present application
- FIG. 3 is a schematic structural diagram of a device for predicting indicators based on machine learning provided by an embodiment of the present application
- FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- leading indicators mentioned in the embodiments of the present application refer to economic indicators that have an impact on future economic development, which are indicators that change first before economic growth or recession comes. It can predict the turning points in the economic cycle and estimate the magnitude of the rise and fall of economic activity, thereby inferring the trend of economic fluctuations.
- the leading indicator may be referred to as a leading indicator in this application, and it can be the first to change before the predicted main indicator changes.
- explanatory variables mentioned in the embodiments of the present application may be referred to as explanatory variables and controllable variables, and are independent variables in the econometric model. In some cases, explanatory variables can have an effect on economic variables that are dependent variables.
- the main indicator mentioned in the embodiment of the present application is an indicator that needs to be predicted, for example, the main indicator may be an economic variable as a dependent variable.
- the number of leading periods mentioned in the embodiments of the present application refers to the number of periods in which the leading index and the main index are staggered in time.
- the leading indicator 1 corresponding to the leading period 1 the first period is 1 month
- the value of the leading indicator 1 in December 2020 can be corresponding to the value of the main indicator in January 2021
- the leading indicator 1 The value in November 2020 corresponds to the value of the main indicator in December 2020, and so on. This makes the leading indicator 1 and the main indicator staggered by one period in time, which is understood as the leading period.
- the leading indicator 2 corresponding to the leading two periods is 1 month
- the value of the leading indicator 2 in November 2020 can be corresponding to the value of the main indicator in January 2021
- the leading indicator The value of 2 in October 2020 corresponds to the value of the main indicator in December 2020, and so on.
- the leading indicator 2 and the main indicator are staggered by 2 periods in time, which is understood as the leading 2 period.
- the electronic device may use each pre-trained classification in the first number of pre-trained classification models
- the model classifies the first value of the specified leading indicator to obtain the first number of classification results, and takes the classification result with the most occurrences determined from the first number of classification results as the first classification result for the predicted main indicator. forecast result.
- the electronic device may also perform regression processing on the first value of the specified leading indicator and the second value of the specified explanatory variable by using each of the pre-trained regression models in the second number of pre-trained regression models, to obtain the first value of the specified leading indicator.
- the electronic device may determine the final prediction result for the main indicator according to the first prediction result, the second prediction result, and the average classification and retrospective precision. Using this process, the electronic device effectively integrates different models to improve the prediction accuracy of statistical and metrological processes.
- the electronic device uses each pre-trained classification model in the first number of pre-trained classification models to obtain the first value of the specified leading indicator
- a classification process is performed to obtain a first number of classification results, and the classification result with the largest number of occurrences determined from the first number of classification results is used as the first prediction result for the predicted main index.
- the electronic device may use the first prediction result as the final prediction result for the main indicator. In this case, the electronic device can obtain an accurate final prediction result of the main indicator through the first number of pre-trained classification models.
- the electronic device may use each pre-trained regression model in the second number of pre-trained regression models to perform the first fetch of the specified leading indicator. and the second value of the specified explanatory variable to perform regression processing to obtain the second number of regression results, and use the mean calculated according to the second number of regression results as the second prediction result for the main indicator. After obtaining the second prediction result, the electronic device may use the second prediction result as the final prediction result for the main indicator. In this case, the electronic device can obtain an accurate final prediction result of the main indicator through the second number of pre-trained regression models.
- FIG. 1 is a schematic flowchart of a method for predicting indicators based on machine learning according to an embodiment of the present application.
- the method can be applied to an electronic device, and the electronic device can be a server or a user terminal.
- the server may be a server or a server cluster
- the user terminal may be an intelligent terminal such as a notebook computer and a desktop computer.
- the method may include the following steps:
- the data to be predicted may include the first value of the specified leading indicator and/or the second value of the specified explanatory variable.
- the specified leading indicators may be one or more, and the specified explanatory variables may also be one or more.
- the first value of the specified leading indicator can be used to determine the first prediction result of the main indicator, and the first value of the specified leading indicator and the second value of the specified explanatory variable can be used to determine the second prediction result of the main indicator. .
- the first prediction result can be one of the prediction results of the main indicator at the target time (the time unit can be day, month, quarter, year, etc.), and the second prediction result can be another prediction result of the main indicator at the target time,
- the two prediction results can be used to determine the final prediction result of the main indicator, that is, can be used to determine the final prediction result of the main indicator at the target time.
- the specified leading indicator may have different values at different times.
- the first value of the specified leading indicator may be determined according to the number of target leading periods corresponding to the specified leading indicator. In one embodiment, the first value of the specified leading indicator may be determined according to the target leading period corresponding to the specified leading indicator and the aforementioned target time. For example, specify the leading indicator as leading indicator 1, and the target leading period corresponding to leading indicator 1 is leading period 1. When period 1 is 1 month, if you want to predict the value of the main indicator in January 2021, the first value of leading indicator 1 is the value of leading indicator 1 in December 2020. For another example, specify the leading indicator as leading indicator 2, and the target leading period corresponding to leading indicator 2 is leading 2 period. When the first period is 1 month, if the predicted value of the main indicator in January 2021, the first value of the leading indicator 2 is the value of the leading indicator 2 in November 2020.
- the specified explanatory variable may also have different values at different times.
- the second value of the specified explanatory variable can be determined according to the aforementioned target time.
- the second value of the specified explanatory variable may be the value of the specified explanatory variable at the target time. For example, if the explanatory variable is specified as explanatory variable 1, if the predicted value of the main indicator is in January 2021, then the second value of explanatory variable 1 is the value of explanatory variable 1 in January 2021.
- step S102-step S104 Due to the principle characteristics of different models, their prediction results will have different trend preferences, which will bring potential stability risks to the final fusion results. In order to further improve the robustness of the prediction model, different methods are used for different models to obtain the final prediction result.
- process of obtaining the final prediction result by using different methods for different models please refer to the process of step S102-step S104.
- the to-be-predicted data includes a first value of a specified leading indicator and a second value of a specified explanatory variable
- use each pre-trained classification model in the first number of pre-trained classification models to classify the specified leading indicator Perform classification processing on the first value of , to obtain a first number of classification results, and use the classification result with the largest number of occurrences determined from the first number of classification results as the first prediction result for the predicted main index.
- the electronic device may use the first value of the specified leading indicator as the first number of pre-trained values.
- the input data of each pre-trained classification model in the classification model is used to classify the first value of the specified leading indicator through each pre-trained classification model, and the classification result of the pre-trained classification model is obtained, and the first quantity is obtained in total classification results.
- the electronic device may determine the first prediction result for the main indicator according to the first number of classification results.
- the electronic device may determine the classification result with the largest number of occurrences from the first number of classification results by using a voting method or the like, and use the classification result with the largest number of occurrences as the first prediction result for the main indicator.
- the classification model is any model such as a neural network model that can be used for classification processing.
- the classification result may be a predicted trend (a trend of future development), such as rising, falling or unchanged.
- the first number of pre-trained classification models include pre-trained classification model 1, pre-trained classification model 2, pre-trained classification model 3, pre-trained classification model 4, and pre-trained classification model 5.
- the electronic device may use the first value of the specified leading indicator as the input data of the five pre-trained classification models, and classify the first value of the specified leading indicator through the five pre-trained classification models, and obtain each The classification results of each pre-trained classification model, a total of 5 classification results. Assuming that the classification result is the predicted trend, the five classification results are rising, rising, falling, unchanged, and rising. Therefore, the electronic device may determine, from the five classification results, the classification result with the largest number of occurrences as an increase by means of a voting method or the like, so as to use the increase as the first prediction result for the main index.
- the electronic device may use the first value of the specified leading indicator and the second value of the specified explanatory variable as the input data of each pre-trained regression model in the second number of pre-trained regression models, through Each pre-trained regression model performs regression processing on the first value of the specified leading indicator and the second value of the specified explanatory variable, and obtains the regression results of each pre-trained regression model, totaling a second number of regression results. After obtaining the second number of regression results, the electronic device may determine the second prediction result for the main indicator according to the second number of regression results. In one embodiment, the electronic device may calculate the mean value of the second number of regression results, and use the mean value calculated according to the second number of regression results as the second prediction result for the main indicator.
- the electronic device may remove one highest value and one lowest value from the second number of regression results, then calculate the mean value of the remaining regression results, and use the mean values calculated according to the remaining regression results as the pair The secondary forecast result of the primary indicator.
- the regression model may be any model such as a neural network model that can be used for regression processing.
- the regression result may be a predicted point, such as where a market index is located.
- the electronic device may use the first value of the specified leading indicator and the specified explanatory variable as the pre-trained regression model.
- the input data of model 1, and the first value of the specified leading indicator and the value of the specified explanatory variable are used as the input data of the pre-trained regression model 2, and the first value of the specified leading indicator is analyzed by these two pre-trained regression models.
- the value and the second value of the specified explanatory variable are used for regression processing, and the regression results of each pre-trained regression model are obtained, with a total of 2 regression results. Assuming that the regression result is the predicted point, the two regression results are a and b respectively.
- the electronic device can calculate and obtain the mean value of the two regression results as (a+b)/2, so as to use (a+b)/2 as the second prediction result for the main indicator.
- S104 Determine the final prediction result for the main indicator according to the first prediction result, the second prediction result, and the average classification and backtracking precision, and the average classification and backtracking precision is pre-trained according to the first number
- the mean value of classification and backtracking precision may be set to a value such as 0.6, which is not limited in this embodiment of the present application.
- the embodiment of the present application After performing steps S102 and S103, that is, after the model fusion process is completed, the predicted values of the two groups of models can be obtained, which are the first prediction result for the main indicator and the second prediction result for the main indicator.
- the embodiment of the present application also corrects the prediction values of the two groups of models with each other based on the model correction strategy by classifying the retrospective precision mean value, the first preset value, and the second preset value, thereby improving the prediction accuracy. Accuracy, and output a sufficiently stable and reliable indicator prediction value.
- the sufficiently stable and reliable forecast can be used to assist relevant departments to adopt corresponding policy strategies for the future trend of research indicators.
- the embodiment of the present application fully utilizes and mines the preferences and capabilities of various models for data feature extraction, realizes the understanding of the nature of data from multiple angles and latitudes, and to a certain extent, realizes that the model can be explanatory.
- the process for the electronic device to determine the final prediction result for the main indicator according to the first prediction result, the second prediction result, and the classification and backtracking accuracy may be as follows: when the average value of the classification and backtracking accuracy of the electronic device is less than the first preset When the value is set, the second prediction result is taken as the final prediction result of the main indicator. The mean value of classification and backtracking accuracy is less than the first preset value, indicating that the referenceability of the first prediction result is low. At this time, the accuracy of the final prediction result of the main index obtained by combining the first prediction result and the second prediction result analysis is low. In order to improve the accuracy of the final prediction result of the main indicator, the second prediction result can be directly used as the final prediction result of the main indicator.
- the process for the electronic device to determine the final prediction result for the main indicator according to the first prediction result, the second prediction result and the classification backtracking accuracy may be as follows: the average value of the classification backtracking accuracy of the electronic device is greater than or equal to When the first preset value is smaller than the second preset value, a first regression result in the same direction as the first prediction result is determined from the second number of regression results, and a first regression result in the opposite direction to the first prediction result is determined. A second regression result closest to the first regression result is determined from the regression results, and a final prediction result for the main indicator is determined according to the first regression result and the second regression result.
- the manner in which the electronic device determines the final prediction result for the main indicator according to the first regression result and the second regression result may be: the electronic device uses the mean value calculated according to the first regression result and the second regression result as the pair The final forecast result of the main indicator.
- the first regression result refers to the regression result in the same direction as the first prediction result. There may be one or more regression results in the same direction as the first prediction result.
- the second regression result refers to the regression result closest to the first regression result determined from the regression results that are opposite to the first prediction result.
- the regression result closest to the first regression result can be understood as the regression result with the smallest absolute value of the difference between the first regression result and the first regression result among the regression results that are opposite to the first prediction result.
- the mean value of classification and backtracking accuracy is greater than or equal to the first preset value but less than the second preset value, indicating that the classification and backtracking accuracy has a certain reference, but considering its possible errors, the first regression result can be obtained first, and then The final prediction result of the main indicator is determined in combination with the second regression result, thereby reducing the error of the final prediction result of the main indicator.
- the electronic device may determine the first regression result in the same direction as the first prediction result and the regression result in the opposite direction to the first prediction result in the following manner: if the first prediction result is rising, then A regression result greater than the target value is determined from the second number of regression results, and the regression result greater than the target value is determined as the first regression result in the same direction as the first prediction result.
- the target value refers to the main indicator at the target time
- the value of the previous time can be the real value of the main indicator at the previous time of the target time or the final prediction result of the main indicator at the previous time of the target time
- the electronic device can also obtain the regression result from the second number of A regression result smaller than the target value is determined, and the regression result smaller than the target value is determined as a regression result inverse to the first prediction result.
- the regression result is greater than the target value, it indicates that the trend of the regression result is rising. Since the first prediction result is also rising, it indicates that the regression result greater than the target value is in the same direction as the first prediction result. If the regression result is smaller than the target value, it indicates that the trend of the regression result is down.
- the electronic device may determine the first regression result in the same direction as the first prediction result and the regression result in the opposite direction to the first prediction result in the following manner: if the first prediction result is a decline, from the A regression result smaller than the target value among the second number of regression results is determined, and a regression result smaller than the target value among the second number of regression results is determined as the first regression result in the same direction as the first prediction result; The device may also determine a regression result greater than the target value from the second number of regression results, and determine the regression result greater than the target value as a regression result that is inverse to the first prediction result.
- the regression result is smaller than the target value, it indicates that the trend of the regression result is decreasing. Since the first prediction result is also decreasing, it indicates that the regression result greater than the target value is in the same direction as the first prediction result. If the regression result is greater than the target value, it indicates that the trend of the regression result is upward. Since the first prediction result is downward, it indicates that the regression result smaller than the target value is opposite to the first prediction result.
- the process for the electronic device to determine the final prediction result for the main indicator according to the first prediction result, the second prediction result and the classification backtracking accuracy may be as follows: the average value of the classification backtracking accuracy of the electronic device is greater than or equal to In the case of the second preset value, if the average of the classification and backtracking precision is less than or equal to the average of the regression and backtracking precision, the second prediction result is used as the final prediction result of the main indicator.
- the regression backtracking precision mean is the mean value calculated according to the regression backtracking accuracy of each pretrained regression model in the second number of pretrained regression models.
- the manner in which the electronic device determines the final prediction result of the main indicator according to the first regression result may be: the electronic device uses the mean value calculated according to the first regression result as the final prediction for the main indicator result.
- the electronic device may The regression backtracking accuracy of each pretrained regression model in the number of pretrained regression models is mirror-fitted with a value in the same direction as the first prediction result as the final prediction result of this time. In one embodiment, the electronic device mirrors and fits a value in the same direction as the first prediction result according to the regression backtracking accuracy of each pretrained regression model in the second number of pretrained regression models as the final prediction result of the main indicator.
- the method may be as follows: the electronic device performs weighting processing on the regression backtracking accuracy of each pretrained regression model in the second number of pretrained regression models and the regression result of the regression model, and obtains multiple weighted results to calculate the average weighted result, and Calculate the absolute value of the difference between the weighted result mean and the aforementioned target value, and determine the sum of the target value and the absolute value of the difference to determine the final prediction result of the main indicator.
- the application adopts the method of mirror fitting to calculate the final prediction results of the main indicators. , thereby improving the accuracy of the final prediction result of the main indicator.
- the electronic device can use each pre-trained classification model in the first number of pre-trained classification models to classify the first value of the specified leading indicator, and obtain the first number of classification results, and use the classification result with the largest number of occurrences determined from the first number of classification results as the first prediction result for the predicted main indicator; the electronic device may also use each of the second number of pre-trained regression models.
- a pre-trained regression model performs regression processing on the first value of the specified leading indicator and the second value of the specified explanatory variable to obtain a second number of regression results, and the mean calculated according to the second number of regression results is used as The second prediction result of the main indicator; then the electronic device determines the final prediction result of the main indicator according to the first prediction result, the second prediction result and the average of the classification backtracking accuracy, and then improves the statistical and measurement process by integrating different models. prediction accuracy.
- FIG. 2 is a schematic flowchart of another method for predicting an indicator according to an embodiment of the present application.
- the method can be applied to an electronic device, and the electronic device can be a server or a user terminal.
- the server may be a server or a server cluster
- the user terminal may be an intelligent terminal such as a notebook computer and a desktop computer.
- the method may include the following steps:
- the electronic device may combine each leading indicator in the target leading indicator and each explanatory variable in the target explanatory variable by adopting a method such as stepwise regression or single factor screening to obtain multiple combined results.
- the target leading indicator may be determined from a plurality of leading indicators associated with the main indicator. In one embodiment, the target leading indicator may be determined from a plurality of leading indicators associated with the main indicator according to the first time series data of the main indicator.
- the target explanatory variable can be determined from multiple explanatory variables associated with the main indicator. In one embodiment, the target explanatory variable may be determined from a plurality of explanatory variables associated with the main indicator according to the first time series data of the main indicator.
- the first time series data is the time series data of the main indicator.
- the first time series data may include the real values of each time included in the first time range of the main indicator.
- the empirical estimation method can be used here to obtain the estimated value of the main indicator at this time.
- the first time series data can reflect the change of the main indicator over time.
- the time unit used for the main indicator here may be a time unit such as a year, a quarter, a month, a day, etc., depending on the actual application scenario, which is not limited in this embodiment of the present application.
- the first time series data may include the real value of the main indicator in December 2020, the real value of the main indicator in November 2020...
- the first time series data may include the main indicator on December 10, 2020 The true value of , the true value of the main indicator on December 11, 2020, or the true value of the main indicator on December 12, 2020...
- the target leading indicator may include the specified leading indicators mentioned in the embodiments of this application.
- the target explanatory variables may include the specified explanatory variables mentioned in the embodiments of the present application.
- the plurality of combined results may include at least one first combined result, and the first combined result refers to a combined result composed of at least one leading indicator. The result is different for each first combination.
- the target leading indicator includes leading indicator 1 and leading indicator 2.
- the first combined result may include combined result 1, combined result 2, and combined result 3.
- Combined result 1 includes leading indicator 1
- combined result 2 includes Leading indicator 2 and combination result 3 include leading indicator 1 and leading indicator 2.
- the target leading indicator and the target explanatory variable may be determined in the following manner: the electronic device obtains the first time series data of the main indicator, and the second time of each leading indicator in the plurality of leading indicators associated with the main indicator. serial data, and the third time series data of each explanatory variable in the multiple explanatory variables associated with the main indicator; the electronic device determines the data associated with each second time series data in the first time series data, and according to Each data and the data associated with each data in each second time series data, calculate the correlation coefficient between each leading indicator and the main indicator, and determine from the plurality of leading indicators according to the correlation coefficient between each leading indicator and the main indicator The target leading indicator, and determine the number of target leading periods corresponding to each leading indicator in the target leading indicator; the electronic device determines the data associated with each data in the first time series data in each third time series data, and according to each data and each For data associated with each third time series data, the correlation coefficient between each explanatory variable and the main indicator is calculated, and the target explanatory variable is determined from the
- the target classification model is any pre-trained classification model among the first number of pre-trained classification models.
- the second time series data refers to the time series data of the leading indicator.
- the second time series data may include actual values of the leading indicator at each time included in the second time frame.
- an empirical estimation method can be used to obtain the estimated value of the leading indicator at this time.
- the second time series data can reflect the change of the leading indicator over time.
- the second time frame may be determined from the first time frame.
- the third time series data refers to time series data of explanatory variables.
- the third time series data may include true values of the explanatory variables at various times included in the third time horizon.
- the empirical estimation method can be used to obtain the estimated value of the explanatory variable at this time.
- the third time series data can reflect the change of explanatory variables over time.
- the third time frame may be determined based on the first time frame.
- the manner in which the electronic device determines the data associated with each data in the first time series data at each second time series data may be: the electronic device determines the data in the first time series data at each second time Corresponding data in the sequence data that has been staggered according to the preset number of preceding periods is used as the data associated with the respective data in the respective second time-series data.
- the first time series data includes the real value of the main indicator in December 2020, the real value of the main indicator in November 2020.
- Multiple leading indicators include leading indicator 1
- the second time series data of leading indicator 1 includes The real value of the leading indicator 1 in December 2020, the real value of the leading indicator 1 in November 2020, the real value of the leading indicator 1 in October 2020...
- the electronic device can determine that the real value of the main indicator in December 2020 corresponds to the second time series data of the leading indicator 1.
- the data after the wrong period of the leading index 1 is the leading index 1 in November 2020.
- the actual value of the leading indicator 1 in November 2020 is determined as the data associated with the actual value of the main indicator in December 2020 in the second time series of the leading indicator 1; the electronic device can also determine The corresponding data of the main indicator in the second time series data of leading indicator 1 is the real value of leading indicator 1 in October 2020 according to the data after the wrong period of leading indicator 1, and the real value of leading indicator 1 in October 2020 , determine the data associated with the real value of the main indicator in November 2020 in the second time series of the leading indicator 1, and so on. If the preset number of leading periods includes two leading periods and one period is one month, the electronic device can determine the actual value of leading indicator 1 in October 2020 as the actual value of the main indicator in December 2020 in the leading indicator 1.
- the true value of the leading indicator 1 in September 2020 can also be determined as the data associated with the second time series data of the leading indicator 1, and the true value of the main indicator in November 2020 is associated with the second time series data of the leading indicator 1. And so on.
- the electronic device may calculate the correlation coefficient between each leading indicator and the main indicator according to the data associated with each data and each data in each second time series data, as follows: the electronic device calculates the correlation between each data and the main indicator. The product of the respective data in the second time series data of any one of the multiple leading indicators, the absolute value of the proportion of the product in the same direction in the total product is calculated, and the absolute value of the largest proportion is selected As the correlation coefficient between any leading indicator and the main indicator. Through this process, the correlation coefficient between each leading indicator and the main indicator can be obtained. The process is described below with reference to Table 1.
- the true value of the main indicator in December 2020 corresponds to the true value of the leading indicator 1 in November 2020, and the main indicator is in November 2020.
- the real value corresponds to the real value of leading indicator 1 in October 2020
- the real value of the main indicator in October 2020 corresponds to the real value of leading indicator 1 in September.
- the real value of the main indicator in December 2020 is 1
- the real value of the main indicator in November 2020 is -1
- the real value of the main indicator in October 2020 is 1
- the leading indicator 1 is in November 2020.
- the true value of 1, the true value of leading indicator 1 in October 2020 is 1, and the true value of leading indicator 1 in September 2020 is -1.
- a correlation coefficient between the index and the main index can be calculated for each leading indicator. For example, if the preset number of leading periods is one leading period, then for each leading indicator, a correlation coefficient between the leading index and the main index can be calculated. If the preset number of leading periods is multiple, for each leading indicator, a plurality of correlation coefficients between the leading index and the main index can be calculated. For example, if the preset number of leading periods is 1 leading period - leading 12 periods (including 12 leading periods), then for each leading indicator, 12 correlation coefficients between the index and the main index can be calculated.
- the manner in which the electronic device determines the target leading indicator from a plurality of leading indicators according to the correlation coefficient between the leading indicators and the main index may be: coefficient, and a leading index whose corresponding correlation coefficient is greater than or equal to the first correlation coefficient is determined from a plurality of leading indicators, as the target leading index.
- the electronic device may determine the number of target leading periods corresponding to the target leading indicator as follows: if the correlation coefficient between the first leading indicator in the target leading indicator and the main index is one, then use the correlation coefficient corresponding to the correlation coefficient. Set the number of leading periods as the target leading period of the first leading indicator.
- a certain correlation coefficient can be determined from the multiple correlation coefficients.
- the preset number of leading periods corresponding to the maximum value of the correlation coefficient (such as the maximum value of the correlation coefficient) is used as the target number of leading periods corresponding to the second leading indicator.
- the first leading indicator refers to a leading indicator that has a correlation coefficient with the main indicator
- the second leading indicator refers to a leading indicator that has multiple correlation coefficients with the main indicator.
- the manner in which the electronic device determines the data associated with each data in the first time series data in each third time series data may be: the electronic device determines the data in the first time series data in each third time series data The corresponding data is used as the data associated with each third time series of the respective data.
- the explanatory variables do not need to be staggered according to the number of preceding periods, or the explanatory variables can be understood as staggered according to the number of preceding periods as the preceding 0 period.
- the first time series data includes the true value of the main indicator in December 2020, the true value of the main indicator in November 2020...
- the third time series data of explanatory variable 1 includes the explanatory variable 1 in December 2020 The true value of , the true value of explanatory variable 1 in November 2020...
- the electronic device can determine that the true value of the main indicator in December 2020 corresponds to the true value of explanatory variable 1 in December 2020, and put the explanatory variable 1 in The actual value in December 2020 is used as the data associated with the third time series data of explanatory variable 1 as the actual value of the main indicator in December 2020.
- the electronic device can also determine the actual value of the main indicator in November 2020 corresponding to the explanatory variable
- the true value of 1 in November 2020, and the true value of explanatory variable 1 in November 2020 is used as the data associated with the third time series data of explanatory variable 1, and the true value of the main indicator in November 2020. analogy.
- the way that the electronic device calculates the correlation coefficient between each explanatory variable and the main indicator may be calculated by using the Pearson algorithm, which is an algorithm in the prior art, and will not be repeated in this embodiment of the present application. .
- the electronic device determines the target explanatory variable from the multiple explanatory variables according to the correlation coefficient between each explanatory variable and the main indicator.
- An explanatory variable whose corresponding correlation coefficient is greater than or equal to the second correlation coefficient is determined from the plurality of explanatory variables as the target explanatory variable.
- the second correlation coefficient here may be the same as or different from the first correlation coefficient.
- the manner in which the electronic device performs prediction processing according to the second time series data of each leading indicator in each first combination result may be as follows: the electronic device determines each leading indicator in each first combination result The corresponding target leading period number, and the second time series data of each leading indicator in each first combination result is processed according to the target leading period corresponding to the leading indicator.
- the electronic device performs prediction processing on the time-series data after the staggered period of each leading indicator in each first combination result.
- the staggered period processing referred to here is a process of staggering the leading indicator and the main indicator in time according to the corresponding target leading period number.
- the real value of leading indicator 1 in December 2020 may correspond to the value of the main indicator in December 2020, and the real value of leading indicator 1 in November 2020 and the main indicator in November 2020 corresponds to the value of , and so on.
- the real value of leading indicator 1 in November 2020 corresponds to the value of the main indicator in December 2020
- the real value of leading indicator 1 in October 2020 corresponds to the main indicator.
- the value in November 2020 corresponds to, and so on. It can be seen from the above process that the staggered processing can make the data of the leading indicator at each time correctly correspond to the data of the main indicator at each time.
- the electronic device after performing prediction processing on the time series data of each leading indicator in each first combination result after the staggered period, can obtain the first prediction for the main indicator corresponding to each first combination result The result set, and then the prediction accuracy for each first combination result can be calculated according to the first prediction result set for the main indicator corresponding to each first combination result and the real value set of the main indicator. For example, the electronic device may, according to each first combination result corresponding to the first prediction result set corresponding to the main indicator and the actual value set of the main indicator, count the correct prediction result for the main indicator corresponding to each first combination result in the first combination result. The proportion of the first prediction result set for the main indicator corresponding to a combination result, and then the prediction accuracy of the first combination result is determined according to the proportion, for example, the proportion can be determined as the prediction accuracy of the first combination result.
- the first time series data of the main indicator includes the real value of the main indicator in December 2020, and the real value of the main indicator in November 2020.
- the combination result 3 Including leading indicator 1 and leading indicator 2
- the second time series data of leading indicator 1 includes the real value of leading indicator 1 in December 2020, the real value of leading indicator 1 in November 2020, and the real value of leading indicator 1 in 2020
- the real value of October includes the real value of leading indicator 2 in December 2020, the real value of leading indicator 2 in November 2020, the real value of leading indicator 2 in October 2020 True value...
- the second time series data of leading indicator 1 is processed according to the leading period of the first period, and the time series of leading indicator 1 after the wrong period can be obtained.
- Data, in the time series data of the leading indicator 1 after the wrong period the real value of the leading indicator 1 in November 2020 and the value of the main indicator in December 2020 (including the real value of the main indicator in December 2020 and The predicted value of the main indicator in December 2020, the real value of the leading indicator 1 in November 2020 can be used to determine the predicted value of the main indicator in December 2020); the real value of the leading indicator 1 in October 2020 It corresponds to the value of the main indicator in November 2020 (including the real value of the main indicator in November 2020 and the predicted value of the main indicator in November 2020), and so on.
- the second time series data of leading indicator 2 is processed in accordance with the first two periods, and the time series data of leading indicator 2 after the wrong period can be obtained.
- the real value of leading indicator 2 in October 2020 and the value of the main indicator in December 2020 corresponds to the actual value of the leading indicator 2 in September and the value of the main indicator in November 2020 (including the real value of the main indicator in November 2020 and the main indicator in November 2020). predicted value), and so on.
- the time series data of the leading indicator 1 and the time series data of the leading indicator 2 can be calculated. Prediction processing to obtain a set of prediction results for the main indicator corresponding to the combination result 3.
- the prediction result set includes the prediction result of the main indicator in December 2020 and the prediction result of the main indicator in November 2020... At this time, you can The prediction accuracy corresponding to the combination result 3 is further determined in combination with the actual value set of the main indicators.
- the actual value set of the main indicator includes the actual value of the main indicator in December 2020 and the predicted value of the main indicator in December 2020,,,,,,,, In one example, in determining the prediction accuracy corresponding to the combination result 3
- the electronic device can count the correct prediction result of the main indicator corresponding to the combination result 3 in the corresponding pair of the combination result 3 according to the set of prediction results for the main indicator corresponding to the combination result 3 and the actual value set of the main indicator.
- the proportion of the main indicator's prediction result set, and then the prediction accuracy of the combined result 3 is determined according to the proportion. For example, the proportion can be determined as the prediction accuracy of the combined result 3 .
- each first combination result determines a first combination result that meets a first preset condition from at least one first combination result
- the electronic device may, according to the prediction accuracy of each first combination result, determine the first combination result with the highest prediction accuracy from at least one first combination result as the first combination result that meets the first preset condition .
- the first combination result that satisfies the first preset condition is the first combination result that can make the prediction accuracy of the target classification model reach the highest level. In this way, for each pre-trained classification model, a combination of antecedent indicators that can achieve the highest prediction accuracy of the pre-trained classification model can be determined.
- S204 using the target classification model to perform S-phase retrospective verification on the first combination result that meets the first preset condition, to obtain S retrospective accuracies; the S is an integer greater than or equal to 1.
- S205 Determine the average calculated according to the S backtracking precisions as the classification backtracking precision of the target classification model.
- step S204-step S205 the electronic device can use the target classification model to perform S-period retrospective verification on the first combination result that meets the first preset condition, obtain S retrospective accuracies, and calculate the S retrospective accuracies according to the S retrospective accuracies.
- the resulting mean is determined as the classification backtracking accuracy of the target classification model.
- the S period backtracking may be 7 monthly retrospectives (1 period is 1 month) or 4 quarterly retrospectives (1 period is 1 quarter), etc., which is not limited in this embodiment of the present application.
- the purpose of backtracking is to backtrack the data of the past period to calculate the backtracking accuracy.
- the following will illustrate the process of using the target classification model in the first number of pre-trained classification models to perform S-phase retrospective verification on the first combination result that meets the first preset condition to obtain S retrospective accuracies.
- the first combined result includes the combined result 3, and the combined result 3 includes the leading indicator 1 and the leading indicator 2.
- the target lead period corresponding to leading indicator 1 is lead 1 period
- the target lead period corresponding to leading indicator 2 is lead 2 period.
- the S-period backtracking is a monthly backtracking of 7 periods, then the electronic equipment can use the target classification model to perform prediction processing according to the actual value of the leading indicator 1 in November 2020 and the actual value of the leading indicator 2 in October 2020, and obtain the main indicator in 2020.
- the forecast result in December 2020 can then be calculated according to the forecast result of the main indicator in December 2020 and the real value of the main indicator in December 2020 to obtain the first forecast accuracy as the first retrospective accuracy.
- the electronic device can also use the target classification model to perform prediction processing according to the real value of the leading indicator 1 in October 2020 and the real value of the leading indicator 2 in September 2020, and obtain the prediction result of the main indicator in November 2020, Then, according to the prediction result of the main indicator in November 2020 and the real value of the main indicator in November 2020, the second prediction accuracy is calculated as the second retrospective accuracy, and so on, the target classification model can be used to calculate the accuracy according to the leading indicator.
- the 7th prediction accuracy is calculated as the 7th retrospective accuracy from the real value calculation in June 2020.
- the 7 retrospective accuracy can be calculated through the target classification model, and the target can be obtained by calculating the average of these 7 retrospective accuracy.
- the classification backtracking precision of the classification model where the average of the seven backtracking precisions can be determined as the classification backtracking precision of the target regression model.
- the to-be-predicted data includes the first value of the specified leading indicator and the second value of the specified explanatory variable
- use each pre-trained classification model in the first number of pre-trained classification models to classify the specified leading indicator Perform classification processing on the first value of , to obtain a first number of classification results, and use the classification result with the largest number of occurrences determined from the first number of classification results as the first prediction result for the predicted main index.
- S209. Determine the final prediction result for the main index according to the first prediction result, the second prediction result, and the average classification and backtracking precision, and the average classification and backtracking precision is pre-trained according to the first number The mean calculated from the classification backtracking accuracy of each pretrained classification model in the classification model.
- steps S206-S209 reference may be made to steps S101-S104 in the embodiment of FIG. 1, which is not repeated in this embodiment of the present application.
- the plurality of combined results may further include at least one second combined result
- the second combined result refers to a combined result composed of at least one leading indicator and at least one explanatory variable, each second combined result different.
- the electronic device can use the target regression model in the second preset number of pre-trained regression models, according to the second time series data of each leading indicator in each second combination result and the second time series data in each second combination result.
- the target regression model is any pre-trained regression model in the second number of pre-trained regression models ;
- the electronic device determines a second combination result that meets the second preset condition from at least one second combination result according to the prediction accuracy of each second combination result, and utilizes the second number of pre-trained regression models in the regression model.
- the target regression model performs T-period retrospective verification on the second combination result that meets the second preset condition, and obtains T retrospective accuracies; the T is an integer greater than or equal to 1;
- the mean is determined as the regression backtracking accuracy of the target regression model.
- T may be the same as S, or it may be different.
- the T period backtracking may be 7 monthly retrospectives (for example, 1 period may be 1 month), or 4 quarterly retrospectives (for example, 1 period may be 1 quarter), etc., which is not limited in this embodiment of the present application.
- the second combined result includes the combined result 4, and the combined result 4 includes the leading indicator 1 and the explanatory variable 1.
- the number of target leading periods corresponding to leading indicator 1 is leading period 1.
- the electronic device can use the target regression model to perform prediction processing based on the true value of the leading indicator 1 in November 2020 and the true value of the explanatory variable 1 in December 2020, and get the main
- the forecast result of the indicator in December 2020 can then be calculated according to the forecast result of the main indicator in December 2020 and the actual value of the main indicator in December 2020 to obtain the first forecast accuracy as the first retrospective accuracy.
- the electronic device can also use the target regression model to perform prediction processing according to the real value of the leading indicator 1 in October 2020 and the real value of the explanatory variable 1 in November 2020, and obtain the prediction result of the main indicator in November 2020, Then, according to the forecast result of the main indicator in November 2020 and the real value of the main indicator in November 2020, the second prediction accuracy is calculated as the second retrospective accuracy.
- the target regression model can be used to perform prediction processing based on the true value of the leading indicator 1 in May 2020 and the true value of the explanatory variable 1 in June 2020, and the prediction result of the main indicator in June 2020 can be obtained, And according to the prediction result of the main indicator in June 2020 and the real value of the main indicator in June 2020, the seventh prediction accuracy is calculated as the seventh retrospective accuracy.
- the regression backtracking precision of the target regression model can be obtained by calculating the average of these 7 backtracking precisions.
- the average of these 7 backtracking precisions can be determined as the regression backtracking precision of the target regression model.
- the electronic device uses the second preset number of pre-trained regression models for the target regression model, according to the second time series data of each leading indicator in each second combination result and the each After the third time series data of each explanatory variable in the second combination result is predicted, a second set of prediction results for the main indicators corresponding to each second combination result can be obtained.
- the second prediction result set of the main index and the real value set of the main index are calculated to obtain the prediction accuracy of each second combination result.
- the second prediction result set refers to the prediction result set for the main indicator corresponding to the second combined result.
- the aforementioned first prediction result set refers to the prediction result set for the main indicator corresponding to the first combined result.
- the electronic device can obtain the prediction accuracy set of each second combination result by calculating the accuracy calculation formula according to the second prediction result set for the main indicator corresponding to each second combination result and the real value set of the main indicator, and then The prediction accuracy of each second combination result can be calculated according to the prediction accuracy set of each second combination result.
- the mean value corresponding to the prediction accuracy set of each second combination result can be calculated as the prediction of each second combination result. precision.
- the accuracy calculation formula is 1-abs(predicted value-true value)/true value). where abs means to calculate the absolute value.
- the electronic device may, according to the prediction accuracy of each second combination result, determine a second combination result with the highest prediction accuracy from at least one second combination result as the second combination result that meets the second preset condition .
- the second combination result that satisfies the second preset condition is the second combination result that can achieve the highest prediction accuracy of the target regression model.
- a combination of an antecedent indicator and an explanatory variable that can achieve the highest prediction accuracy of the pre-trained regression model can be determined. The following will illustrate the process of using the target regression model in the second number of pre-trained regression models to perform T-phase retrospective verification on the second combination result that meets the second preset condition to obtain T retrospective accuracies.
- the electronic device may also use the S-phase data of each leading indicator in the first combined result satisfying the first preset condition to train the target classification model, so as to achieve the purpose of optimizing the target classification model.
- the S-phase data of the leading indicator may be the S-phase data backdated to the leading indicator in the foregoing retrospective verification process, or the S-phase data of the leading indicator backdated at other times, and so on.
- the first combination result satisfying the first preset condition includes the combination result 3, the combination result 3 includes the leading indicator 1 and the leading indicator 2, the leading period corresponding to the leading indicator 1 is the leading period 1, and the leading period corresponding to the leading indicator 2 is the leading period.
- the number is the first 2 phases, and the first phase is 1 month.
- the S period data of the leading indicator 1 can include the real value of the leading indicator 1 in November 2020.
- the real value of the leading indicator in May 2020 the S period data of the leading indicator 2 can include The true value of the leading indicator 2 in October 2020 across the true value of the leading indicator 2 in April 2020.
- the electronic device can use the real value of the main indicator in December 2020, the real value of the leading indicator 1 in November 2020, and the real value of the leading indicator 2 in October 2020 to train the target classification model, and so on.
- the target classification model can also be trained with the true value of the main indicator in June 2020, the true value of the leading indicator 1 in May 2020, and the true value of the leading indicator 2 in April 2020.
- the electronic device may also use the T period data of each leading indicator in the second combined result satisfying the second preset condition and the T period data of each explanatory variable in the second combined result satisfying the second preset condition Train the target regression model to achieve the purpose of optimizing the target regression model.
- the T-period data of the leading indicator may be the T-period data backdated to the leading indicator in the foregoing retrospective verification process, or the T-period data of the leading indicator backtracked at other times, and so on.
- the T-period data of the explanatory variables can be the T-period data of the explanatory variables in the previous retrospective verification process, or the T-period data of the explanatory variables can be traced back at other times, and so on.
- the second combination result satisfying the second preset condition includes combination result 4, which includes leading indicator 1 and explanatory variable 1.
- the number of leading periods corresponding to leading indicator 1 is leading period 1, and period 1 is 1 month. Assuming that the T period is 7, then the T period data of the leading indicator 1 can include the real value of the leading indicator 1 in November 2020...
- the real value of the leading indicator in May 2020, the T period data of the explanatory variable 1 can include The true value of explanatory variable 1 in December 2020 acrossthe true value of explanatory variable 1 in June 2020.
- the electronic device can use the true value of the main indicator in December 2020, the true value of the leading indicator 1 in November 2020, and the true value of the explanatory variable 1 in December 2020 to train the target regression model, and so on.
- the objective regression model can also be trained using the true value of the main indicator in June 2020, the true value of the leading indicator 1 in May 2020, and the true value of the explanatory variable 1 in June 2020.
- the aforementioned first number of pretrained classification models and the aforementioned second number of regression models may be determined in the following manner:
- the first data set includes the first time series data of the main index, and the wrong period of each leading index in the target leading index subsequent time series data.
- the first dataset may be represented as ⁇ (x1, y1)...(xn, yn) ⁇ .
- x represents the target leading indicator.
- y represents the main indicator.
- the time series data of each leading indicator in the target leading indicators after the staggered period is obtained by the following method: the electronic device performs the second time series data of each leading indicator in the target leading indicator according to the target corresponding to the leading indicator. The number of periods is staggered, and the time series data after the staggered period of each leading indicator in the target leading indicators are obtained.
- the second data set includes the first time series data of the main indicator, the time series data of each leading indicator in the target leading indicator after the wrong date, and the time series data of each explanatory variable in the target explanatory variable.
- the second dataset may be represented as ⁇ (x1, z1), y1 ⁇ ... ⁇ (xn, zn), yn ⁇ .
- z represents the target explanatory variable.
- the electronic device may input the first training set into five initial classification models, and then use the first training set to train the five initial classification models to obtain five pre-trained classification models.
- the electronic device may input the second training set into five initial regression models, and then use the second training set to train N initial regression models to obtain five pre-trained regression models.
- a first number of pre-trained classification models are determined from the M pre-trained classification models.
- the electronic device may determine, from the M pre-trained classification models, a first number of classification models with the highest prediction accuracy for the first validation set.
- the electronic device may sort the M pre-trained classification models in descending order according to the prediction accuracy of the first validation set, and select the first number of pre-trained classification models in the front.
- a second preset number of pre-trained regression models are determined from the N pre-trained regression models.
- the electronic device may determine a second number of regression models with the highest prediction accuracy for the second validation set from the N pre-trained regression models.
- the electronic device may sort the N pre-trained regression models in descending order according to the prediction accuracy of the second validation set, and select the second number of pre-trained regression models in the front.
- the electronic device can combine each leading indicator in the target leading indicator and each explanatory variable in the target explanatory variable to obtain multiple combined results, and the multiple combined results include at least one first combined result , and then use the target classification model in the first preset number of pre-trained classification models to perform prediction processing according to the second time series data of each leading indicator in each first combination result, and obtain the first combination result for each Therefore, according to the prediction accuracy of each first combination result, a first combination result that meets the first preset condition is determined from at least one first combination result, so as to use the target classification model to match the first preset condition.
- the first combination of results is S-period backtracking verification, and S backtracking accuracies are obtained, and the mean value calculated according to the S backtracking accuracies is determined as the classification backtracking accuracy of the target classification model.
- the above process can determine the first combination result with high prediction accuracy, and then perform S-phase retrospective verification through this first combination result, and then the classification retrospective accuracy of the target classification model can be obtained.
- the backtracking accuracy is relatively accurate, and the subsequent use of it in the process of determining the final prediction result of the main indicator can also make the final prediction result of the main indicator more accurate.
- This application relates to blockchain technology, for example, the final prediction result of the main indicator can be written into the blockchain.
- the final prediction result of the main indicator can be compared with the real value of the main indicator at the target time later, so as to analyze the prediction error caused by using the indicator prediction method based on machine learning described in the embodiment of the present application .
- FIG. 3 is a schematic structural diagram of an apparatus for predicting indicators based on machine learning according to an embodiment of the present application.
- the apparatus can be used in electronic equipment.
- the device includes:
- the obtaining module 301 is used for obtaining the data to be predicted.
- the classification processing module 302 is configured to use each pre-trained classification in the first number of pre-trained classification models when the data to be predicted includes the first value of the specified leading indicator and the second value of the specified explanatory variable
- the model performs classification processing on the first value of the specified leading indicator to obtain a first number of classification results, and uses the classification result with the largest number of occurrences determined from the first number of classification results as the prediction of the main indicator. The first prediction result.
- the regression processing module 303 is configured to perform regression processing on the first value of the specified leading indicator and the second value of the specified explanatory variable by using each of the pre-trained regression models in the second number of pre-trained regression models to obtain a second value. A number of regression results, and the mean value calculated according to the second number of regression results is used as the second prediction result for the main indicator.
- a determination module 304 configured to determine the final prediction result for the main indicator according to the first prediction result, the second prediction result and the average classification and backtracking accuracy, and the average classification and backtracking precision is based on the first quantity The mean calculated from the classification backtracking accuracy of each of the pre-trained classification models.
- the determining module 304 determines the final prediction result for the main indicator according to the first prediction result, the second prediction result, and the classification and backtracking accuracy, specifically, when the average value of the classification and backtracking accuracy is less than When the first preset value is used, the second prediction result is used as the final prediction result for the main indicator; when the average of the classification and backtracking accuracy is greater than or equal to the first preset value but less than the second preset value, the A first regression result in the same direction as the first prediction result is determined from the second number of regression results, and a distance from the first regression result is determined from the regression results in the opposite direction to the first prediction result.
- the final prediction result for the main indicator is determined according to the first regression result and the second regression result; in the case that the mean value of the classification and backtracking accuracy is greater than or equal to the second preset value , if the mean value of classification and backtracking precision is less than or equal to the mean value of regression and backtracking precision, then the second prediction result is used as the final prediction result for the main index; A first regression result in the same direction as the first prediction result is determined from the second number of regression results, and a final prediction result for the main indicator is determined according to the first regression result, and the regression backtracking accuracy
- the mean value is the mean value calculated according to the regression backtracking precision of each pretrained regression model in the second number of pretrained regression models.
- the determining module 304 is further configured to combine each leading indicator in the target leading indicator and each explanatory variable in the target explanatory variable to obtain multiple combined results; the target leading indicator is based on the main The first time series data of the indicator is determined from a plurality of leading indicators associated with the main indicator, and the target explanatory variable is determined from a plurality of explanatory variables associated with the main indicator according to the first time series data.
- the target leading indicator includes the specified leading indicator, and the target explanatory variable includes the specified explanatory variable; the multiple combined results include at least one first combined result, and the first combined result refers to at least one combined result.
- a combined result composed of a leading indicator, each first combined result is different; using the target classification model in the first preset number of pre-trained classification models, according to the first index of each leading indicator in the first combined result. 2. Perform prediction processing on time series data to obtain the prediction accuracy of each first combination result; the target classification model is any pre-trained classification model in the first number of pre-trained classification models; according to For the prediction accuracy of each first combination result, a first combination result that meets the first preset condition is determined from at least one first combination result; the target classification model is used to determine the first combination result meeting the first preset condition.
- the first combined result is subjected to S-period retrospective verification, and S retrospective accuracies are obtained; the S is an integer greater than or equal to 1; the mean value calculated according to the S retrospective accuracies is determined as the classification retrospective accuracies of the target classification model .
- the determining module 304 performs prediction processing according to the second time series data of each leading indicator in each first combined result, specifically determining the The number of target leading periods corresponding to each leading indicator; the second time series data of each leading indicator in each first combination result is processed according to the target leading period corresponding to the leading indicator, and the leading indicator is obtained.
- the plurality of combined results further include at least one second combined result
- the second combined result refers to a combined result composed of at least one leading indicator and at least one explanatory variable
- each The two combination results are different
- the determining module 304 is further configured to use the target regression model in the second preset number of pre-trained regression models, according to the second time series data of each leading indicator in each second combination result and the third time series data of each explanatory variable in each second combination result is subjected to prediction processing to obtain the prediction accuracy of each second combination result
- the target regression model is the second number of Any pre-trained regression model in the pre-trained regression models
- according to the prediction accuracy of each second combination result determine a second combination result that meets the second preset condition from at least one second combination result
- Use the target regression model in the second number of pre-trained regression models to perform T-period retrospective verification on the second combination result that meets the second preset condition, and obtain T retrospective accuracies
- the T is greater than or equal to 1 Integer
- the determining module 304 is further configured to obtain the first time series data of the main indicator, and the second time series data of each leading indicator in the multiple leading indicators associated with the main indicator, and the third time series data of each explanatory variable in the plurality of explanatory variables associated with the main indicator; determine the data associated with each second time series data in the first time series data, and according to each data and For the data associated with each second time series data, the correlation coefficient between each leading indicator and the main indicator is calculated, and the target is determined from the plurality of leading indicators according to the correlation coefficient between each leading indicator and the main indicator.
- Leading indicators and determining the number of target leading periods corresponding to each leading indicator in the target leading indicators; determining the data associated with each data in the first time series data in each third time series data, and according to each data and each data In the data associated with each third time series data, the correlation coefficient between each explanatory variable and the main indicator is calculated, and the target explanatory variable is determined from the plurality of explanatory variables according to the correlation coefficient between each explanatory variable and the main indicator .
- the apparatus for predicting indicators based on machine learning further includes a training module 305 .
- the training module 305 is configured to construct a first data set, and divide the first data set into a first training set and a first validation set; the first data set includes the main The first time series data of the indicator, and the time series data of each leading indicator in the target leading indicators after the wrong date; constructing a second data set, and dividing the second data set into a second training set and a second validation set ; the second data set includes the first time series data of the main indicator, and the time series data of each leading indicator in the target leading indicator after the wrong period, and the time series data of each explanatory variable in the target explanatory variable; using The first training set respectively trains M initial classification models to obtain M pre-trained classification models, and uses the second training set to train N initial regression models respectively to obtain N pre-trained regression models, M is an integer greater than or equal to 2, and N is an integer greater than or equal to 2.
- the determining module 304 is further configured to predict the first verification set by using each pre-trained classification model in the M pre-trained classification models, and obtain the result of each pre-trained classification model on the first validation set.
- a prediction accuracy of a validation set, and a first number of pre-trained classification models are determined from the M pre-trained classification models according to the prediction accuracy of each pre-trained classification model on the first validation set.
- Use each pre-trained regression model of the N pre-trained regression models to predict the second validation set obtain the prediction accuracy of each pre-trained regression model on the second validation set, and calculate the prediction accuracy of each pre-trained regression model according to each pre-trained regression model.
- a second preset number of pre-trained regression models are determined from the N pre-trained regression models.
- FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- the electronic device described in this embodiment may include: one or more processors 1000 and a memory 2000 .
- the processor 1000 and the memory 2000 may be connected through a bus or the like.
- the processor 1000 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC) , Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the memory 2000 can be a high-speed RAM memory, or a non-volatile memory, such as a disk memory.
- the memory 2000 is used to store a set of program codes, and the processor 1000 can call the program codes stored in the memory 2000 . specifically:
- the processor 1000 is configured to obtain the data to be predicted; when the data to be predicted includes the first value of the specified leading indicator and the second value of the specified explanatory variable, use each of the first number of pre-trained classification models
- the pre-trained classification model performs classification processing on the first value of the specified leading indicator to obtain a first number of classification results, and uses the classification result with the largest number of occurrences determined from the first number of classification results as the predicted
- the first prediction result of the main indicator of a second number of regression results, and the mean calculated according to the second number of regression results is used as the second prediction result for the main indicator; according to the first prediction result, the second prediction result and the classification
- the mean value of backtracking precision is to determine the final prediction result for the main indicator
- the mean value of classification backtracking precision is the mean value calculated according to the classification backtracking precision of each pre-trained classification model in the first number of pre-trained classification models .
- the final prediction result for the main indicator is determined according to the first prediction result, the second prediction result, and the classification and backtracking accuracy, specifically when the average value of the classification and backtracking accuracy is less than the first preset
- the value of A first regression result in the same direction as the first prediction result is determined from the number of regression results, and a second regression result closest to the first regression result is determined from the regression results in the opposite direction to the first prediction result.
- the final prediction result for the main indicator is determined according to the first regression result and the second regression result; in the case that the mean value of the classification and backtracking precision is greater than or equal to the second preset value, if the If the mean value of classification and backtracking precision is less than or equal to the mean value of regression and backtracking precision, the second prediction result is used as the final prediction result for the main indicator.
- the first regression result in the same direction as the first prediction result is determined from the number of regression results, and the final prediction result for the main indicator is determined according to the first regression result.
- the processor 1000 is further configured to combine each leading indicator in the target leading indicator and each explanatory variable in the target explanatory variable to obtain multiple combined results;
- the target leading indicator is based on the first index of the main indicator.
- the time series data is determined from a plurality of leading indicators associated with the main indicator, and the target explanatory variable is determined from a plurality of explanatory variables associated with the main indicator according to the first time series data;
- the target leading indicator includes the specified leading indicator, the target explanatory variable includes the specified explanatory variable;
- the multiple combined results include at least one first combined result, and the first combined result refers to at least one leading indicator.
- the combination results of each first combination result are different; using the target classification model in the first preset number of pre-trained classification models, according to the second time series data of each leading indicator in each first combination result Perform prediction processing to obtain the prediction accuracy for each of the first combined results; the target classification model is any pre-trained classification model in the first number of pre-trained classification models; according to each of the pre-trained classification models
- the prediction accuracy of the first combination result, the first combination result that meets the first preset condition is determined from at least one first combination result; the first combination result that meets the first preset condition is determined by using the target classification model.
- the processor 1000 performs prediction processing according to the second time series data of each leading indicator in each of the first combined results, specifically determining that each leading indicator in each of the first combined results corresponds to The number of target leading periods of The time series data of each first combination result is predicted and processed for the time series data of each leading indicator in the first combination result after the staggered period.
- the plurality of combined results further include at least one second combined result
- the second combined result refers to a combined result composed of at least one leading indicator and at least one explanatory variable, and each second combined result is different .
- the processor 1000 is further configured to use the target regression model in the second preset number of pre-trained regression models, according to the second time series of each leading indicator in each second combination result performing prediction processing on the data and the third time series data of each explanatory variable in each of the second combined results to obtain the prediction accuracy of each of the second combined results;
- the target regression model is the second quantity Any pre-trained regression model among the pre-trained regression models; according to the prediction accuracy of each second combination result, determine a second combination result that meets the second preset condition from at least one second combination result Utilize the target regression model in the regression model of the second number of pre-training to carry out T phase retrospective verification to the second combination result that meets the second preset condition, and obtain T retrospective precisions; Described T is greater than or equal to 1
- the mean value calculated according to the T backtracking precisions is determined as the regression backtracking precision of the target regression model.
- the processor 1000 is further configured to acquire the first time series data of the main indicator, and the second time series data of each leading indicator in the multiple leading indicators associated with the main indicator, and the main indicator The third time series data of each explanatory variable in the multiple explanatory variables associated with the indicator; determine the data associated with each data in the first time series data in each second time series data, and according to each data and each data in each
- the correlation coefficient between each leading indicator and the main indicator is calculated, and the target leading indicator is determined from the plurality of leading indicators according to the correlation coefficient between each leading indicator and the main indicator, and Determine the number of target leading periods corresponding to each leading indicator in the target leading indicators; determine the data associated with each data in the first time series data in each third time series data, and according to each data and each data in each third time series data.
- the correlation coefficient between each explanatory variable and the main indicator is calculated, and the target explanatory variable is determined from the plurality of explanatory variables according to the
- the processor 1000 is further configured to construct a first data set, and divide the first data set into a first training set and a first validation set;
- the first data set includes the first data set of the main indicator. a time series data, and the time series data of each leading indicator in the target leading indicators after the wrong period; constructing a second data set, and dividing the second data set into a second training set and a second validation set;
- the second data set includes the first time series data of the main indicator, the time series data of each leading indicator in the target leading indicator after the wrong date, and the time series data of each explanatory variable in the target explanatory variable;
- One training set respectively trains M initial classification models to obtain M pre-trained classification models, and uses the second training set to train N initial regression models respectively to obtain N pre-trained regression models, where M is greater than or an integer equal to 2, N is an integer greater than or equal to 2; use each pre-trained classification model in the M pre-trained classification models to predict the first validation set, and obtain the first validation set for each
- the prediction accuracy of the validation set and according to the prediction accuracy of each pre-trained classification model for the first validation set, determine the first number of pre-trained classification models from the M pre-trained classification models; use N pre-trained classification models
- Each pre-trained regression model in the regression model predicts the second validation set, obtains the prediction accuracy of each pre-trained regression model on the second validation set, and predicts the second validation set according to each pre-trained regression model.
- a second preset number of pre-trained regression models are determined from the N pre-trained regression models.
- the processor 1000 described in the embodiments of the present application may execute the implementation manners described in the embodiments of FIG. 1 and FIG. 2 , and may also execute the implementation manners described in the embodiments of the present application, which will not be repeated here. .
- Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method in the foregoing embodiment can be implemented, and details are not described herein again.
- the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
- Each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.
- the above-mentioned integrated modules can be implemented in the form of sampling hardware or in the form of sampling software function modules.
- the computer-readable storage medium can be volatile or non-volatile.
- the computer storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), and the like.
- the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; Use the created data, etc.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Computation (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
一种基于机器学习的指标预测方法、装置、设备及存储介质,方法包括:利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果以确定对所预测的主指标的第一预测结果(S102);利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果以确定对主指标的第二预测结果(S103);根据第一预测结果、第二预测结果和分类回溯精度均值,确定对主指标的最终预测结果(S104)。采用上述方法,可以融合不同的模型以提升统计和计量过程的预测精度。上述方法涉及区块链技术,可将主指标的最终预测结果写入区块链中。
Description
本申请要求于2021年4月23日提交中国专利局、申请号为202110442036.5,发明名称为“基于机器学习的指标预测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能技术领域,尤其涉及一种基于机器学习的指标预测方法、装置、设备及存储介质。
传统的统计和计量方法通过对历史数据、信息的整理及分析研究,不仅可以对经济活动的现状做出定性和定量的结论,深化对经济活动内在规律的认识,而且还能够结合经济现象的历史状况,运用科学的方法,对经济现象未来的发展前景进行预测。但发明人意识到,如今数据量日益增大,单靠传统的统计和计量方法并不能较好地完成经济走势的预测,因此学者开始将计算机技术引入其研究中,利用一些传统的机器学习方法来捕捉未来经济的走势状态。但是对于单体模型来说,对于不同的数据,不同的模型有不同的表达能力,因此如何有效融合不同的模型以提升统计和计量过程的预测精度成为亟待解决的问题。
发明内容
本申请实施例提供了一种基于机器学习的指标预测方法、装置、设备及存储介质,可以有效融合不同的模型以提升统计和计量过程的预测精度。
第一方面,本申请实施例提供了一种基于机器学习的指标预测方法,包括:获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
第二方面,本申请实施例提供了一种基于机器学习的指标预测装置,包括:获取模块,用于获取待预测数据;分类处理模块,用于在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;回归处理模块,用于利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;确定模块,用于根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
第三方面,本申请实施例提供了一种电子设备,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理, 得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
本申请有效的融合了各个预训练的分类模型的第一预测结果以及各个预训练的回归模型的第二预测结果,然后结合各个预测结果以及分类回溯精度用于确定对主指标的最终预测结果,相较于现有技术采用单体模型对主指标进行预测的方式,考虑到不同模型有不同的表达能力所带来的预测精度低的问题,本申请通过上述过程能够有效地提升统计和计量过程的预测精度。
下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的一种基于机器学习的指标预测方法的流程示意图;
图2是本申请实施例提供的另一种基于机器学习的指标预测方法的流程示意图;
图3是本申请实施例提供的一种基于机器学习的指标预测装置的结构示意图;
图4是本申请实施例提供的一种电子设备的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
其中,本申请实施例提及的先行指标指对未来经济发展产生影响的经济指标,这是在经济增长或衰退尚未来临之际就率先发生变动的指标。它可以预测经济周期中的转折点和估计经济活动升降的幅度,从而推测经济波动的趋向。先行指标在本申请中可称之为超前指标,可在所预测的主指标发生变动前就率先发生变动。
其中,本申请实施例提及的解释变量可称为说明变量和可控制变量,是经济计量模型中的自变量。在一些情况下,解释变量会对作为因变量的经济变量产生影响。
其中,本申请实施例提及的主指标为需要预测的某个指标,例如主指标可以是作为因变量的经济变量。
其中,本申请实施例提及的先行期数为先行指标与主指标在时间上错开的期数。比如,以先行指标1对应有先行1期为例,1期为1个月,可以将先行指标1在2020年12月的取值与主指标在2021年1月的取值对应,先行指标1在2020年11月的取值与主指标在2020年12月的取值对应,以此类推,这种使得先行指标1与主指标在时间上错开1期,即理解为先行1期。再如,以先行指标2对应有先行2期为例,1期为1个月,可以将先行指标2 在2020年11月的取值与主指标在2021年1月的取值对应,先行指标2在2020年10月的取值与主指标在2020年12月的取值对应,以此类推,上述过程先行指标2与主指标在时间上错开2期,即理解为先行2期。
本申请实施例中,电子设备可以在待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果。与此同时,电子设备还可以利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据第二数量个回归结果计算得到的均值作为对主指标的第二预测结果。在得到第一预测结果以及第二预测结果后,电子设备可以根据第一预测结果、第二预测结果以及分类回溯精度均值,确定对主指标的最终预测结果。采用该过程,电子设备有效的融合了不同的模型以提升统计和计量过程的预测精度。
在一个实施例中,电子设备在待预测数据包括指定先行指标的第一取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果。在得到第一预测结果后,电子设备可以将第一预测结果作为对主指标的最终预测结果。在这种情况下,电子设备通过第一数量个预训练的分类模型,便可以得到准确的主指标的最终预测结果。
在一个实施例中,电子设备在待预测数据包括指定解释变量的第二取值时,可以利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据第二数量个回归结果计算得到的均值作为对主指标的第二预测结果。在得到第二预测结果后,电子设备可以将第二预测结果作为对主指标的最终预测结果。在这种情况下,电子设备通过第二数量个预训练的回归模型,便可以得到准确的主指标的最终预测结果。
下面对本申请实施例提及的一种基于机器学习的指标预测方法及装置进行阐述。
请参阅图1,为本申请实施例提供的一种基于机器学习的指标预测方法的流程示意图。该方法可以应用于电子设备,电子设备可以为服务器或用户终端。服务器可以为一台服务器或服务器集群,用户终端可以为笔记本电脑、台式电脑等智能终端。具体地,该方法可以包括以下步骤:
S101、获取待预测数据。
其中,待预测数据可以包括指定先行指标的第一取值和/或指定解释变量的第二取值。其中,所述的指定先行指标可以为一个或多个,所述的指定解释变量也可以为一个或多个。指定先行指标的第一取值能够用于确定对主指标的第一预测结果,指定先行指标的第一取值以及指定解释变量的第二取值能够用于确定对主指标的第二预测结果。第一预测结果可以为主指标在目标时间(时间单位可以为日、月、季度、年等)的其中一种预测结果,第二预测结果可以为主指标在目标时间的另一种预测结果,两种预测结果能够用于确定主指标的最终预测结果,也就是能够用于确定主指标在目标时间的最终的预测结果。
在一个实施例中,指定先行指标在不同时间可以有不同取值。指定先行指标的第一取值可以根据指定先行指标对应的目标先行期数确定出。在一个实施例中,指定先行指标的第一取值可以根据指定先行指标对应的目标先行期数以及前述提及的目标时间确定出。例如,指定先行指标为先行指标1,先行指标1对应的目标先行期数为先行1期。在1期为1个月时,如果要预测主指标在2021年1月的取值,那么先行指标1的第一取值为先行指标1在2020年12月的取值。再如,指定先行指标为先行指标2,先行指标2对应的目标先行 期数为先行2期。在1期为1个月时,如果所预测的是主指标在2021年1月的取值,那么先行指标2的第一取值为先行指标2在2020年11月的取值。
在一个实施例中,指定解释变量在不同时间也可以有不同取值。指定解释变量的第二取值可以根据前述提及的目标时间确定出。在一个实施例中,指定解释变量的第二取值可以为指定解释变量在目标时间的取值。例如,指定解释变量为解释变量1,如果所预测的是主指标在2021年1月的取值,那么解释变量1的第二取值为解释变量1在2021年1月的取值。
由于不同模型本身原理特性,其预测结果会具有不同的趋势偏好,这将对最终融合结果带来潜在的稳定性风险。为了进一步提升预测模型鲁棒性,针对不同模型采用不同的方法得到最终预测结果。其针对不同模型采用不同的方法得到最终预测结果的过程可以参见步骤S102-步骤S104的过程。
S102、在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果。
本申请实施例中,电子设备可以在待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,将指定先行指标的第一取值作为第一数量个预训练的分类模型中每个预训练的分类模型的输入数据,通过每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到预训练的分类模型的分类结果,共计得到第一数量个分类结果。电子设备在得到第一数量个分类结果后,可以根据第一数量个分类结果确定对主指标的第一预测结果。在一个实施例中,电子设备可以采用投票法等方法从第一数量个分类结果中确定出出现次数最多的分类结果,并将出现次数最多的分类结果作为对主指标的第一预测结果。其中,分类模型为任意可以用于分类处理的神经网络模型等模型。在一个实施例中,所述的分类结果可以为预测的趋势(未来发展的趋势),如上升、下降或不变。
举例来说,假设第一数量个预训练的分类模型包括预训练的分类模型1、预训练的分类模型2、预训练的分类模型3、预训练的分类模型4、预训练的分类模型5。电子设备可以将指定先行指标的第一取值分别作为这5个预训练的分类模型的输入数据,通过这5个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到每个预训练的分类模型的分类结果,共计5个分类结果。假设分类结果为预测的趋势,这5个分类结果分别为上升、上升、下降、不变、上升。于是,电子设备可以采用投票法等方法从这个5个分类结果中确定出现次数最多的分类结果为上升,从而将上升作为对主指标的第一预测结果。
S103、利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果。
本申请实施例中,电子设备可以将指定先行指标的第一取值以及指定解释变量的第二取值作为第二数量个预训练的回归模型中每个预训练的回归模型的输入数据,通过每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到每个预训练的回归模型的回归结果,共计第二数量个回归结果。电子设备在得到第二数量个回归结果后,可以根据第二数量个回归结果确定出对主指标的第二预测结果。在一个实施例中,电子设备可以计算第二数量个回归结果的均值,并将根据第二数量个回归结果计算得到的均值作为对主指标的第二预测结果。在一个实施例中,电子设备可以针对第二数量个回归结果中去掉一个最高值并去掉一个最低值,然后计算剩余的回归结果的均值,并将根据剩余的回归几个计算得到的均值作为对主指标的第二预测结果。其中,回归模型可以为任意能够用于回归处理的神经网络模型等模型。在一个实施例中,回归结果可以为 预测的点位,如市场指数所处的位置。
举例来说,假设第二数量个预训练的回归模型包括预训练的回归模型1、预训练的回归模型2,电子设备可以将指定先行指标的第一取值以及指定解释变量作为预训练的回归模型1的输入数据,并将指定先行指标的第一取值以及指定解释变量的取值作为预训练的回归模型2的输入数据,通过这2个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到每个预训练的回归模型的回归结果,共计2个回归结果。假设回归结果为预测的点位,这2个回归结果分别为a、b。电子设备可以计算得到这2个回归结果的均值为(a+b)/2,从而将(a+b)/2作为对主指标的第二预测结果。
S104、根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
其中,分类回溯精度均值可以设为0.6等数值,本申请实施例不做限制。
在执行步骤S102以及步骤S103之后,也就是在模型融合处理完毕后,可以得到两组模型的预测值,分别为对主指标的第一预测结果以及对主指标的第二预测结果,为了输出更为稳定可信的预测结果,本申请实施例还基于模型矫正策略,通过分类回溯精度均值以及第一预设数值、第二预设数值,将两组模型的预测值相互矫正,从而提升预测的准确率,输出足够稳定可靠的指标预测值。在一个应用场景中,该足够稳定可靠的预测可以用于协助相关部门针对研究指标的未来走势采取相应的政策策略。此外,本申请实施例通过运用模型融合处理的技术,充分利用和挖掘了各类模型对数据特征提取的偏好和能力,实现从多个角度和纬度理解数据本质,在一定程度上落地了模型可解释性。
在一个实施例中,电子设备根据该第一预测结果、该第二预测结果以及分类回溯精度确定对该主指标的最终预测结果的过程可以如下:电子设备在分类回溯精度均值小于第一预设数值时,将该第二预测结果作为对该主指标的最终预测结果。分类回溯精度均值小于第一预设数值,说明第一预测结果的可参考性较低,此时结合第一预测结果以及第二预测结果分析得到的主指标的最终预测结果的准确度较低,为了提升主指标的最终预测结果的准确度,可以直接将第二预测结果作为对主指标的最终预测结果。
在一个实施例中,电子设备根据该第一预测结果、该第二预测结果以及分类回溯精度确定对该主指标的最终预测结果的过程还可以如下:电子设备在该分类回溯精度均值大于或等于第一预设数值但小于第二预设数值时,从该第二数量个回归结果中确定出与该第一预测结果同向的第一回归结果,并从与该第一预测结果反向的回归结果中确定出距离该第一回归结果最近的第二回归结果,根据该第一回归结果以及该第二回归结果确定对该主指标的最终预测结果。在一个实施例中,电子设备根据第一回归结果以及第二回归结果确定对主指标的最终预测结果的方式可以为:电子设备将根据第一回归结果以及第二回归结果计算得到的均值作为对主指标的最终预测结果。其中,第一回归结果指与第一预测结果同向的回归结果。与第一预测结果同向的回归结果可以为一个或多个。第二回归结果指从与第一预测结果反向的回归结果中确定出的距离第一回归结果最近的回归结果。此处距离第一回归结果最近的回归结果,可以理解为与第一预测结果反向的回归结果中与第一回归结果之间的差值绝对值最小的回归结果。分类回溯精度均值大于或等于第一预设数值但小于第二预设数值,说明分类回溯精度具有一定可参考性,但是考虑到其可能存在的误差,因此可以首先获取到第一回归结果,然后再结合第二回归结果确定对主指标的最终预测结果,从而减小主指标的最终预测结果的误差。
在一个实施例中,电子设备可以通过以下方式确定出与该第一预测结果同向的第一回归结果以及与该第一预测结果反向的回归结果:若第一预测结果为上升,则从第二数量个回归结果中确定出大于目标取值的回归结果,并将大于目标取值的回归结果确定为与第一 预测结果同向的第一回归结果,目标取值指主指标在目标时间的前一个时间的取值(可以为主指标在目标时间的前一个时间的真实值或主指标在目标时间的前一个时间的最终的预测结果);电子设备还可以从第二数量个回归结果中确定出小于目标取值的回归结果,并将小于目标取值的回归结果确定为与第一预测结果反向的回归结果。在上述这种情况下,回归结果大于目标取值表明该回归结果的趋势为上升,由于第一预测结果也为上升,此时表明该大于目标取值的回归结果与第一预测结果同向。回归结果小于目标取值表明该回归结果的趋势为下降,由于第一预测结果为上升,此时表明该小于目标取值的回归结果与第一预测结果反向。在一个实施例中,电子设备可以通过以下方式确定出与该第一预测结果同向的第一回归结果以及与该第一预测结果反向的回归结果:若第一预测结果为下降,则从第二数量个回归结果中确定出小于目标取值的回归结果,并将第二数量个回归结果中小于目标取值的回归结果,确定为与第一预测结果同向的第一回归结果;电子设备还可以从第二数量个回归结果确定出大于目标取值的回归结果,并将该大于目标取值的回归结果确定为与第一预测结果反向的回归结果。在上述这种情况下,回归结果小于目标取值表明该回归结果的趋势为下降,由于第一预测结果也为下降,此时表明该大于目标取值的回归结果与第一预测结果同向。回归结果大于目标取值表明该回归结果的趋势为上升,由于第一预测结果为下降,此时表明该小于目标取值的回归结果与第一预测结果反向。
在一个实施例中,电子设备根据该第一预测结果、该第二预测结果以及分类回溯精度确定对该主指标的最终预测结果的过程还可以如下:电子设备在该分类回溯精度均值大于或等于第二预设数值的情况下,若该分类回溯精度均值小于或等于回归回溯精度均值,则将该第二预测结果作为对该主指标的最终预测结果,若该分类回溯精度均值大于回归回溯精度均值,则从该第二数量个回归结果中确定出与该第一预测结果同向的第一回归结果,并根据该第一回归结确定对该主指标的最终预测结果,该回归回溯精度均值是根据该第二数量个预训练的回归模型中每个预训练的回归模型的回归回溯精度计算得到的均值。在一个实施例中,电子设备根据该第一回归结果确定对该主指标的最终预测结果的方式可以为:电子设备将根据该第一回归结果计算得到的均值作为对所述主指标的最终预测结果。
在一个实施例中,在无法确定与该第一预测结果同向的第一回归结果时,也就是说在第二数量个回归结果均与第一预测结果反向时,电子设备可以根据第二数量个预训练的回归模型中各个预训练的回归模型的回归回溯精度镜像拟合一个与第一预测结果同方向的值作为本次的最终预测结果。在一个实施例中,电子设备根据第二数量个预训练的回归模型中各个预训练的回归模型的回归回溯精度镜像拟合一个与第一预测结果同方向的值作为主指标的最终预测结果的方式可以为:电子设备将第二数量个预训练的回归模型中各个预训练的回归模型的回归回溯精度与该回归模型的回归结果进行加权处理,得到多个加权结果以计算加权结果均值,并计算该加权结果均值与前述提及的目标取值之间的差值绝对值,并将该目标值与该差值绝对值之和确定为主指标的最终预测结果。在这种情况下,由于分类回溯精度均值具有较高的可参靠性,但是所有的回归结果均与第一预测结果反向,本申请采用镜像拟合的方式计算得到主指标的最终预测结果,从而提升了主指标的最终预测结果的准确度。
可见,图1所示的实施例中,电子设备可以利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;电子设备还可以利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据第二数量个回归结果计算得到的均值作为对主指标的第二预测结果;之后电子设备根据第一预测结果、第二预测结果以及分类回溯精度均值,确定对主指标的 最终预测结果,进而通过融合不同的模型来提升统计和计量过程的预测精度。
请参阅图2,为本申请实施例提供的另一种对指标的预测方法的流程示意图。该方法可以应用于电子设备,电子设备可以为服务器或用户终端。服务器可以为一台服务器或服务器集群,用户终端可以为笔记本电脑、台式电脑等智能终端。具体地,该方法可以包括以下步骤:
S201、对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果。
本申请实施例中,电子设备可以采用逐步回归或单因素筛选的方法等方法对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果。其中,目标先行指标可以是从主指标关联的多个先行指标中确定出的。在一个实施例中,目标先行指标可以是根据主指标的第一时间序列数据从该主指标关联的多个先行指标中确定出的。目标解释变量可以是从该主指标关联的多个解释变量中确定出的。在一个实施例中,目标解释变量可以是根据主指标的第一时间序列数据从主指标关联的多个解释变量中确定出的。第一时间序列数据为主指标的时间序列数据,如第一时间序列数据可以包括主指标在第一时间范围内包括的各个时间的真实值。在一些情况下,由于无法获取到指标在某个时间的真实值,此处便可以采用经验估计法得到主指标在这个时间的估计值。第一时间序列数据能够反映主指标随时间变化的情况。其中,此处主指标采用的时间单位可以为年、季度、月、日等时间单位,具体视实际应用场景而定,本申请实施例对此不做限制。例如,第一时间序列数据可以包括主指标在2020年12月的真实值、主指标在2020年11月的真实值……或,第一时间序列数据可以包括主指标在2020年12月10日的真实值、主指标在2020年12月11日的真实值,或主指标在2020年12月12日的真实值……其中,目标先行指标可以包括本申请实施例提及的指定先行指标。目标解释变量可以包括本申请实施例提及的指定解释变量。多个组合结果可以包括至少一个第一组合结果,第一组合结果指由至少一个先行指标构成的组合结果。每个第一组合结果不同。例如,目标先行指标包括先行指标1和先行指标2,在一种情况下,第一组合结果可以包括组合结果1、组合结果2和组合结果3,组合结果1包括先行指标1、组合结果2包括先行指标2、组合结果3包括先行指标1和先行指标2。
在一个实施例中,目标先行指标和目标解释变量可以通过以下方式确定:电子设备获取主指标的第一时间序列数据,以及该主指标关联的多个先行指标中每个先行指标的第二时间序列数据,以及该主指标关联的多个解释变量中每个解释变量的第三时间序列数据;电子设备确定该第一时间序列数据中各个数据在各个第二时间序列数据关联的数据,并根据各个数据以及各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数,根据该各个先行指标与主指标之间的相关系数从该多个先行指标确定出目标先行指标,并确定该目标先行指标中各个先行指标对应的目标先行期数;电子设备确定该第一时间序列数据中各个数据在各个第三时间序列数据关联的数据,并根据各个数据以及各个数据在各个第三时间序列数据关联的数据,计算各个解释变量与主指标之间的相关系数,根据该各个解释变量与主指标之间的相关系数从该多个解释变量确定出目标解释变量。其中,目标分类模型为第一数量个预训练的分类模型中的任一预训练的分类模型。第二时间序列数据指先行指标的时间序列数据。例如,第二时间序列数据可以包括先行指标在第二时间范围包括的各个时间的真实值。在一些情况下,由于无法获取到先行指标在某个时间的真实值,便可以采用经验估计法得到先行指标在这个时间的估计值。第二时间序列数据能够反映先行指标随时间变化的情况。在一个实施例中,第二时间范围可以根据第一时间范围确定。第三时间序列数据指解释变量的时间序列数据。例如,第三时间序列数据可以包括解释变量在第三时间范围包括的各个时间的真实值。在一些情况下,由于无 法获取到解释变量在某个时间的真实值,便可以采用经验估计法得到解释变量在这个时间的估计值。第三时间序列数据能够反映解释变量随时间变化的情况。在一个实施例中,第三时间范围可以根据第一时间范围确定。采用上述过程,可以筛选出与主指标相关性较大的目标先行指标以与主指标相关性较大的目标解释变量,用以提升对主指标的预测准确度。
在一个实施例中,电子设备确定该第一时间序列数据中各个数据在各个第二时间序列数据关联的数据的方式可以为:电子设备确定该第一时间序列数据中各个数据在各个第二时间序列数据中对应的按照预设先行期数错期后的数据,以作为该各个数据在各个第二时间序列数据中关联的数据。例如,第一时间序列数据包括主指标在2020年12月的真实值,主指标在2020年11月的真实值……多个先行指标包括先行指标1,先行指标1的第二时间序列数据包括该先行指标1在2020年12月的真实值,先行指标1在2020年11月的真实值、先行指标1在2020年10月的真实值……如果预设先行期数包括先行1期,1期为1个月,电子设备可以确定主指标在2020年12月的真实值在先行指标1的第二时间序列数据中对应的按照先行1期错期后的数据为先行指标1在2020年11月的真实值,并将先行指标1在2020年11月的真实值,确定为主指标在2020年12月的真实值在先行指标1的第二时间序列中关联的数据;电子设备还可以确定主指标在先行指标1的第二时间序列数据中对应的按照先行1期错期后的数据为先行指标1在2020年10月的真实值,并将先行指标1在2020年10月的真实值,确定为主指标在2020年11月的真实值在先行指标1的第二时间序列中关联的数据,以此类推。如果预设先行期数包括先行2期,1期为1个月,电子设备可以将先行指标1在2020年10月的真实值确定为主指标在2020年12月的真实值在先行指标1的第二时间序列数据关联的数据,还可以将先行指标1在2020年9月的真实值确定为主指标在2020年11月的真实值在先行指标1的第二时间序列数据关联的数据,以此类推。
在一个实施例中,电子设备根据该各个数据与该各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数的方式可以为:电子设备计算各个数据与该各个数据在多个先行指标中任一先行指标的第二时间序列数据中关联的数据之间的乘积,统计同向的乘积在总的乘积的占比绝对值,并选取最大占比绝对值作为该任一先行指标与主指标之间的相关系数。通过该过程,便可以得到每个先行指标与主指标之间的相关系数。下面结合表1对该过程进行说明。
表1
主指标 | 先行指标1 | 乘积 |
1(2020年12月) | 1(2020年11月) | 1 |
-1(2020年11月) | 1(2020年10月) | 1 |
1(2020年10月) | -1(2020年9月) | -1 |
由上表可以看出,在预设先行期数为先行1期时,主指标在2020年12月的真实值对应先行指标1在2020年11月的真实值,主指标在2020年11月的真实值对应先行指标1在2020年10月的真实值,主指标在2020年10月的真实值对应先行指标1在9月的真实值。其中,主指标在2020年12月的真实值为1、主指标在2020年11月的真实值为-1、主指标在2020年10月的真实值为1,先行指标1在2020年11月的真实值为1、先行指标1在2020年10月的真实值为1,先行指标1在2020年9月的真实值为-1。采用前述过程可以计算出先行指标1与主指标之间的相关系数为2/3。
在一个实施例中,若预设先行期数为一个,则针对每一个先行指标可以计算得到该指标与主指标之间的一个相关系数。例如,若预设先行期数为先行1期,则针对每一个先行指标,可以计算得到该先行指标与主指标之间的一个相关系数。若预设先行期数为多个,则针对每一个先行指标,可以计算得到该先行指标与主指标之间的多个相关系数。例如, 若预设先行期数为先行1期-先行12期(包括12个先行期数),则针对每一个先行指标,可以计算得到该指标与主指标之间的12个相关系数。在一个实施例中,电子设备根据该各个先行指标与主指标之间的相关系数从多个先行指标确定出目标先行指标的方式可以为:电子设备根据该各个先行指标与主指标之间的相关系数,从多个先行指标中确定出对应的相关系数大于或等于第一相关系数的先行指标,以作为目标先行指标。在一个实施例中,电子设备确定目标先行指标对应的目标先行期数的方式可以为:若目标先行指标中的第一先行指标与主指标的相关系数为一个,则将这个相关系数对应的预设先行期数作为第一先行指标的目标先行期数,若目标先行指标中的第二先行指标与主指标之间的相关系数为多个,可以从多个相关系数中确定出某个相关系数(如相关系数最大值)对应的预设先行期数,作为第二先行指标对应的目标先行期数。其中,第一先行指标指与主指标之间存在一个相关系数的先行指标,第二先行指标指与主指标之间存在多个相关系数的先行指标。
在一个实施例中,电子设备确定第一时间序列数据中各个数据在各个第三时间序列数据关联的数据的方式可以为:电子设备确定第一时间序列数据中各个数据在各个第三时间序列数据对应的数据,以作为该各个数据在各个第三时间序列关联的数据。在这个过程中,解释变量无需按照先行期数无需错期,或者说解释变量可以理解为按照先行期数为先行0期进行错期。例如,第一时间序列数据包括主指标在2020年12月的真实值,主指标在2020年11月的真实值……解释变量1的第三时间序列数据包括该解释变量1在2020年12月的真实值,解释变量1在2020年11月的真实值……电子设备可以确定主指标在2020年12月的真实值对应解释变量1在2020年12月的真实值,并将解释变量1在2020年12月的真实值作为主指标在2020年12月的真实值在解释变量1的第三时间序列数据关联的数据,电子设备还可以确定主指标在2020年11月的真实值对应解释变量1在2020年11月的真实值,并将解释变量1在2020年11月的真实值作为主指标在2020年11月的真实值在解释变量1的第三时间序列数据关联的数据,以此类推。
在一个实施例中,电子设备计算各个解释变量与主指标之间的相关系数的方式可以是通过皮尔逊算法计算得到的,皮尔逊算法为现有技术算法,本申请实施例在此不做赘述。在一个实施例中,电子设备根据各个解释变量与主指标之间的相关系数从多个解释变量确定出目标解释变量的方式可以为:电子设备根据各个解释变量与主指标之间的相关系数,从多个解释变量中确定出对应的相关系数大于或等于第二相关系数的解释变量,以作为目标解释变量。此处的第二相关系数可以与第一相关系数相同,也可以不同。
S202、利用第一预设数量个预训练的分类模型中的目标分类模型,根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对所述每个第一组合结果的预测精度。
本申请实施例中,电子设备根据该每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理的方式可以如下:电子设备确定该每个第一组合结果中的各先行指标对应的目标先行期数,并将该每个第一组合结果中的各先行指标的第二时间序列数据,按照该先行指标对应的目标先行期数进行错期处理,得到该先行指标的错期后的时间序列数据;电子设备对该每个第一组合结果中的各先行指标的错期后的时间序列数据进行预测处理。其中,此处所指的错期处理是使先行指标与主指标在时间上按照对应的目标先行期数错开对应的过程。错期前,比如可能是先行指标1在2020年12月的真实值与主指标在2020年12月的取值对应,先行指标1在2020年11月的真实值与主指标在2020年11月的取值对应,以此类推。在按照先行1期错期后,就变为先行指标1在2020年11月的真实值与主指标在2020年12月的取值对应,先行指标1在2020年10月的真实值与主指标在2020年11月的取值对应,以此类推。由上述过程可以看出,错开处理能够使得先行指标在各个时间的数据正确的与主指标在各个时间的数据对应。在一个实施例中,电子设备在对每个第 一组合结果中各先行指标的错期后的时间序列数据进行预测处理后,可以得到每个第一组合结果对应的对主指标的第一预测结果集合,然后可根据每个第一组合结果对应的对主指标的第一预测结果集合以及主指标的真实值集合,计算得到对每个第一组合结果的预测精度。例如,电子设备可以根据每个第一组合结果对应对主指标的第一预测结果集合以及该主指标的真实值集合,统计每个第一组合结果对应的对主指标的正确预测结果在该第一组合结果对应的对主指标的第一预测结果集合中所占的比例,然后根据该比例确定对第一组合结果的预测精度,例如可以将该比例确定为该第一组合结果的预测精度。
举例来说,主指标的第一时间序列数据包括主指标在2020年12月的真实值,主指标在2020年11月的真实值……以第一组合结果包括组合结果3为例,组合结果3包括先行指标1和先行指标2,先行指标1的第二时间序列数据包括先行指标1在2020年12月的真实值、先行指标1在2020年11月的真实值、先行指标1在2020年10月的真实值……先行指标2的第二时间序列数据包括先行指标2在2020年12月的真实值、先行指标2在2020年11月的真实值、先行指标2在2020年10月的真实值……假设先行指标1对应的目标先行期数为先行1期,对先行指标1的第二时间序列数据按照先行1期进行错期处理,便可以得到先行指标1错期后的时间序列数据,在先行指标1错期后的时间序列数据中,先行指标1在2020年11月的真实值与主指标在2020年12月的取值(包括主指标在2020年12月的真实值以及主指标在2020年12月的预测值,先行指标1在2020年11月的真实值能够用于确定主指标在2020年12月的预测值)对应;先行指标1在2020年10月的真实值与主指标在2020年11月的取值(包括主指标在2020年11月的真实值以及主指标在2020年11月的预测值)对应,以此类推。假设先行指标2对应的目标先行期数为先行2期,对先行指标2的第二时间序列数据按照先行2期进行错期处理,便可以得到先行指标2错期后的时间序列数据,在先行指标2错期后的时间序列数据中,先行指标2在2020年10月的真实值与主指标在2020年12月的取值(包括主指标在2020年12月的真实值和主指标在2020年12月的预测值)对应,先行指标2在9月的真实值与主指标在2020年11月的取值(包括主指标在2020年11月的真实值以及主指标在2020年11月的预测值)对应,以此类推。在得到先行指标1错期后的时间序列数据以及先行指标2错期后的时间序列数据后,便可以对先行指标1错期后的时间序列数据以及先行指标2错期后的时间序列数据进行预测处理,得到组合结果3对应的对主指标的预测结果集合,该预测结果集合包括主指标在2020年12月的预测结果、主指标在2020年11月的预测结果…...此时可以进一步结合主指标的真实值集合确定组合结果3对应的预测精度。其中,主指标的真实值集合包括主指标在2020年12月的真实值以及主指标在2020年12月的预测值,,,,,,在一个示例中,在确定组合结果3对应的预测精度的过程中,电子设备可以根据组合结果3对应的对主指标的预测结果集合以及该主指标的真实值集合,统计组合结果3对应的对主指标的正确预测结果在该组合结果3对应的对主指标的预测结果集合中所占的比例,然后根据该比例确定对组合结果3的预测精度,例如可以将该比例确定为该组合结果3的预测精度。
S203、根据所述每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果;
本申请实施例中,电子设备可以根据每个第一组合结果的预测精度,从至少一个第一组合结果中确定出预测精度最高的第一组合结果作为符合第一预设条件的第一组合结果。此处,满足第一预设条件的第一组合结果为能够使得目标分类模型预测精度达到最高的第一组合结果。这样,针对每一个预训练的分类模型,都能确定出能够使得这个预训练的分类模型预测精度达到最高的一种先行指标的组合。
S204、利用所述目标分类模型对所述符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度;所述S为大于或等于1的整数。
S205、将根据所述S个回溯精度计算得到的均值确定为所述目标分类模型的分类回溯精度。
在步骤S204-步骤S205中,电子设备可以利用该目标分类模型对该符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度,并将根据该S个回溯精度计算得到的均值确定为该目标分类模型的分类回溯精度。其中,S期回溯可以为月度回溯7期(1期为1个月)或季度回溯4期(1期为1个季度)等,本申请实施例对此不做限制。回溯的目的就是回溯往期数据来计算回溯精度。下面将举例说明利用第一数量个预训练的分类模型中的目标分类模型对该符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度的过程。
例如,第一组合结果包括组合结果3,组合结果3包括先行指标1和先行指标2。先行指标1对应的目标先行期数为先行1期,先行指标2对应的目标先行期数为先行2期。S期回溯是月度回溯7期,那么电子设备可以利用目标分类模型根据先行指标1在2020年11月的真实值以及先行指标2在2020年10月的真实值进行预测处理,得到主指标在2020年12月的预测结果,然后可以根据主指标在2020年12月的预测结果以及主指标在2020年12月的真实值计算得到第1个预测精度作为第1个回溯精度。之后,电子设备还可以利用目标分类模型根据先行指标1在2020年10月的真实值以及先行指标2在2020年9月的真实值进行预测处理,得到主指标在2020年11月的预测结果,然后根据主指标在2020年11月的预测结果以及主指标在2020年11月的真实值计算得到第2个预测精度作为第2个回溯精度,以此类推,便可以利用目标分类模型根据先行指标1在2020年5月的真实值以及先行指标2在4月的真实值进行预测,得到主指标在2020年6月的预测结果,并根据主指标在2020年6月的预测结果以及主指标在2020年6月的真实值计算得到第7个预测精度作为第7个回溯精度,至此,便可以通过目标分类模型计算出7个回溯精度,通过计算这7个回溯精度的平均值便可以得到目标分类模型的分类回溯精度,此处可以将这7个回溯精度的平均值确定为目标回归模型的分类回溯精度。
S206、获取待预测数据。
S207、在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果。
S208、利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果。
S209、根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
其中,步骤S206-S209可以参见图1实施例中的步骤S101-S104,本申请实施例在此不做赘述。
前面已经介绍了分类回溯精度的确定方式,接下来对前述提及的回归回溯精度的确定方式进行阐述。
在一个实施例中,所述多个组合结果还可以包括至少一个第二组合结果,所述第二组合结果指由至少一个先行指标以及至少一个解释变量构成的组合结果,每个第二组合结果不同。电子设备可以利用第二预设数量个预训练的回归模型中的目标回归模型,根据该每个第二组合结果中的各先行指标的第二时间序列数据以及该每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理,得到对该每个第二组合结果的预测精度;该目标 回归模型为该第二数量个预训练的回归模型中的任一预训练的回归模型;电子设备根据该每个第二组合结果的预测精度,从至少一个第二组合结果中确定出符合第二预设条件的第二组合结果,并利用第二数量个预训练的回归模型中的目标回归模型对该符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度;该T为大于或等于1的整数;电子设备将根据该T个回溯精度计算得到的均值确定为该目标回归模型的回归回溯精度。其中,T可以与S相同,也可以不同。例如,T期回溯可以为月度回溯7期(如1期可以为1个月)或季度回溯4期(如1期可以为1个季度)等,本申请实施例对此不做限制。
例如,第二组合结果包括组合结果4,组合结果4包括先行指标1和解释变量1。先行指标1对应的目标先行期数为先行1期。假设T期回溯验证是月度回溯7期验证,那么电子设备可以利用目标回归模型根据先行指标1在2020年11月的真实值以及解释变量1在2020年12月的真实值进行预测处理,得到主指标在2020年12月的预测结果,然后可以根据主指标在2020年12月的预测结果以及主指标在2020年12月的真实值计算得到第1个预测精度作为第1个回溯精度。之后,电子设备还可以利用目标回归模型根据先行指标1在2020年10月的真实值以及解释变量1在2020年11月的真实值进行预测处理,得到主指标在2020年11月的预测结果,然后根据主指标在2020年11月的预测结果以及主指标在2020年11月的真实值计算得到第2个预测精度作为第2个回溯精度。以此类推,便可以利用目标回归模型根据先行指标1在2020年5月的真实值以及解释变量1在2020年6月的真实值进行预测处理,得到主指标在2020年6月的预测结果,并根据主指标在2020年6月的预测结果以及主指标在2020年6月的真实值计算得到第7个预测精度作为第7个回溯精度,至此,便可以通过目标回归模型计算出这7个回溯精度,通过计算这7个回溯精度的平均值便可以得到目标回归模型的回归回溯精度,此处可以将这7个回溯精度的平均值确定为目标回归模型的回归回溯精度。
在一个实施例中,电子设备在利用第二预设数量个预训练的回归模型中的目标回归模型,根据该每个第二组合结果中的各先行指标的第二时间序列数据以及该每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理后,可以得到每个第二组合结果对应的对主指标的第二预测结果集合,然后可根据每个第二组合结果对应的对主指标的第二预测结果集合以及主指标的真实值集合,计算得到对每个第二组合结果的预测精度。其中,第二预测结果集合是指第二组合结果对应的对主指标的预测结果集合。前述提及的第一预测结果集合是指第一组合结果对应的对主指标的预测结果集合。例如,电子设备可以根据每个第二组合结果对应的对主指标的第二预测结果集合以及该主指标的真实值集合,通过精度计算公式计算得到每个第二组合结果的预测精度集合,然后根据每个第二组合结果的预测精度集合计算得到每个第二组合结果的预测精度,例如可以计算得到每个第二组合结果的预测精度集合对应的均值以作为每个第二组合结果的预测精度。精度计算公式为1-abs(预测值-真实值)/真实值)。其中abs表示计算绝对值。在一个实施例中,电子设备可以根据每个第二组合结果的预测精度,从至少一个第二组合结果中确定出预测精度最高的第二组合结果作为符合第二预设条件的第二组合结果。此处,满足第二预设条件的第二组合结果为能够使目标回归模型预测精度达到最高的第二组合结果。这样,针对每一个预训练的回归模型,都能确定出能够使得这个预训练的回归模型预测精度达到最高的一种先行指标与解释变量的组合。下面将举例说明利用第二数量个预训练的回归模型中的目标回归模型对该符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度的过程。
在一个实施例中,电子设备还可以利用满足第一预设条件的第一组合结果中各先行指标的S期数据,训练该目标分类模型,以达到优化目标分类模型的目的。在一个实施例中,先行指标的S期数据可以为前述回溯验证过程中对先行指标回溯的S期数据,或可以为对先行指标以其它时间往前回溯的S期数据,等等。例如,满足第一预设条件的第一组合结 果包括组合结果3,组合结果3包括先行指标1和先行指标2,先行指标1对应的先行期数为先行1期,先行指标2对应的先行期数为先行2期,1期为1个月。假设S期为7期,那么先行指标1的S期数据可以包括先行指标1在2020年11月的真实值……先行指标在2020年5月的真实值,先行指标2的S期数据可以包括先行指标2在2020年10月的真实值…..先行指标2在2020年4月的真实值。电子设备可以利用主指标在2020年12月的真实值、先行指标1在2020年11月的真实值以及先行指标2在2020年10月的真实值训练该目标分类模型,以此类推,电子设备还可以利用主指标在2020年6月的真实值、先行指标1在2020年5月的真实值以及先行指标2在2020年4月的真实值来训练该目标分类模型。
在一个实施例中,电子设备还可以利用满足第二预设条件的第二组合结果中各先行指标的T期数据以及满足第二预设条件的第二组合结果中各解释变量的T期数据训练目标回归模型,以达到优化目标回归模型的目的。在一个实施例中,先行指标的T期数据可以为前述回溯验证过程中的对先行指标回溯的T期数据,或可以为对先行指标以其它时间往前回溯的T期数据,等等。解释变量的T期数据可以为前述回溯验证过程中的对解释变量回溯的T期数据,或可以为对解释变量以其它时间往前回溯的T期数据,等等。例如,满足第二预设条件的第二组合结果包括组合结果4,组合结果4包括先行指标1和解释变量1,先行指标1对应的先行期数为先行1期,1期为1个月。假设T期为7期,那么先行指标1的T期数据可以包括先行指标1在2020年11月的真实值……先行指标在2020年5月的真实值,解释变量1的T期数据可以包括解释变量1在2020年12月的真实值…..解释变量1在2020年6月的真实值。电子设备可以利用主指标在2020年12月的真实值、先行指标1在2020年11月的真实值以及解释变量1在2020年12月的真实值训练该目标回归模型,以此类推,电子设备还可以利用主指标在2020年6月的真实值、先行指标1在2020年5月的真实值以及解释变量1在2020年6月的真实值来训练该目标回归模型。
在一个实施例中,前述提及的第一数量个预训练的分类模型以及前述提及的第二数量个回归模型可以通过以下方式确定:
①构建第一数据集,并将第一数据集划分为第一训练集和第一验证集;第一数据集包括主指标的第一时间序列数据,以及目标先行指标中各先行指标的错期后的时间序列数据。例如,第一数据集可以表示为{(x1,y1)…(xn,yn)}。x表示目标先行指标。y表示主指标。在一个实施例中,目标先行指标中各个先行指标的错期后的时间序列数据通过如下方式获得:电子设备对目标先行指标中各个先行指标的第二时间序列数据按照该先行指标对应的目标先行期数进行错期处理,得到目标先行指标中各先行指标的错期后的时间序列数据。
②构建第二数据集,并将第二数据集划分为第二训练集和第二验证集。第二数据集包括主指标的第一时间序列数据,以及目标先行指标中各个先行指标的错期后的时间序列数据,以及目标解释变量中各个解释变量的时间序列数据。例如,第二数据集可以表示为{{(x1,z1),y1}…{(xn,zn),yn}}。z表示目标解释变量。
③利用第一训练集分别训练M个初始的分类模型,得到M个预训练的分类模型,并利用第二训练集分别训练N个初始的回归模型,得到N个预训练的回归模型,M为大于或等于2的整数,N为大于或等于2的整数。例如,电子设备可以将第一训练集分别输入到5个初始的分类模型,然后利用第一训练集分别训练5个初始的分类模型,得到5个预训练的分类模型。电子设备可以将第二训练集分别输入到5个初始的回归模型,然后利用第二训练集分别训练N个初始的回归模型,得到5个预训练的回归模型。
④利用M个预训练的分类模型中每个预训练的分类模型对第一验证集进行预测,得到每个预训练的分类模型对第一验证集的预测精度,并根据每个预训练的分类模型对第一验 证集的预测精度,从M个预训练的分类模型中确定出第一数量个预训练的分类模型。在一个实施例中,电子设备可以从M个预训练的分类模型中确定出对第一验证集的预测精度最高的第一数量个分类模型。或者,电子设备可以对M个预训练的分类模型按照对第一验证集的预测精度按照由高到低的顺序排序,并选取排在前面的第一数量个预训练的分类模型。
⑤利用N个预训练的回归模型中每个预训练的回归模型对第二验证集进行预测,得到每个预训练的回归模型对第二验证集的预测精度,并根据每个预训练的回归模型对第二验证集的预测精度,从N个预训练的回归模型中确定出第二预设数量个预训练的回归模型。在一个实施例中,电子设备可以从N个预训练的回归模型中确定出对第二验证集的预测精度最高的第二数量个回归模型。或者,电子设备可以对N个预训练的回归模型按照对第二验证集的预测精度按照由高到低的顺序排序,并选取排在前面的第二数量个预训练的回归模型。
可见,图2所示的实施例中,电子设备可以对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果,多个组合结果包括至少一个第一组合结果,然后利用第一预设数量个预训练的分类模型中的目标分类模型,根据每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对每个第一组合结果的预测精度,从而根据每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果,以利用目标分类模型对符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度,并将根据S个回溯精度计算得到的均值确定为目标分类模型的分类回溯精度,后续可以根据该分类回溯精度来确定对主指标的最终预测结果,上述过程可以确定出预测精度高的第一组合结果,然后通过这个第一组合结果进行S期回溯验证,便可以得到目标分类模型的分类回溯精度,通过这种方式得到的分类回溯精度比较准确,后续将其用于确定主指标的最终预测结果的过程中,也可以使得主指标的最终预测结果的准确度更高。
本申请涉及区块链技术,例如可将对主指标的最终预测结果写入区块链中。在一个实施例中,后续可以将主指标的最终预测结果与主指标在目标时间的真实值进行比较,用以分析采用本申请实施例所述的基于机器学习的指标预测方法带来的预测误差。
请参阅图3,为本申请实施例提供的一种基于机器学习的指标预测装置的结构示意图。该装置可以用于电子设备中。具体地,该装置,包括:
获取模块301,用于获取待预测数据。
分类处理模块302,用于在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果。
回归处理模块303,用于利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果。
确定模块304,用于根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
在一种可选的实施方式中,确定模块304根据所述第一预测结果、所述第二预测结果以及分类回溯精度确定对所述主指标的最终预测结果,具体为在分类回溯精度均值小于第一预设数值时,将所述第二预测结果作为对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第一预设数值但小于第二预设数值时,从所述第二数量个回归结果中 确定出与所述第一预测结果同向的第一回归结果,并从与所述第一预测结果反向的回归结果中确定出距离所述第一回归结果最近的第二回归结果,根据所述第一回归结果以及所述第二回归结果确定对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第二预设数值的情况下,若所述分类回溯精度均值小于或等于回归回溯精度均值,则将所述第二预测结果作为对所述主指标的最终预测结果,若所述分类回溯精度均值大于回归回溯精度均值,则从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并根据所述第一回归结果确定对所述主指标的最终预测结果,所述回归回溯精度均值是根据所述第二数量个预训练的回归模型中每个预训练的回归模型的回归回溯精度计算得到的均值。
在一种可选的实施方式中,确定模块304,还用于对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果;所述目标先行指标是根据主指标的第一时间序列数据从所述主指标关联的多个先行指标中确定出的,所述目标解释变量是根据所述第一时间序列数据从所述主指标关联的多个解释变量中确定出的;所述目标先行指标包括所述指定先行指标,所述目标解释变量包括所述指定解释变量;所述多个组合结果包括至少一个第一组合结果,所述第一组合结果指由至少一个先行指标构成的组合结果,每个第一组合结果不同;利用第一预设数量个预训练的分类模型中的目标分类模型,根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对所述每个第一组合结果的预测精度;所述目标分类模型为所述第一数量个预训练的分类模型中的任一预训练的分类模型;根据所述每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果;利用所述目标分类模型对所述符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度;所述S为大于或等于1的整数;将根据所述S个回溯精度计算得到的均值确定为所述目标分类模型的分类回溯精度。
在一种可选的实施方式中,确定模块304根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,具体为确定所述每个第一组合结果中的各先行指标对应的目标先行期数;将所述每个第一组合结果中的各先行指标的第二时间序列数据,按照该先行指标对应的目标先行期数进行错期处理,得到该先行指标的错期后的时间序列数据;对所述每个第一组合结果中的各先行指标的错期后的时间序列数据进行预测处理。
在一种可选的实施方式中,所述多个组合结果还包括至少一个第二组合结果,所述第二组合结果指由至少一个先行指标以及至少一个解释变量构成的组合结果,每个第二组合结果不同,确定模块304,还用于利用第二预设数量个预训练的回归模型中的目标回归模型,根据所述每个第二组合结果中的各先行指标的第二时间序列数据以及所述每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理,得到对所述每个第二组合结果的预测精度;所述目标回归模型为所述第二数量个预训练的回归模型中的任一预训练的回归模型;根据所述每个第二组合结果的预测精度,从至少一个第二组合结果中确定出符合第二预设条件的第二组合结果;利用第二数量个预训练的回归模型中的目标回归模型对所述符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度;所述T为大于或等于1的整数;将根据所述T个回溯精度计算得到的均值确定为所述目标回归模型的回归回溯精度。
在一种可选的实施方式中,确定模块304,还用于获取主指标的第一时间序列数据,以及所述主指标关联的多个先行指标中每个先行指标的第二时间序列数据,以及所述主指标关联的多个解释变量中每个解释变量的第三时间序列数据;确定所述第一时间序列数据中各个数据在各个第二时间序列数据关联的数据,并根据各个数据以及各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数,根据所述各个 先行指标与主指标之间的相关系数从所述多个先行指标确定出目标先行指标,并确定所述目标先行指标中各个先行指标对应的目标先行期数;确定所述第一时间序列数据中各个数据在各个第三时间序列数据关联的数据,并根据各个数据以及各个数据在各个第三时间序列数据关联的数据,计算各个解释变量与主指标之间的相关系数,根据所述各个解释变量与主指标之间的相关系数从所述多个解释变量确定出目标解释变量。
在一种可选的实施方式中,所述基于机器学习的指标预测装置还包括训练模块305。
在一种可选的实施方式中,训练模块305,用于构建第一数据集,并将所述第一数据集划分为第一训练集和第一验证集;所述第一数据集包括主指标的第一时间序列数据,以及目标先行指标中各先行指标的错期后的时间序列数据;构建第二数据集,并将所述第二数据集划分为第二训练集和第二验证集;所述第二数据集包括主指标的第一时间序列数据,以及所述目标先行指标中各个先行指标的错期后的时间序列数据,以及目标解释变量中各个解释变量的时间序列数据;利用所述第一训练集分别训练M个初始的分类模型,得到M个预训练的分类模型,并利用所述第二训练集分别训练N个初始的回归模型,得到N个预训练的回归模型,M为大于或等于2的整数,N为大于或等于2的整数。
在一种可选的实施方式中,确定模块304还用于利用M个预训练的分类模型中每个预训练的分类模型对第一验证集进行预测,得到每个预训练的分类模型对第一验证集的预测精度,并根据每个预训练的分类模型对第一验证集的预测精度,从M个预训练的分类模型中确定出第一数量个预训练的分类模型。利用N个预训练的回归模型中每个预训练的回归模型对第二验证集进行预测,得到每个预训练的回归模型对第二验证集的预测精度,并根据每个预训练的回归模型对第二验证集的预测精度,从N个预训练的回归模型中确定出第二预设数量个预训练的回归模型。
请参阅图4,为本申请实施例提供的一种为本申请实施例提供的一种电子设备的结构示意图。本实施例中所描述的电子设备可以包括:一个或多个处理器1000和存储器2000。处理器1000和存储器2000可以通过总线等方式连接。
处理器1000可以是中央处理模块(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器2000可以是高速RAM存储器,也可为非不稳定的存储器(non-volatile memory),例如磁盘存储器。存储器2000用于存储一组程序代码,处理器1000可以调用存储器2000中存储的程序代码。具体地:
处理器1000,用于获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
在一个实施例中,处理起1000根据所述第一预测结果、所述第二预测结果以及分类回溯精度确定对所述主指标的最终预测结果,具体为在分类回溯精度均值小于第一预设数值 时,将所述第二预测结果作为对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第一预设数值但小于第二预设数值时,从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并从与所述第一预测结果反向的回归结果中确定出距离所述第一回归结果最近的第二回归结果,根据所述第一回归结果以及所述第二回归结果确定对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第二预设数值的情况下,若所述分类回溯精度均值小于或等于回归回溯精度均值,则将所述第二预测结果作为对所述主指标的最终预测结果,若所述分类回溯精度均值大于回归回溯精度均值,则从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并根据所述第一回归结果确定对所述主指标的最终预测结果,所述回归回溯精度均值是根据所述第二数量个预训练的回归模型中每个预训练的回归模型的回归回溯精度计算得到的均值。
在一个实施例中,处理器1000,还用于对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果;所述目标先行指标是根据主指标的第一时间序列数据从所述主指标关联的多个先行指标中确定出的,所述目标解释变量是根据所述第一时间序列数据从所述主指标关联的多个解释变量中确定出的;所述目标先行指标包括所述指定先行指标,所述目标解释变量包括所述指定解释变量;所述多个组合结果包括至少一个第一组合结果,所述第一组合结果指由至少一个先行指标构成的组合结果,每个第一组合结果不同;利用第一预设数量个预训练的分类模型中的目标分类模型,根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对所述每个第一组合结果的预测精度;所述目标分类模型为所述第一数量个预训练的分类模型中的任一预训练的分类模型;根据所述每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果;利用所述目标分类模型对所述符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度;所述S为大于或等于1的整数;将根据所述S个回溯精度计算得到的均值确定为所述目标分类模型的分类回溯精度。
在一个实施例中,处理器1000根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,具体为确定所述每个第一组合结果中的各先行指标对应的目标先行期数;将所述每个第一组合结果中的各先行指标的第二时间序列数据,按照该先行指标对应的目标先行期数进行错期处理,得到该先行指标的错期后的时间序列数据;对所述每个第一组合结果中的各先行指标的错期后的时间序列数据进行预测处理。
在一个实施例中,所述多个组合结果还包括至少一个第二组合结果,所述第二组合结果指由至少一个先行指标以及至少一个解释变量构成的组合结果,每个第二组合结果不同。
在一个实施例中,处理器1000,还用于利用第二预设数量个预训练的回归模型中的目标回归模型,根据所述每个第二组合结果中的各先行指标的第二时间序列数据以及所述每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理,得到对所述每个第二组合结果的预测精度;所述目标回归模型为所述第二数量个预训练的回归模型中的任一预训练的回归模型;根据所述每个第二组合结果的预测精度,从至少一个第二组合结果中确定出符合第二预设条件的第二组合结果;利用第二数量个预训练的回归模型中的目标回归模型对所述符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度;所述T为大于或等于1的整数;将根据所述T个回溯精度计算得到的均值确定为所述目标回归模型的回归回溯精度。
在一个实施例中,处理器1000,还用于获取主指标的第一时间序列数据,以及所述主指标关联的多个先行指标中每个先行指标的第二时间序列数据,以及所述主指标关联的多个解释变量中每个解释变量的第三时间序列数据;确定所述第一时间序列数据中各个数据在各个第二时间序列数据关联的数据,并根据各个数据以及各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数,根据所述各个先行指标与 主指标之间的相关系数从所述多个先行指标确定出目标先行指标,并确定所述目标先行指标中各个先行指标对应的目标先行期数;确定所述第一时间序列数据中各个数据在各个第三时间序列数据关联的数据,并根据各个数据以及各个数据在各个第三时间序列数据关联的数据,计算各个解释变量与主指标之间的相关系数,根据所述各个解释变量与主指标之间的相关系数从所述多个解释变量确定出目标解释变量。
在一个实施例中,处理器1000,还用于构建第一数据集,并将所述第一数据集划分为第一训练集和第一验证集;所述第一数据集包括主指标的第一时间序列数据,以及目标先行指标中各先行指标的错期后的时间序列数据;构建第二数据集,并将所述第二数据集划分为第二训练集和第二验证集;所述第二数据集包括主指标的第一时间序列数据,以及所述目标先行指标中各个先行指标的错期后的时间序列数据,以及目标解释变量中各个解释变量的时间序列数据;利用所述第一训练集分别训练M个初始的分类模型,得到M个预训练的分类模型,并利用所述第二训练集分别训练N个初始的回归模型,得到N个预训练的回归模型,M为大于或等于2的整数,N为大于或等于2的整数;利用M个预训练的分类模型中每个预训练的分类模型对第一验证集进行预测,得到每个预训练的分类模型对第一验证集的预测精度,并根据每个预训练的分类模型对第一验证集的预测精度,从M个预训练的分类模型中确定出第一数量个预训练的分类模型;利用N个预训练的回归模型中每个预训练的回归模型对第二验证集进行预测,得到每个预训练的回归模型对第二验证集的预测精度,并根据每个预训练的回归模型对第二验证集的预测精度,从N个预训练的回归模型中确定出第二预设数量个预训练的回归模型。
具体实现中,本申请实施例中所描述的处理器1000可执行图1实施例、图2实施例所描述的实现方式,也可执行本申请实施例所描述的实现方式,在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时可实现上述实施例中方法,这里不再赘述。可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以是两个或两个以上模块集成在一个模块中。上述集成的模块既可以采样硬件的形式实现,也可以采样软件功能模块的形式实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的计算机可读存储介质可为易失性的或非易失性的。例如,该计算机存储介质可以为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。所述的计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。
Claims (20)
- 一种基于机器学习的指标预测方法,包括:获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
- 根据权利要求1所述的方法,其中,所述根据所述第一预测结果、所述第二预测结果以及分类回溯精度确定对所述主指标的最终预测结果,包括:在分类回溯精度均值小于第一预设数值时,将所述第二预测结果作为对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第一预设数值但小于第二预设数值时,从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并从与所述第一预测结果反向的回归结果中确定出距离所述第一回归结果最近的第二回归结果,根据所述第一回归结果以及所述第二回归结果确定对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第二预设数值的情况下,若所述分类回溯精度均值小于或等于回归回溯精度均值,则将所述第二预测结果作为对所述主指标的最终预测结果,若所述分类回溯精度均值大于回归回溯精度均值,则从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并根据所述第一回归结果确定对所述主指标的最终预测结果,所述回归回溯精度均值是根据所述第二数量个预训练的回归模型中每个预训练的回归模型的回归回溯精度计算得到的均值。
- 根据权利要求1所述的方法,其中,所述方法还包括:对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果;所述目标先行指标是根据主指标的第一时间序列数据从所述主指标关联的多个先行指标中确定出的,所述目标解释变量是根据所述第一时间序列数据从所述主指标关联的多个解释变量中确定出的;所述目标先行指标包括所述指定先行指标,所述目标解释变量包括所述指定解释变量;所述多个组合结果包括至少一个第一组合结果,所述第一组合结果指由至少一个先行指标构成的组合结果,每个第一组合结果不同;利用第一预设数量个预训练的分类模型中的目标分类模型,根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对所述每个第一组合结果的预测精度;所述目标分类模型为所述第一数量个预训练的分类模型中的任一预训练的分类模型;根据所述每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果;利用所述目标分类模型对所述符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度;所述S为大于或等于1的整数;将根据所述S个回溯精度计算得到的均值确定为所述目标分类模型的分类回溯精度。
- 根据权利要求3所述的方法,其中,所述根据所述每个第一组合结果中的各先行指 标的第二时间序列数据进行预测处理,包括:确定所述每个第一组合结果中的各先行指标对应的目标先行期数;将所述每个第一组合结果中的各先行指标的第二时间序列数据,按照该先行指标对应的目标先行期数进行错期处理,得到该先行指标的错期后的时间序列数据;对所述每个第一组合结果中的各先行指标的错期后的时间序列数据进行预测处理。
- 根据权利要求3所述的方法,其中,所述多个组合结果还包括至少一个第二组合结果,所述第二组合结果指由至少一个先行指标以及至少一个解释变量构成的组合结果,每个第二组合结果不同,所述方法还包括:利用第二预设数量个预训练的回归模型中的目标回归模型,根据所述每个第二组合结果中的各先行指标的第二时间序列数据以及所述每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理,得到对所述每个第二组合结果的预测精度;所述目标回归模型为所述第二数量个预训练的回归模型中的任一预训练的回归模型;根据所述每个第二组合结果的预测精度,从至少一个第二组合结果中确定出符合第二预设条件的第二组合结果;利用第二数量个预训练的回归模型中的目标回归模型对所述符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度;所述T为大于或等于1的整数;将根据所述T个回溯精度计算得到的均值确定为所述目标回归模型的回归回溯精度。
- 根据权利要求3所述的方法,其中,所述方法还包括:获取主指标的第一时间序列数据,以及所述主指标关联的多个先行指标中每个先行指标的第二时间序列数据,以及所述主指标关联的多个解释变量中每个解释变量的第三时间序列数据;确定所述第一时间序列数据中各个数据在各个第二时间序列数据关联的数据,并根据各个数据以及各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数,根据所述各个先行指标与主指标之间的相关系数从所述多个先行指标确定出目标先行指标,并确定所述目标先行指标中各个先行指标对应的目标先行期数;确定所述第一时间序列数据中各个数据在各个第三时间序列数据关联的数据,并根据各个数据以及各个数据在各个第三时间序列数据关联的数据,计算各个解释变量与主指标之间的相关系数,根据所述各个解释变量与主指标之间的相关系数从所述多个解释变量确定出目标解释变量。
- 根据权利要求1所述的方法,其中,所述方法还包括:构建第一数据集,并将所述第一数据集划分为第一训练集和第一验证集;所述第一数据集包括主指标的第一时间序列数据,以及目标先行指标中各先行指标的错期后的时间序列数据;构建第二数据集,并将所述第二数据集划分为第二训练集和第二验证集;所述第二数据集包括主指标的第一时间序列数据,以及所述目标先行指标中各个先行指标的错期后的时间序列数据,以及目标解释变量中各个解释变量的时间序列数据;利用所述第一训练集分别训练M个初始的分类模型,得到M个预训练的分类模型,并利用所述第二训练集分别训练N个初始的回归模型,得到N个预训练的回归模型,M为大于或等于2的整数,N为大于或等于2的整数;利用M个预训练的分类模型中每个预训练的分类模型对第一验证集进行预测,得到每个预训练的分类模型对第一验证集的预测精度,并根据每个预训练的分类模型对第一验证集的预测精度,从M个预训练的分类模型中确定出第一数量个预训练的分类模型;利用N个预训练的回归模型中每个预训练的回归模型对第二验证集进行预测,得到每个预训练的回归模型对第二验证集的预测精度,并根据每个预训练的回归模型对第二验证 集的预测精度,从N个预训练的回归模型中确定出第二预设数量个预训练的回归模型。
- 一种基于机器学习的指标预测装置,包括:获取模块,用于获取待预测数据;分类处理模块,用于在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;回归处理模块,用于利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;确定模块,用于根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
- 一种电子设备,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下方法:获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
- 根据权利要求9所述的电子设备,其中,执行所述根据所述第一预测结果、所述第二预测结果以及分类回溯精度确定对所述主指标的最终预测结果,包括:在分类回溯精度均值小于第一预设数值时,将所述第二预测结果作为对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第一预设数值但小于第二预设数值时,从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并从与所述第一预测结果反向的回归结果中确定出距离所述第一回归结果最近的第二回归结果,根据所述第一回归结果以及所述第二回归结果确定对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第二预设数值的情况下,若所述分类回溯精度均值小于或等于回归回溯精度均值,则将所述第二预测结果作为对所述主指标的最终预测结果,若所述分类回溯精度均值大于回归回溯精度均值,则从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并根据所述第一回归结果确定对所述主指标的最终预测结果,所述回归回溯精度均值是根据所述第二数量个预训练的回归模型中每个预训练的回归模型的回归回溯精度计算得到的均值。
- 根据权利要求9所述的电子设备,其中,所述处理器还用于执行:对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果;所述目标先行指标是根据主指标的第一时间序列数据从所述主指标关联的多 个先行指标中确定出的,所述目标解释变量是根据所述第一时间序列数据从所述主指标关联的多个解释变量中确定出的;所述目标先行指标包括所述指定先行指标,所述目标解释变量包括所述指定解释变量;所述多个组合结果包括至少一个第一组合结果,所述第一组合结果指由至少一个先行指标构成的组合结果,每个第一组合结果不同;利用第一预设数量个预训练的分类模型中的目标分类模型,根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对所述每个第一组合结果的预测精度;所述目标分类模型为所述第一数量个预训练的分类模型中的任一预训练的分类模型;根据所述每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果;利用所述目标分类模型对所述符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度;所述S为大于或等于1的整数;将根据所述S个回溯精度计算得到的均值确定为所述目标分类模型的分类回溯精度。
- 根据权利要求11所述的电子设备,其中,所述多个组合结果还包括至少一个第二组合结果,所述第二组合结果指由至少一个先行指标以及至少一个解释变量构成的组合结果,每个第二组合结果不同,所述处理器还用于执行:利用第二预设数量个预训练的回归模型中的目标回归模型,根据所述每个第二组合结果中的各先行指标的第二时间序列数据以及所述每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理,得到对所述每个第二组合结果的预测精度;所述目标回归模型为所述第二数量个预训练的回归模型中的任一预训练的回归模型;根据所述每个第二组合结果的预测精度,从至少一个第二组合结果中确定出符合第二预设条件的第二组合结果;利用第二数量个预训练的回归模型中的目标回归模型对所述符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度;所述T为大于或等于1的整数;将根据所述T个回溯精度计算得到的均值确定为所述目标回归模型的回归回溯精度。
- 根据权利要求11所述的电子设备,其中,所述处理器还用于执行:获取主指标的第一时间序列数据,以及所述主指标关联的多个先行指标中每个先行指标的第二时间序列数据,以及所述主指标关联的多个解释变量中每个解释变量的第三时间序列数据;确定所述第一时间序列数据中各个数据在各个第二时间序列数据关联的数据,并根据各个数据以及各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数,根据所述各个先行指标与主指标之间的相关系数从所述多个先行指标确定出目标先行指标,并确定所述目标先行指标中各个先行指标对应的目标先行期数;确定所述第一时间序列数据中各个数据在各个第三时间序列数据关联的数据,并根据各个数据以及各个数据在各个第三时间序列数据关联的数据,计算各个解释变量与主指标之间的相关系数,根据所述各个解释变量与主指标之间的相关系数从所述多个解释变量确定出目标解释变量。
- 根据权利要求9所述的电子设备,其中,所述处理器还用于执行:构建第一数据集,并将所述第一数据集划分为第一训练集和第一验证集;所述第一数据集包括主指标的第一时间序列数据,以及目标先行指标中各先行指标的错期后的时间序列数据;构建第二数据集,并将所述第二数据集划分为第二训练集和第二验证集;所述第二数据集包括主指标的第一时间序列数据,以及所述目标先行指标中各个先行指标的错期后的时间序列数据,以及目标解释变量中各个解释变量的时间序列数据;利用所述第一训练集分别训练M个初始的分类模型,得到M个预训练的分类模型,并利用所述第二训练集分别训练N个初始的回归模型,得到N个预训练的回归模型,M为大于或等于2的整数,N为大于或等于2的整数;利用M个预训练的分类模型中每个预训练的分类模型对第一验证集进行预测,得到每个预训练的分类模型对第一验证集的预测精度,并根据每个预训练的分类模型对第一验证集的预测精度,从M个预训练的分类模型中确定出第一数量个预训 练的分类模型;利用N个预训练的回归模型中每个预训练的回归模型对第二验证集进行预测,得到每个预训练的回归模型对第二验证集的预测精度,并根据每个预训练的回归模型对第二验证集的预测精度,从N个预训练的回归模型中确定出第二预设数量个预训练的回归模型。
- 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现以下方法:获取待预测数据;在所述待预测数据包括指定先行指标的第一取值以及指定解释变量的第二取值时,利用第一数量个预训练的分类模型中每个预训练的分类模型对指定先行指标的第一取值进行分类处理,得到第一数量个分类结果,并将从所述第一数量个分类结果中确定的出现次数最多的分类结果作为对所预测的主指标的第一预测结果;利用第二数量个预训练的回归模型中每个预训练的回归模型对指定先行指标的第一取值以及指定解释变量的第二取值进行回归处理,得到第二数量个回归结果,并将根据所述第二数量个回归结果计算得到的均值作为对所述主指标的第二预测结果;根据所述第一预测结果、所述第二预测结果以及分类回溯精度均值,确定对所述主指标的最终预测结果,所述分类回溯精度均值是根据所述第一数量个预训练的分类模型中每个预训练的分类模型的分类回溯精度计算得到的均值。
- 根据权利要求15所述的计算机可读存储介质,其中,执行所述根据所述第一预测结果、所述第二预测结果以及分类回溯精度确定对所述主指标的最终预测结果,包括:在分类回溯精度均值小于第一预设数值时,将所述第二预测结果作为对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第一预设数值但小于第二预设数值时,从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并从与所述第一预测结果反向的回归结果中确定出距离所述第一回归结果最近的第二回归结果,根据所述第一回归结果以及所述第二回归结果确定对所述主指标的最终预测结果;在所述分类回溯精度均值大于或等于第二预设数值的情况下,若所述分类回溯精度均值小于或等于回归回溯精度均值,则将所述第二预测结果作为对所述主指标的最终预测结果,若所述分类回溯精度均值大于回归回溯精度均值,则从所述第二数量个回归结果中确定出与所述第一预测结果同向的第一回归结果,并根据所述第一回归结果确定对所述主指标的最终预测结果,所述回归回溯精度均值是根据所述第二数量个预训练的回归模型中每个预训练的回归模型的回归回溯精度计算得到的均值。
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还用于实现:对目标先行指标中各个先行指标以及目标解释变量中各个解释变量进行组合,得到多个组合结果;所述目标先行指标是根据主指标的第一时间序列数据从所述主指标关联的多个先行指标中确定出的,所述目标解释变量是根据所述第一时间序列数据从所述主指标关联的多个解释变量中确定出的;所述目标先行指标包括所述指定先行指标,所述目标解释变量包括所述指定解释变量;所述多个组合结果包括至少一个第一组合结果,所述第一组合结果指由至少一个先行指标构成的组合结果,每个第一组合结果不同;利用第一预设数量个预训练的分类模型中的目标分类模型,根据所述每个第一组合结果中的各先行指标的第二时间序列数据进行预测处理,得到对所述每个第一组合结果的预测精度;所述目标分类模型为所述第一数量个预训练的分类模型中的任一预训练的分类模型;根据所述每个第一组合结果的预测精度,从至少一个第一组合结果中确定出符合第一预设条件的第一组合结果;利用所述目标分类模型对所述符合第一预设条件的第一组合结果进行S期回溯验证,得到S个回溯精度;所述S为大于或等于1的整数;将根据所述S个回溯精度计算得到的 均值确定为所述目标分类模型的分类回溯精度。
- 根据权利要求17所述的计算机可读存储介质,其中,所述多个组合结果还包括至少一个第二组合结果,所述第二组合结果指由至少一个先行指标以及至少一个解释变量构成的组合结果,每个第二组合结果不同,所述计算机程序被处理器执行时还用于实现:利用第二预设数量个预训练的回归模型中的目标回归模型,根据所述每个第二组合结果中的各先行指标的第二时间序列数据以及所述每个第二组合结果中的各解释变量的第三时间序列数据进行预测处理,得到对所述每个第二组合结果的预测精度;所述目标回归模型为所述第二数量个预训练的回归模型中的任一预训练的回归模型;根据所述每个第二组合结果的预测精度,从至少一个第二组合结果中确定出符合第二预设条件的第二组合结果;利用第二数量个预训练的回归模型中的目标回归模型对所述符合第二预设条件的第二组合结果进行T期回溯验证,得到T个回溯精度;所述T为大于或等于1的整数;将根据所述T个回溯精度计算得到的均值确定为所述目标回归模型的回归回溯精度。
- 根据权利要求17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还用于实现:获取主指标的第一时间序列数据,以及所述主指标关联的多个先行指标中每个先行指标的第二时间序列数据,以及所述主指标关联的多个解释变量中每个解释变量的第三时间序列数据;确定所述第一时间序列数据中各个数据在各个第二时间序列数据关联的数据,并根据各个数据以及各个数据在各个第二时间序列数据关联的数据,计算各个先行指标与主指标之间的相关系数,根据所述各个先行指标与主指标之间的相关系数从所述多个先行指标确定出目标先行指标,并确定所述目标先行指标中各个先行指标对应的目标先行期数;确定所述第一时间序列数据中各个数据在各个第三时间序列数据关联的数据,并根据各个数据以及各个数据在各个第三时间序列数据关联的数据,计算各个解释变量与主指标之间的相关系数,根据所述各个解释变量与主指标之间的相关系数从所述多个解释变量确定出目标解释变量。
- 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还用于实现:构建第一数据集,并将所述第一数据集划分为第一训练集和第一验证集;所述第一数据集包括主指标的第一时间序列数据,以及目标先行指标中各先行指标的错期后的时间序列数据;构建第二数据集,并将所述第二数据集划分为第二训练集和第二验证集;所述第二数据集包括主指标的第一时间序列数据,以及所述目标先行指标中各个先行指标的错期后的时间序列数据,以及目标解释变量中各个解释变量的时间序列数据;利用所述第一训练集分别训练M个初始的分类模型,得到M个预训练的分类模型,并利用所述第二训练集分别训练N个初始的回归模型,得到N个预训练的回归模型,M为大于或等于2的整数,N为大于或等于2的整数;利用M个预训练的分类模型中每个预训练的分类模型对第一验证集进行预测,得到每个预训练的分类模型对第一验证集的预测精度,并根据每个预训练的分类模型对第一验证集的预测精度,从M个预训练的分类模型中确定出第一数量个预训练的分类模型;利用N个预训练的回归模型中每个预训练的回归模型对第二验证集进行预测,得到每个预训练的回归模型对第二验证集的预测精度,并根据每个预训练的回归模型对第二验证集的预测精度,从N个预训练的回归模型中确定出第二预设数量个预训练的回归模型。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110442036.5 | 2021-04-23 | ||
CN202110442036.5A CN113159424A (zh) | 2021-04-23 | 2021-04-23 | 基于机器学习的指标预测方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022222230A1 true WO2022222230A1 (zh) | 2022-10-27 |
Family
ID=76869869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/097309 WO2022222230A1 (zh) | 2021-04-23 | 2021-05-31 | 基于机器学习的指标预测方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113159424A (zh) |
WO (1) | WO2022222230A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705889A (zh) * | 2021-08-26 | 2021-11-26 | 广东电网有限责任公司 | 电耗预测方法、系统、终端设备及计算机可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409426A (zh) * | 2018-10-23 | 2019-03-01 | 冶金自动化研究设计院 | 一种极值梯度提升逻辑回归分类预测方法 |
CN110222762A (zh) * | 2019-06-04 | 2019-09-10 | 恒安嘉新(北京)科技股份公司 | 对象预测方法、装置、设备、及介质 |
WO2020113673A1 (zh) * | 2018-12-07 | 2020-06-11 | 深圳先进技术研究院 | 一种基于多组学集成的癌症亚型分类方法 |
CN111309914A (zh) * | 2020-03-03 | 2020-06-19 | 支付宝(杭州)信息技术有限公司 | 基于多个模型结果对多轮对话的分类方法和装置 |
CN111784360A (zh) * | 2020-09-07 | 2020-10-16 | 北京江融信科技有限公司 | 一种基于网络链接回溯的反欺诈预测方法及系统 |
CN112561179A (zh) * | 2020-12-21 | 2021-03-26 | 深圳大学 | 一种股票走势预测方法、装置、计算机设备及存储介质 |
-
2021
- 2021-04-23 CN CN202110442036.5A patent/CN113159424A/zh active Pending
- 2021-05-31 WO PCT/CN2021/097309 patent/WO2022222230A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409426A (zh) * | 2018-10-23 | 2019-03-01 | 冶金自动化研究设计院 | 一种极值梯度提升逻辑回归分类预测方法 |
WO2020113673A1 (zh) * | 2018-12-07 | 2020-06-11 | 深圳先进技术研究院 | 一种基于多组学集成的癌症亚型分类方法 |
CN110222762A (zh) * | 2019-06-04 | 2019-09-10 | 恒安嘉新(北京)科技股份公司 | 对象预测方法、装置、设备、及介质 |
CN111309914A (zh) * | 2020-03-03 | 2020-06-19 | 支付宝(杭州)信息技术有限公司 | 基于多个模型结果对多轮对话的分类方法和装置 |
CN111784360A (zh) * | 2020-09-07 | 2020-10-16 | 北京江融信科技有限公司 | 一种基于网络链接回溯的反欺诈预测方法及系统 |
CN112561179A (zh) * | 2020-12-21 | 2021-03-26 | 深圳大学 | 一种股票走势预测方法、装置、计算机设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113159424A (zh) | 2021-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | DCT-GAN: dilated convolutional transformer-based GAN for time series anomaly detection | |
CN111563706A (zh) | 一种基于lstm网络的多变量物流货运量预测方法 | |
Qin et al. | Simulating and Predicting of Hydrological Time Series Based on TensorFlow Deep Learning. | |
US20190251458A1 (en) | System and method for particle swarm optimization and quantile regression based rule mining for regression techniques | |
CN111091196B (zh) | 客流数据确定方法、装置、计算机设备和存储介质 | |
CN110686633B (zh) | 一种滑坡位移预测方法、装置及电子设备 | |
US20180285969A1 (en) | Predictive model training and selection for consumer evaluation | |
CN106355499A (zh) | 一种股票价格趋势预测及交易方法 | |
WO2021004324A1 (zh) | 资源数据的处理方法、装置、计算机设备和存储介质 | |
CN110633859B (zh) | 一种两阶段分解集成的水文序列预测方法 | |
CN112668822A (zh) | 科技成果转化平台共享系统、方法、存储介质、手机app | |
CN114154716A (zh) | 一种基于图神经网络的企业能耗预测方法及装置 | |
CN116187835A (zh) | 一种基于数据驱动的台区理论线损区间估算方法及系统 | |
CN109740818A (zh) | 一种应用于航路扇区交通的概率密度预测系统 | |
WO2022222230A1 (zh) | 基于机器学习的指标预测方法、装置、设备及存储介质 | |
CN114118570A (zh) | 业务数据预测方法及装置、电子设备和存储介质 | |
CN118364963A (zh) | 基于lstm神经网络的建筑材料价格预测方法 | |
RU2632124C1 (ru) | Способ прогнозной оценки эффективности многоэтапных процессов | |
CN109829115B (zh) | 搜索引擎关键词优化方法 | |
CN112308293A (zh) | 违约概率预测方法及装置 | |
CN116862078A (zh) | 一种换电套餐用户逾期的预测方法、系统、装置及介质 | |
CN113379455B (zh) | 订单量预测方法和设备 | |
Zhang et al. | A combinational QoS-prediction approach based on RBF neural network | |
CN114493697A (zh) | 农产品价格预测的方法、系统、存储介质及电子设备 | |
CN111164633B (zh) | 一种评分卡模型的调整方法、装置、服务器及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21937455 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21937455 Country of ref document: EP Kind code of ref document: A1 |