CN111160662A - Risk prediction method, electronic equipment and storage medium - Google Patents
Risk prediction method, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111160662A CN111160662A CN201911415220.XA CN201911415220A CN111160662A CN 111160662 A CN111160662 A CN 111160662A CN 201911415220 A CN201911415220 A CN 201911415220A CN 111160662 A CN111160662 A CN 111160662A
- Authority
- CN
- China
- Prior art keywords
- risk prediction
- project
- sample
- variable
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000013058 risk prediction model Methods 0.000 claims abstract description 92
- 238000012549 training Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 76
- 230000008569 process Effects 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 10
- 238000013138 pruning Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 8
- 238000003908 quality control method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000012423 maintenance Methods 0.000 claims description 4
- 238000011160 research Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000012795 verification Methods 0.000 claims description 2
- 238000012954 risk control Methods 0.000 abstract description 7
- 238000011002 quantification Methods 0.000 abstract description 4
- 238000012217 deletion Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000011835 investigation Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Educational Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides a risk prediction method, electronic equipment and a storage medium, wherein the method comprises the following steps: selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to the project to be predicted based on a risk prediction model; inputting the target risk prediction variable into the risk prediction model to obtain a risk prediction result output by the risk prediction model; the risk prediction model is obtained by training a plurality of project samples serving as training samples in advance and real risk values of the project samples serving as target values, wherein the project samples comprise a plurality of sample variables. The embodiment of the invention realizes the quantification and tracking of the risks in the project and increases the operability of risk control.
Description
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a risk prediction method, an electronic device, and a storage medium.
Background
The software project implementation has a set of standard processes in the industry, and mainly comprises a project starting stage, a requirement collecting and researching stage, a system architecture design and development stage, an integration test and commissioning stage, a system training stage, a total acceptance stage, a system handover stage and other main contents. The wind control is used as an important step for implementing a software project, and almost covers all links of project implementation, including cost control of the project, key function planning of the project, task arrangement in a project iteration period, satisfaction survey after online and the like.
At present, the wind control scheme in the industry is mainly embodied in implementation of Capability Maturity Model Integration (CMMI) of enterprise software, and is combined with some modernized software Development modes, such as Test-Driven Development (TDD), processes, methods and systems (DevOps) for advocating Development and operation and maintenance to form a complete closed loop, and the like, and is more embodied in performing risk control of software projects according to mainstream modes and standards in the industry.
However, the above process is mainly embodied in the application of process management and related management tools for project implementation, which requires enterprises or departments to know the whole software engineering and implementation degree, and is difficult for small and medium-sized enterprises to control and land; furthermore, the project implementation cannot be quantified and tracked.
Disclosure of Invention
The embodiment of the invention provides a risk prediction method, electronic equipment and a storage medium, which are used for quantifying and tracking risks in a project and increasing the operability of risk control.
The embodiment of the invention provides a risk prediction method, which comprises the following steps:
selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to the project to be predicted based on a risk prediction model;
inputting the target risk prediction variable into the risk prediction model to obtain a risk prediction result output by the risk prediction model; wherein,
the risk prediction model is obtained by taking a plurality of project samples as training samples in advance and taking the real risk values of the project samples as target values in a training mode, wherein the project samples comprise a plurality of sample variables.
Optionally, before selecting the target risk prediction variable from the multiple risk prediction variables corresponding to the item to be predicted based on the risk prediction model, the method further includes: acquiring a multi-element self-adaptive regression spline basic model; constructing a basis function based on each sample variable in the project sample to obtain a plurality of basis functions; gradually selecting the multiple basis functions in a forward direction, and performing fitting degree training on the regression spline basic model to obtain an overfitting model; gradually deleting the basis functions of which the accuracy contribution values are smaller than a preset value from the over-fitting model through a backward pruning process based on a generalized interactive verification standard GCV to obtain a plurality of candidate models; determining a candidate model with the smallest GCV value in the plurality of candidate models as the risk prediction model; and determining a plurality of target basis functions corresponding to the risk prediction model, and determining sample variables corresponding to the target basis functions as target sample variables.
Optionally, the selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to the item to be predicted based on the risk prediction model includes: and selecting the risk prediction variable which is the same as the target sample variable from the plurality of risk prediction variables corresponding to the project to be predicted, and determining the risk prediction variable which is the same as the target sample variable as the target risk prediction variable.
Optionally, the regression spline basic model is:wherein f (x) represents the risk prediction value output by the regression spline basic model, hm(x) Represents the basis function corresponding to the M-th sample variable x, M represents the number of sample variables included in the item sample β0Representing the initial regression coefficient, βmRepresents hm(x) The corresponding regression coefficients.
Alternatively, β is represented by the following formula pairmTo show that: wherein N represents the number of the item samples, ynRepresenting the true risk value for the project sample n.
Optionally, theConstructing a basis function based on each sample variable in the project sample to obtain a plurality of basis functions, wherein the method comprises the following steps: the basis functions are represented by the following formulas:wherein h ism(x) Representing the basis function, x, corresponding to the m-th sample variable xv(k,m)Represents the target value, t, of the m-th sample variable xkmRepresents any one value, k, of all possible values corresponding to the m-th sample variable xmRepresenting the number of product interaction factors of the sample variables; when x isv(k,m)Greater than tkmWhen S is presentkmIs 1 when xv(k,m)Is less than or equal to tkmWhen S is presentkmIs-1.
Optionally, before the inputting the target risk prediction variable into the risk prediction model and obtaining a risk prediction result output by the risk prediction model, the method further includes: and carrying out normalization processing on the target risk prediction variable to obtain a processed target risk prediction variable.
Optionally, the plurality of sample variables belong to at least two project quality control categories; the project quality control category comprises at least two of a market research category, a market demand category, a project technology category, a project management category, a project operation category and a project operation and maintenance category.
An embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the risk prediction method when executing the program.
Embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the risk prediction method.
According to the risk prediction method, the electronic device and the storage medium provided by the embodiment of the invention, the target risk prediction variable is selected from the multiple risk prediction variables corresponding to the project to be predicted based on the risk prediction model, so that the selected target risk prediction variable is matched with the risk prediction model, the implicit valuable data mode is automatically and efficiently found, and the target risk prediction variable influencing the risk output is revealed, so that when the target risk prediction variable is input into the risk prediction model to obtain the risk prediction result output by the risk prediction model, the accuracy of risk prediction can be ensured, the risk in the project can be quantized and tracked, and the operability of risk control is increased.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating the steps of a risk prediction method according to an embodiment of the present invention;
FIG. 2 is a flowchart of steps taken before a target risk prediction variable is selected from a plurality of risk prediction variables corresponding to a project to be predicted in an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the change in the GCV size during backward pruning of an over-fit model according to an embodiment of the present invention;
FIG. 4 is a block diagram of a risk prediction device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a flowchart of the steps of a risk prediction method according to an embodiment of the present invention is shown, where the method includes the following steps:
step 101: and selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to the item to be predicted based on the risk prediction model.
Specifically, the risk prediction model is obtained by training a plurality of project samples serving as training samples in advance and real risk values of the plurality of project samples serving as target values, wherein the project samples include a plurality of sample variables.
Specifically, each project sample comprises a plurality of sample variables, and at this time, the plurality of project samples are used as training samples, and the actual risk values of the plurality of project samples are used as target values to train and obtain the risk prediction model, so that the accuracy of the trained risk prediction model can be ensured.
For example, the plurality of risk prediction variables corresponding to the item to be predicted may include: the system comprises the following components of market demand and consumer investigation sufficiency, sales and channel investigation sufficiency, competitor investigation sufficiency, business model innovation, industry development trend investigation sufficiency, whether market demand is clear or not, customer participation, the degree of casual change of market demand, whether market demand defines priority or not, whether industrial standards are referenced and applied or not, product experience, project technology foresight, system error rate and stability, whether task and time arrangement are reasonable or not, project documentation management sufficiency, whether capital cost control is reasonable or not, whether project asset safety is fully guaranteed or not, whether project related personnel are complete or not, project personnel skill proficiency, project personnel mobility, whether operation targets and platforms are clear or not, whether project stage popularization is sufficiently implemented or not, and the like.
At this time, although the risk prediction variables corresponding to the project to be predicted are more, all the risk prediction variables may not be variables having a larger influence on the risk prediction of the project to be predicted, that is, the target risk prediction variable may be selected from the multiple risk prediction variables corresponding to the project to be predicted based on the risk prediction model, so as to reduce unnecessary risk prediction variables and ensure the prediction accuracy of the project to be predicted.
Step 102: and inputting the target risk prediction variable into the risk prediction model to obtain a risk prediction result output by the risk prediction model.
In this step, specifically, the determined target risk prediction variable may be input into the risk prediction model, so as to obtain a risk prediction result output by the risk prediction model.
Specifically, the risk prediction result may be a risk prediction value, for example, when the output risk prediction value is 1, it indicates that the item to be predicted has a first level risk; when the output risk prediction value is 0.8, indicating that the project to be predicted has a second level risk; when the output risk prediction value is 0.6, indicating that the project to be predicted has a third level risk; when the output risk prediction value is 0.4, the fourth level risk of the project to be predicted exists; when the output risk prediction value is 0.2, indicating that the project to be predicted has a fifth level risk; and when the output risk prediction value is 0, indicating that the item to be predicted has the risk of the sixth level. The first-level risk and the sixth-level risk are sequentially reduced, that is, the first-level risk may represent a very high risk, the second-level risk may represent a high risk, the third-level risk may represent a higher risk, the fourth-level risk may represent a medium risk, the fifth-level risk may represent a lower risk, and the sixth-level risk may represent a very low risk.
In this way, in the embodiment, the target risk prediction variable is selected from the multiple risk prediction variables corresponding to the project to be predicted based on the risk prediction model, so that the selected target risk prediction variable is matched with the risk prediction model, an implicit valuable data pattern is automatically and efficiently found, and the target risk prediction variable influencing risk output is revealed, so that when the target risk prediction variable is input into the risk prediction model to obtain a risk prediction result output by the risk prediction model, the accuracy of risk prediction can be ensured, the risk in the project is quantized and tracked, and the operability of risk control is increased.
Specifically, it should be noted herein that before the target risk prediction variable is input into the risk prediction model to obtain the risk prediction result output by the risk prediction model, normalization processing may be performed on the target risk prediction variable to obtain a processed target risk prediction variable.
The normalization processing can be performed through the following formula, and the data is mapped between [0,1], so that the subsequent data processing and application are facilitated:
wherein x' represents the target risk prediction variable after the normalization processing, x represents a value before the normalization processing, min (x) represents a minimum value of the target risk prediction variables, and max (x) represents a maximum value of the target risk prediction variables.
In addition, specifically, for the noun attribute, a natural number mapping processing mode may be adopted, for example, values of the mobility of the project staff in the risk prediction variable include frequent, occasional and none, and a value 2 may be adopted to represent frequent, a value 1 may represent occasional, and a value 0 may represent none, so as to perform a numerical quantification process of the variable, thereby facilitating subsequent variable processing and application.
In addition, in this embodiment, before selecting the target risk prediction variable from the multiple risk prediction variables corresponding to the item to be predicted based on the risk prediction model, the risk prediction model needs to be obtained to determine the target sample variable, and in this process, the item sample needs to be obtained.
Specifically, the present embodiment can record the industry case and the historical project data into the system database to form the database of the project sample.
Further, the plurality of sample variables in each project sample may belong to at least two project quality control categories; the project quality control category comprises at least two of a market research category, a market demand category, a project technology category, a project management category, a project operation category and a project operation and maintenance category. The sample variables are divided into two stages in total, the first stage is a project quality control category, and the second stage is a specific sample variable in the project quality control category. Specific examples of sample variables can be shown in the following table:
specifically, after obtaining a plurality of sample variables in the item sample, data preprocessing may be performed on the sample variables, that is, normalization processing is performed on the sample variables to unify dimension, quantify noun attributes, and the like, so as to facilitate application of subsequent data.
At this time, after the data preprocessing is performed on the project sample, the establishment of a risk prediction model corresponding to the project to be predicted may be started to determine the target sample variable. In this embodiment, a multivariate adaptive regression method may be used to perform modeling analysis on the sample variables in the project sample and the true risk values of the project sample, that is, the initial sample variables may be 22 sample variables in the above table, that is, X ═ X1, X2, …, X22 }; true risk values may be 1, 0.8, 0.6, 0.4, 0.2, and 0, where 1 indicates that the project sample is at very high risk, 0.8 indicates that the project sample is at high risk, 0.6 indicates that the project sample is at higher risk, 0.4 indicates that the project sample is at moderate risk, 0.2 indicates that the project sample is at low risk, and 0 indicates that the project sample is at very low risk.
Further, as shown in fig. 2, before selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to an item to be predicted based on a risk prediction model, the method may include the following steps:
step 201: and acquiring a multi-element self-adaptive regression spline basic model.
In this step, specifically, the regression spline basic model is:
wherein f (x) represents the risk prediction value output by the regression spline basic model, hm(x) Denotes a basis function corresponding to the M-th sample variable x, M denotes the number of sample variables included in the item sample β0Representing the initial regression coefficient, βmRepresents hm(x) The corresponding regression coefficients.
That is, the regression spline basic model is a framework of the risk prediction model to be obtained, and the embodiment may be implemented by applying parameters (for example, β) in the regression spline basic model0、βmAnd M, etc.) to obtain a risk prediction model of the form described above.
Further, specifically, β can be represented by the following formula pairmTo show that:
where N represents the number of item samples, ynRepresenting the true risk value for the project sample n.
That is, the solution β can be solved by the above formula using the least squares method0And hm(x) Corresponding βm。
Step 202: and constructing a basis function based on each sample variable in the project sample to obtain a plurality of basis functions.
In this step, specifically, when constructing a basis function based on each sample variable in the project sample to obtain a plurality of basis functions, the basis functions may be represented by the following formula:
wherein h ism(x) Representing the basis function, x, corresponding to the m-th sample variable xv(k,m)Represents the target value, t, of the m-th sample variable xkmRepresents any one value, k, of all possible values corresponding to the m-th sample variable xmThe number of the product interaction factors of the sample variables is represented, namely the number of times of division of the basis function corresponding to the mth sample variable x is represented; when x isv(k,m)Greater than tkmWhen S is presentkmIs 1 when xv(k,m)Is less than or equal to tkmWhen S is presentkmIs-1.
In addition, specifically, [ x ]v(k,m)-tkm]+The following can be defined:
the above formula is described by taking the sample variable x1 in the above table as an example. For example, assume xv(k,m)Is 1, then when t iskmWhen 1, [ x ]v(k,m)-tkm]+Is 0; when t iskmWhen 0, i.e. xv(k,m)>tkmThen [ x ]v(k,m)-tkm]+Is 1.
Step 203: and selecting a plurality of basis functions step by step in the forward direction, and carrying out fitting degree training on the regression spline basic model to obtain an overfitting model.
In this step, specifically, after a plurality of basis functions are obtained through construction, a fitness training may be performed on the regression spline basic model in a manner of forward stepwise selecting the basis functions, so as to obtain an overfitting model.
Specifically, in the fitting degree training process, a mirror pair with each possible value of the sample variable as a torsion node can be constructed for each sample variable first. At this time, the set C of basis functions is the following mirror spline pairs:
C={(xj-t)+,(t-xj)+};t∈{x1j,x2j,...,xNj},j=1,2,...,p;
where N represents the number of item samples, xjDenotes the jth sample variable, p denotes the total number of sample variables in the item sample, xNjRepresents the Nth item sampleThe j-th sample variable in (1).
Each basis function may be composed of splines in the set C or their products, the products represent the interaction between the sample variables, and the number of product interaction factors of the sample variables in this embodiment is 2, that is, each basis function is represented by multiplying at most 2 splines.
In addition, specifically, in each step of selecting the basis functions step by step in the forward direction, an attempt may be made to locate all the basis functions in each basis function h in the set Mm(x) And taking the product of the current step and the remaining mirror pairs in the set C as a new alternative basis function pair, and finally adding two pairs of product functions which generate the maximum training error reduction degree into a model corresponding to the current step, wherein the recurrence form is as follows:
wherein h isl(x) Basis functions, x, generated for previous iterationsj-t and t-xjIs a currently added pair of remaining mirrors, like xv(k,m)And tkmThe meanings are the same; in addition, the coefficientAndand the M +1 regression coefficients are estimated together with other M +1 regression coefficients in the current model by a least square method.
Then, the above process is continued until the number of basis function sets M of the model reaches a preset maximum value. Taking 22 sample variables in this embodiment as an example, according to algorithm recommendation, M estimates about 2 times of the number of maximum sample variables, that is, it can be set to 45 here.
The process of improving the fitting degree by forward and gradually selecting the basis functions usually obtains an over-fitting model, so a backward pruning process is needed at the moment to prevent the over-fitting of the model.
Step 204: and gradually deleting the basis functions of which the accuracy contribution values are smaller than a preset value from the over-fitting model through a backward pruning process based on GCV to obtain a plurality of candidate models.
In this step, specifically, through the backward pruning process, the basis functions whose accuracy contribution values to the over-fit model are smaller than the preset value may be gradually deleted from the over-fit model based on the GCV, so as to obtain a plurality of candidate models.
When one of the basis functions is supposed to be deleted from the overfitting model, the minimum residual sum of squares increment can be generated, the accuracy contribution value of the basis function to the overfitting model is determined to be smaller than a preset value, and the basis function is deleted from the overfitting model to obtain a candidate model; and according to the operation, continuing to prune and delete the basis functions to obtain a plurality of candidate models.
Specifically, the backward pruning process can be performed based on a Generalized Cross-Validation (GCV) standard. In each step of backward pruning, the base functions which are deleted and generate the minimum residual sum-of-squares increment (namely, the influence on the accuracy of the current model is minimum) are deleted from the model, and the estimated optimal model of each deleted base function is obtainedWherein the GCV is defined by the following formula:
wherein, yiIs the true risk value for the sample of the item,for the predicted value of the item sample, M (λ) is a penalty coefficient for complexity of the overfitting model, and in this embodiment, M (λ) ═ M +1) + d × M may be set, M is the number of basis functions of the overfitting model, d is a penalty factor, and may be an integer from 0 to 4, and N is the number of item samples. Specifically, when the number of project samples is small, the value of d may be 2, 3 or 4, and when d is 3, the number of sample variables participating in regression may be reduced compared to when d is 2The number d is 2 in this embodiment; when the number of item samples is large, d may appropriately take a small value.
Step 205: and determining the candidate model with the minimum GCV value in the plurality of candidate models as a risk prediction model.
In this step, specifically, the candidate model with the smallest GCV value among the plurality of candidate models may be directly determined as the risk prediction model.
Specifically, referring to fig. 3, a schematic diagram of the size change of the GCV in the post-pruning process of the over-fit model is shown, where the number of the item samples is 46, and it can be seen from the diagram that when the number of the basis functions is 6, the GCV reaches the minimum value, and at this time, the candidate model with the basis function of 6 may be determined as the risk prediction model. I.e. the risk prediction model may now comprise 6 Basis Functions (BF).
Taking the 22 sample variables in the above table as an example, the 6 basis functions may be:
BF1=max(0,x21-0.5);BF2=max(0,x16+0);
BF3=max(0,x6-0.5);BF4=max(0,x13-0.5);
BF5=max(0,x20-0.5)*max(0,x4+0);
y=1.0148-0.8291*BF1-0.23267*BF2-0.35705*BF3-0.44021*BF4-0.41323*BF5;
from the above, the final risk prediction model includes 6 basis functions (the basis function corresponding to the intercept coefficient 1.0148 is constant 1), namely x 4-business model innovation, x 6-clear market demand, x 13-system error rate and stability, x 16-reasonable capital cost control, x 20-project personnel mobility, and x 21-clear operation target and platform, respectively.
Namely, M in the risk prediction model is 6, hm(x) Corresponding to the 6 basis functions described above.
Step 206: and determining a plurality of target basis functions corresponding to the risk prediction model, and determining sample variables corresponding to the target basis functions as target sample variables.
In this step, specifically, when the candidate model with the smallest GCV value is determined as the risk prediction model, a plurality of target basis functions corresponding to the risk prediction model can be determined. For example, if M in the risk prediction model is determined to be 6 in the above steps and respectively corresponds to the 6 basis functions, a plurality of target basis functions can be obtained as the 6 basis functions.
In addition, after the target basis functions are determined, the sample variables corresponding to the target basis functions may be determined as target sample variables, that is, the target sample variables may be directly used as input variables of the risk prediction model.
In addition, it should be noted that the candidate model automatically gives corresponding weight scores to the importance of the sample variables participating in the regression, and gives corresponding weights according to the contribution of the sample variables to the fitting degree of the risk prediction model, that is, the GCV square root after all the basis functions related to the sample variables are deleted can be taken as the contribution degree, and the difference value of the GCV square root without deleting any basis function is subtracted to be taken as the contribution degree; in addition, the most important sample variables are given with 100% of weight, the sample variables of which the model is not important are given with 0 weight, and the final risk prediction model is not involved, so that the risk prediction model can reveal the interaction strength relation between the output predicted value and the sample variables, and a reliable reference is provided for further risk prediction.
The significance of the 6 target sample variables included in the final risk prediction model can be referred to as the post-deletion GCV value, wherein the greater the post-deletion GCV value, the greater the significance.
Specifically, the post-deletion GCV value corresponding to x4 is 1.991, the post-deletion GCV value corresponding to x6 is 10.630, the post-deletion GCV value corresponding to x13 is 29.326, the post-deletion GCV value corresponding to x16 is 28.311, the post-deletion GCV value corresponding to x20 is 1.991, and the post-deletion GCV value corresponding to x21 is 100.000; namely, whether x 21-operation targets and platforms definitely have the largest influence on the risk output of the risk prediction model, and secondly whether x 13-system error rate and stability and x 16-capital cost control are reasonable, and x 4-business model innovation and x 20-project personnel liquidity have the smallest influence; and meanwhile, the influence degree of other sample variables is 0, and the other sample variables do not participate in the output of the risk prediction model.
The process of establishing the risk prediction model is described below by means of codes. Specifically, the related code mainly calls aresbarams 2 and aresbuild, which are multivariate adaptive regression functions of the aresslab. Part of the core code is as follows:
% risk prediction model based on time sequence multiple self-adaptive regression
% input data
M=importdata([filepath filename'.csv']);
[m,n]=size(M.data);
X_TMP=M.data(:,1:n-1);
% data normalization
max _ attrs ═ max (X _ TMP); % maximum value per column
min _ attrs ═ zeros (1, n-1); % minimum value of-0 per column
m=size(X_TMP,1);
maxnew=repmat(max_attrs,m,1);
minnew=repmat(min_attrs,m,1);
X ═ X _ TMP-minnew)/(maxnew-minnew); % assigning normalized data to X
Y=M.data(:,n);
%%
% ARESLab modeling
params=aresparams2('maxFuncs',45,'c',2,'maxInteractions',2,'cubic',false);[model,~,resultsEval]=aresbuild(X,Y,params)。
In addition, it should be noted that the risk prediction model in this embodiment performs risk prediction for each staged time point of the item to be predicted, that is, a time-series process. And inputting corresponding target risk prediction variables to carry out risk prediction according to the established risk prediction model at the time point (such as a demand analysis time point, a pre-release time point and the like) of each key version of the project to be predicted.
For example, taking 22 sample variables in this embodiment as an example, assuming that the value after the preprocessing is [1,0,1,0,1,0.5,0,0,0.5,0.5,0.5,0, 0,0,0,1,0.5,0,1,0], the risk prediction result after the risk prediction model is 0.6002, which indicates that the item to be predicted has a high risk.
Thus, on the basis of a multiple self-adaptive regression spline basic model, the fitting degree of the model is improved by selecting the basis functions step by step in the forward direction, the basis functions which have small contribution to the precision of the model are deleted in the backward pruning, then selecting the candidate model with the minimum GCV from the plurality of candidate models as a final risk prediction model, and the sample variable corresponding to the risk prediction model is determined as the target sample variable, so that the accuracy of the risk prediction model is ensured, and the risk prediction model is corresponding to the target sample variable of fixed type, so that the risk prediction model can better quantify the risk of the current item and the influence degree of each risk point, therefore, valuable reference is provided for risk control of the project to be predicted, the target risk prediction variable can be automatically and efficiently found, quantification and tracking of risks in the project are achieved, and a project risk assessment scheme which is easy to land and reliable is provided for related enterprises and users.
In addition, when a target risk prediction variable is selected from a plurality of risk prediction variables corresponding to the item to be predicted based on the risk prediction model, a risk prediction variable that is the same as the target sample variable may be directly selected from the plurality of risk prediction variables corresponding to the item to be predicted, and the same risk prediction variable as the target sample variable may be determined as the target risk prediction variable.
The method comprises the steps of determining a target risk prediction variable capable of risk prediction through a trained risk prediction model, taking a risk prediction variable which is the same as a target sample variable in a plurality of risk prediction variables corresponding to a project to be predicted as the target risk prediction variable, and obtaining a model with higher accuracy by selecting a basis function forward to improve the fitting degree and deleting the basis function backward based on the risk prediction model, so that the accuracy of prediction can be ensured when the target risk prediction variable is the same as the target sample variable in the risk prediction model.
In this way, according to the risk prediction method provided by this embodiment, the target risk prediction variable is selected from the multiple risk prediction variables corresponding to the project to be predicted based on the risk prediction model, so that the selected target risk prediction variable is matched with the risk prediction model, an implicit valuable data pattern is automatically and efficiently found, and the key risk prediction variables affecting risk output are revealed, so that when the target risk prediction variable is input into the risk prediction model to obtain the risk prediction result output by the risk prediction model, accuracy of risk prediction can be ensured, quantification and tracking of risks in the project are realized, and operability of risk control is increased.
In addition, as shown in fig. 4, a block diagram of a risk prediction apparatus according to an embodiment of the present invention is shown, where the apparatus includes:
a first obtaining module 401, configured to select a target risk prediction variable from multiple risk prediction variables corresponding to a project to be predicted based on a risk prediction model;
a second obtaining module 402, configured to input the target risk prediction variable into the risk prediction model, so as to obtain a risk prediction result output by the risk prediction model; wherein,
the risk prediction model is obtained by taking a plurality of project samples as training samples in advance and taking the real risk values of the project samples as target values in a training mode, wherein the project samples comprise a plurality of sample variables.
It should be noted that, the apparatus in this embodiment can implement all the steps in the method and achieve the same technical effect, and detailed descriptions of the steps and the technical effects that are the same in this embodiment as those in the method embodiment are omitted here.
In addition, as shown in fig. 5, an entity structure schematic diagram of the electronic device provided in the embodiment of the present invention is shown, where the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke a computer program stored on memory 530 and executable on processor 510 to perform the methods provided by the various embodiments described above, including, for example: selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to the project to be predicted based on a risk prediction model; inputting the target risk prediction variable into the risk prediction model to obtain a risk prediction result output by the risk prediction model; the risk prediction model is obtained by training a plurality of project samples serving as training samples in advance and real risk values of the project samples serving as target values, wherein the project samples comprise a plurality of sample variables.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention also provide a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method provided by the above embodiments.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of risk prediction, comprising:
selecting a target risk prediction variable from a plurality of risk prediction variables corresponding to the project to be predicted based on a risk prediction model;
inputting the target risk prediction variable into the risk prediction model to obtain a risk prediction result output by the risk prediction model; wherein,
the risk prediction model is obtained by taking a plurality of project samples as training samples in advance and taking the real risk values of the project samples as target values in a training mode, wherein the project samples comprise a plurality of sample variables.
2. The risk prediction method according to claim 1, wherein before selecting the target risk prediction variable from the plurality of risk prediction variables corresponding to the item to be predicted based on the risk prediction model, the method further comprises:
acquiring a multi-element self-adaptive regression spline basic model;
constructing a basis function based on each sample variable in the project sample to obtain a plurality of basis functions;
gradually selecting the multiple basis functions in a forward direction, and performing fitting degree training on the regression spline basic model to obtain an overfitting model;
gradually deleting the basis functions of which the accuracy contribution values are smaller than a preset value from the over-fitting model through a backward pruning process based on a generalized interactive verification standard GCV to obtain a plurality of candidate models;
determining a candidate model with the smallest GCV value in the plurality of candidate models as the risk prediction model;
and determining a plurality of target basis functions corresponding to the risk prediction model, and determining sample variables corresponding to the target basis functions as target sample variables.
3. The risk prediction method of claim 2, wherein selecting the target risk prediction variable from the plurality of risk prediction variables corresponding to the item to be predicted based on the risk prediction model comprises:
and selecting the risk prediction variable which is the same as the target sample variable from the plurality of risk prediction variables corresponding to the project to be predicted, and determining the risk prediction variable which is the same as the target sample variable as the target risk prediction variable.
4. The risk prediction method according to claim 2, characterized in that the regression spline basic model is:
wherein(x) a risk prediction value h representing the output of the regression spline fundamental modelm(x) Represents the basis function corresponding to the M-th sample variable x, M represents the number of sample variables included in the item sample β0Representing the initial regression coefficient, βmRepresents hm(x) The corresponding regression coefficients.
6. The risk prediction method of claim 5, wherein constructing basis functions based on each sample variable in the project sample to obtain a plurality of basis functions comprises:
the basis functions are represented by the following formulas:
wherein h ism(x) Representing the basis function, x, corresponding to the m-th sample variable xv(k,m)Represents the target value, t, of the m-th sample variable xkmRepresents any one value, k, of all possible values corresponding to the m-th sample variable xmRepresenting the number of product interaction factors of the sample variables; when x isv(k,m)Greater than tkmWhen S is presentkmIs 1 when xv(k,m)Is less than or equal to tkmWhen S is presentkmIs-1.
7. The risk prediction method according to claim 1, wherein before the inputting the target risk prediction variable into the risk prediction model and obtaining the risk prediction result output by the risk prediction model, the method further comprises:
and carrying out normalization processing on the target risk prediction variable to obtain a processed target risk prediction variable.
8. The risk prediction method according to any one of claims 1 to 7, wherein the plurality of sample variables belong to at least two project quality control categories; the project quality control category comprises at least two of a market research category, a market demand category, a project technology category, a project management category, a project operation category and a project operation and maintenance category.
9. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the risk prediction method of any one of claims 1 to 8 when executing the program.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, is adapted to carry out the steps of the risk prediction method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911415220.XA CN111160662A (en) | 2019-12-31 | 2019-12-31 | Risk prediction method, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911415220.XA CN111160662A (en) | 2019-12-31 | 2019-12-31 | Risk prediction method, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111160662A true CN111160662A (en) | 2020-05-15 |
Family
ID=70560049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911415220.XA Pending CN111160662A (en) | 2019-12-31 | 2019-12-31 | Risk prediction method, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160662A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967774A (en) * | 2020-08-18 | 2020-11-20 | 中国银行股份有限公司 | Software quality risk prediction method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090043637A1 (en) * | 2004-06-01 | 2009-02-12 | Eder Jeffrey Scott | Extended value and risk management system |
CN106022592A (en) * | 2016-05-16 | 2016-10-12 | 中国电子科技集团公司电子科学研究院 | Power consumption behavior anomaly detection and public security risk early warning method and device |
US9563852B1 (en) * | 2016-06-21 | 2017-02-07 | Iteris, Inc. | Pest occurrence risk assessment and prediction in neighboring fields, crops and soils using crowd-sourced occurrence data |
CN108364106A (en) * | 2018-02-27 | 2018-08-03 | 平安科技(深圳)有限公司 | A kind of expense report Risk Forecast Method, device, terminal device and storage medium |
CN109165840A (en) * | 2018-08-20 | 2019-01-08 | 平安科技(深圳)有限公司 | Risk profile processing method, device, computer equipment and medium |
CN109300040A (en) * | 2018-08-29 | 2019-02-01 | 中国科学院自动化研究所 | Overseas investment methods of risk assessment and system based on full media big data technology |
CN109816221A (en) * | 2019-01-07 | 2019-05-28 | 平安科技(深圳)有限公司 | Decision of Project Risk method, apparatus, computer equipment and storage medium |
CN110570111A (en) * | 2019-08-30 | 2019-12-13 | 阿里巴巴集团控股有限公司 | Enterprise risk prediction method, model training method, device and equipment |
CN110598959A (en) * | 2018-05-23 | 2019-12-20 | 中国移动通信集团浙江有限公司 | Asset risk assessment method and device, electronic equipment and storage medium |
-
2019
- 2019-12-31 CN CN201911415220.XA patent/CN111160662A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090043637A1 (en) * | 2004-06-01 | 2009-02-12 | Eder Jeffrey Scott | Extended value and risk management system |
CN106022592A (en) * | 2016-05-16 | 2016-10-12 | 中国电子科技集团公司电子科学研究院 | Power consumption behavior anomaly detection and public security risk early warning method and device |
US9563852B1 (en) * | 2016-06-21 | 2017-02-07 | Iteris, Inc. | Pest occurrence risk assessment and prediction in neighboring fields, crops and soils using crowd-sourced occurrence data |
CN108364106A (en) * | 2018-02-27 | 2018-08-03 | 平安科技(深圳)有限公司 | A kind of expense report Risk Forecast Method, device, terminal device and storage medium |
CN110598959A (en) * | 2018-05-23 | 2019-12-20 | 中国移动通信集团浙江有限公司 | Asset risk assessment method and device, electronic equipment and storage medium |
CN109165840A (en) * | 2018-08-20 | 2019-01-08 | 平安科技(深圳)有限公司 | Risk profile processing method, device, computer equipment and medium |
CN109300040A (en) * | 2018-08-29 | 2019-02-01 | 中国科学院自动化研究所 | Overseas investment methods of risk assessment and system based on full media big data technology |
CN109816221A (en) * | 2019-01-07 | 2019-05-28 | 平安科技(深圳)有限公司 | Decision of Project Risk method, apparatus, computer equipment and storage medium |
CN110570111A (en) * | 2019-08-30 | 2019-12-13 | 阿里巴巴集团控股有限公司 | Enterprise risk prediction method, model training method, device and equipment |
Non-Patent Citations (3)
Title |
---|
张影等: "大数据分析与应用", 天津大学出版社, pages: 214 - 126 * |
李万庆;李海涛;孟文清;: "工程项目工期风险的支持向量机预测模型", no. 04, pages 401 - 412 * |
王梦菊;胡晓旭;: "基于组合数据挖掘技术的信用评估模型研究", no. 23, pages 135 - 136 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967774A (en) * | 2020-08-18 | 2020-11-20 | 中国银行股份有限公司 | Software quality risk prediction method and device |
CN111967774B (en) * | 2020-08-18 | 2023-08-22 | 中国银行股份有限公司 | Software quality risk prediction method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dabbagh et al. | An approach for integrating the prioritization of functional and nonfunctional requirements | |
Korableva et al. | Designing a Decision Support System for Predicting Innovation Activity. | |
CN109710766B (en) | Complaint tendency analysis early warning method and device for work order data | |
CN110489749B (en) | Business process optimization method of intelligent office automation system | |
CN109583782B (en) | Automobile financial wind control method supporting multiple data sources | |
CN111581516A (en) | Investment product recommendation method and related device | |
CN110222838B (en) | Document sorting method and device, electronic equipment and storage medium | |
Relich | A computational intelligence approach to predicting new product success | |
CN116911962A (en) | Article selecting device and method based on data model | |
CN112132356A (en) | Stock price prediction method based on space-time diagram attention mechanism | |
Shnorr | Integral assessment of retail digitalization | |
CN111160662A (en) | Risk prediction method, electronic equipment and storage medium | |
CN113159419A (en) | Group feature portrait analysis method, device and equipment and readable storage medium | |
US20230252387A1 (en) | Apparatus, method and recording medium storing commands for providing artificial-intelligence-based risk management solution in credit exposure business of financial institution | |
CN111105127A (en) | Modular product design evaluation method based on data driving | |
CN113256404A (en) | Data processing method and device | |
Patankar et al. | An Efficient System for the Prediction of House Prices using a Neural Network Algorithm | |
JP2023534475A (en) | Machine learning feature recommendation | |
Boongasame et al. | Cryptocurrency price forecasting method using long short-term memory with time-varying parameters | |
CN117556264B (en) | Training method and device for evaluation model and electronic equipment | |
CN117422314B (en) | Enterprise data evaluation method and equipment based on big data analysis | |
CN113536672B (en) | Target object processing method and device | |
Hofert et al. | Dependence Model Assessment and Selection with DecoupleNets | |
CN117910522A (en) | Sentence vector model optimization method, text similarity calculation method and device | |
Pekalp et al. | Estimation of the mean value function for gamma trend renewal process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200515 |