CN117952566A

CN117952566A - Project cost prediction method and computer system based on ridge regression machine learning

Info

Publication number: CN117952566A
Application number: CN202410343125.8A
Authority: CN
Inventors: 孙国岩; 马杰; 何太明
Original assignee: NANJING AUDIT UNIVERSITY
Current assignee: NANJING AUDIT UNIVERSITY
Priority date: 2024-03-25
Filing date: 2024-03-25
Publication date: 2024-04-30

Abstract

The invention provides a project cost prediction method and a computer system based on ridge regression machine learning, which relate to the technical field of project cost prediction and comprise the following steps: establishing data communication with a project management database, acquiring data with time sequence, and establishing a basic data set; performing data denoising to generate a data denoising result, and performing ridge regression model construction; carrying out data distribution and data quantity analysis, and matching and dividing K values; performing data random division of the data noise reduction result, establishing K data sets, performing regularization parameter optimization of the ridge regression model, and updating the ridge regression model; extracting project data of a target project, predicting project cost, and generating an initial prediction result; and executing risk cost deviation compensation, and generating a calibration prediction result. The method solves the technical problems that the traditional method cannot effectively process the time sequence and noise in project data, and is difficult to accurately optimize model parameters, so that project cost prediction is poor in accuracy and reliability.

Description

Project cost prediction method and computer system based on ridge regression machine learning

Technical Field

The invention relates to the technical field of project cost prediction, in particular to a project cost prediction method based on ridge regression machine learning and a computer system.

Background

In project management, accurately predicting project cost is critical to reasonable planning, resource allocation and risk management, while the prior art still has some technical problems, on one hand, project cost changes along with time and can be influenced by various interference factors, and the traditional project cost prediction method cannot effectively process timeliness and noise in project data, so that cost prediction is complicated; on the other hand, for the ridge regression model, the selection of regularization parameters is critical to the performance of the model, and the prior art is difficult to accurately optimize the parameters, so that the generalization capability of the model is insufficient.

Therefore, a new method is needed, which can better adapt to the actual requirements in project management, and improve the accuracy and reliability of cost prediction.

Disclosure of Invention

The application provides a project cost prediction method based on ridge regression machine learning, which aims to solve the technical problems that the traditional method cannot effectively process time sequence and noise in project data, and model parameters are difficult to accurately optimize, so that the project cost prediction is poor in accuracy and reliability.

In view of the above, the present application provides a project cost prediction method and a computer system based on ridge regression machine learning.

In a first aspect of the present disclosure, there is provided a project cost prediction method based on ridge regression machine learning, the method comprising: establishing data communication with a project management database, acquiring data with time sequence based on a data communication result, and establishing a basic data set; performing data denoising of the basic data set, generating a data denoising result, and performing ridge regression model construction according to the data denoising result; carrying out data distribution and data quantity analysis on the data noise reduction result, and matching and dividing a K value according to the analysis result; performing data random division of a data noise reduction result by using the division K values, establishing K data sets, taking K-1 data sets as training sets, taking the rest 1 data set as a test set, performing regularization parameter optimization of a ridge regression model, and updating the ridge regression model through an optimization result; extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost, and generating an initial prediction result; and performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

In a second aspect of the present disclosure, there is provided a project cost prediction computer system based on ridge regression machine learning, the computer system being used in the project cost prediction method based on ridge regression machine learning, the computer system comprising: the basic data set acquisition module is used for establishing data communication with the project management database, acquiring time sequence data based on a data communication result and establishing a basic data set; the ridge regression model construction module is used for executing data denoising of the basic data set, generating a data denoising result and executing ridge regression model construction according to the data denoising result; the data analysis module is used for carrying out data distribution and data quantity analysis on the data noise reduction result and dividing K values according to matching of the analysis result; the ridge regression model updating module is used for randomly dividing the data of the data noise reduction result by the division K value, establishing K data sets, taking K-1 data sets as training sets and the rest 1 data as test sets, executing regularization parameter optimization of the ridge regression model, and updating the ridge regression model through the optimizing result; the initial prediction result generation module is used for extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost and generating an initial prediction result; and the calibration prediction result generation module is used for executing risk cost deviation compensation of the initial prediction result through the self-adaptive optimization network to generate a calibration prediction result, wherein the self-adaptive optimization network is a correction network connected with the ridge regression model.

In a third aspect of the disclosure, a computer device is provided, including a memory storing a computer program and a processor implementing the following steps when executing the computer program: establishing data communication with a project management database, acquiring data with time sequence based on a data communication result, and establishing a basic data set; performing data denoising of the basic data set, generating a data denoising result, and performing ridge regression model construction according to the data denoising result; carrying out data distribution and data quantity analysis on the data noise reduction result, and matching and dividing a K value according to the analysis result; performing data random division of a data noise reduction result by using the division K values, establishing K data sets, taking K-1 data sets as training sets, taking the rest 1 data set as a test set, performing regularization parameter optimization of a ridge regression model, and updating the ridge regression model through an optimization result; extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost, and generating an initial prediction result; and performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of: establishing data communication with a project management database, acquiring data with time sequence based on a data communication result, and establishing a basic data set; performing data denoising of the basic data set, generating a data denoising result, and performing ridge regression model construction according to the data denoising result; carrying out data distribution and data quantity analysis on the data noise reduction result, and matching and dividing a K value according to the analysis result; performing data random division of a data noise reduction result by using the division K values, establishing K data sets, taking K-1 data sets as training sets, taking the rest 1 data set as a test set, performing regularization parameter optimization of a ridge regression model, and updating the ridge regression model through an optimization result; extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost, and generating an initial prediction result; and performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

One or more technical schemes provided by the application have at least the following technical effects or advantages:

By establishing a basic data set and executing data noise reduction, the timeliness and noise of the data are effectively processed, and the fitting capacity of the model to actual project data is improved, so that the accuracy of cost prediction is improved; the data distribution and the data quantity analysis are carried out on the data noise reduction result, and the K value is matched and divided, so that the quality of random data division is improved, model training and verification are better facilitated, and the generalization performance of the model is improved; by executing regularization parameter optimization of the ridge regression model, regularization parameters of the ridge regression model are effectively optimized, and the performance and stability of the model are improved; and risk cost deviation compensation is performed through the adaptive optimization network, so that cost prediction is more reliable and practical. In general, the project cost prediction method based on the ridge regression machine learning achieves the technical effect of improving the accuracy and the reliability of cost prediction by comprehensively considering data time sequence, noise, model parameter optimization and the like, so that the model is more suitable for the change of an actual project, and more accurate cost prediction is provided for project management.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

Fig. 1 is a schematic flow chart of a project cost prediction method based on ridge regression machine learning according to an embodiment of the present application.

FIG. 2 is a schematic diagram of a project cost prediction computer system based on ridge regression machine learning according to an embodiment of the present application.

Fig. 3 is an internal structure diagram of a computer device according to an embodiment of the present application.

Reference numerals illustrate: the system comprises a basic data set acquisition module 10, a ridge regression model construction module 20, a data analysis module 30, a ridge regression model updating module 40, an initial prediction result generation module 50 and a calibration prediction result generation module 60.

Detailed Description

The project cost prediction method based on ridge regression machine learning solves the technical problems that the traditional method cannot effectively process time sequence and noise in project data, and model parameters are difficult to accurately optimize, so that the project cost prediction is poor in accuracy and reliability.

Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

As shown in FIG. 1, an embodiment of the present application provides a project cost prediction method based on ridge regression machine learning, the method comprising:

Establishing data communication with a project management database, acquiring data with time sequence based on a data communication result, and establishing a basic data set;

The data communication results comprise raw data retrieved from the project management database, including different phases of the project, tasks, resource allocation, cost information, etc., the data retrieved from the database is sorted in chronological order, e.g. time-stamped order, or organized in chronological order for execution of the project. By sorting and processing the time-series data, a basic data set is established, which includes characteristics of each stage of project execution, such as time, resource usage, task completion, etc.

Performing data denoising of the basic data set, generating a data denoising result, and performing ridge regression model construction according to the data denoising result;

The data noise reduction refers to processing noise existing in a basic data set to improve the quality and reliability of the data, the data noise reduction method comprises a smoothing technology, a filter, abnormal value detection and the like, and after the data noise reduction is executed, a processed data noise reduction result is obtained, and the result is helpful for improving the accuracy of a subsequent model. The construction of a ridge regression model is performed using the data noise reduction results, ridge regression being a variant of linear regression that prevents overfitting by introducing regularization terms, the goal of ridge regression being to fit the data and find the optimal model parameters for cost prediction.

Carrying out data distribution and data quantity analysis on the data noise reduction result, and matching and dividing a K value according to the analysis result;

Carrying out data distribution analysis on the data noise reduction result, wherein the data distribution analysis comprises the steps of obtaining the distribution condition of each feature by using a statistical method, drawing a histogram or a density map and other modes; data amount analysis is performed to acquire the data amount of each feature, for example, checking whether there is a missing value or sparse data. According to the data distribution and the data amount analysis result, a proper K value is selected, the K value is used for K-fold cross validation, the K value can be selected based on the data property, the model complexity and the requirements on a training set and a test set, in general, the larger the K value is, the more reliable the evaluation result of the model is, but the calculation cost is increased. The purpose of dividing the K value is to better evaluate the performance of the model, and through multiple training and testing, the model is ensured to have good generalization capability on data of different subsets.

Performing data random division of a data noise reduction result by using the division K values, establishing K data sets, taking K-1 data sets as training sets, taking the rest 1 data set as a test set, performing regularization parameter optimization of a ridge regression model, and updating the ridge regression model through an optimization result;

According to the K value obtained by matching, randomly dividing the data noise reduction result into K subsets, selecting one subset as a test set and the remaining K-1 subsets as a training set for each iteration, wherein the division is used for cross-validation, so that the model is ensured to train and test on different data subsets, and the performance of the model is better evaluated.

For each iteration, K-1 training sets are used for training, a ridge regression model is used for regularization parameter optimization, cross verification is used for finding the optimal regularization parameter so as to prevent over fitting, the regularization parameter of the ridge regression is a super parameter to be adjusted, and optimization is carried out by searching the parameter value with the best performance on the verification set. And retraining the ridge regression model on the whole training set by utilizing regularization parameters obtained by optimizing, so that the whole training set is ensured to be used for final training of the model, and a more accurate cost prediction model is obtained.

The whole process carries out model training and testing by repeated iteration and using different data subsets, and continuously improves the ridge regression model by optimizing regularization parameters, which is beneficial to improving the generalization capability of the model and the prediction accuracy of new data.

Extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost, and generating an initial prediction result;

And extracting relevant data of the target item, including information such as characteristics, cost, resource allocation, execution time sequence and the like of the item, and acquiring the target item data. The extracted target item data is used as input of a ridge regression model, the ridge regression model is used for predicting the cost of the target item data, the model utilizes parameters obtained by previous training, a predicted cost value of the cost is generated by combining the characteristics of the target item, the predicted cost value can reflect the estimation of the model on the current item, the predicted cost result of the ridge regression model on the target item is used as an initial predicted result, a reference is provided for item management, and the cost can be further optimized and adjusted in the subsequent steps.

And performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

The adaptive optimization network is a network for correcting initial prediction results, which may be a neural network or other machine learning model, and aims to adjust initial predictions by learning so as to reduce deviation of risk costs.

The method comprises the steps that a certain deviation exists in an initial prediction result, the risk cost of an actual project cannot be completely and accurately reflected, the deviation is adjusted through learning, the self-adaptive optimization network learns how to adjust the initial prediction through analyzing the risk cost condition of the actual project so as to more accurately reflect the actual risk of the project, the structure and parameters of the self-adaptive optimization network including the number of layers, the number of nodes, an activation function and the like of the network are determined, the design of the network is capable of processing input risk factors and outputting corresponding risk prediction results, the existing historical data is used for training the network, the historical data comprises information such as the time length, the historical technical risk and the historical actual cost of the historical project, the weight and bias of the historical data are attempted to be adjusted through learning the data, the complex relation between the time length and the technical risk of the project and the actual cost is captured through learning the historical data, and the network can dynamically adjust the prediction so as to adapt to the characteristics and the risk conditions of different projects.

And correcting the initial prediction result by using the self-adaptive optimization network to generate a calibration prediction result, wherein the result considers the adjustment of the risk cost, can better reflect the actual cost condition of the project, and ensures that the final prediction is more accurate and reliable.

Further, the method further comprises:

Carrying out data analysis on the project data and extracting data characteristics;

Executing project duration prediction according to the data features, and generating a project duration risk factor;

performing technical risk prediction on the data characteristics to generate project technical risk factors;

Inputting the project duration risk factors and the project technology risk factors into the self-adaptive optimization network, and executing risk prediction;

And generating the calibration prediction result through the risk prediction result and the initial prediction result.

The data of the target project is subjected to data analysis, including obtaining distribution, trend, abnormal value and the like of the data by using a statistical method, a visualization tool and the like, the data analysis aims at deeply knowing the characteristics of the project data, the data characteristics related to cost prediction are extracted based on the result of the data analysis, and the data characteristics are input variables of model training, and can capture influencing factors of the project cost, including project duration, resource allocation, technical difficulty, frequency and the like.

The method comprises the steps of performing project duration estimation by using extracted data features, for example, using a regression model or other duration estimation methods to determine the time required for completing the project, wherein the time required for estimating the project is aimed at according to the features and historical data of the project, and generating a risk factor of the project duration through the result of project duration estimation, namely, the longer the estimated project duration is, the larger the project duration risk factor is, and the risk factor is an adjustment parameter used for carrying out finer adjustment on cost prediction in the subsequent steps.

Using the extracted data features, a pre-estimate of technical risk is performed, for example, evaluating technical difficulty, innovativeness, etc. of the item to determine a technical risk level of the item, the pre-estimate of technical risk being intended to identify potential technical challenges and uncertainties, and evaluating technical complexity of the item. And generating a project technical risk factor according to the result of the technical risk prediction, wherein the factor can reflect the technical risk degree of the project and possibly influence the actual cost of the project, and the technical risk factor is also an adjustment parameter for considering the technical uncertainty and risk in the cost prediction.

The method comprises the steps of taking the generated project duration risk factors and project technology risk factors as inputs, providing the factors for a trained self-adaptive optimization network, reflecting the project duration and the risk level in the technical aspect, carrying out risk prediction on the risk factors by the network according to the complex relationship between the project duration and the technical risk learned in the training process and the actual cost, outputting corresponding risk prediction results, comprehensively considering the duration and the risk in the technical aspect by the prediction results, and reflecting the cost risk level of the project more accurately, so that the accuracy of cost prediction is improved, and the risk situation of the actual project is more met.

Combining the risk prediction result with the initial prediction result to generate a calibration prediction result, wherein the result considers factors such as project duration, technical risk and the like, provides more comprehensive estimation for final project cost, and can be combined by directly multiplying the risk prediction result and the initial prediction result to ensure that risk adjustment is reasonably fused into the initial prediction, so as to generate a more accurate and comprehensive calibration prediction result, and the result can be used for project management and decision making to better understand potential risks and provide more reasonable budget for project implementation.

Further, the method further comprises:

establishing a main body database of the project execution main body;

performing execution time sequence evaluation based on project difficulty and project frequency based on the main body database, and constructing a current state feature through an execution time sequence evaluation result, wherein the current state feature characterizes the existing execution capacity of the execution main body;

performing timing-based attack innovation progress evaluation by using the main body database, and establishing innovation evaluation factors;

And establishing a main body influence by the current state characteristics and the innovation evaluation factors, updating the project duration risk factors and the project technology risk factors by the main body influence, and completing risk prediction by the updated project duration risk factors and the updated project technology risk factors.

The scope and members of a project execution subject are defined, which refers to an organization, team, or individual involved in project execution and execution, such as a project manager, team member, related stakeholder, and the like. Creating a database for storing information related to the project execution subject, the database comprising data about basic information, roles, skills, historical execution capabilities and the like of each subject member, entering the related data of the project execution subject into the subject database, the database being capable of helping to analyze the existing execution capabilities of the project execution subject and providing a basis for updating the project duration risk factors and the project technology risk factors.

And performing execution time sequence evaluation on the project execution subject based on project difficulty and project frequency, wherein the project difficulty relates to factors such as technical complexity, resource requirement and the like, the project frequency represents the frequency of the execution subject in the past participating in projects, the evaluation aims to acquire the performance of the execution subject in different types of projects, and the execution capability of the execution subject is quantitatively evaluated.

Based on the results of the execution time sequence evaluation, current state characteristics are constructed, the characteristics comprise adaptability scores for project difficulty, adaptability scores for project frequency and the like, the values of the characteristics reflect the execution capacity of an execution main body to cope with various projects in the current state, and quantitative data are provided for subsequent evaluation factor establishment.

The data in the main body database are sequenced according to time and divided according to a preset period, time sequence data of different periods are obtained, and the time sequence data comprise information such as innovation activities of different project stages, innovation degree of solutions and the like. Based on the time sequence data, innovations and progress of different periods are evaluated to know the development trend of an executing body in the aspect of hardening innovation, such as the development trend of innovation capability, the solving effect of hardening problem, technical innovation in projects and the like, and based on the time sequence evaluation result, innovation evaluation factors are established, and the purpose of the innovation evaluation factors is to quantify the performance of the executing body in the aspect of hardening innovation, so that quantitative data are provided for subsequent updating of project duration risk factors and project technical risk factors.

And establishing a main influence by using the current state characteristics and the innovation evaluation factors, for example, presetting weights of the current state characteristics and the innovation evaluation factors, wherein the weights can be set according to actual conditions and specific requirements, weighting and summing the current state characteristics and the innovation evaluation factors by using the corresponding weights, and taking a calculation result as the main influence. This principal influence takes into account the current situation and innovation capabilities of the executing principal so that the risk factors can be more accurately adjusted.

And updating the project duration risk factors and the project technology risk factors by using the established main body influence, wherein the updating method can be to directly multiply the main body influence with the project duration risk factors and the project technology risk factors respectively, and the products are corresponding project duration risk factors and project technology risk factors to be updated, and the updated risk factors reflect the actual influence of an executing main body on project execution and take the existing capability and innovation level of the executing main body into consideration.

And the updated project duration risk factors and project technology risk factors are utilized to complete final risk prediction, the influence of an execution subject is considered in the prediction, the cost risk level of the project is reflected more accurately, and the project manager is helped to better understand the potential risk of the project and make a more intelligent decision.

Further, the method further comprises:

carrying out data integrity check of each data packet on the basic data set, carrying out one-round screening through an integrity detection result, and marking the screening result as a first data set;

performing data similarity clustering among group data on the first data set to obtain a data similarity clustering result;

Carrying out inter-group data smooth screening of each clustering result according to the data similarity clustering result, and generating a second data set according to the smooth screening result;

and taking the second data set as the data noise reduction result.

Defining data integrity criteria, i.e. determining key information and fields each data packet in the underlying data set should contain, and specifying the data quality requirements each data packet should meet. For each data packet, performing a data integrity check including, missing value detection, checking if there is missing critical information or fields; checking the data format, and verifying whether the data accords with the designated format and type; logical relationship checking ensures that the logical relationship between the data is consistent. For each data packet, the results of the integrity check are recorded, including which data packets passed the integrity check and which data packets did not meet the criteria. Based on the result of the integrity test, a round of screening is performed, the data packets passing the integrity test are selected as a first data set, and the data set is a data subset with higher quality after a round of screening.

And clustering algorithms, such as K-means clustering, hierarchical clustering, DBSCAN and the like, are used for performing similar clustering among group data in the first data set, wherein measurement modes for measuring data similarity comprise Euclidean distance, manhattan distance, cosine similarity and the like, the algorithm divides the data into different clusters according to a similarity calculation result, the data similarity in each cluster is higher, and the similarity among the different clusters is lower. A result of the data similarity clustering, i.e. the cluster to which each data sample belongs, is obtained, reflecting the similar data sets present in the first data set.

And carrying out inter-group data smoothing screening on each cluster by using the obtained data similarity clustering result, wherein the smoothing method comprises moving average, exponential smoothing and the like, smoothing the data under each cluster result by using a trend smoothing method to generate a smoothed data sequence which can better reflect the overall trend of the data, ensure smoother trend among the data, reduce fluctuation in a short period and generate a second data set based on the smoothing screening result, wherein the second data set has smoother data trend and lower noise level.

The obtained second data set is used as a data noise reduction result, so that the data can be ensured to be processed and cleaned to a certain degree before the ridge regression model is constructed, and the training effect and the prediction accuracy of the model can be improved.

Further, the method further comprises:

performing data dimension reduction on each clustering result by taking group data as a unit, and establishing data group characteristics, wherein the data group characteristics are characteristic sets for ignoring cost data in a basic data set;

taking the data group characteristics as reference characteristics, and executing data sequence ordering under each clustering result;

And executing trend smoothing processing under each clustering result according to the sequential ordering result to finish inter-group data smoothing screening.

The method for reducing the dimension of the data comprises the steps of taking group data as a unit, reducing the dimension of the data in each group by taking the data in each group result as a group, taking the data in each group as a main component analysis (PCA), and the like, which means that independent dimension reduction operation is carried out on the clusters obtained by each group, the characteristics of the data group are established according to the dimension reduction result, the characteristics reflect the position and the distribution of each data group in a dimension reduction space, and when the characteristics of the data group are established, the cost data in the basic data set are ignored, because the cost is a parameter needing to be optimized, so that the characteristics of the data group mainly reflect the characteristics which are not related to the cost.

The data sequence ordering method comprises the steps of using the established data group characteristics as reference characteristics to order the data under each clustering result, including ordering the data groups by using the reference characteristics to ensure that similar data groups are close after ordering, rearranging the data in each data group according to the ordering result to form an ordered data sequence, and the purpose of the data sequence ordering is to establish the ordered data sequence in the data groups, so that the similar data groups are easier to compare and analyze after ordering.

And carrying out trend smoothing on the ordered data sequences under each clustering result, wherein the trend smoothing is used for removing fluctuation and noise in a short period, retaining long-term trend and change of data, the trend smoothing method comprises moving average, exponential smoothing and the like, and the trend smoothing method is used for carrying out smoothing on the ordered data sequences under each clustering result to generate a smoothed data sequence which can better reflect the overall trend of the data. By means of trend smoothing processing, inter-group data smoothing screening is achieved, which ensures that trends of data among different groups are smoother, and short-term fluctuations are reduced.

Further, the method further comprises:

Obtaining construction constraint parameters of a ridge regression model, wherein the construction constraint parameters comprise speed constraint parameters;

Configuring a screening rule of the isolated data by the construction constraint parameters;

And carrying out rejection screening when carrying out data similarity clustering according to the screening rule, and completing data similarity clustering result construction according to the rejection screening result.

The mathematical expression and the construction mode of the ridge regression model are determined, and the ridge regression is a regularized linear regression model, which comprises an additional regularization term for limiting the complexity of the model. The construction constraint parameters comprise speed constraint parameters which are parameters for controlling the construction process of the ridge regression model, and the speed constraint parameters are weights of regularization terms and are used for balancing the relation between fitting and complexity of the model.

Based on actual conditions and specific requirements, setting values of speed constraint parameters, wherein the selection of the values relates to the balance among training effect, overfitting and under fitting of the model, and the convergence speed and fitting degree of the ridge regression model can be controlled by adjusting the speed constraint parameters. In the construction process of the ridge regression model, the selected speed constraint parameters are applied to regularization terms of the model, so that the model is constrained by regularization while learning data, and the model is enabled to be more robust and generalized.

Isolated data refers to data in a dataset that has abnormal characteristics or is significantly different from other data points, and in the construction of a ridge regression model, it is necessary to screen out these isolated data to improve the robustness of the model. And configuring rules for screening the isolated data by constructing constraint parameters, including threshold setting, distance measurement, abnormal value detection method and the like, applying the configured screening rules in the data set, and marking out isolated data points conforming to the rules, so that abnormal data which possibly interfere with the performance of the model can be eliminated in model training.

When data similarity clustering is carried out, configured screening rules are applied to reject and screen the data set, data points which do not accord with the rules are rejected from clustering operation, only high-quality data which accord with the rules are ensured to be used in clustering, and according to the result of reject and screening, data similarity clustering is completed, so that isolated data possibly introducing noise is eliminated, the obtained clustering result is more accurate, the construction of a ridge regression model based on high-quality data is facilitated, and the performance and generalization capability of the model are improved.

Further, the method further comprises:

continuously executing project data monitoring and obtaining, and establishing an updated data set;

replacing time sequence data in the data noise reduction result by the updated data set;

And carrying out parameter test updating of the ridge regression model according to the time sequence data replacement result, and continuously completing project cost prediction according to the updated ridge regression model.

The rules and frequency of project data monitoring are set, including determining monitored data items and monitored time intervals, to ensure timely acquisition of updated information for the project data. Continuously executing project data monitoring, periodically acquiring the latest project data according to set rules and frequencies, and integrating the acquired latest project data into an update data set, wherein the steps comprise adding new data points, updating the numerical values of the existing data points and the like.

And replacing corresponding time sequence data in the noise reduction result according to the latest time sequence data in the updated data set, wherein the replacement mode comprises the steps of directly replacing the latest data, replacing the average value of a sliding window, and the like, and updating the result after the time sequence data replacement into the data noise reduction result, so that the time sequence data of the model is ensured to be carried out based on the latest information, and the accuracy of prediction is improved.

And performing parameter testing of the ridge regression model by using the updated time sequence data, wherein the parameter testing comprises adjustment of regularization parameters, speed constraint parameters and the like so as to optimize the performance of the model, and updating the parameters of the ridge regression model according to the result of the parameter testing, so that the model can be ensured to better fit with the latest time sequence data, and the prediction precision of the model is improved. And the updated ridge regression model is used for completing project cost prediction, so that the consistency between the model and the latest project data can be ensured, and the prediction capability and adaptability of the model are improved.

In summary, the project cost prediction method based on ridge regression machine learning provided by the embodiment of the application has the following technical effects:

1. By establishing a basic data set and executing data noise reduction, the timeliness and noise of the data are effectively processed, and the fitting capacity of the model to actual project data is improved, so that the accuracy of cost prediction is improved;

2. The data distribution and the data quantity analysis are carried out on the data noise reduction result, and the K value is matched and divided, so that the quality of random data division is improved, model training and verification are better facilitated, and the generalization performance of the model is improved;

3. By executing regularization parameter optimization of the ridge regression model, regularization parameters of the ridge regression model are effectively optimized, and the performance and stability of the model are improved;

4. And risk cost deviation compensation is performed through the adaptive optimization network, so that cost prediction is more reliable and practical.

In general, the project cost prediction method based on the ridge regression machine learning achieves the technical effect of improving the accuracy and the reliability of cost prediction by comprehensively considering data time sequence, noise, model parameter optimization and the like, so that the model is more suitable for the change of an actual project, and more accurate cost prediction is provided for project management.

Based on the same inventive concept as the project cost prediction method based on the ridge regression machine learning in the foregoing embodiments, as shown in fig. 2, the present application provides a project cost prediction computer system based on the ridge regression machine learning, the computer system comprising:

The basic data set acquisition module 10 is used for establishing data communication with the project management database, acquiring time sequence data based on a data communication result and establishing a basic data set;

The ridge regression model construction module 20 is used for executing data denoising of the basic data set, generating a data denoising result, and executing ridge regression model construction according to the data denoising result;

The data analysis module 30 is used for carrying out data distribution and data quantity analysis on the data noise reduction result, and dividing the K value according to the analysis result;

The ridge regression model updating module 40 is configured to perform data random division of the data noise reduction result with the division K value, establish K data sets, use K-1 data sets as training sets, use the remaining 1 data as test sets, perform regularization parameter optimization of the ridge regression model, and update the ridge regression model with the optimization result;

the initial prediction result generation module 50 is used for extracting item data of a target item, inputting the item data into the ridge regression model, and predicting item cost to generate an initial prediction result;

And the calibration prediction result generation module 60 is used for performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

Further, the computer system further comprises a calibration prediction result generation module for executing the following operation steps:

Further, the computer system further comprises a risk prediction module to perform the following operation steps:

establishing a main body database of the project execution main body;

Further, the computer system further comprises a data noise reduction result acquisition module, so as to execute the following operation steps:

and taking the second data set as the data noise reduction result.

Further, the computer system further comprises a data smoothing screening module for executing the following operation steps:

Further, the computer system further comprises a clustering result construction module for executing the following operation steps:

Further, the computer system further comprises a project cost prediction module to perform the following operation steps:

From the foregoing detailed description of the project cost prediction method based on ridge regression machine learning, it will be clear to those skilled in the art that the project cost prediction computer system based on ridge regression machine learning in this embodiment is relatively simple to describe for the apparatus disclosed in the embodiments, and the relevant points refer to the description of the method section.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in FIG. 3. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing news data, time attenuation factors and other data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a project cost prediction method based on ridge regression machine learning.

It will be appreciated by those skilled in the art that the structure shown in FIG. 3 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: establishing data communication with a project management database, acquiring data with time sequence based on a data communication result, and establishing a basic data set; performing data denoising of the basic data set, generating a data denoising result, and performing ridge regression model construction according to the data denoising result; carrying out data distribution and data quantity analysis on the data noise reduction result, and matching and dividing a K value according to the analysis result; performing data random division of a data noise reduction result by using the division K values, establishing K data sets, taking K-1 data sets as training sets, taking the rest 1 data set as a test set, performing regularization parameter optimization of a ridge regression model, and updating the ridge regression model through an optimization result; extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost, and generating an initial prediction result; and performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: establishing data communication with a project management database, acquiring data with time sequence based on a data communication result, and establishing a basic data set; performing data denoising of the basic data set, generating a data denoising result, and performing ridge regression model construction according to the data denoising result; carrying out data distribution and data quantity analysis on the data noise reduction result, and matching and dividing a K value according to the analysis result; performing data random division of a data noise reduction result by using the division K values, establishing K data sets, taking K-1 data sets as training sets, taking the rest 1 data set as a test set, performing regularization parameter optimization of a ridge regression model, and updating the ridge regression model through an optimization result; extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost, and generating an initial prediction result; and performing risk cost deviation compensation of the initial prediction result through an adaptive optimization network to generate a calibration prediction result, wherein the adaptive optimization network is a correction network connected with a ridge regression model.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. Project cost prediction method based on ridge regression machine learning, characterized in that the method comprises the following steps:

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 2, wherein the method further comprises:

establishing a main body database of the project execution main body;

4. The method of claim 1, wherein the method further comprises:

and taking the second data set as the data noise reduction result.

5. The method of claim 4, wherein the method further comprises:

6. The method of claim 4, wherein the method further comprises:

7. The method of claim 1, wherein the method further comprises:

8. A ridge regression machine learning based project cost prediction computer system for implementing the ridge regression machine learning based project cost prediction method of any one of claims 1-7, comprising:

The basic data set acquisition module is used for establishing data communication with the project management database, acquiring time sequence data based on a data communication result and establishing a basic data set;

The ridge regression model construction module is used for executing data denoising of the basic data set, generating a data denoising result and executing ridge regression model construction according to the data denoising result;

the data analysis module is used for carrying out data distribution and data quantity analysis on the data noise reduction result and dividing K values according to matching of the analysis result;

The ridge regression model updating module is used for randomly dividing the data of the data noise reduction result by the division K value, establishing K data sets, taking K-1 data sets as training sets and the rest 1 data as test sets, executing regularization parameter optimization of the ridge regression model, and updating the ridge regression model through the optimizing result;

the initial prediction result generation module is used for extracting item data of a target item, inputting the item data into the ridge regression model, predicting item cost and generating an initial prediction result;

and the calibration prediction result generation module is used for executing risk cost deviation compensation of the initial prediction result through the self-adaptive optimization network to generate a calibration prediction result, wherein the self-adaptive optimization network is a correction network connected with the ridge regression model.

9. Computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the ridge regression machine learning based project cost prediction method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the project cost prediction method based on ridge regression machine learning of any one of claims 1 to 7.