CN114048436A

CN114048436A - Construction method and construction device for forecasting enterprise financial data model

Info

Publication number: CN114048436A
Application number: CN202111359394.6A
Authority: CN
Inventors: 冷宇; 孔祥永; 王浩; 袁伟; 蔡明�
Original assignee: Beijing Daokou Jinke Technology Co ltd
Current assignee: Beijing Daokou Jinke Technology Co ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-02-15

Abstract

The invention provides a construction method and a construction device for a forecasting enterprise financial data model, and belongs to the technical field of data information processing. The device comprises a prediction target and original feature input module, a derivative feature generation module, a sample data extraction module, a model input feature determination module, an enterprise financial data prediction model construction module and a prediction module. The method comprises the following steps: determining a prediction target and influencing factors; selecting a sample enterprise, collecting historical financial data of the enterprise, deriving new characteristics, and extracting a characteristic data set; performing relevant analysis on the predicted target and the influence factors to screen model input characteristics; and constructing and training an enterprise financial data model, and predicting by using the trained model. The invention fills the blank of the current enterprise financial data prediction model field, reduces the labor cost, improves the acquisition efficiency of the prediction data, and provides an objective data basis for the subsequent financial management and the generation and management.

Description

Construction method and construction device for forecasting enterprise financial data model

Technical Field

The invention relates to the technical field of artificial intelligence, data mining and data information processing, relates to processing and analysis of enterprise financial data, and particularly relates to a construction method and a construction device of a forecast enterprise financial data model.

Background

Artificial intelligence may be defined as the ability of a computer system to undertake tasks traditionally accomplished by humans on their intelligence. Artificial intelligence covers a wide range of fields, machine learning being one of the branches, which can be defined as designing a series of "algorithms" to automatically make optimal choices through empirical learning with no or limited human intervention. In recent years, the artificial intelligence technology is developed at a high speed, the artificial intelligence technology is widely applied to the financial field, a plurality of factors promoting the development of financial science and technology promote the development of artificial intelligence and machine learning in the financial service field, and a financial institution also has the power to use the artificial intelligence and the machine learning to reduce the cost, manage risks, improve the business level and increase the income.

At present, artificial intelligence is applied to the field of enterprise financial data analysis, and one method is that the health degree of financial indexes of an enterprise is evaluated through financial indexes of multiple dimensions such as enterprise operation capacity, profitability, capital structure, liability risk and the like, and enterprise financial evaluation model scores and model scores of all indexes are output; the other is based on the financial fraud recognition model analysis result, enterprise industry and commerce, judicial and tax data are combined, the financial information credibility of the enterprise is evaluated from a plurality of dimensions such as financial digital characteristics, financial index characteristics, internal control and external control, various financial information credibility scores of the enterprise are output, and the information with the possibility of financial embellishment is comprehensively read; the third application is that the internal subjects of the combined report among the members are counteracted in the aspects of credit granting management, associated risk analysis, associated enterprise cluster management and the like of the group and the member enterprises, and a regression model is established by using the stable subjects and the subjects with counteraction; the field of the model for predicting the enterprise financial data by artificial intelligence is blank at present. As a tool for effectively and quantitatively reflecting the financial and operating conditions of a company, the financial data is naturally an important basis for judging the production and operating conditions of the company and the valuation of the company. Through the prediction of financial data, the method can be used for financial management, production operation guidance and company valuation. The financial data prediction model is established scientifically and credibly, and only some key parameters are set and adjusted, so that the future operation condition can be scientifically determined, and the clear judgment of the investment value is facilitated. Meanwhile, the model can be used for guiding post-delivery management due to the fact that core parameters of future operation are set. Therefore, there is a need to create a predictive model of enterprise financial data.

Disclosure of Invention

Aiming at the requirement of establishing an enterprise financial data prediction model, the invention provides a construction method and a construction device of the enterprise financial data prediction model based on an artificial intelligence (machine learning) technology, which utilize the model to mine historical data of a large number of enterprises to discover implicit rules so as to realize enterprise financial prediction and further provide objective data basis for subsequent financial management, production and management, company valuation and the like.

The construction method of the enterprise financial data forecasting model provided by the invention comprises the following steps:

step 1: determining a prediction target of the enterprise financial data and influence factors of each target;

the forecast target of the enterprise financial data comprises: revenue, total profit, net profit, income tax, total assets, and sales volume.

Step 2: selecting sample enterprises aiming at enterprises of different industries and different scales, collecting historical financial data of the sample enterprises, and preprocessing the data; the pretreatment comprises the following steps: setting labels of industries and scales to which enterprises belong for collected financial data, and dividing data of the same label into a group; and (3) taking all the influence factors obtained in the step (1) as original features, deriving new features according to the original features, and extracting an original feature data set and a derived feature data set.

And step 3: aiming at enterprises of different industries and different scales, a correlation coefficient is calculated for a prediction target vector and each feature of financial data of the enterprises by utilizing regression analysis, feature screening is carried out according to a correlation coefficient threshold value, and a training sample set is generated by an original feature data set and a derived feature data set.

And 4, step 4: aiming at enterprises of different industries and different scales, an enterprise financial data prediction model is constructed by adopting a regression prediction model, the input of the model is the screened characteristic value, and the output is the prediction target of the enterprise financial data;

wherein, the model adopts the following fractional logarithmic loss function L when training_y(y,y^p) To perform model optimization;

wherein, y^pRespectively representing the true value and the predicted value of the sample; y is_i、

Respectively representing the real value and the predicted value of the ith sample; i is 1,2, … n, n is the number of samples; gamma represents a quantile value;

which is indicative of the case of an overestimation,

representing the underestimation case, the fractional logarithmic loss function separates the two cases and assigns different coefficients; the quantile value gamma is set to be different according to the sample data training of enterprises of different industries and different scales.

And 5: aiming at enterprises of different industries and different scales, training the enterprise financial data prediction model by using the training sample sets corresponding to the labels respectively, and predicting the financial data of the target enterprise by using the trained model.

Accordingly, the present invention provides a predictive enterprise financial data model comprising:

the system comprises a forecasting target and original characteristic input module, a data processing module and a data processing module, wherein the forecasting target and original characteristic input module are used for determining a forecasting target of enterprise financial data and influence factors of each target, and the forecasting target of the enterprise financial data comprises business income, total profit, net profit, income tax, total assets and sales volume; the influence factors are original characteristics;

the derived feature generation module is used for deriving new features according to the original features;

the system comprises a sample data extraction module, a database management module and a database management module, wherein the sample data extraction module is used for selecting sample enterprises from enterprises of different industries and different scales, acquiring historical financial data of the sample enterprises, setting labels of the industries and the scales of the enterprises for the acquired financial data, and dividing the data of the same label into a group; generating an original characteristic data set and a derivative characteristic data set according to the original characteristics and the derivative characteristics;

the model input feature determination module is used for calculating a correlation coefficient of a prediction target vector of financial data of an enterprise and each original feature or each derivative feature by utilizing regression analysis aiming at enterprises of different industries and different scales, performing feature screening according to a correlation coefficient threshold value, and generating a training sample set by using an original feature data set and a derivative feature data set;

the enterprise financial data prediction model construction module is used for constructing a regression prediction model aiming at enterprises of different industries and different scales, wherein the input of the model is the characteristic correspondingly output by the model input characteristic determination module, and the output is the prediction target of the enterprise financial data; aiming at enterprises of different industries and different scales, training the model by using a corresponding training sample set, and outputting the trained model;

in training, the following fractional logarithmic loss function L is adopted_y(y,y^p) To perform model optimization;

Respectively representing the true value and the predicted value of the ith sample(ii) a i is 1,2, … n, n is the number of samples; gamma represents a quantile value; training the quantile value gamma into different values according to sample data of enterprises of different industries and different scales;

and the forecasting module forecasts the input financial data of the target enterprise by using the trained model.

Compared with the prior art, the invention has the advantages and positive effects that: the method and the device realize an enterprise financial data prediction model and fill the blank in the field of the current enterprise financial data prediction model. The method and the decoration can predict the financial data of the enterprise through the model, provide objective data for subsequent financial management and generation and management, so that an enterprise manager can guide further investment and production management, provide a more credible acquisition mode for enterprise valuation, and facilitate users to objectively know the financial state of the enterprise. Meanwhile, at present, much labor and time are consumed in general enterprise financial data prediction, the labor cost of the operation is reduced through the model constructed by the method, and the acquisition efficiency of the prediction data is improved.

Drawings

FIG. 1 is a schematic overall flow chart of a method of constructing an enterprise financial data forecasting model of the present invention;

FIG. 2 is a schematic diagram of the construction apparatus of the enterprise financial data forecasting model of the present invention.

Detailed Description

The technical solution of the present invention is described below with reference to the accompanying drawings and examples.

The method provided by the invention has the advantages that the enterprise financial condition is modeled, the implicit rule in the historical data of a large number of enterprises is mined by utilizing the model and is used for predicting the enterprise financial condition, and the predicted financial data result is well suitable for a 'new sample'. The machine learning model to be constructed by the invention can be regarded as a function to be found, the input is sample data, the output is an expected result, the function is very complex, and the model is successfully established when the learned function can complete good financial prediction for an enterprise which is not learned.

As shown in FIG. 1, the method for constructing a forecast enterprise financial data model of the present invention comprises 6 steps, which are described separately below.

Step 1: the prediction target and the influencing factors are determined.

The invention aims to predict the financial data of the enterprise in the next year, including business income, gross profit, net profit, income tax, total assets, sales volume and the like. And taking each target needing to be predicted as a dependent variable, finding the influence factor of the target, and taking the influence factor as an independent variable to realize the purpose of predicting the dependent variable according to the independent variable.

It is necessary to find out the main influencing factors for each predicted target according to market research and data statistics, data description and the like. For example, for the sales volume whose prediction target is the next year, the sales volume Y is used as a dependent variable, and by market research and reference of data, relevant influencing factors, i.e., independent variables, with respect to the prediction target are searched for, and main influencing factors are selected from them.

Step 2: selecting a sample enterprise, collecting historical financial data of the enterprise, and preprocessing the data. The obtained influence factors are used as original characteristic indexes, and the derivation of new indexes can be carried out according to the indexes.

The preprocessing is mainly to clean the collected dirty data, respectively establish different labels for enterprises of different industries and different scales to distinguish the labels into different groups, observe the overall data condition and carry out smooth processing, so as to facilitate model prediction and improve the prediction accuracy.

The enterprise financial information is evaluated mainly through the basic strength of an enterprise, team members, enterprise operation stability, enterprise profit capacity, enterprise growth capacity and the like, so that the main sources of preprocessed data are industrial and commercial change information, illegal complaint information, tax owed and owed information, abnormal operation information, tax declaration information, invoice transaction information, enterprise public opinion information and the like of the enterprise, the data sources are wide, the data structure is complex, the standardization degree is poor, data acquisition, data verification, data cleaning, data conversion, data integration, data warehousing and data quality management are required to be carried out, an original characteristic index data set is formed, reliable support is provided for subsequent index derivation, and deep mining is carried out. In order to improve the prediction accuracy, the method is also based on the original characteristic indexes to carry out derivation, the derivation mode can combine a plurality of original characteristic indexes, and new characteristic indexes are obtained according to a pre-designed mathematical operation formula.

And step 3: and carrying out correlation analysis on the overall prediction target and the influence factors. The influencing factors in the step comprise original characteristic indexes and derivative characteristic indexes. The overall forecast objective refers to a vector of forecast objectives for the enterprise financial data that needs to be forecasted. And screening out model input variables according to the correlation analysis result, and generating a training sample set from the original characteristic data set and the derivative characteristic data set.

The regression analysis is a mathematical statistic analysis process performed on influence factors and prediction targets having a causal relationship. The regression equation established is only meaningful if the two do have some relationship. Therefore, the influence factor as the independent variable is related to whether or not the prediction target as the dependent variable is related to, how much the degree of correlation is, and how much the reliability of the judgment of the degree of correlation is, which becomes a problem to be solved for the regression analysis. The method utilizes regression analysis to carry out correlation analysis on the predicted target and the influence factors, judges the correlation degree of the independent variable and the dependent variable according to the magnitude of the correlation coefficient, and screens the modeling characteristic variable according to the correlation degree threshold. The correlation degree threshold may be empirically set in advance and adjusted depending on the situation when used. Different thresholds may be set for different industries and different sizes of enterprises.

Aiming at enterprises of different industries and different scales, the same influence factor can influence the prediction of the financial data of the enterprises in different degrees, so that the method provided by the invention analyzes and utilizes the corresponding data sets to carry out correlation degree analysis aiming at the enterprises of different industries and different scales, screens out the input characteristics of the model and forms the corresponding training data sets.

And 4, step 4: and constructing an enterprise financial data prediction model.

The enterprise financial data prediction model adopts a regression prediction model. The financial data has great difference because of the invoicing rate of each industry, the accuracy of the model is low by using the same loss function to predict, and meanwhile, the financial data has great variance and non-uniform dimension because of the difference of the scale of each industry and the enterprise size. Therefore, the method is optimized on the basis of the original lightbgm loss function, the original loss function is a mean square error function MAE, as shown in a formula (1), but the problem of dimension of financial data cannot be solved, so that the model prediction accuracy is low, and therefore the loss function is transformed into a fractional logarithm loss function, as shown in a formula (2).

Where n denotes the number of samples, y_iThe true value of the ith sample is represented,

representing a predicted value of the ith sample; y represents the true value of the sample, y^pRepresenting the true value of the sample and gamma representing the quantile value. Modified loss function L_y(y,y^p) Is a piecewise function of y_i＜y_i ^pIs overestimated and

are separated and given different coefficients, respectively. When gamma is>At 0.5, the underestimated loss is greater than the overestimated loss, whereas when γ is greater<At 0.5, the overestimated loss is greater than the underestimated loss; the binary prediction is added on the basis of the mean square error, the original regression prediction is transformed into the interval prediction, the problem of invoicing rates of different industries is solved, meanwhile, the logarithmic smoothing processing is added, and the problem of overlarge dimensional difference of prediction data is optimized.

According to the method, the quantile loss after log transformation is added, so that the loss of overestimation and underestimation is controlled by using different coefficients respectively, and quantile regression is further realized.

The method utilizes enterprise sample data of different industries and different scales to respectively train the regression prediction model to obtain the corresponding enterprise financial data prediction model. According to the method, calculation is carried out according to historical statistical data of independent variables and dependent variables, and on the basis, a lightgbm model, namely a regression prediction model, with an improved loss function is established, so that the generalization capability and accuracy of the model are effectively improved.

And 5: and (4) checking the regression prediction model, calculating a prediction error, and training the regression prediction model.

Whether a regression prediction model is available for actual prediction depends on the examination of the regression prediction model and the calculation of the prediction error. The regression equation can be used as a prediction model for prediction only if the regression equation passes various tests and the prediction error is small.

And training enterprise financial data prediction models of different industries and different scales until the prediction error meets the requirement, and finishing the model training.

Step 6: and predicting the financial data of the target enterprise by using the trained prediction model.

And calculating a predicted value by using the regression prediction model, and comprehensively analyzing the predicted value to determine a final predicted value.

And for the target enterprise, calling a corresponding enterprise financial data prediction model according to the enterprise scale, extracting an input characteristic value required by the model, inputting the prediction model, and obtaining a prediction result. And finally, displaying the output result of the enterprise financial data prediction model, outputting the result in a form of a result table, and simultaneously outputting the deviation analysis result of each index of the predicted enterprise financial data and the real data.

Accordingly, the device for constructing a forecast enterprise financial data model, as shown in fig. 2, includes: the system comprises a prediction target and original feature input module, a derived feature generation module, a sample data extraction module, a model input feature determination module, an enterprise financial data prediction model construction module and a prediction module.

And the prediction target and original characteristic input module is used for determining the prediction target of the enterprise financial data and the influence factors of each target, wherein the influence factors are original characteristics. The forecast goals for the corporate financial data include revenue, total profit, net profit, income tax, total assets, and sales volume.

And the derived feature generation module is used for deriving the new features according to the original features. Description of the specific derivatization method in step 2.

The sample data extraction module selects sample enterprises from enterprises of different industries and different scales, collects historical financial data of the sample enterprises, sets tags of the industries and scales to which the enterprises belong for the collected financial data, and divides the data of the same tags into a group. Different sets of data will construct different sets of training samples. And generating a raw feature data set and a derivative feature data set according to the raw features and the derivative features.

And the model input feature determination module is used for calculating a correlation coefficient of the prediction target vector of the financial data of the enterprise and each original feature or each derivative feature by utilizing regression analysis aiming at enterprises with different industries and different scales, performing feature screening according to a correlation coefficient threshold value, and generating a training sample set by using the original feature data set and the derivative feature data set.

The enterprise financial data prediction model construction module is used for constructing a regression prediction model aiming at enterprises of different industries and different scales, wherein the input of the model is the characteristic correspondingly output by the model input characteristic determination module, and the output is the prediction target of the enterprise financial data; and aiming at enterprises of different industries and different scales, training the model by using the corresponding training sample set and outputting the trained model.

During training, the model optimization is carried out by adopting the fractional logarithmic loss function of the formula (2).

Example (b):

the annual financial report data of a sample enterprise are taken as a prediction target value, and influence factors such as annual invoicing data of the enterprise, derivative invoice data and other characteristic variables are obtained after the correlation analysis, wherein the main characteristics include annual invoicing amount, red rush amount, entry amount, invoicing quantity, electricity consumption amount and the like, and the industry classification and regional characteristics are derived according to the industrial and commercial information. Important characteristic variables are screened according to the measures such as the kini coefficient and the like to form a variable set of the input model, and the financial prediction model is trained on the lightgbm model with the improved loss function. Finally, the four enterprises are predicted by using the trained models, and the prediction results are shown in table 1.

TABLE 1 financial data forecast results for four enterprises (financial data time: 2018 annual fund unit: ten thousand yuan)

Name of an enterprise	Income of business	Gross profit	Income tax	Net profit	Value added tax amount	Total assets
							Anhui-Hui-Co Ltd	54,300.31	4,786.11	717.92	4,068.2	2,243.81	33,897.16
Free of tin	2,875.53	1,341.67	201.25	1,140.42	11.78	1,865.98
							Shaanxi company	2,083.24	-19.65	0	-19.65	0	1,901.09
Fujian company	6,459.69	1,586.98	238.05	1,348.93	0	1,099.85

TABLE 2 comparative analysis of financial data prediction results

In table 2 above, the rows represent the deviation ratios, the first behavior of each predicted target corresponds to the number of deviation ratios, and the second behavior corresponds to the percentage of the number of deviation ratios. The following conclusions can be drawn from table 2:

1. the two indexes of the business income and the profit total amount are that the rate of model prediction fitting consistency (namely the corresponding deviation rate is 10-50%) is about 80% on the premise that the enterprise has tax data;

2. the prediction fitting consistency ratio of the obtained tax and net profit index model is about 70%;

3. the ratio of the predicted fit consistency of the total asset index model is about 60%.

Therefore, the model constructed by the method can effectively predict the financial data of the enterprise so as to provide a data base for subsequent production.

Claims

1. A construction method of a forecast enterprise financial data model is characterized by comprising the following steps:

the forecast target of the enterprise financial data comprises: revenue, total profit, net profit, income tax, total assets, and sales;

step 2: selecting sample enterprises aiming at enterprises of different industries and different scales, collecting historical financial data of the sample enterprises, and preprocessing the data; the pretreatment comprises the following steps: setting labels of industries and scales to which enterprises belong for collected financial data, and dividing data of the same label into a group; taking all the influence factors obtained in the step 1 as original features, deriving new features according to the original features, and extracting an original feature data set and a derived feature data set;

and step 3: aiming at enterprises of different industries and different scales, calculating a correlation coefficient of a prediction target vector and each feature of financial data of the enterprise by using regression analysis, screening the features according to a correlation coefficient threshold value, and generating a training sample set by an original feature data set and a derivative feature data set;

which is indicative of the case of an overestimation,

representing the underestimation case, the fractional logarithmic loss function separates the two cases and assigns different coefficients; training the quantile value gamma into different values according to sample data of enterprises of different industries and different scales;

2. The method of claim 1, wherein in step 2, the collected financial data comprises industry and commerce change information, illegal complaint information, debt and pay information, abnormal operation information, tax declaration information, invoice transaction information and enterprise public opinion information of the enterprise.

3. An apparatus for constructing a predictive enterprise financial data model, comprising:

the sample data extraction module is used for selecting sample enterprises from enterprises of different industries and different scales, acquiring historical financial data of the sample enterprises, setting labels of the industries and the scales of the enterprises, and dividing the data of the same label into a group; generating an original characteristic data set and a derivative characteristic data set according to the original characteristics and the derivative characteristics;

wherein, y^pRespectively generation by generationReal values and predicted values of the table samples; y is_i、

Respectively representing the real value and the predicted value of the ith sample; i is 1,2, … n, n is the number of samples; gamma represents a quantile value; training the quantile value gamma into different values according to sample data of enterprises of different industries and different scales;