CN109299728A

CN109299728A - Federal learning method, system and readable storage medium storing program for executing

Info

Publication number: CN109299728A
Application number: CN201810918868.8A
Authority: CN
Inventors: 马国强; 范涛; 刘洋; 陈天健; 杨强
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2019-02-01
Anticipated expiration: 2038-08-10
Also published as: CN109299728B

Abstract

The invention discloses a kind of federal learning method, system and readable storage medium storing program for executing, federal learning method carries out federal training to multi-party training sample the following steps are included: data terminal is based on gradient decline tree GBDT algorithm, to construct gradient tree-model, wherein, the data terminal be it is multiple, the gradient tree-model includes more regression trees, and the regression tree includes multiple cut-points, the training sample includes multiple features, and the feature and the cut-point correspond；The data terminal is based on the gradient tree-model, treats forecast sample and carries out associated prediction, with the predicted value of determination sample to be predicted.The present invention carries out federal training to multi-party training sample by GBDT algorithm, realizes gradient tree model foundation, by gradient tree-model, is suitable for the sweeping scene of data volume, can meet real production environment needs well；It treats forecast sample and carries out associated prediction, realize the prediction for treating forecast sample.

Description

Federal learning method, system and readable storage medium storing program for executing

Technical field

The present invention relates to big data processing technology fields, more particularly to federal learning method, system and readable storage medium storing program for executing.

Background technique

Currently, it is predominantly stayed in theoretical research and academic paper about the federal machine learning scheme of secret protection, root It is found according to investigation, is limited to Form of Technique and practical application, industry is without relevant technical application at present.

Currently existing secret protection federation Learning Scheme often appears in academic paper, and more in paper is for letter The simple structure method of single algorithm model such as logistic regression or single decision tree decision tree, Such as ID3, C4.5.Deficiency is understood to realistic problem, more rests on theory stage, lacks the thinking to real production environment, It is difficult to be applied directly in industry practical application scene.

Summary of the invention

The main purpose of the present invention is to provide a kind of federal learning method, system and readable storage medium storing program for executing, it is intended to solve The technical issues of solving folk prescription or the corresponding sample training inefficiency of both sides in the prior art.

To achieve the above object, the present invention provides a kind of federal learning method, and federation's learning method includes following step It is rapid:

Data terminal is based on gradient decline tree GBDT algorithm and carries out federal training to multi-party training sample, to construct gradient tree Model, wherein the data terminal be it is multiple, the gradient tree-model includes more regression trees, and the regression tree includes multiple Cut-point, the training sample include multiple features, and the feature and the cut-point correspond；

The data terminal is based on the gradient tree-model, treats forecast sample and carries out associated prediction, to be predicted with determination The predicted value of sample.

Preferably, the multi-party training sample includes that each data terminal is stored with training sample, Ge Gesuo respectively State training sample sample characteristics having the same.

Preferably, each data terminal is based on gradient decline tree GBDT algorithm and carries out federal instruction to multi-party training sample Practice, includes: the step of gradient tree-model to construct

When constructing epicycle regression tree, for the node to be processed of epicycle regression tree, each data terminal is obtained by last round of To first gradient tree-model predicted to obtain the first derivative and second dervative of the local loss function to training sample；

Each data terminal determines the corresponding segmentation point set of all partitioning schemes of the sample characteristics of itself；

Based on each cut-point in the segmentation point set, each data terminal carries out Secure and the first meter is calculated Calculate result；

Each data terminal is led based on the single order that the cut-point of itself and first calculated result obtain being divided into left branch The sum of the sum of first derivative of the sum of the sum of number and second dervative, right branch and second dervative；

Each data terminal to the sum of the sum of first derivative of the left branch and second dervative, the first derivative of right branch it Summation is carried out with the data terminal for executing with the sum of second dervative where being sent to the cut-point after cryptographic operation to summarize, and is obtained Summarized results；

The summarized results is sent to coordination terminal by the data terminal where the cut-point, for the coordination terminal The sum of the sum of the sum of first derivative of left branch and second dervative, the first derivative of right branch are obtained after being decrypted and second order is led The sum of number, the sum of first derivative based on the left branch and the sum of the sum of second dervative, the first derivative of right branch and second order The sum of derivative calculates the corresponding yield value of the cut-point, calculates the best cutting point based on the yield value, and will be described optimal Cut-point is back to corresponding first data terminal of the best cutting point；

When receiving the best cutting point, the best cutting point is sent to the second number by first data terminal It is saved according to terminal, and the node to be processed is divided to obtain two new nodes to be processed, wherein second data Terminal is in each data terminal for saving the data terminal of gradient tree-model.

Preferably, each data terminal obtains being divided into left point based on the cut-point of itself and first calculated result The step of the sum of the sum of the sum of first derivative and second dervative of branch, the first derivative of right branch and the sum of second dervative includes:

Each data terminal is based on removing the segmentation in the cut-point of itself and first calculated result and each data terminal The corresponding local of the 4th data terminal outside the corresponding third data terminal of point is compared to training sample, is obtained first and is compared As a result；

Based on first comparison result, each data terminal obtains being divided into the sum of first derivative of left branch to be led with second order The sum of the sum of first derivative of the sum of number, right branch and second dervative.

Preferably, each data terminal is to the sum of the sum of first derivative of the left branch and second dervative, right branch The data terminal that executes where being sent to the cut-point after cryptographic operation of the sum of the sum of first derivative and second dervative carry out The step of summation summarizes, and obtains summarized results include:

Each data terminal executes cryptographic operation to the sum of the sum of first derivative of the left branch, the first derivative of right branch Obtain the first encrypted result；

Each data terminal executes cryptographic operation to the sum of the sum of second dervative of the left branch, the second dervative of right branch Obtain the second encrypted result；

First encrypted result and the second encrypted result are carried out summation and summarized by each data terminal, obtain summarized results, So that the summarized results is sent to coordination terminal by each data terminal, wherein first encrypted result and the second encrypted result It is decrypted by coordinating the private key that terminal is retained.

Preferably, each data terminal by last round of obtained first gradient tree-model predicted to obtain it is local to Before the step of first derivative and second dervative of the loss function of training sample, federation's learning method further include:

Each data terminal receives the public key that the coordination terminal is sent, so that each data terminal is to the single order of the left branch The sum of the sum of the sum of first derivative of the sum of derivative, right branch, second dervative of the left branch, the second dervative of right branch point It Zhi Hang not cryptographic operation.

Preferably, described when receiving the best cutting point, first data terminal is by the best cutting point It is sent to the preservation of the second data terminal, and the step of obtaining two new nodes to be processed is divided to the node to be processed Later, the federal learning method further include:

When generating regression tree of the new node to be processed to construct gradient tree-model, each data terminal judges that epicycle returns Whether tree reaches leaf condition；

If so, the new node Stop node to be processed division, obtains a regression tree of gradient tree-model, it is no Then, each data terminal is updated described in local sample data entrance to be trained using the new corresponding sample data of node to be processed Each data terminal is predicted the loss function obtained Ben Di to training sample by last round of obtained first gradient tree-model First derivative and second dervative the step of.

Preferably, described to be based on the gradient tree-model, it treats forecast sample and carries out associated prediction, with determination sample to be predicted The step of this predicted value includes:

5th data terminal traverses the corresponding regression tree of the gradient tree-model, wherein the 5th data terminal is each Possess the data terminal of gradient tree-model in data terminal；

5th data terminal by comparing the local sample to be predicted of the 5th data terminal the first data point The second comparison result is obtained with the attribute value of current first traverse node, judges to work as described in entrance based on second comparison result The left subtree or right subtree of preceding first traverse node are based on institute until entering the leaf node of current first traverse node It states leaf node and obtains the first prediction result；

Or；

6th data terminal traverses the corresponding regression tree of the gradient tree-model, wherein the 6th data terminal is each Data terminal in data terminal in addition to the 5th data terminal；

6th data terminal by the attribute value of current second traverse node of the 6th data terminal with it is described The attribute value of current first traverse node of 5th data terminal carries out Secure calculating, obtains the second calculated result, for Attribute value of 5th terminal based on current first traverse node of second comparison of computational results and current second time described The attribute value of joint-running point obtains third comparison result, and the 5th data terminal is based on third comparison result judgement and enters institute The left subtree or right subtree of current first traverse node are stated, until enter the leaf node of current first traverse node, 5th data terminal is based on being sent to the 6th data terminal after the leaf node obtains the second prediction result.

In addition, to achieve the above object, the present invention also provides a kind of system, the system comprises: memory, processor and It is stored in the federal learning program that can be run on the memory and on the processor, federation's learning program is described The step of described in any item federal learning methods among the above are realized when processor executes.

In addition, to achieve the above object, the present invention also provides a kind of readable storage medium storing program for executing, being deposited on the readable storage medium storing program for executing Federal learning program is contained, federation's learning program realizes described in any item federal study among the above when being executed by processor The step of method.

In the present invention, federal training is carried out to multi-party training sample based on gradient decline tree GBDT algorithm, to construct gradient Tree-model realizes the sample training of the corresponding data terminal of multi-party training sample, is suitable for the sweeping scene of data volume, can be with Meet real production environment needs well；Solve the problems, such as folk prescription or the corresponding sample training inefficiency of both sides；Pass through ladder Tree-model is spent, forecast sample is treated and carries out associated prediction, obtains the predicted value of sample to be predicted, the pre- of forecast sample is treated in realization It surveys.The present invention carries out federal training to multi-party training sample by GBDT algorithm, realizes gradient tree model foundation, passes through gradient tree Model treats forecast sample and carries out associated prediction, realizes the prediction for treating forecast sample.

Detailed description of the invention

Fig. 1 is the system hardware structure schematic diagram that the embodiment of the present invention is related to；

Fig. 2 is the flow diagram of the federal learning method first embodiment of the present invention；

Fig. 3 is the flow diagram of the federal learning method second embodiment of the present invention；

Fig. 4 is the flow diagram in the federal learning method second embodiment of the present invention；

Fig. 5 is the flow diagram of the federal learning method 3rd embodiment of the present invention；

Fig. 6 is the flow diagram of the federal learning method fourth embodiment of the present invention；

Fig. 7 is the flow diagram of the present invention the 5th embodiment of federal learning method；

Fig. 8 is the flow diagram of the present invention the 5th embodiment of federal learning method.

The object of the invention is realized, the embodiments will be further described with reference to the accompanying drawings for functional characteristics and advantage.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

As shown in Figure 1, Fig. 1 is the system structure diagram for the hardware running environment that the embodiment of the present invention is related to.

As shown in Figure 1, the system may include: processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these components. User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include that the wired of standard connects Mouth, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor 1001 storage device.

Optionally, system can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, audio Circuit, WiFi module etc..Certainly, system can also configure gyroscope, barometer, hygrometer, thermometer, infrared sensor etc. Other sensors, details are not described herein.

It will be understood by those skilled in the art that the restriction of the not structure paired systems of system structure shown in Fig. 1, can wrap It includes than illustrating more or fewer components, perhaps combines certain components or different component layouts.

As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, user interface section and federal learning program.

In the system shown in figure 1, network interface 1004 is mainly used for connecting background apparatus, is counted with background server According to communication；User interface 1003 is mainly used for connecting client (user terminal), carries out data communication with client；And processor 1001 can be used for calling the federal learning program stored in memory 1005, and execute following operation:

Further, processor 1001 can call the federal learning program stored in memory 1005, also execute following Operation:

Further, processor 1001 can call the federal learning program stored in 1005, also execute following operation:

Or；

It is the flow diagram of the federal learning method first embodiment of the present invention referring to Fig. 2, Fig. 2.

In the first embodiment, federal learning method includes:

Step S10, data terminal is based on gradient decline tree GBDT algorithm and carries out federal training to multi-party training sample, with structure Build gradient tree-model, wherein the data terminal be it is multiple, the gradient tree-model includes more regression trees, the regression tree Including multiple cut-points, the training sample includes multiple features, and the feature and the cut-point correspond.

GBDT full name gradient decline tree is to the best several of true fitting of distribution inside conventional machines learning algorithm One of algorithm, before several years ago deep learning is propagated its belief on a large scale not yet, GBDT is to yield unusually brilliant results in various contests.Reason is general Have it is several, first is that effect is really quite well.Second is that can be used to classify can be used for returning, third is that feature can be screened. GBDT (Gradient Boosting Decision Tree, the decision tree of iteration), is a kind of decision Tree algorithms of iteration, should Algorithm is made of more decision trees, and the conclusion of all trees, which adds up, does final result.It at the beginning of being suggested just and SVM together It is considered as generalization ability (generalization) stronger algorithm.In recent years more because being used to search for the engineering of sequence It practises model and causes everybody concern.

In the present embodiment, tree GBDT algorithm is declined using gradient, federal training is carried out to multi-party training sample, to construct ladder Spend tree-model.GBDT is mainly made of three concepts: Regression Decision Tree, i.e. DT, Gradient Boosting, i.e. GB, Shrinkage, the important evolution branch of one of algorithm, major part source code all presses version realization at present. Decision tree (DT, Decision Tree), is commonly referred to as regression tree in GBDT algorithm, passes through the gradient of federal training building Tree-model includes more regression trees, and a cut-point of regression tree corresponds to a feature of training sample.

By carrying out federal training to multi-party training sample simultaneously, training effectiveness is effectively improved, data volume scale is suitable for Big scene can be very good to meet real production environment needs.

Step S20, the data terminal are based on the gradient tree-model, treat forecast sample and carry out associated prediction, with true The predicted value of fixed sample to be predicted.

In the present embodiment, data terminal is based on gradient tree-model, treats forecast sample and carries out associated prediction, to determine to pre- The predicted value of test sample sheet.Input multi-party sample characteristics X_ownerAnd sample class label Y_owner, and X_{Owner_o=}{[x_i,1, x_{I, 2}... x_i,dim], i=1 ... N, N X_{owner_o}Sample number }, wherein dim is sample characteristics dimension size, each side sample characteristics dimension dim Equal, each characteristic dimension meaning is consistent, such as [loan value, loan duration, debt situation].It is special to multi-party sample based on GBDT algorithm Levy X_ownerAnd sample class label Y_ownerAfter carrying out federal training, gradient tree-model is obtained, by gradient tree-model, is treated pre- Test sample this progress associated prediction, so that it is determined that the predicted value of sample to be predicted.

Further, the multi-party training sample includes that each data terminal is stored with training sample respectively, each The training sample sample characteristics having the same.

In the present embodiment, multi-party training sample includes the corresponding training sample of multiple data terminals, each training sample Sample characteristics having the same are simultaneously stored in each data terminal local.Since data terminal has multiple, the data of each data terminal By transversally cutting, i.e. every sample characteristics are completely existed in a copy of it data, and there is only in a copy of it data, Cutting several pieces save parallel convenient for subsequent training.

As shown in following table one, the data that data terminal X includes are as shown in Table 1:

ID certificate number/telephone number	Age
		X1	10
X2	20
		X3	30
X4	40
		X5	50

Table one: data terminal X

It is 3 parts that table one, which is carried out transversally cutting, obtains the training sample A as shown in following table two

ID certificate number/telephone number	Age
		X1	10

Table two: training sample A

ID certificate number/telephone number	Age
		X2	20
X3	30

Table three: training sample B

ID certificate number/telephone number	Age
		X4	40
X5	50

Table four: training sample C

As shown in table two, table three and table four, transversally cutting is carried out into three parts by the data possessed data terminal X, is obtained To training sample A, training sample B and training sample C, three training sample sample characteristics having the same, and it is maintained in number According in terminal X.

Based on first embodiment, the second embodiment of the federal learning method of the present invention, as shown in Figure 3-4, step S10 are proposed Include:

Step S11, when constructing epicycle regression tree, for the node to be processed of epicycle regression tree, each data terminal passes through Last round of obtained first gradient tree-model is predicted to obtain the first derivative and two of the local loss function to training sample Order derivative；

Step S12, each data terminal determine the corresponding segmentation point set of all partitioning schemes of the sample characteristics of itself；

Step S13, based on each cut-point in the segmentation point set, each data terminal carries out Secure and calculates To the first calculated result；

Step S14, each data terminal obtain being divided into left branch based on the cut-point of itself and first calculated result The sum of the sum of first derivative and second dervative, the sum of the sum of the first derivative of right branch and second dervative；

Step S15, each data terminal to the sum of the sum of first derivative of the left branch and second dervative, right branch one Data terminal where being sent to the cut-point after the sum of the sum of order derivative and second dervative execution cryptographic operation is summed Summarize, obtains summarized results；

Step S16, the summarized results is sent to coordination terminal by the data terminal where the cut-point, for described Coordinate to obtain the sum of the sum of the sum of first derivative of left branch and second dervative, the first derivative of right branch after terminal is decrypted And the sum of second dervative, the sum of first derivative based on the left branch and the sum of second dervative, the first derivative of right branch it With and the sum of second dervative calculate the corresponding yield value of the cut-point, the best cutting point is calculated based on the yield value, and will The best cutting point is back to corresponding first data terminal of the best cutting point；

Step S17, when receiving the best cutting point, first data terminal sends the best cutting point It is saved to the second data terminal, and the node to be processed is divided to obtain two new nodes to be processed, wherein is described Second data terminal is in each data terminal for saving the data terminal of gradient tree-model.

In the present embodiment, when constructing epicycle regression tree, for the node to be processed of epicycle regression tree, each data terminal is logical Cross last round of obtained first gradient tree-model predicted to obtain the local loss function to training sample first derivative and Second dervative.It is the t regression tree for epicycle regression tree, last round of obtained first gradient tree-model is the t-1 tree.If T=1 then is predicted to carry out by first gradient tree-model itself pre- by last round of obtained first gradient tree-model It surveys.

Based on each cut-point in the segmentation point set, each data terminal carries out Secure and the first meter is calculated Calculate result.Secure is calculated can be solved using Yao Shi circuit.Yao Shi millionaires' problem is the typical case that Secure calculates Millionaires' problem is converted to set intersection problem by using 0 coding and 1 coding by problem, is proposed a kind of based on commutative The millionaires' problem efficient solutions of encryption function, and Security Proof has been carried out, the program is without complicated module exponent Operation, encryption and decryption operation are O (n), and theory of communication number is 4.Certainly, other modes can also be used in the Secure calculating of this case, and It is not limited to the mode of Yao Shi circuit, as long as can be carried out Secure calculating.

It is specific as shown in figure 4, model owning side H is equivalent to the second data terminal in the present embodiment, data owning side X1 and Data owning side X2 is equivalent to each data terminal, and coordination side C is equivalent to coordination terminal.For each tree node node to be processed, If not up to stopping design conditions, i.e., under the conditions of leaf, the second data terminal such as model owning side H is for each column feature, choosing Several cut-points out, while all data owning sides i.e. data owning side X1 and data owning side X2 being notified to prepare about itself Each column Image Segmentation Methods Based on Features point Candidate Set of data.Other are notified for each (feature, dividing candidate point) P, cut-point P place side X Data side carries out Secure calculating (assuming that there is other N number of data participants, N number of participant both participates in the calculating of P), by meter After calculation, other data roots according to and P the first comparison result, obtain the first derivative for being divided into left and right branch and G, second order Encrypted result [[G]], [[H]] are sent to model owning side H summation and summarized by derivative and H, all sides, obtain (P, Sum ([[G]], Sum ([[H]]).After result etc. all candidate point P calculates, each dividing candidate point place side will be all (Sum [[G]], Sum [[H]]) is sent to coordination side C decryption, and carries out optimum segmentation value selection, and coordination side C selects optimal point It is after cutting value S, the value is corresponding (Sum [[G]), Sum [[H]]] partition value S place side Y is returned to, Y is data owning side herein The corresponding P of S is sent to model owning side H by X1, Y, while increasing by two new node Node1, Node2 to be processed.

The calculation formula of yield value is as follows:

Wherein, G_ainFor yield value, G_LFor the first derivative of left branch, G_RFor the first derivative of right branch, H_LFor left branch Second dervative, H_RFor the second dervative of right branch, γ is the complexity cost that new node to be processed is added and introduces.

By taking table one to four as an example, when constructing t regression tree of gradient tree, wherein t=1,2,3 ... N, first each Data terminal, calculates the first derivative and second dervative of the respectively local loss function to training sample, and the above table one to four is Example, obtained gradient are A (g_Ah_A)、B(g_B1h_B1)、A(g_B2h_B2)、C(g_C1h_C1)、C(g_C2h_C2), then, in each data terminal, really The corresponding segmentation point set of all partitioning schemes of the sample characteristics of fixed each data terminal for the above table one to table four, obtains Cut-point P_A、P_BAnd P_c, such as assume P_A(age≤10), P_B(age≤20), P_C(age≤40)；In each data terminal, notice Other data terminal combined calculations respectively divide the sum of the first derivative of the corresponding left branch of all cut-points in point set G_LWith The sum of second dervative H_L, the sum of the sum of first derivative and the second dervative of right branch；It, will be described by public key in each data terminal The sum of the first derivative of left branch G_LWith the sum of second dervative H_L, right branch the sum of first derivative G_RWith the sum of second dervative H_R It is sent to model owning side H after encrypting respectively and sum and summarizes, obtains summarized results (P, Sum ([[G]]), Sum ([[H]])) It is sent to coordination terminal；Coordinate terminal all Sum ([[G]]), Sum ([[H]]) are decrypted by private key, obtains left point The sum of first derivative of branch and the sum of the sum of second dervative, the first derivative of right branch and the sum of second dervative, and according to above-mentioned Formula calculates the corresponding yield value G of each cut-point_ain.According to yield value G_ain, coordinate terminal and calculate cut-point P_A、P_BAnd P_c In the best cutting point, as the best cutting point be P_B, then by the best cutting point P_BIt is back to P_BIn the training sample B at place, instruction Data terminal where practicing sample B is data terminal X；After the data terminal receives the best cutting point, it is sent to First data terminal saves, and is divided to obtain two new nodes to be processed to the node to be processed, wherein described the One data terminal is in each data terminal for saving the data terminal of gradient tree-model.

Based on second embodiment, the 3rd embodiment of the federal learning method of the present invention is proposed, as shown in figure 5, step S14 packet It includes:

Step S141, each data terminal in the cut-point of itself and first calculated result and each data terminal based on removing The corresponding local of the 4th data terminal outside the corresponding third data terminal of the cut-point is compared to training sample, is obtained First comparison result；

Step S142, is based on first comparison result, each data terminal obtain being divided into left branch first derivative it With the sum of the sum of first derivative with the sum of second dervative, right branch and second dervative.

In the present embodiment, the data terminal where each cut-point notifies other data terminals to carry out Secure calculating, Carried out after the first calculated result is calculated in Secure in each data terminal, each data terminal based on itself cut-point and First calculated result and the 4th data terminal in each data terminal in addition to the corresponding third data terminal of the cut-point Corresponding local is compared to training sample, obtains the first comparison result；Based on first comparison result, each data terminal Obtain being divided into the sum of the sum of the sum of first derivative of left branch and second dervative, the first derivative of right branch and second dervative it With.

Based on first calculated result, is recorded as such as this dimension of monthly pay, there are three data terminal X1, be 1000,2000,3000；There are three records for the same characteristic dimension of data terminal X2, are 2000,3000,4000, it is assumed that number Cut-point according to terminal X1 is 1500 and 2500, and the cut-point of data terminal X2 is 2500 and 3500, then for data terminal For X1, he wish to know in all monthly pay data (union of X1 and X2 and A)<1500 and>1500 and, X1's 2500, the 3500 of 2500, X2 are similarly.It is specifically relatively three records 1000,2000 of cut-point 1500 and data terminal X1, The union that three of 3000 and data terminal X2 records 2000,3000,4000 are formed be (1000,2000,3000,2000, 3000,4000) be compared, union be (1000,2000,3000,2000,3000,4000) in find out < 1500 and >1500 all records, it is evident that herein<1500 be (1000) to be (2000,3000,4000), also, data greater than 1500 Terminal X2 gives data terminal X1 calculating<the cost derivative of 1500 sums>1500, and<the cost derivative of 1500 sums>1500 Ciphertext is sent to data terminal X1 by public key.

Due to data terminal X2 to data terminal X1 send be cost derivative ciphertext, by public key encryption, therefore, number The physical record in data terminal X2 will not be obtained according to terminal X1, therefore, realizes the secret protection of data terminal X2.

Based on 3rd embodiment, the fourth embodiment of the federal learning method of the present invention is proposed, as shown in fig. 6, step S15 packet It includes:

Step S151, each data terminal hold the sum of the sum of first derivative of the left branch, the first derivative of right branch Row cryptographic operation obtains the first encrypted result；

Step S152, each data terminal hold the sum of the sum of second dervative of the left branch, the second dervative of right branch Row cryptographic operation obtains the second encrypted result；

First encrypted result and the second encrypted result are carried out summation and summarized, obtained by step S153, each data terminal Summarized results, so that the summarized results is sent to coordination terminal by each data terminal, wherein first encrypted result and Two encrypted results are decrypted by coordinating the private key that terminal is retained.

Each data terminal to the sum of the sum of first derivative of the left branch and second dervative, the first derivative of right branch it Summation is carried out with the data terminal for executing with the sum of second dervative where being sent to the cut-point after cryptographic operation to summarize, and is obtained Summarized results specifically: each data terminal executes the sum of the sum of first derivative of the left branch, the first derivative of right branch Cryptographic operation obtains the first encrypted result [[G]]；Each data terminal to the sum of second dervative of the left branch, right branch two The sum of order derivative executes cryptographic operation and obtains the second encrypted result [[H]]；In each data terminal, by first encrypted result [[G]] and the second encrypted result [[H]] are sent to coordination terminal.Each data terminal is by the encrypted result of G, H [[G]], [[H]] It is sent to other data terminals and sum and summarize, obtain (P, Sum ([[G]], Sum ([[H]]).By Sum ([[G]], Sum ([[H]] is sent to coordination terminal, coordinates terminal and passes through the private key retained to the first encrypted result [[G]] and the second encrypted result [[H]] is decrypted.

The public key sent by coordinating terminal so that each data terminal encrypted to obtain the first encrypted result and second plus It is close to be decrypted when as a result, decrypting by the private key of coordination side, guarantee that data will not be revealed between each data terminal.

Further, before step S10, federation's learning method further include: each data terminal receives the coordination Terminal send public key, for each data terminal to the sum of the sum of first derivative of the left branch, the first derivative of right branch, The sum of the sum of second dervative of the left branch, second dervative of right branch execute cryptographic operation respectively.

Public key is sent to each data terminal by coordinating terminal, so that each data terminal is to the first derivative of the left branch The sum of, the sum of the sum of the first derivative of right branch, the second dervative of the left branch, the sum of the second dervative of right branch hold respectively Row cryptographic operation obtains the first encrypted result and the second encrypted result, realizes and coordinates terminal in Coordination Treatment to each data end The data at end maintain secrecy.

Based on second embodiment, propose the 5th embodiment of the federal learning method of the present invention, as shown in fig. 7, step S17 it Afterwards, federal learning method further include:

Step S18, when generating regression tree of the new node to be processed to construct gradient tree-model, each data terminal judgement Whether epicycle regression tree reaches leaf condition；

Step S191, if so, the new node Stop node to be processed division, one for obtaining gradient tree-model return Gui Shu；

Step S192, if it is not, each data terminal is updated locally using the new corresponding sample data of node to be processed wait instruct Experienced sample data enters step S11.

After the data terminal receives the best cutting point, it is sent to the preservation of the first data terminal, and to described Node to be processed is divided to obtain two new nodes to be processed, is realized the processing of a node, is being handled a section After point, when generating regression tree of the new node to construct gradient tree-model, judge whether epicycle regression tree reaches leaf condition, If so, Stop node divides, a regression tree of gradient tree-model is obtained, otherwise, uses the corresponding sample data of new node It updates local sample data to be trained and enters next round node split.In general, a regression tree has multiple leaf nodes, With the growth of regression tree, when regression tree branch, handled sample size was constantly reduced, and regression tree is to data totality pearl generation Table degree is constantly declining, and when carrying out branch to root node, it is then processing that processing, which is whole samples, then branch down, The sample under grouping under different grouping.In order to avoid over-fitting, we stop division under conditions of n omicronn-leaf, otherwise, will be after It is continuous to carry out next round node split.

Based on first embodiment, the sixth embodiment of the federal learning method of the present invention is proposed, step S20 includes:

Or；

For the 5th data terminal: it is local directly directly to be predicted according to common GBDT rule, that is, traverse the process of tree.Specifically Are as follows: the 5th data terminal is by comparing the first data point of the local sample to be predicted of the 5th data terminal and current The attribute value of first traverse node obtains the second comparison result, enters described current first based on second comparison result judgement The left subtree or right subtree of traverse node are based on the leaf until entering the leaf node of current first traverse node Node obtains the first prediction result.For example, the node N, N of current traversal tree are (monthly pay < 1500), then if monthly pay 2000, The right subtree of node N is entered, otherwise enters left subtree, gets to leaf node, the first prediction result can be obtained.

For the 6th data terminal of the data terminal in addition to the 5th data terminal, the 6th data terminal traverses institute The corresponding regression tree of gradient tree-model is stated, the 6th data terminal is saved by current second traversal of the 6th data terminal The attribute value of current first traverse node of the attribute value and the 5th data terminal of point carries out Secure calculating, obtains the Two calculated results, attribute value for the 5th terminal based on current first traverse node of second comparison of computational results and The attribute value of current second traverse node obtains third comparison result, and the 5th data terminal is based on the third and compares As a result judgement enters the left subtree or right subtree of current first traverse node, until entering current first traverse node Leaf node, the 5th data terminal be based on the leaf node obtain the second prediction result after be sent to the 6th data Terminal.

For the 6th data terminal in addition to the 5th data terminal, needs to be implemented n times and predict process, N GBDT as follows The quantity of middle decision tree.For each decision tree, it is assumed that be currently located at point node (root node for being initially positioned in tree), B will be pre- The node node of measured data each (feature name, characteristic value) and model owning side H carry out Secure calculating, in not leak data While, A obtains (feature name, the binary values comparison result with node), selects next branch node node_ of node Present node node is updated to node_child by child, A, is repeated this process until going to leaf node, is obtained predicted value.

Such as: present node is monthly pay < 1500, it is assumed that there is 2 attribute values, one is monthly pay, and one is the age, It when one wheel compares, needs to compare monthly pay and age, in order to maintain secrecy, the 5th data terminal does not need to inform the 6th data terminal Specific age data and monthly pay data, it is assumed that subtree is the age, can set a false monthly pay data, and there are one true Real age data, then, the 5th data terminal is all compared against two data, obtains third comparison result, based on the The judgement of three comparison results enters the left subtree or right subtree of the present node, until entering the leaf section of the present node Point is based on being sent to the 6th data terminal after the leaf node obtains the second prediction result in the 5th data terminal, this When, what the 6th data terminal obtained is only the second prediction result, it can not learn other any data of the 5th data terminal, from And realize the secrecy of prediction process.

Referring to Fig. 8, Fig. 8 is the flow diagram of the fifth embodiment of the present invention, model owning side H is equivalent to the 5th data Terminal locally predicts that just traverses the node of oneself if it is model owning side；If being non-model owning side prediction, that is just After calculating with model owning side's Secure, the node of model owning side, also, the data confidentiality of model owning side are traversed, only The second prediction result is only sent to other data sides.

In addition, the embodiment of the present invention also proposes a kind of readable storage medium storing program for executing, federation is stored on the readable storage medium storing program for executing The step of learning program, federation's learning program realizes federal learning method as described above when being executed by processor.

The specific embodiment of readable storage medium storing program for executing of the present invention and each embodiment of above-mentioned federal learning method are essentially identical, This will not be repeated here.

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that appliance arrangement (it can be mobile phone, Computer, device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. it is a kind of federation learning method, which is characterized in that it is described federation learning method the following steps are included:

Data terminal is based on gradient decline tree GBDT algorithm and carries out federal training to multi-party training sample, to construct gradient tree mould Type, wherein the data terminal be it is multiple, the gradient tree-model includes more regression trees, and the regression tree includes multiple points Cutpoint, the training sample include multiple features, and the feature and the cut-point correspond；

The data terminal is based on the gradient tree-model, treats forecast sample and carries out associated prediction, with determination sample to be predicted Predicted value.

2. federation's learning method as described in claim 1, which is characterized in that the multi-party training sample includes each number It is stored with training sample, each training sample sample characteristics having the same respectively according to terminal.

3. federation's learning method as claimed in claim 2, which is characterized in that each data terminal is based on gradient decline tree GBDT algorithm carries out federal training to multi-party training sample, includes: the step of gradient tree-model to construct

When constructing epicycle regression tree, for the node to be processed of epicycle regression tree, each data terminal is obtained by last round of First gradient tree-model is predicted to obtain the first derivative and second dervative of the local loss function to training sample；

Based on each cut-point in the segmentation point set, each data terminal carries out Secure and the first calculating knot is calculated Fruit；

Each data terminal based on the cut-point of itself and first calculated result obtain being divided into left branch first derivative it With the sum of the sum of first derivative with the sum of second dervative, right branch and second dervative；

Each data terminal to the sum of the sum of the sum of first derivative of the left branch and second dervative, the first derivative of right branch with Data terminal where being sent to the cut-point after the sum of second dervative execution cryptographic operation carries out summation and summarizes, and is summarized As a result；

The summarized results is sent to coordination terminal by the data terminal where the cut-point, for coordination terminal progress Obtained after decryption the sum of the sum of the sum of first derivative of left branch and second dervative, the first derivative of right branch and second dervative it With, the sum of first derivative based on the left branch and the sum of the sum of second dervative, the first derivative of right branch and second dervative The sum of calculate the corresponding yield value of the cut-point, the best cutting point is calculated based on the yield value, and by the optimum segmentation Point is back to corresponding first data terminal of the best cutting point；

When receiving the best cutting point, the best cutting point is sent to the second data end by first data terminal End saves, and is divided to obtain two new nodes to be processed to the node to be processed, wherein second data terminal For the data terminal for being used to save gradient tree-model in each data terminal.

4. federation's learning method as claimed in claim 3, which is characterized in that cut-point of each data terminal based on itself And first calculated result obtains being divided into the sum of the sum of first derivative of left branch and second dervative, the single order of right branch is led The step of the sum of number and the sum of second dervative includes:

Each data terminal is based on removing the cut-point pair in the cut-point of itself and first calculated result and each data terminal The corresponding local of the 4th data terminal outside the third data terminal answered is compared to training sample, is obtained first and is compared knot Fruit；

Based on first comparison result, each data terminal obtain being divided into the sum of first derivative of left branch and second dervative it With the sum of the sum of the first derivative of right branch and second dervative.

5. federation's learning method as claimed in claim 4, which is characterized in that each data terminal to the left branch one The sum of the sum of first derivative of the sum of the sum of order derivative and second dervative, right branch and second dervative are sent after executing cryptographic operation To the data terminal where the cut-point carry out summation summarize, obtain summarized results the step of include:

Each data terminal executes cryptographic operation to the sum of the sum of first derivative of the left branch, the first derivative of right branch and obtains First encrypted result；

Each data terminal executes cryptographic operation to the sum of the sum of second dervative of the left branch, the second dervative of right branch and obtains Second encrypted result；

First encrypted result and the second encrypted result are carried out summation and summarized by each data terminal, obtain summarized results, for The summarized results is sent to coordination terminal by each data terminal, wherein first encrypted result and the second encrypted result pass through Coordinate the private key that terminal is retained to be decrypted.

6. federation's learning method as claimed in claim 5, which is characterized in that each data terminal is obtained by last round of First gradient tree-model is predicted the step of obtaining local first derivative and second dervative to the loss function of training sample Before, the federal learning method further include:

Each data terminal receives the public key that the coordination terminal is sent, so that each data terminal is to the first derivative of the left branch The sum of, the sum of the sum of the first derivative of right branch, the second dervative of the left branch, the sum of the second dervative of right branch hold respectively Row cryptographic operation.

7. federation's learning method as claimed in claim 3, which is characterized in that it is described when receiving the best cutting point, The best cutting point is sent to the second data terminal and saved by first data terminal, and is carried out to the node to be processed After division obtains the step of two new nodes to be processed, federation's learning method further include:

When generating regression tree of the new node to be processed to construct gradient tree-model, each data terminal judges that epicycle regression tree is It is no to reach leaf condition；

If so, the new node Stop node to be processed division, obtains a regression tree of gradient tree-model, otherwise, respectively Data terminal updates local sample data to be trained using the new corresponding sample data of node to be processed and enters each number It is predicted to obtain the one of the local loss function to training sample by last round of obtained first gradient tree-model according to terminal The step of order derivative and second dervative.

8. federation's learning method as described in claim 1, which is characterized in that it is described to be based on the gradient tree-model, it treats pre- Test sample this progress associated prediction includes: with the step of predicted value of determination sample to be predicted

5th data terminal traverses the corresponding regression tree of the gradient tree-model, wherein the 5th data terminal is each data Possess the data terminal of gradient tree-model in terminal；

5th data terminal by comparing the local sample to be predicted of the 5th data terminal the first data point with work as The attribute value of preceding first traverse node obtains the second comparison result, enters described current the based on second comparison result judgement The left subtree or right subtree of one traverse node are based on the leaf until entering the leaf node of current first traverse node Child node obtains the first prediction result；

Or；

6th data terminal traverses the corresponding regression tree of the gradient tree-model, wherein the 6th data terminal is each data Data terminal in terminal in addition to the 5th data terminal；

The attribute value and the described 5th that 6th data terminal passes through current second traverse node of the 6th data terminal The attribute value of current first traverse node of data terminal carries out Secure calculating, the second calculated result is obtained, for described Attribute value and the current second traversal section of 5th terminal based on current first traverse node of second comparison of computational results The attribute value of point obtains third comparison result, and the 5th data terminal is based on working as described in third comparison result judgement entrance The left subtree or right subtree of preceding first traverse node, until entering the leaf node of current first traverse node, described 5th data terminal is based on being sent to the 6th data terminal after the leaf node obtains the second prediction result.

9. a kind of system, which is characterized in that the system comprises: it memory, processor and is stored on the memory and can The federal learning program run on the processor, federation's learning program realize such as right when being executed by the processor It is required that the step of federal learning method described in any one of 1 to 8.

10. a kind of readable storage medium storing program for executing, which is characterized in that federal learning program is stored on the readable storage medium storing program for executing, it is described It is realized when federal learning program is executed by processor such as the step of federal learning method described in any item of the claim 1 to 8.