Nothing Special   »   [go: up one dir, main page]

CN114841361A - Model training method and related equipment thereof - Google Patents

Model training method and related equipment thereof Download PDF

Info

Publication number
CN114841361A
CN114841361A CN202210304574.2A CN202210304574A CN114841361A CN 114841361 A CN114841361 A CN 114841361A CN 202210304574 A CN202210304574 A CN 202210304574A CN 114841361 A CN114841361 A CN 114841361A
Authority
CN
China
Prior art keywords
model
trained
neurons
layer
neuron
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210304574.2A
Other languages
Chinese (zh)
Inventor
詹德川
李新春
邵云峰
李秉帅
李银川
宋绍铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Huawei Technologies Co Ltd
Original Assignee
Nanjing University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University, Huawei Technologies Co Ltd filed Critical Nanjing University
Priority to CN202210304574.2A priority Critical patent/CN114841361A/en
Publication of CN114841361A publication Critical patent/CN114841361A/en
Priority to PCT/CN2023/082679 priority patent/WO2023185541A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a model training method and relevant equipment thereof, which are applied to the technical field of artificial intelligence and can enable a trained model obtained by client-side and server-side joint training to have sufficient excellent functions. The method of the present application comprises: when the server needs to obtain the neural network model with the data processing function, the model to be trained can be issued to the client, and each neuron in a plurality of neurons in the model to be trained has a parameter and a position code. After receiving the model to be trained, the client can use local data stored by the client as training data and input the training data into the model to be trained so as to process the training data through parameters and position codes of a plurality of neurons in the model to be trained, thereby realizing the update of the model to be trained and obtaining the updated model. And then, the client can send the updated model to the server, so that the server can aggregate the updated model uploaded by the client, and the trained model is obtained.

Description

Model training method and related equipment thereof
Technical Field
The embodiment of the application relates to the technical field of Artificial Intelligence (AI), in particular to a model training method and relevant equipment thereof.
Background
With the continuously enhanced data security awareness of the user and the occurrence of data security problems such as frequent leakage of personal privacy data of the user, the user continuously improves the protection of the data related to personal privacy, and a new challenge is provided for model training of the AI technology. Therefore, the mode of federal learning (fed learning) training is in force.
The federal learning system generally comprises a server and a plurality of clients, wherein when model training is carried out, the server firstly sends a model to be trained to each client. After receiving the model to be trained, each client device trains the model to be trained by using the training data stored locally to obtain an updated model. Then, each client can upload the updated model to the server. And finally, the server side aggregates the updated models uploaded by the clients to obtain the trained models.
Because different clients use different training data for the same model to be trained and are affected by rearrangement invariance of the neural network model (after the positions of some neurons with different functions in the model are exchanged, the output of the model is not changed), compared with the distribution of the neurons in the model to be trained, some neurons with different functions in the updated model obtained by some clients have position changes. It can be seen that, in the updated model obtained by each client, the neurons with the same function are not all located at the same position, but when the server performs aggregation, the neurons in each updated model are processed according to the position, which results in that the obtained trained model cannot have a sufficiently excellent function.
Disclosure of Invention
The embodiment of the application provides a model training method and related equipment thereof, so that a trained model obtained by joint training of a client and a server has sufficiently excellent functions.
A first aspect of an embodiment of the present application provides a model training method, including:
when the model training is needed, the model to be trained can be obtained first. The model to be trained includes a plurality of neurons, and each neuron is associated with a parameter (i.e., the aforementioned parameter information) and a Position Encoding (PE) (i.e., the aforementioned position encoding information), that is, each neuron has a parameter and a position encoding. In the plurality of neurons, different neurons have different position codes, and for any one neuron, the position code of the neuron can be used for indicating the position of the neuron in the model to be trained. For example, of the two middle layers of the model to be trained, layer 2 contains 3 neurons and layer 3 contains 4 neurons, and the position codes of these 7 neurons may be 1, 2, 3, 4, 5, 6, and 7 in order.
After the model to be trained is obtained, the training data can be processed through the parameters of the neurons in the model to be trained and the position codes of the neurons in the model to be trained so as to update the model to be trained, thereby obtaining an updated model, and the updated model is sent outwards so as to realize the aggregation of the models, thereby obtaining the trained model.
In a possible implementation manner, the steps of the model training method may be implemented by a client, where the client may be any one of multiple clients of the federal learning system, and then the client may complete model training in combination with a server in the federal learning system, where the process of model training is specifically as follows:
when the server needs to acquire the neural network model with the data processing function, the model to be trained can be acquired first and sent to the plurality of clients. For any one of the clients, after receiving the model to be trained sent by the server, the client can use local data stored by the client as training data to train the model to be trained.
Specifically, after receiving the model to be trained, the client may use local data stored by the client as training data and input the training data into the model to be trained, so as to process the training data through a plurality of neurons of the model to be trained, thereby obtaining a processing result of the training data. It should be noted that, for a plurality of neurons in the model to be trained, each of the plurality of neurons has a parameter and a position code, and then, the plurality of neurons, as a plurality of data processing units in the model to be trained, may use their own parameter and position codes to process the training data, thereby obtaining a processing result of the training data.
After the processing result of the training data is obtained, the client can update the parameters of the neurons in the model to be trained based on the processing result of the training data to obtain an updated model.
After the updated model is obtained, the client side can send the updated model to the server side, so that the server side can perform federal aggregation based on the updated model uploaded by the client side and the updated models uploaded by the other client sides, and the trained model is obtained.
From the above process of model training, it can be seen that: after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), the output of the model with the changed position will be different from the output of the model without the changed position, which causes instability of the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
In one possible implementation, the location codes of the plurality of neurons are determined by the server based on the locations of the plurality of neurons in the model to be trained, or the location codes of the plurality of neurons are determined by the client and the server based on the locations of the plurality of neurons in the model to be trained. In the foregoing implementation, the position coding of the neuron can be set in a variety of ways: (1) in the model to be trained, for all neurons of layer 2 to all neurons of layer N-1, the position codes of the plurality of neurons can be determined by the server based on the positions of the plurality of neurons in the model to be trained, that is, the position code of any one neuron in the plurality of neurons is defined by the server based on the position of the neuron in the model to be trained. (2) In the model to be trained, for all neurons of layer 2 to all neurons of layer N-1, the position codes of the plurality of neurons can be determined by a plurality of clients and a server together based on the positions of the plurality of neurons in the model to be trained, that is, in the plurality of neurons, the position code of any one neuron is agreed in advance by the server and the plurality of clients based on the position of the neuron in the model to be trained.
In one possible implementation, the model to be trained includes N layers, where the 1 st layer is an input layer, the 2 nd to N-1 st layers are intermediate layers, and the N layer is an output layer, and each layer includes at least one neuron. All neurons of layer 1 are used for receiving input data, so all neurons of layer 1 may not have parameters and position codes, all neurons of layer 2 to layer N-1 are used for data processing, so all neurons of layer 2 to layer N-1 have parameters (e.g., weights, offsets, etc.) and position codes, all neurons of layer N are used for outputting processing results of data, so all neurons of layer N have parameters (e.g., weights, etc.), and the client processes training data through a plurality of neurons of a model to be trained, and obtaining the processing results includes: the client performs first calculation on parameters of a jth neuron of an ith layer and final outputs of all neurons of an i-1 th layer to obtain an initial output of the jth neuron of the ith layer, wherein i is 2, i.e., N-1, j is 1, i.e., M, N is not less than 3, and M is not less than 1; the client performs second calculation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; the final output of all the neurons of the 1 st layer is training data, the parameter of the jth neuron of the Nth layer is used for performing first calculation on the final output of all the neurons of the N-1 st layer to obtain the final output of the jth neuron of the Nth layer, and the final output of all the neurons of the Nth layer is a processing result. In the foregoing implementation manner, after the client inputs the training data into the model to be trained, all neurons in the layer 1 in the model to be trained may send final outputs of all neurons in the layer 1 to each neuron in the layer 2, where the final outputs of all neurons in the layer 1 are the training data. Then, the 1 st neuron of the layer 2 can obtain the initial output of the 1 st neuron of the layer 2 after performing the first calculation based on the parameters of the 1 st neuron and the final output of all the neurons of the layer 1, and the 1 st neuron of the layer 2 can obtain the final output of the 1 st neuron of the layer 2 by performing the second calculation by using the position code of the 1 st neuron of the layer 2 and the initial output of the neuron of the layer 1. The remaining layer 2 neurons may also perform the same operations as the layer 2, layer 1 neurons, resulting in the final output of all layer 2 neurons, so that all layer 2 neurons may send the final output of all layer 2 neurons to each layer 3 neuron, …, and so on, until all the neurons of the N-2 layer send the final output of all the neurons of the N-2 layer to each neuron of the N-1 layer, after the 1 st neuron of the N-1 layer can perform the first calculation based on the parameters of the neuron and the final output of all the neurons of the N-2 layer, the initial output of the 1 st neuron of the (N-1) th layer can be obtained, the 1 st neuron of the (N-1) th layer performs second calculation by utilizing the position code of the 1 st neuron and the initial output of the 1 st neuron of the (N-1) th layer, and the final output of the 1 st neuron of the (N-1) th layer can be obtained. The remaining neurons of layer N-1 may also perform the same operations as the 1 st neuron of layer N-1, resulting in the final output of all neurons of layer N-1. Because the neurons of the Nth layer in the model to be trained do not have position codes, after the final outputs of all the neurons of the N-1 th layer are obtained, the 1 st neuron of the Nth layer can perform first calculation on the parameters of the 1 st neuron of the Nth layer and the final outputs of all the neurons of the N-1 th layer to obtain the final output of the 1 st neuron of the Nth layer. Then, the 2 nd neuron of the nth layer can perform first calculation on the parameters of the 2 nd neuron of the nth layer and the final outputs of all the neurons of the N-1 th layer to obtain the final output of the 2 nd neuron of the nth layer, …, and so on to obtain the final outputs of all the neurons of the nth layer, wherein the final outputs of all the neurons of the nth layer are the outputs of the model to be trained, which is equivalent to the processing result of the training data. Therefore, the processing result of the training data is obtained based on the output of each neuron in the model to be trained, and for any neuron, the output of the neuron can be obtained by the neuron performing data processing operation by using its own parameter and position code, so that the neuron is constrained by its position code when realizing data processing (i.e. realizing the function of the neuron), and the influence generated by the position code of the neuron can be reflected in the processing result of the training data. Then, when the client updates the parameter of the neuron based on the processing result of the training data, the function of the neuron can be maintained as unchanged as possible, thereby limiting the rearrangement invariance of the neural network.
In a possible implementation manner, the second calculation is performed by the client on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer, and obtaining the final output of the jth neuron of the ith layer includes: the client performs four arithmetic operations on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or the client performs trigonometric function operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or, the client performs exponential operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or the client performs logarithm operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer. In the foregoing implementation manner, the second calculation performed by the neuron may be any one of a four-way operation, a trigonometric function operation, an exponential operation, and a logarithmic operation, and the constraint of the position coding on the function of the neuron may be implemented in various manners.
In a possible implementation manner, the client updates the parameter based on the processing result, and obtaining the updated model includes: the client acquires a target loss based on the processing result and a real processing result of the training data, wherein the target loss is used for indicating the difference between the processing result and the real processing result; and the client updates the parameters and the position codes based on the target loss to obtain an updated model. It should be noted that, when the client updates the model to be trained, the update frequency of the position code is less than the update frequency of the parameter, for example, 5 batches of training data are provided, the client may successively input the 5 batches of training data to the model to be trained, and after the processing of the parameter and the position code of the neuron in the model to be trained, the processing results of the 5 batches of training data may be correspondingly obtained. Then, the client will successively update the parameters of the neurons in the model to be trained 5 times by using the processing results of 5 batches of training data, but update the position codes of the neurons in the model to be trained 1 time only by using the processing results of 1 batch of training data. In this way, rearrangement invariance of the model can be suppressed to some extent.
Further, the sending, by the client, the updated model to the server includes: the client side obtains a parameter updating amount and a position coding updating amount based on the updated model and the model to be trained; and the client sends the parameter updating amount and the position code updating amount to the server, and the parameter updating amount and the position code updating amount are used for updating the model to be trained by the server until model training conditions are met, so that the trained model is obtained. In the implementation manner, the position codes of the neurons in the model to be trained are non-fixed values, so that the client and the server can jointly update the parameters and the position codes in the model to be trained, so that the model can learn appropriate position codes according to the properties of specific tasks (namely, in a certain service scene, a user needs the model to have a certain data processing function), and more reasonable alignment processing on the neurons is facilitated.
In a possible implementation manner, the client updates the parameter based on the processing result, and obtaining the updated model includes: the client acquires a target loss based on the processing result and a real processing result of the training data, wherein the target loss is used for indicating the difference between the processing result and the real processing result; and the client updates the parameters based on the target loss to obtain an updated model.
Further, the sending, by the client, the updated model to the server includes: the client side obtains a parameter updating amount based on the updated model and the model to be trained; and the client sends the parameter updating quantity to the server, and the parameter updating quantity is used for updating the model to be trained by the server until the model training condition is met, so that the trained model is obtained. In the foregoing implementation manner, the position codes of the neurons in the model to be trained are fixed values, so that the client and the server can jointly update the parameters in the model to be trained, and the trained model can have a certain data processing function.
A second aspect of an embodiment of the present application provides a model training method, including: sending a model to be trained, wherein the model to be trained comprises a plurality of neurons, the neurons are associated with parameter information and position coding information, and the neurons correspond to the position coding information one by one; and acquiring the updated model, aggregating the updated model to obtain a trained model, and updating the model to be trained based on the parameter information, the position coding information and the training data to obtain the updated model.
In a possible implementation manner, the steps of the model training method may be implemented by a server, the server is deployed in a federal learning system, the server may combine multiple clients in the federal learning system to complete model training, and the process of model training specifically includes:
the method comprises the steps that a server side sends a model to be trained to a client side, the model to be trained comprises a plurality of neurons, each neuron in the neurons has a parameter and a position code, the neurons at different positions in the neurons have different position codes, the neurons of the model to be trained are used for the client side to process training data to obtain a processing result, and the parameters are updated based on the processing result to obtain an updated model; and the server side acquires the updated model from the client side and aggregates the updated model to obtain the trained model.
From the above method, it can be seen that: after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), the output of the model with the changed position will be different from the output of the model without the changed position, which causes instability of the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
In one possible implementation, the location codes of the plurality of neurons are determined by the server based on the locations of the plurality of neurons in the model to be processed, or the location codes of the plurality of neurons are determined by the client and the server based on the locations of the plurality of neurons in the model to be processed.
In a possible implementation manner, the server receives the updated model from the client, and aggregates the updated model, and obtaining the trained model includes: the server side obtains a parameter updating amount and a position code updating amount from the client side, and the parameter updating amount and the position code updating amount are obtained based on the updated model and the model to be trained; and the server side updates the model to be trained based on the parameter updating amount and the position coding updating amount until the model training condition is met, so as to obtain the trained model.
In a possible implementation manner, the server receives the updated model from the client, and aggregates the updated model, and obtaining the trained model includes: the server side obtains a parameter updating amount from the client side, wherein the parameter updating amount is obtained based on the updated model and the model to be trained; and updating the model to be trained by the server based on the parameter updating amount until the model training condition is met, and obtaining the trained model.
A third aspect of an embodiment of the present application provides a model training apparatus, including: the acquisition module is used for acquiring a model to be trained, wherein the model to be trained comprises a plurality of neurons, the neurons are associated with parameters and position codes, and the neurons correspond to the position codes one by one; the processing module is used for updating the model to be trained through the parameters, the position codes and the training data to obtain an updated model; and the sending module is used for sending the updated model.
In a possible implementation manner, the apparatus may be any one of clients in a federal learning system, and the obtaining module of the client is configured to obtain a model to be trained from a server, where the model to be trained includes a plurality of neurons, each of the neurons has a parameter and a position code, and neurons at different positions in the neurons have different position codes. The processing module of the client is used for: processing the training data through parameters of a plurality of neurons in the model to be trained and parameter position codes of the neurons in the model to be trained to obtain a processing result, wherein the neurons at different positions in the neurons have different position codes; and updating the parameters of a plurality of neurons in the training model based on the processing result to obtain an updated model. And the sending module of the client is used for sending the updated model to the server, and the updated model is used for aggregation at the server to obtain the trained model. As can be seen from the client: after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), the output of the model with the changed position will be different from the output of the model without the changed position, which causes instability of the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
In one possible implementation, the position codes of the plurality of neurons are determined by the server based on the positions of the plurality of neurons in the model to be trained, or the position codes of the plurality of neurons are determined by the client and the server based on the positions of the plurality of neurons in the model to be trained.
In a possible implementation manner, the model to be trained includes N layers, the plurality of neurons are all neurons from layer 2 to layer N-1, and the processing module is configured to: performing first calculation on parameters of the jth neuron of the ith layer and final outputs of all neurons of the (i-1) th layer to obtain an initial output of the jth neuron of the ith layer, wherein i is 2,. And performing second calculation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer, wherein the final output of the jth neuron of the ith layer is used for generating a processing result, namely the final outputs of all the neurons of the (N-1) th layer are used for generating the processing result.
In a possible implementation manner, in the model to be trained, the layer 1 is an input layer, the layer N is an output layer, final outputs of all neurons in the layer 1 are training data, parameters of the layer N and the layer j are used for performing first calculation on the final outputs of all neurons in the layer N-1 to obtain a final output of the layer N and the layer j, and the final outputs of all neurons in the layer N are processing results.
In one possible implementation, the processing module is configured to: performing four arithmetic operations on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or, performing trigonometric function operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or, performing exponential operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or carrying out logarithm operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer.
In one possible implementation, the processing module is configured to: obtaining a target loss based on the processing result and a real processing result of the training data, the target loss being indicative of a difference between the processing result and the real processing result; and updating the parameters and the position codes based on the target loss to obtain an updated model, wherein the updating frequency of the position codes is less than that of the parameters.
In one possible implementation, the apparatus includes a sending module configured to: acquiring a parameter updating amount and a position coding updating amount based on the updated model and the model to be trained; and sending the parameter updating amount and the position code updating amount to the server, wherein the parameter updating amount and the position code updating amount are used for updating the model to be trained by the server until model training conditions are met, and obtaining the trained model.
In one possible implementation, the processing module is configured to: obtaining a target loss based on the processing result and a real processing result of the training data, the target loss being indicative of a difference between the processing result and the real processing result; and updating the parameters based on the target loss to obtain an updated model.
In one possible implementation, the apparatus includes a sending module configured to: acquiring a parameter updating amount based on the updated model and the model to be trained; and sending the parameter updating quantity to the server, wherein the parameter updating quantity is used for updating the model to be trained by the server until the model training condition is met, and obtaining the trained model.
A fourth aspect of an embodiment of the present application provides a model training apparatus, including: the device comprises a sending module, a calculating module and a calculating module, wherein the sending module is used for sending a model to be trained, the model to be trained comprises a plurality of neurons, the neurons are associated with parameters and position codes, and the neurons correspond to the position codes one by one; the acquisition module is used for acquiring an updated model, and the updated model is obtained by updating a model to be trained based on parameters, position codes and training data; and the aggregation module is used for aggregating the updated model to obtain the trained model.
In a possible implementation manner, the device may be a server in a federal learning system, where the sending module of the server is configured to send a model to be trained to a client, the model to be trained includes a plurality of neurons, each neuron in the plurality of neurons has a parameter and a position code, neurons at different positions in the plurality of neurons have different position codes, and the parameter and the position code are used for the client to process training data to obtain a processing result and update the parameter based on the processing result to obtain an updated model; the acquisition module is used for acquiring the updated model from the client; and the aggregation module is used for aggregating the updated model to obtain the trained model. From the server side, it can be seen that: after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. If the function of some neurons in certain positions in the updated model changes (i.e. some neurons with different functions change in position) relative to the distribution of neurons in the model to be trained, since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), the output of the model with the changed position will be different from the output of the model without the changed position, which causes instability of the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
In one possible implementation, the location codes of the plurality of neurons are determined by the server based on the locations of the plurality of neurons in the model to be processed, or the location codes of the plurality of neurons are determined by the client and the server based on the locations of the plurality of neurons in the model to be processed.
In a possible implementation manner, the obtaining module is configured to obtain a parameter update amount and a position code update amount from the client, where the parameter update amount and the position code update amount are obtained based on the updated model and the model to be trained; and the aggregation module is used for updating the model to be trained based on the parameter updating amount and the position coding updating amount until the model training condition is met, so as to obtain the trained model.
In a possible implementation manner, the obtaining module is configured to obtain a parameter update amount from the client, where the parameter update amount is obtained based on the updated model and the model to be trained; and the aggregation module is used for updating the model to be trained based on the parameter updating amount until the model training condition is met, so as to obtain the trained model.
A fifth aspect of an embodiment of the present application provides a client, including a memory and a processor; the memory stores code and the processor is configured to execute the code, and when executed, the client performs the method according to the first aspect or any one of the possible implementations of the first aspect.
A sixth aspect of an embodiment of the present application provides a server, where the server includes a memory and a processor; the memory stores code and the processor is configured to execute the code, and when executed, the server performs the method according to the second aspect or any one of the possible implementations of the second aspect.
A seventh aspect of the present embodiment provides a bang learning system, where the bang learning system includes a client as described in the fifth aspect and a server as described in the sixth aspect, and the client and the server are in communication connection.
An eighth aspect of embodiments of the present application provides a circuit system, which includes a processing circuit configured to execute the method according to the first aspect, any one of the possible implementations of the first aspect, or the second aspect.
A ninth aspect of an embodiment of the present application provides a chip system, where the chip system includes a processor, and is configured to invoke a computer program or computer instructions stored in a memory, so as to cause the processor to execute the method according to the first aspect, any one of the possible implementation manners of the first aspect, or the second aspect.
In one possible implementation, the processor is coupled to the memory through an interface.
In one possible implementation, the system-on-chip further includes a memory having a computer program or computer instructions stored therein.
A tenth aspect of embodiments of the present application provides a computer storage medium storing a computer program, which, when executed by a computer, causes the computer to implement the method according to any one of the first aspect, the possible implementation manner of the first aspect, or the second aspect.
An eleventh aspect of embodiments of the present application provides a computer program product storing instructions that, when executed by a computer, cause the computer to implement the method according to the first aspect, any one of the possible implementations of the first aspect, or the second aspect.
In the embodiment of the application, after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), it will result in the output of the model with the changed position being different from the output of the model without the changed position, causing instability in the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
Drawings
FIG. 1 is a schematic structural diagram of an artificial intelligence body framework;
FIG. 2 is a schematic structural diagram of a federated learning system provided in an embodiment of the present application;
FIG. 3 is a diagram illustrating an architecture of the system 100 according to an embodiment of the present application;
FIG. 4 is an illustration of an application of the federated learning system provided in an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating another application of the federated learning system provided in an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating another application of the federated learning system provided in an embodiment of the present application
FIG. 7 is a schematic flow chart illustrating a model training method according to an embodiment of the present disclosure;
FIG. 8 is a schematic structural diagram of a federated learning system provided in an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a model to be trained according to an embodiment of the present application;
FIG. 10 is another schematic structural diagram of a federated learning system provided in an embodiment of the present application;
FIG. 11 is a schematic flow chart illustrating a model training method according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of a client according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of an execution device according to an embodiment of the present application;
FIG. 15 is a schematic structural diagram of a training apparatus provided in an embodiment of the present application;
fig. 16 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a model training method and related equipment thereof, so that a trained model obtained by joint training of a client and a server has sufficiently excellent functions.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the manner in which objects of the same nature are distinguished in the embodiments of the application. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
With the continuously enhanced data security awareness of the user and the occurrence of data security problems such as frequent leakage of personal privacy data of the user, the protection of the data related to personal privacy is continuously improved by the user, and a new challenge is provided for model training of the AI technology. Accordingly, federal learning (fed learning) model training has been developed.
The federal learning system generally comprises a server and a plurality of clients, wherein when model training is carried out, the server firstly sends a model to be trained to each client. After receiving the model to be trained, each client device trains the model to be trained by using the training data stored locally to obtain an updated model. Then, each client can upload the updated model to the server. And finally, the server side aggregates the updated models uploaded by the clients to obtain the trained models.
Because different clients use different training data for the same model to be trained and are affected by rearrangement invariance of the neural network model (after the positions of some neurons with different functions in the model are exchanged, the output of the model is not changed), compared with the distribution of the neurons in the model to be trained, some neurons with different functions in the updated model obtained by some clients have position changes. It can be seen that, in the updated model obtained by each client, the neurons with the same function are not all located at the same position, but when the server implements aggregation, the neurons in each updated model are processed according to the position, which results in that the obtained trained model cannot have a sufficiently excellent data processing function.
For example, let the federal learning system contain a server, a client 1 and a client 2. In the layer 2 of the model to be trained issued by the server, the 1 st neuron is used for realizing the function 1, and the 2 nd neuron is used for realizing the function 2. After the client 1 inputs the local data 1 into the model to be trained, a processing result 1 is obtained, then parameters of neurons in the model to be trained are updated based on the processing result 1, the updated model 1 is obtained, in the updated model 1, the function of the 1 st neuron on the 2 nd layer is converted into a function 2, and the function of the 2 nd neuron on the 2 nd layer is converted into a function 1. Similarly, after the client 2 inputs the local data 2 into the model to be trained, the processing result 2 is obtained, and then the parameters of the neurons in the model to be trained are updated based on the processing result 2, so as to obtain the updated model 2, in the updated model 2, the function of the 1 st neuron in the layer 2 is still the function 1, and the function of the 2 nd neuron in the layer 2 is still the function 2. It can be seen that, compared to the neuron distribution of the model to be trained, the two neurons in the updated model 1 for respectively implementing function 1 and function 2 have position exchange, and the positions of the two neurons in the updated model 2 for respectively implementing function 1 and function 2 remain unchanged.
After obtaining the updated model 1, the client 1 may obtain the parameter update amount 1 (including the parameter update amount of each neuron in the updated model 1, for example, the parameter update amount of the 1 st neuron on the layer 2, the parameter update amount of the 2 nd neuron on the layer 2, and the like) based on the updated model 1 and the model to be trained, and send the parameter update amount 1 to the server. Similarly, the client 2 may also obtain the parameter update amount 2 (including the updated parameter update amount of each neuron in the model 2, for example, the parameter update amount of the layer 1 neuron of the layer 2, the parameter update amount of the layer 2 neuron of the layer 2, and the like), and send the parameter update amount to the server. After obtaining the parameter update amount 1 and the parameter update amount 2, the client may perform averaging calculation to obtain an average value of the parameter update amounts (including an average value of the parameter update amounts of the neurons, for example, an average value of the parameter update amounts of the layer 2, 1 st neuron, an average value of the parameter update amounts of the layer 2, 2 nd neuron, and the like). Because the server stores the model to be trained locally, the parameters of each neuron in the model to be trained can be updated correspondingly based on the average value of the parameter updating quantity.
And then, the server can continuously execute the processes with the client 1 and the client 2 repeatedly until the model meets the preset model training condition to obtain the trained model. However, the trained model obtained in this way cannot have a sufficiently excellent data processing function, for example, when the server calculates the average value of the parameter update amounts of the 1 st neuron on the layer 2, the server calculates the parameter update amount of the 1 st neuron on the layer 2 in the updated model 1 and the parameter update amount of the 1 st neuron on the layer 2 in the updated model 2, but the 1 st neuron on the layer 2 in the updated model 1 is used for realizing the function 2, and the 1 st neuron on the layer 2 in the updated model 2 is used for realizing the function 1, so that the parameter of the 1 st neuron on the layer 2 in the model to be trained is updated based on the average value of the parameter update amounts of the 1 st neuron on the layer 2, and the neuron becomes dysfunctional after the parameter update. As can be seen, in the trained model, neurons at a plurality of positions may be dysfunctional, and the trained model may not have an excellent data processing function.
In order to solve the above problem, the embodiments of the present application provide a model training method based on federal learning,
the method can be realized by combining an Artificial Intelligence (AI) technology. AI technology is a technical discipline that simulates, extends and expands the intelligence of a human being using a digital computer or a machine controlled by a digital computer, and obtains the best results by perceiving the environment, acquiring knowledge and using the knowledge. In other words, artificial intelligence technology is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Data processing using artificial intelligence is a common application of artificial intelligence.
The general workflow of the artificial intelligence system is described first, please refer to fig. 1, fig. 1 is a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of an "intelligent information chain" (horizontal axis) and an "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure
The infrastructure provides computing power support for the artificial intelligent system, communication with the outside world is achieved, and support is achieved through the foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, wisdom city etc..
Several application scenarios of the present application are presented next.
Fig. 2 is a schematic structural diagram of a federal learning system provided in an embodiment of the present application, where the federal learning system includes a server and a plurality of clients, and the server and the clients may be connected through a communication network. The client comprises an intelligent terminal such as a mobile phone, a personal computer or an information processing center, the server can be a device or a server with a data processing function such as a cloud server, a network server, an application server and a management server, and the client and the server can cooperatively realize the training of the neural network model.
And model training of multiple iterations can be realized between multiple clients and a server. Specifically, in the first iteration, each client can receive a model to be trained from the server through an interactive interface, and then perform model training in the modes of machine learning, deep learning, searching, reasoning, decision making and the like on the model to be trained through a memory for storing local data and a processor for processing data. After each client finishes model training (namely, parameters of the model to be trained are updated), the updated model can be uploaded to the server, so that the server aggregates the updated model uploaded by each client, and the local model to be trained of the server is trained based on the aggregation result. Then, the server may send the updated model obtained by the training of the server itself to each client again as a new model to be trained, so as to perform the second iteration of model training (i.e., repeatedly perform the foregoing process). Therefore, after multiple iterations, the server determines that the updated model obtained by the last training of the server meets the model training condition, and then can use the updated model obtained by the last training of the server as the trained model (i.e. the trained model). Therefore, the server not only indirectly utilizes the local data of each client to complete model training, but also can ensure the data security of each client, thereby protecting the personal privacy of the user.
It should be noted that, in order to further ensure data security, in each iteration, each client may upload the parameter update amount of the model to the server, so as to represent the updated model obtained by each client. Then, the server can average the parameter updating quantities from the clients, and update the parameters of the local model to be trained of the server based on the average value of the parameter updating quantities, so as to implement model training of the server.
In fig. 2, a server and a client may jointly execute the model training method according to the embodiment of the present application.
In addition, in the federal learning system provided in the embodiment of the present application, the server obtains the trained model, and has a data processing function, so that the model can be deployed on each client. Therefore, each client can provide data processing service for the user, and after a certain client obtains the data to be processed input by the user, the trained model can be called to correspondingly process the data to be processed input by the user, and a corresponding processing result is returned to the user.
In fig. 2, a client may use a trained model obtained by the model training method of the embodiment of the present application to implement a data processing function.
Fig. 3 is a schematic diagram of an architecture of the system 100 according to an embodiment of the present application, in fig. 3, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include: each task to be scheduled, the resources that can be invoked, and other parameters.
During the process that the execution device 110 preprocesses the input data or during the process that the calculation model 111 of the execution device 110 performs calculations and other related processes (such as performing the functional implementation of the neural network in the present application), the execution device 110 may call data, codes and the like in the data storage system 150 for corresponding processes, and may store the data, instructions and the like obtained by corresponding processes in the data storage system 150.
Finally, the I/O interface 112 returns the processing results to the client device 140 for presentation to the user.
It should be noted that the training device 120 may generate corresponding target models/rules based on different training data for different targets or different tasks, and the corresponding target models/rules may be used to achieve the targets or complete the tasks, so as to provide the user with the required results. Wherein the training data may be stored in the database 130 and derived from training samples collected by the data collection device 160.
In the case shown in fig. 3, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.
It should be noted that fig. 3 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 3, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110. As shown in fig. 3, a neural network may be trained from the training device 120.
It should be noted that, in the embodiment of the present application, the training device 120 generally refers to the aforementioned server, the executing device 110 generally refers to the aforementioned client, and the training device 120 may implement model training together with the executing device 110 when training a model, that is, both may implement model training in a federal learning manner.
The embodiment of the application also provides a chip, which comprises the NPU. The chip may be provided in an execution device 110 as shown in fig. 3 to perform the computational work of the computational model 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 3 to complete the training work of the training apparatus 120 and output the target model/rule.
The neural network processor NPU, NPU is mounted as a coprocessor on a main Central Processing Unit (CPU) (host CPU), and tasks are distributed by the main CPU. The core portion of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract data in a memory (weight memory or input memory) and perform an operation.
In some implementations, the arithmetic circuitry includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit is a two-dimensional systolic array. The arithmetic circuit may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix A data from the input memory and carries out matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator).
The vector calculation unit may further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector computation unit may be used for network computation of the non-convolution/non-FC layer in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.
In some implementations, the vector calculation unit can store the processed output vector to a unified buffer. For example, the vector calculation unit may apply a non-linear function to the output of the arithmetic circuit, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to arithmetic circuitry, e.g., for use in subsequent layers in a neural network.
The unified memory is used for storing input data and output data.
The weight data directly passes through a memory cell access controller (DMAC) to carry input data in the external memory to the input memory and/or the unified memory, store the weight data in the external memory in the weight memory, and store data in the unified memory in the external memory.
And the Bus Interface Unit (BIU) is used for realizing interaction among the main CPU, the DMAC and the instruction fetch memory through a bus.
An instruction fetch buffer (instruction fetch buffer) connected to the controller for storing instructions used by the controller;
and the controller is used for calling the instruction cached in the finger memory and realizing the control of the working process of the operation accelerator.
Generally, the unified memory, the input memory, the weight memory, and the instruction fetch memory are On-Chip (On-Chip) memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.
In addition, the federal learning system provided by the embodiment of the application can also be applied to various fields, which will be described below. Fig. 4 is an application illustration of the federal learning system provided in the embodiment of the present application, and as shown in fig. 4, the federal learning system may be applied to the smart home field, in which case, a plurality of clients in the system are smart home devices located in a plurality of households, and the plurality of households are located at different geographic locations. The plurality of smart home devices can communicate with a server (namely a server) at the cloud end to realize federal learning.
The cloud server can be used for realizing repeated iterative model training with a plurality of intelligent household devices in order to deploy a neural network model with a voice recognition function on each intelligent household device. In the first iteration, each intelligent household device can receive the model to be trained from the server, then update the parameters of the model to be trained through local voice data, and upload the updated model to the server, so that the server aggregates the updated model uploaded by each intelligent household device, and the local model to be trained of the server is trained based on the aggregated result. Then, the server may send the updated model obtained by the training of the server itself to each smart home device again as a new model to be trained, so as to perform the second iteration of model training (i.e., repeatedly perform the foregoing process). Therefore, after multiple iterations, the server determines that the updated model obtained by the last training of the server meets the model training condition, and then deploys the updated model obtained by the last training of the server to each intelligent household device as a neural network model capable of realizing the voice recognition function, so as to provide intelligent household service for each family.
Fig. 5 is a schematic view illustrating another application of the federal learning system provided in this embodiment of the present application, and as shown in fig. 5, the federal learning system may be applied in the field of teaching, in which case, the plurality of clients in the system are teaching devices (e.g., personal computers, tablet computers, etc.) located at a plurality of schools, and the schools are located at different geographic locations. The plurality of teaching devices may communicate with a server (i.e., server) of a solver developer to implement federal learning.
In order to deploy a solver, namely a neural network model with an equation solving function, on each teaching device, a server of a solver developer can realize model training of multiple iterations with a plurality of teaching devices. In the first iteration, each teaching device can receive the model to be trained from the server, update the parameters of the model to be trained through local mathematical data, and upload the updated model to the server, so that the server aggregates the updated model uploaded by each teaching device, and trains the local model to be trained of the server based on the aggregated result. Then, the server may send the updated model obtained by the training thereof as a new model to be trained to each teaching device again, so as to perform the second iteration of model training (i.e. repeatedly perform the above-mentioned process). Therefore, after multiple iterations, the server determines that the updated model obtained by the last training of the server meets the model training condition, and the updated model obtained by the last training of the server can be used as a solver to be deployed on each teaching device, so that teaching service is provided for students and teachers in each school.
Fig. 6 is a schematic view illustrating another application of the federal learning system provided in an embodiment of the present application, and as shown in fig. 6, the federal learning system may be applied in the field of software services, in which case, a plurality of clients in the system are intelligent terminal devices used by a plurality of users or enterprises. The plurality of intelligent terminal devices can communicate with a server (namely a server) of a remote software developer to realize federal learning.
In order to deploy image processing software, namely a neural network model with image classification, on each intelligent terminal device, a server of a software developer can realize model training of multiple iterations with a plurality of intelligent terminal devices. In the first iteration, each intelligent terminal device can receive the model to be trained from the server, update the parameters of the model to be trained through local image data, and upload the updated model to the server, so that the server aggregates the updated model uploaded by each intelligent terminal device, and the local model to be trained of the server is trained based on the aggregation result. Then, the server may send the updated model obtained by the training of the server itself to each intelligent terminal device again as a new model to be trained, so as to execute the second iteration of model training (i.e., repeatedly execute the foregoing process). Therefore, after multiple iterations, the server determines that the updated model obtained by the last training of the server meets the model training condition, and then can deploy the updated model obtained by the last training of the server to each intelligent terminal device as image processing software to provide image processing service for enterprises and individuals.
It should be understood that the above description is only schematically presented in the context that the federal learning system is applicable to the smart home field, the teaching field, and the software service field, and in practical applications, the federal learning system provided in the embodiments of the present application can also be applied to more fields, which are not described herein one by one.
Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.
(1) Neural network
The neural network may be composed of neural units, the neural units may refer to operation units with xs and intercept 1 as inputs, and the output of the operation units may be:
Figure BDA0003566655930000161
where s is 1, 2, … … n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
The operation of each layer in a neural network can be described by the mathematical expression y ═ a (Wx + b): from the work of each layer in the physical layer neural network, it can be understood that the transformation of the input space into the output space (i.e. the row space to the column space of the matrix) is accomplished by five operations on the input space (set of input vectors), which include: 1. ascending/descending dimensions; 2. zooming in/out; 3. rotating; 4. translating; 5. "bending". Wherein the operations 1, 2 and 3 are performed by Wx, the operation 4 is performed by + b, and the operation 5 is performed by a (). The expression "space" is used herein because the object being classified is not a single thing, but a class of things, and space refers to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value for a neuron in the layer of neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weight W of each layer controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the trained neural network. Therefore, the training process of the neural network is essentially a way of learning the control space transformation, and more specifically, the weight matrix.
Because it is desirable that the output of the neural network is as close as possible to the value actually desired to be predicted, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the value actually desired to be predicted, and then updating the weight vector according to the difference between the predicted value and the value actually desired (of course, there is usually an initialization process before the first update, that is, the parameters are configured in advance for each layer of the neural network). Therefore, it is necessary to define in advance how to compare the difference between the predicted value and the target value, which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the neural network becomes a process of reducing the loss as much as possible.
(2) Back propagation algorithm
The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.
(3) Federal learning
The federated learning technology is a machine learning technology for protecting user privacy, structurally and generally comprises a server (central server) and a plurality of clients as participants, and the technical process mainly comprises a model issuing and model aggregation process. In the model issuing process, the client downloads the model from the server, trains the model on local data, and uploads the model to the server after the training reaches a certain degree. In the model aggregation process, the server side collects the models uploaded by the clients and performs model fusion. The two processes are iterated repeatedly until the model converges, so that a trained model is obtained.
(4) Federal polymerization
The federal aggregation is a subprocess of federal learning, and the main task of a server in the federal learning is to aggregate models uploaded by a client, namely, a process that a server fuses a plurality of models into one model in the federal learning.
(5) Parametric point-to-point aggregation
Parameter point-to-point aggregation is the simplest federate aggregation mode, the mode requires that models uploaded by a plurality of clients have the same structure, and a server can average parameters of neurons at the same position of the models.
The method provided by the present application is described below from the training side of the neural network and the application side of the neural network.
The model training method provided by the embodiment of the application relates to data processing, and the client can specifically apply methods such as data training, machine learning and deep learning to perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on training data (for example, the training data locally stored by the client in the application), finally obtain an updated neural network (for example, an updated model obtained by the client based on the training data in the application) and return the updated neural network to the server for aggregation, so as to obtain a trained neural network (for example, a trained model obtained by the server aggregating based on the updated model in the application); in addition, the trained neural network obtained by the model training method provided in the embodiment of the present application may be deployed at the client by the server, so that the client realizes a data processing function, that is, input data is input to the trained neural network deployed in the client, thereby obtaining output data (i.e., a processing result of the input data). It should be noted that the model training method provided in the embodiment of the present application and the model obtained based on the model training method to realize the data processing function are inventions generated based on the same concept, and may also be understood as two parts in a system or two stages of an overall process: such as a model training phase and a model application phase.
Fig. 7 is a schematic flowchart of a model training method provided in the embodiment of the present application, which may be implemented by the federal learning system shown in fig. 8 (fig. 8 is a schematic structural diagram of the federal learning system provided in the embodiment of the present application), and as shown in fig. 8, the system includes: the server and the clients each include a computing device (e.g., CPU, GPU, etc.) for training a model to be trained and a transmission device (e.g., communication interface, etc.) for transmitting the model or information related to the model, so that the server and the clients can train the model in a federal learning manner, and for further understanding of the process, the process is further described below with reference to fig. 7. As shown in fig. 7, the method includes:
701. the client side obtains a model to be trained from the server side.
In this embodiment, when the server needs to obtain the neural network model with a data processing function (e.g., image processing, voice processing, text processing, etc.), the server may first obtain a model to be trained (i.e., the neural network model that needs to be trained), and send the model to the plurality of clients.
It should be noted that the model to be trained issued by the server includes N layers (N is an integer greater than or equal to 3), the 1 st layer is an input layer, the 2 nd to N-1 st layers are intermediate layers, the N th layer is an output layer, and each layer includes at least one neuron. All neurons of layer 1 are used for receiving input data, so all neurons of layer 1 may not have parameters and position codes, all neurons of layer 2 to layer N-1 are used for data processing, so all neurons of layer 2 to layer N-1 have parameters (i.e. the aforementioned parameter information, such as weights, offsets, etc.) and position codes (i.e. the aforementioned position code information), all neurons of layer N are used for outputting processing results of data, so all neurons of layer N have parameters (such as weights, etc.) only. Wherein, in a plurality of neurons from all neurons at layer 2 to all neurons at layer N-1, the position code of any one neuron is associated with the position of the neuron in the model to be trained, and in the plurality of neurons, neurons at different positions usually have different position codes.
In particular, the position coding of the neurons can be set in a number of ways, which will be described separately below:
in one possible implementation, for all neurons at layer 2 to all neurons at layer N-1, the position codes of the plurality of neurons may be determined by the server based on the positions of the plurality of neurons in the model to be processed, i.e., the position code of any one neuron in the plurality of neurons is defined by the server based on the position of the neuron in the model to be trained. For example, if the model to be trained includes 4 layers, the 1 st layer is an input layer, and the 4 th layer is an output layer, where the 2 nd layer includes 3 neurons and the 3 rd layer includes 4 neurons, the server can define the position code of the 1 st neuron in the 2 nd layer as 1 and set in the 1 st neuron in the 2 nd layer, so that the 1 st neuron in the 2 nd layer not only has its own parameters, but also has its own position code. Similarly, the server can define the position code of the 2 nd neuron at layer 2 as 2 and set in the 2 nd neuron at layer 2, so that the 2 nd neuron at layer 2 has not only its own parameters but also its own position code, …, and so on, and the server can define the position code of the 4 th neuron at layer 3 as 7 and set in the 4 th neuron at layer 3, so that the 4 th neuron at layer 3 has not only its own parameters but also its own position code. In this way, the server may define the position codes of all neurons in layer 2 and layer 3 in the model to be trained, and set the position codes in the corresponding neurons.
In another possible implementation manner, for all neurons of layer 2 to all neurons of layer N-1, the position codes of the plurality of neurons may be determined by a plurality of clients and a server together based on the positions of the plurality of neurons in the model to be processed, that is, in the plurality of neurons, the position code of any one neuron is agreed in advance by the server and the plurality of clients based on the position of the neuron in the model to be trained. For example, as shown in fig. 9 (fig. 9 is a schematic structural diagram of a model to be trained provided in this embodiment, it should be noted that fig. 9 further illustrates a location code of a neuron, and does not illustrate parameters of the neuron), if the federal learning system includes a server, a client 1 and a client 2, the model to be trained includes 4 layers, the layer 1 is an input layer, and the layer 4 is an output layer, where the layer 2 includes 3 neurons and the layer 3 includes 4 neurons, then the server, the client 1 and the client 2 can approximate the location code of the layer 2, the layer 1 neuron to be 1, and set in the layer 2, the layer 1 neuron has not only its own parameters but also its own location code. Similarly, the server, client 1 and client 2 can also define the position code of the 2 nd neuron at layer 2 as 2 and set in the 2 nd neuron at layer 2, so that the 2 nd neuron at layer 2 has not only its own parameter but also its own position code, …, and so on, and the server, client 1 and client 2 can also define the position code of the 4 th neuron at layer 3 as 7 and set in the 4 th neuron at layer 3, so that the 4 th neuron at layer 3 has not only its own parameter but also its own position code. In this way, the server, the client 1 and the client 2 may agree in advance on the position codes of all neurons in the layer 2 and the position codes of all neurons in the layer 3 in the model to be trained, and set the position codes in the corresponding neurons.
Because the same model to be trained is issued by the server to the plurality of clients, the position codes of the plurality of neurons in the model to be trained received by any one client and the position codes of the plurality of neurons in the model to be trained received by the other clients are the same set of position codes in the plurality of clients.
In the multiple clients, each client performs the same operation on the model to be trained, so that the following description will schematically take one of the multiple clients as an example. For a certain client, after receiving the model to be trained sent by the server, the client can utilize local data stored by the client as training data to train the model to be trained.
It should be understood that, in the foregoing example, the model to be trained is schematically illustrated by only including 2 intermediate layers, and the layer 2 includes 3 neurons, and the layer 3 includes 4 neurons, without limiting the number of intermediate layers and the number of neurons in the layers of the model to be trained in this application.
It should also be understood that, in the foregoing examples, the position codes of the neurons are only exemplified as 1 to 7, and the size of the position codes of the neurons is not limited.
702. The client processes the training data through a plurality of neurons of the model to be trained to obtain a processing result, each neuron in the plurality of neurons has parameters and position codes, and the neurons at different positions in the plurality of neurons have different position codes.
After receiving the model to be trained, the client can use local data stored by the client as training data and input the training data into the model to be trained so as to process the training data through a plurality of neurons of the model to be trained to obtain a processing result of the training data. It should be noted that, for all neurons in layer 2 to all neurons in layer N-1 in the model to be trained, each neuron in the plurality of neurons has a parameter and a position code, and then, the plurality of neurons, as a plurality of data processing units in the model to be processed, can process (calculate) the training data by using their own parameter and position codes, thereby obtaining a processing result of the training data.
Specifically, the client may obtain a processing result of the training data by using the model to be trained in the following manner:
because the model to be trained comprises N layers, the number of neurons in each layer can be recorded as follows:
d i ,i=1,2,...,N (2)
in the above formula, d 1 The number of neurons in the input layer is equivalent to the dimension of the input training data (i.e., the input dimension), d N The number of neurons in the output layer corresponds to the dimension of the processing result of the training data (i.e., the dimension of the output).
Then, note the parameters of each layer as:
Figure BDA0003566655930000201
in the above formula, W i Is the weight of the neuron in the i-th layer, b i Is the bias of the neurons in layer i. Wherein, W i The element in the jth row in (j), i.e., the weight of the jth neuron at the ith layer. b i The element in the j-th row, i.e. the i-thThe bias of the jth neuron of the layer, both of which can be considered as parameters of the jth neuron of the ith layer.
Then, let the position of the jth neuron in the ith layer encode as:
g i (j),j=1,2,...,d i (4)
in the above equation, since only neurons of the intermediate layers (i.e., layers 2 to N-1) in the model to be trained have position codes, i is 2.
Next, let note the activation function of the intermediate layers (i.e., layers 2 through N-1) as:
f i ,i=2,...,N-1 (5)
then, the training data of the model to be trained is recorded as:
Figure BDA0003566655930000202
in the above equation, the training data is the final output of all neurons in layer 1 (including the final output of each neuron in layer 1), so x can also be written as h 1 ,h 1 Comprising d 1 An element, i.e. h 1 Has a dimension of d 1 And (5) maintaining.
Then, after the client inputs the training data x into the model to be trained, all neurons in layer 1 in the model to be trained can send the final output h of all neurons in layer 1 to each neuron in layer 2 1 And finally outputting h to all the neurons of the layer 1 by each neuron of the layer 2 1 After calculation, the final output h of all neurons in the layer 2 can be obtained 2 (h 2 Comprising d 2 An element, i.e. h 2 Has a dimension of d 2 Vitamin E, and h 2 Contains the final output of each neuron at layer 2), all neurons at layer 2 can send the final output h of all neurons at layer 2 to each neuron at layer 3 2 …, and so on, until all neurons at layer N-2 send the final output h of all neurons at layer N-2 to each neuron at layer N-1 N-2 Final of each neuron at layer N-1 to all neurons at layer N-2Output h N-2 After calculation, the final output h of all the neurons of the N-1 layer can be obtained N-1 (h N-1 Comprising d N-1 An element, i.e. h N-1 Has a dimension of d N-1 Vitamin E, and h N-1 Containing the final output of each neuron at layer N-1). It can be seen that the calculation performed by the ith layer is as shown in the following equation:
Figure BDA0003566655930000203
in the above formula, i ═ 2., N-1. h is i-1 Is the final output of all neurons at layer i-1, h i "is the initial output of all neurons of the i-th layer, h i "j" is the initial output h of all neurons in the i-th layer i The element in the jth row of the 'i', i.e. the initial output of the jth neuron at layer i, h i (j) Is the final output h of all neurons of the ith layer i The element in the jth row in (j), i.e., the final output of the jth neuron at the ith layer. M, M is the number of neurons in the ith layer, and the value of M varies with i, for example, when i is 2, M is d 2 When i is N-1, M is d N-1
Based on equation (7), when i is 2 i-1 Performing a first calculation (the first calculation process can refer to the first row formula in formula (7)), and obtaining an initial output h of the jth neuron of the ith layer i And (j). Then, the jth neuron of the ith layer encodes the position of the jth neuron of the ith layer by g i (j) And the initial output h of the jth neuron of the ith layer i Performing a second calculation (the second calculation process can refer to a second row formula in formula (7)) to obtain a final output h of the jth neuron of the ith layer i (j) In that respect For the rest of the neurons except the jth neuron in the ith layer, the operation similar to the jth neuron can be executed, so that the final output h of all the neurons in the ith layer can be obtained i
Because the neurons of the Nth layer in the model to be trained do not have position codesSo that each neuron on the Nth layer obtains the final output h of all neurons on the N-1 th layer N-1 Thereafter, the 1 st neuron of the Nth layer can output the parameters of the 1 st neuron of the Nth layer and the final output h of all the neurons of the N-1 th layer N-1 Performing a first calculation (the first calculation process can refer to the first row formula in formula (7)), and obtaining a final output h of the 1 st neuron of the Nth layer N And 1. The Nth layer 2 neuron may then combine the parameters of the Nth layer 2 neuron with the final output h of all neurons of layer N-1 N-1 Performing a first calculation to obtain a final output h of the 2 nd neuron of the Nth layer N "2", …, and so on, the initial output h of all neurons in layer N can be obtained N "final output h of all neurons at layer N N The output of the model to be trained is equivalent to the processing result of the training data.
It should be understood that, in this embodiment, when the client processes the training data through the model to be trained, the operation of each neuron in the model to be trained may also be regarded as the operation of the client.
It should also be understood that, since the nth layer (output layer) of the model to be trained does not have an activation function, the neurons of the aforementioned nth layer may perform the activation function f in the first row formula when performing the first calculation, that is, when the neurons of the nth layer are calculated according to the first row formula in formula (7) N Treated as an identity function, i.e. a function where the input equals the output.
It should also be understood that the present embodiment is only schematically illustrated with the second calculation as the multiplication operation, and does not limit the type of the second calculation. For example, the second calculation may also be one or any combination of four calculations, that is, the client may perform one or any combination of a multiplication operation, an addition operation, a subtraction operation, and a division operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain a final output of the jth neuron of the ith layer. For another example, the second calculation may also be a trigonometric function operation, that is, the client may perform trigonometric function operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain a final output of the jth neuron of the ith layer. For another example, the second calculation may also be an exponential operation, that is, the client performs an exponential operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain a final output of the jth neuron of the ith layer. For example, the second calculation may also be a logarithm operation, that is, the client performs a logarithm operation on the position code of the jth neuron in the ith layer and the initial output of the jth neuron in the ith layer to obtain a final output of the jth neuron in the ith layer.
703. And the client updates the parameters of the neurons in the model to be processed based on the processing result to obtain an updated model.
After the processing result of the training data is obtained, the client can update the parameters of the neurons in the model to be processed based on the processing result of the training data to obtain an updated model.
Specifically, the client may update parameters of neurons in the model to be processed based on the processing result in the following manner, so as to obtain an updated model:
since the real processing result of the training data is known, the client may calculate the processing result of the training data output by the model to be trained and the real processing result of the training data through a preset target loss function to obtain a target loss, where the target loss is used to indicate a difference between the processing result of the training data output by the model to be trained and the real processing result of the training data.
After the target loss is obtained, the client updates parameters of neurons in the model to be trained (including parameters of all neurons in the layer 2 to all neurons in the layer N) based on the target loss, but does not update position codes of the neurons (including position codes of all neurons in the layer 2 to all neurons in the layer N-1, and the position codes are regarded as fixed values), so that the updated model is obtained.
704. And the client acquires the parameter updating amount based on the updated model and the model to be trained.
705. And the client sends the parameter updating quantity to the server, and the parameter updating quantity is used for updating the model to be trained by the server until the model training condition is met, so that the trained model is obtained.
After the updated model is obtained, the client side can send the updated model to the server side, so that the server side can perform federal aggregation based on the updated model uploaded by the client side and the updated models uploaded by the other client sides to obtain the trained model.
Specifically, the client may upload the updated model in the following manner, so that the service end implements federal aggregation based on the updated model:
after obtaining the updated model, the client may obtain a parameter update amount between the updated model and the model to be trained, where the parameter update amount generally refers to a parameter update amount of each neuron of the updated model compared to the model to be trained (hereinafter referred to as a parameter update amount of each neuron in the updated model). It should be noted that the client may compare the parameters of the neurons at the same position in the updated model and the model to be trained, so as to obtain the parameter update amount of the neurons at the position, and thus, the parameter update amount of each neuron of the updated model may be obtained. Still as in the above example, since the model to be trained includes 4 layers, the updated model 1 obtained by the client 1 also includes 4 layers, and then the client 1 may compare the parameter of the 1 st neuron in the layer 1 in the model to be trained with the parameter of the 1 st neuron in the layer 1 in the updated model 1 to obtain the parameter update amount of the 1 st neuron in the layer 1 in the updated model 1, and the client may also compare the parameter of the 2 nd neuron in the layer 1 in the model to be trained with the parameter of the 2 nd neuron in the layer 1 in the updated model 1 to obtain the parameter update amount of the 2 nd neuron in the layer 1 in the updated model 1, …, and so on, the client may obtain the parameter update amount of each neuron in the updated model 1.
Then, the client can send the parameter updating amount of each neuron in the updated model to the server, so that the server can obtain the parameter updating amount of each neuron in the updated model uploaded by each client, and perform averaging calculation based on the information, thereby obtaining the average value of the parameter updating amount of each neuron in the updated model, so that the server can perform corresponding updating on the parameters of each neuron in the locally stored model to be trained based on the average value of the parameter updating amount of each neuron, and obtain the updated model trained by the server. Still as in the above example, after receiving the updated parameter update quantities of each neuron in the updated model 1 uploaded by the client 1 and the updated parameter update quantities of each neuron in the updated model 2 uploaded by the client 2, the server may perform averaging calculation on the updated parameter update quantity of the 1 st neuron in the layer 1 in the updated model 1 and the updated parameter update quantity of the 1 st neuron in the layer 1 in the updated model 2 to obtain an average value of the parameter update quantities of the 1 st neuron in the layer 1, …, and so on, and the server may obtain the average value of the parameter update quantities of each neuron. Then, the server may perform corresponding update on the parameters of each neuron in the locally stored model to be trained based on the average value of the parameter update amounts of each neuron, that is, update the parameter of the 1 st neuron in the layer 1 in the model to be trained by using the average value of the parameter update amounts of the 1 st neuron in the layer 1, update the parameter of the 2 nd neuron in the layer 1 in the model to be trained by using the average value of the parameter update amounts of the 2 nd neuron in the layer 1, …, and so on, so as to obtain an updated model obtained by the server through self-training.
Thereafter, the server may use the updated model obtained by the server through self training as a new model to be trained, and send the new model to each client for next iterative model training (i.e., repeatedly execute steps 701 to 704) until, in a certain iterative model training, the updated model obtained by the server through self training meets the model training requirements (e.g., the target loss converges or the iteration number is greater than the preset number, etc.), and may use the updated model obtained by the server through self training in the iteration as the trained model (i.e., the trained neural network model).
It should be understood that, in this embodiment, the calculation of averaging the parameter update amounts of each neuron in the updated model uploaded by each client is only schematically illustrated by the server, and in practical application, the server may further perform weighted average calculation and the like on the parameter update amounts of each neuron in the updated model uploaded by each client.
In the embodiment of the application, after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), the output of the model with the changed position will be different from the output of the model without the changed position, which causes instability of the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since the other clients can also execute the same operation as the client, the neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model can have excellent functions.
Further, in some related technologies, after receiving the updated models uploaded by each client, the server may perform neuron alignment on the models by using local data of each client (i.e., swap the positions of neurons, so that neurons with the same function in the models are located at the same position). However, such alignment may compromise user privacy, involve a series of data security issues, and introduce additional computational overhead. In the solution provided in this embodiment of the present application, in the process of training the model to be trained by using the respective local data, due to the existence of the position code of the neuron in the model to be trained, pre-alignment of the neuron may be implemented (as shown in fig. 10, fig. 10 is another structural schematic diagram of the federal learning system provided in this embodiment of the present application), so that a server is not required to perform alignment operation of the neuron, user privacy may be effectively protected, a data security problem is avoided, and the computational overhead of the server may be reduced.
Fig. 11 is another schematic flow chart of a model training method provided in an embodiment of the present application, where the method may also be implemented by a federal learning system as shown in fig. 8, and as shown in fig. 11, the method includes:
1101. the client side obtains a model to be trained from the server side.
In one possible implementation, the location codes of the plurality of neurons are determined by the server based on the locations of the plurality of neurons in the model to be processed, or the location codes of the plurality of neurons are determined by the client and the server based on the locations of the plurality of neurons in the model to be processed.
1102. The client processes the training data through a plurality of neurons of the model to be trained to obtain a processing result, each neuron in the plurality of neurons has parameters and position codes, and the neurons at different positions in the plurality of neurons have different position codes.
In a possible implementation manner, the model to be trained includes N layers, the 1 st layer is an input layer, the nth layer is an output layer, the plurality of neurons are all neurons from the 2 nd layer to the N-1 st layer, the client processes the training data through the plurality of neurons of the model to be trained, and obtaining the processing result includes: the client performs first calculation on parameters of a jth neuron of an ith layer and final outputs of all neurons of an i-1 th layer to obtain an initial output of the jth neuron of the ith layer, wherein i is 1, N, j is 1, M, N is not less than 3, and M is not less than 1; the client performs second calculation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; the final output of all the neurons of the 1 st layer is training data, and the initial output of all the neurons of the N th layer is a processing result.
In a possible implementation manner, the second calculation is performed by the client on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer, and obtaining the final output of the jth neuron of the ith layer includes: the client performs four arithmetic operations on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or the client performs trigonometric function operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or, the client performs exponential operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or the client performs logarithm operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer.
For the description of step 1101 and step 1102, reference may be made to the relevant description parts of step 701 and step 702 in the embodiment shown in fig. 7, and details are not repeated here.
1103. And the client updates the parameters and the position codes of the neurons in the model to be processed based on the processing result to obtain an updated model.
After the processing result of the training data is obtained, the client can update the parameters and the position codes of the neurons in the model to be processed based on the processing result of the training data to obtain an updated model.
Specifically, the client may update parameters and position codes of neurons in the model to be processed based on the processing result, so as to obtain an updated model:
since the real processing result of the training data is known, the client may calculate the processing result of the training data output by the model to be trained and the real processing result of the training data through a preset target loss function to obtain a target loss, where the target loss is used to indicate a difference between the processing result of the training data output by the model to be trained and the real processing result of the training data.
After the target loss is obtained, the client updates the parameters of the neurons (including the parameters of all neurons in the layer 2 to the layer N) and the position codes of the neurons (including the position codes of all neurons in the layer 2 to the layer N-1, and the position codes are regarded as non-fixed values) in the model to be trained based on the target loss, so as to obtain an updated model.
It is noted that the client may often obtain multiple batches of training data in advance, so the client may perform multiple rounds of updating on the model to be trained (i.e., execute multiple rounds of steps 1102 and 1103). Specifically, in the first round of updating, the client may input the first batch of training data into the model to be trained to obtain a processing result of the first batch of training data, and update the model to be trained based on the processing result to obtain the model obtained in the first round. Then, in the second round of updating, the client may input the second batch of training data into the model obtained in the first round, to obtain a processing result of the second batch of training data, and update the model obtained in the first round based on the processing result, to obtain the model obtained in the second round. However, all the updates of the round update the parameters of the neurons in the model, but only a part of the updates of the round update the position codes of the neurons in the model, and as a result, the update frequency of the position codes is less than that of the parameters, so that the rearrangement invariance of the model can be suppressed to a certain extent.
For any batch of training data, the client may update the model to be trained with the batch of training data (i.e., each batch,
1104. and the client acquires the parameter updating amount and the position coding updating amount based on the updated model and the model to be trained.
1105. And the client sends the parameter updating amount and the position code updating amount to the server, and the parameter updating amount and the position code updating amount are used for updating the model to be trained by the server until model training conditions are met, so that the trained model is obtained.
After the updated model is obtained, the client side can send the updated model to the server side, so that the server side can perform federal aggregation based on the updated model uploaded by the client side and the updated models uploaded by the other client sides to obtain the trained model.
Specifically, the client may upload the updated model in the following manner, so that the service end implements federal aggregation based on the updated model:
after obtaining the updated model, the client may obtain a parameter update amount and a position code update amount between the updated model and the model to be trained, where the parameter update amount generally refers to a parameter update amount of each neuron (hereinafter referred to as a parameter update amount of each neuron in the updated model) of the updated model compared with the model to be trained, and the position code update amount generally refers to a position code update amount of each neuron (hereinafter referred to as a position code update amount of each neuron in the updated model) of the updated model compared with the model to be trained. It should be noted that the client may compare the parameters of the neurons at the same position in the updated model and the model to be trained, so as to obtain the parameter update amount of the neurons at the position, and compare the position codes of the neurons at the same position, so as to obtain the position code update amount of the neurons at the position, so that the parameter update amount and the position code update amount of each neuron of the updated model can be obtained. Still as in the above example, since the model to be trained includes 4 layers, the updated model 1 obtained by the client 1 also includes 4 layers, and then the client 1 may compare the parameter of the 1 st neuron in the layer 1 in the model to be trained with the parameter of the 1 st neuron in the layer 1 in the updated model 1 to obtain the parameter update amount of the 1 st neuron in the layer 1 in the updated model 1, and the client may also compare the parameter of the 2 nd neuron in the layer 1 in the model to be trained with the parameter of the 2 nd neuron in the layer 1 in the updated model 1 to obtain the parameter update amount of the 2 nd neuron in the layer 1 in the updated model 1, …, and so on, the client may obtain the parameter update amount of each neuron in the updated model 1. Similarly, the client 1 may further compare the position code of the 1 st neuron on the layer 1 in the model to be trained with the position code of the 1 st neuron on the layer 1 in the updated model 1 to obtain the updated position code update amount of the 1 st neuron on the layer 1 in the updated model 1, and the client may further compare the position code of the 2 nd neuron on the layer 1 in the model to be trained with the position code of the 2 nd neuron on the layer 1 in the updated model 1 to obtain the updated position code update amount of the 2 nd neuron on the layer 1 in the updated model 1, …, and so on, the client may obtain the updated position code update amount of each neuron in the updated model 1.
Then, the client can send the parameter updating amount and the position coding updating amount of each neuron in the updated model to the server, therefore, the server can obtain the parameter update quantity and the position code update quantity of each neuron in the updated model uploaded by each client, and carry out averaging calculation based on the information, thereby obtaining the average value of the parameter updating amount of each neuron in the updated model and the average value of the position coding updating amount of each neuron, so that the server can obtain the model based on the average value of the parameter updating amount of each neuron, the parameters of each neuron in the locally stored model to be trained are correspondingly updated, and based on the average value of the position coding update quantity of each neuron, and correspondingly updating the position codes of all the neurons in the locally stored model to be trained to obtain an updated model obtained by self training of the server. Still as in the above example, after receiving the updated parameter update quantities of each neuron in the updated model 1 uploaded by the client 1 and the updated parameter update quantities of each neuron in the updated model 2 uploaded by the client 2, the server may perform averaging calculation on the updated parameter update quantity of the 1 st neuron in the layer 1 in the updated model 1 and the updated parameter update quantity of the 1 st neuron in the layer 1 in the updated model 2 to obtain an average value of the parameter update quantities of the 1 st neuron in the layer 1, …, and so on, and the server may obtain the average value of the parameter update quantities of each neuron. Similarly, after receiving the updated position code update amount of each neuron in the updated model 1 uploaded by the client 1 and the updated position code update amount of each neuron in the updated model 2 uploaded by the client 2, the server may perform averaging calculation on the updated position code update amount of the 1 st neuron in the layer 1 in the updated model 1 and the updated position code update amount of the 1 st neuron in the layer 1 in the updated model 2 to obtain an average value of the position code update amounts of the 1 st neuron in the layer 1, …, and so on, and the server may obtain the average value of the position code update amounts of each neuron. Then, the server may perform corresponding update on the parameters of each neuron in the locally stored model to be trained based on the average value of the parameter update amounts of each neuron, that is, the parameter of the 1 st neuron in the layer 1 in the model to be trained is updated by using the average value of the parameter update amounts of the 1 st neuron in the layer 1, the parameter of the 2 nd neuron in the layer 1 in the model to be trained is updated by using the average value of the parameter update amounts of the 2 nd neuron in the layer 1, …, and so on, the server may complete the parameter update of each neuron. Similarly, the server may further perform corresponding update on the position codes of the neurons in the locally stored model to be trained based on the average value of the update amounts of the position codes of the neurons, that is, the position code of the 1 st neuron in the 1 st layer in the model to be trained is updated by using the average value of the update amounts of the position codes of the 1 st neuron in the 1 st layer, and the position code of the 2 nd neuron in the 1 st layer in the model to be trained is updated by using the average value of the update amounts of the position codes of the 2 nd neuron in the 1 st layer, …, and so on, the server may complete the update of the position codes of the neurons, thereby obtaining an updated model obtained by the server through self-training.
Thereafter, the server may use the updated model obtained by the server through self-training as a new model to be trained, and send the new model to each client for next iterative model training (i.e., repeatedly execute steps 1101 to 1104), until the updated model obtained by the server through self-training meets the model training requirements (e.g., the target loss converges or the iteration number is greater than the preset number, etc.) in a certain iterative model training, and may use the updated model obtained by the server through self-training in the iteration as the trained model (i.e., the trained neural network model).
It should be understood that, in this embodiment, the calculation of averaging the parameter update amounts of each neuron in the updated model uploaded by each client is only schematically illustrated by the server, and in practical application, the server may further perform weighted average calculation and the like on the parameter update amounts of each neuron in the updated model uploaded by each client.
It should be further understood that, in this embodiment, it is only schematically illustrated that the server performs averaging calculation on the location code update amount of each neuron in the updated model uploaded by each client, and in practical application, the server may also perform weighted average calculation on the location code update amount of each neuron in the updated model uploaded by each client, and the like.
In the embodiment of the application, after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), the output of the model with the changed position will be different from the output of the model without the changed position, which causes instability of the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
Further, in some related technologies, after receiving the updated models uploaded by each client, the server may perform neuron alignment on the models by using local data of each client (i.e., swap the positions of neurons, so that neurons with the same function in the models are located at the same position). However, such alignment may compromise user privacy, involve a number of data security issues, and introduce additional computational overhead. In the scheme provided by the embodiment of the present application, in the process of training the model to be trained by using the respective local data, due to the existence of the position code of the neuron in the model to be trained, pre-alignment of the neuron can be realized (as shown in fig. 5, fig. 5 is another structural schematic diagram of the federal learning system provided by the embodiment of the present application), so that a server is not required to perform alignment operation of the neuron, user privacy can be effectively protected, the occurrence of data security problems can be avoided, and the computational overhead of the server can be reduced.
Furthermore, the position codes of the neurons can be updated, so that the model can learn the appropriate position codes according to the properties of specific tasks (namely, in a certain service scene, a user needs the model to have a certain data processing function), and more reasonable alignment processing on the neurons is facilitated.
The above is a detailed description of the model training method provided in the embodiments of the present application, and the following describes a model training apparatus provided in the embodiments of the present application. Fig. 12 is a schematic structural diagram of a client according to an embodiment of the present application, and as shown in fig. 12, the client includes:
an obtaining module 1201, configured to obtain a to-be-trained model from a server, where the to-be-trained model includes a plurality of neurons, each neuron in the plurality of neurons has a parameter and a position code, and neurons at different positions in the plurality of neurons have different position codes.
A processing module 1202 for: processing the training data through parameters of a plurality of neurons in the model to be trained and parameter position codes of the neurons in the model to be trained to obtain a processing result, wherein the neurons at different positions in the neurons have different position codes; and updating the parameters of a plurality of neurons in the training model based on the processing result to obtain an updated model.
A sending module 1203, configured to send the updated model to the server, where the updated model is used for aggregation at the server, so as to obtain a trained model.
In the embodiment of the application, after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), it will result in the output of the model with the changed position being different from the output of the model without the changed position, causing instability in the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
In one possible implementation, the position codes of the plurality of neurons are determined by the server based on the positions of the plurality of neurons in the model to be trained, or the position codes of the plurality of neurons are determined by the client and the server based on the positions of the plurality of neurons in the model to be trained.
In a possible implementation manner, the model to be trained includes N layers, the plurality of neurons are all neurons from layer 2 to layer N-1, and the processing module 1202 is configured to: performing first calculation on parameters of the jth neuron of the ith layer and final outputs of all neurons of the (i-1) th layer to obtain an initial output of the jth neuron of the ith layer, wherein i is 2,. And performing second calculation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer, wherein the final output of the jth neuron of the ith layer is used for generating a processing result, namely the final outputs of all the neurons of the (N-1) th layer are used for generating the processing result.
In a possible implementation manner, in the model to be trained, the layer 1 is an input layer, the layer N is an output layer, final outputs of all neurons in the layer 1 are training data, parameters of the layer N and the layer j are used for performing first calculation on the final outputs of all neurons in the layer N-1 to obtain a final output of the layer N and the layer j, and the final outputs of all neurons in the layer N are processing results.
In one possible implementation, the processing module 1202 is configured to: performing four arithmetic operations on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or, performing trigonometric function operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or, performing exponential operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or carrying out logarithm operation on the position code of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer.
In one possible implementation, the processing module 1202 is configured to: obtaining a target loss based on the processing result and a real processing result of the training data, the target loss being indicative of a difference between the processing result and the real processing result; and updating the parameters and the position codes based on the target loss to obtain an updated model, wherein the updating frequency of the position codes is less than that of the parameters.
In one possible implementation manner, the sending module 1203 is configured to: acquiring a parameter updating amount and a position coding updating amount based on the updated model and the model to be trained; and sending the parameter updating amount and the position code updating amount to the server side, wherein the parameter updating amount and the position code updating amount are used for updating the model to be trained by the server side until the model training condition is met, and obtaining the trained model.
In one possible implementation, the processing module 1202 is configured to: obtaining a target loss based on the processing result and a true processing result of the training data, the target loss being indicative of a difference between the processing result and the true processing result; and updating the parameters based on the target loss to obtain an updated model.
In one possible implementation manner, the sending module 1203 is configured to: acquiring a parameter updating amount based on the updated model and the model to be trained; and sending the parameter updating quantity to the server, wherein the parameter updating quantity is used for updating the model to be trained by the server until the model training condition is met, and obtaining the trained model.
Fig. 13 is a schematic structural diagram of a server according to an embodiment of the present application, and as shown in fig. 13, the server includes:
a sending module 1301, configured to send a model to be trained to a client, where the model to be trained includes multiple neurons, each of the multiple neurons has a parameter and a position code, and neurons at different positions in the multiple neurons have different position codes, and the parameter and the position code are used for the client to process training data to obtain a processing result, and update the parameter based on the processing result to obtain an updated model;
an obtaining module 1302, configured to obtain an updated model from a client;
and an aggregation module 1303, configured to aggregate the updated models to obtain a trained model.
In the embodiment of the application, after a client acquires a model to be trained from a server, the client processes local training data through parameters and position codes of a plurality of neurons in the model to be trained to obtain a processing result of the training data. Then, the client may update the parameters of the neurons in the model to be trained based on the processing result, thereby obtaining an updated model. For any one of the neurons of the model to be trained, the neuron has parameters and position codes, and the position codes of the neuron are different from those of the rest neurons, so that the position codes of the neurons can restrict the functions of the neurons and are different from those of the rest neurons. With respect to the distribution of neurons in the model to be trained, if the function of neurons at certain positions in the updated model changes (i.e. some neurons with different functions change in position), since the position codes of the neurons at these positions remain unchanged (since the position codes are only related to the positions of the neurons), it will result in the output of the model with the changed position being different from the output of the model without the changed position, causing instability in the model training process, thereby affecting the performance of the trained model, so that the client can keep the function of the neuron at each position unchanged as much as possible when updating the parameters of each neuron in the model to be trained, therefore, the output of the updated model is ensured to be as stable as possible, and the position coding of the neuron effectively restrains the rearrangement invariance of the neural network model. Since other clients can also execute the same operation as the client, neurons with the same function are all in the same position in the updated model uploaded to the server by each client, so that the server can process the neurons in each updated model according to the position when realizing aggregation, and the obtained trained model has a sufficiently excellent function.
In one possible implementation, the position codes of the plurality of neurons are determined by the server based on the positions of the plurality of neurons in the model to be processed, or the position codes of the plurality of neurons are determined by the client and the server based on the positions of the plurality of neurons in the model to be processed.
In a possible implementation manner, the obtaining module 1302 is configured to obtain a parameter update amount and a position code update amount from a client, where the parameter update amount and the position code update amount are obtained based on an updated model and a model to be trained; and the aggregation module 1303 is configured to update the model to be trained based on the parameter update amount and the position coding update amount until a model training condition is met, so as to obtain a trained model.
In a possible implementation manner, the obtaining module 1302 is configured to obtain a parameter update amount from the client, where the parameter update amount is obtained based on the updated model and the model to be trained; and the aggregation module 1303 is configured to update the model to be trained based on the parameter update amount until the model training condition is met, so as to obtain the trained model.
It should be noted that, because the contents of information interaction, execution process, and the like between the modules/units of the apparatus are based on the same concept as the method embodiment of the present application, the technical effect brought by the contents is the same as the method embodiment of the present application, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not repeated herein.
The embodiment of the present application further relates to an execution device, and fig. 14 is a schematic structural diagram of the execution device provided in the embodiment of the present application. As shown in fig. 14, the execution device 1400 may be embodied as a mobile phone, a tablet, a notebook, a smart wearable device, a server, and the like, which is not limited herein. The execution device 1400 may be deployed with the client shown in fig. 8, and is used to implement the function of model training in the embodiment corresponding to fig. 4 or fig. 6 in combination with the subsequent training device. Specifically, the execution device 1400 includes: a receiver 1401, a transmitter 1402, a processor 1403 and a memory 1404 (wherein the number of processors 1403 in the performing device 1400 may be one or more, for example one processor in fig. 14), wherein the processor 1403 may comprise an application processor 14031 and a communication processor 14032. In some embodiments of the present application, the receiver 1401, the transmitter 1402, the processor 1403, and the memory 1404 may be connected by a bus or other means.
The memory 1404 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1403. A portion of memory 1404 may also include non-volatile random access memory (NVRAM). The memory 1404 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.
The processor 1403 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the embodiments of the present application may be applied to the processor 1403, or implemented by the processor 1403. The processor 1403 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method can be performed by hardware integrated logic circuits or instructions in software form in the processor 1403. The processor 1403 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor 1403 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1404, and the processor 1403 reads the information in the memory 1404 and completes the steps of the above method in combination with the hardware thereof.
The receiver 1401 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1402 may be used to output numeric or character information through a first interface; the transmitter 1402 may also be configured to send instructions to the disk pack via the first interface to modify data in the disk pack; the transmitter 1402 may also include a display device such as a display screen.
In this embodiment, in one case, the processor 1403 may be configured to implement the model training method in the embodiment corresponding to fig. 7 or fig. 11, and may also be configured to implement a corresponding data processing function through the trained model obtained in the embodiment corresponding to fig. 7 or fig. 11.
The embodiment of the present application further relates to a training device, and fig. 15 is a schematic structural diagram of the training device provided in the embodiment of the present application. As shown in FIG. 15, the training apparatus 1500 is implemented by one or more servers, where the training apparatus 1500 may vary widely depending on configuration or performance, and may include one or more Central Processing Units (CPUs) 1514 (e.g., one or more processors) and memory 1532, one or more storage media 1530 (e.g., one or more mass storage devices) that store applications 1542 or data 1544. Memory 1532 and storage media 1530 may be, among other things, transient or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, a central processor 1514 may be provided in communication with the storage medium 1530, executing a series of instruction operations in the storage medium 1530 on the exercise device 1500.
Training apparatus 1500 can also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input-output interfaces 1558; or, one or more operating systems 1541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Specifically, the training device may be combined with the aforementioned executing device to jointly execute the model training method in the embodiment corresponding to fig. 4 or fig. 6.
The present embodiment also relates to a computer storage medium, in which a program for signal processing is stored, which, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution apparatus, or causes the computer to perform the steps performed by the aforementioned training apparatus.
Embodiments of the present application also relate to a computer program product having instructions stored thereon, which, when executed by a computer, cause the computer to perform the steps performed by the aforementioned execution apparatus, or cause the computer to perform the steps performed by the aforementioned training apparatus.
The execution device, the training device, or the terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored by the storage unit to cause the chip in the execution device to execute the data processing method described in the above embodiment, or to cause the chip in the training device to execute the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Specifically, referring to fig. 16, fig. 16 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 1600, and the NPU 1600 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core part of the NPU is an arithmetic circuit 1603, and the controller 1604 controls the arithmetic circuit 1603 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 1603 includes a plurality of processing units (PEs) therein. In some implementations, the arithmetic circuitry 1603 is a two-dimensional systolic array. The arithmetic circuit 1603 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1603 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 1602 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1601 and performs matrix operation with the matrix B, and a partial result or a final result of the obtained matrix is stored in an accumulator (accumulator) 1608.
The unified memory 1606 is used to store input data as well as output data. The weight data directly passes through a memory cell access Controller (DMAC) 1605, and the DMAC is transferred to a weight memory 1602. The input data is also carried into the unified memory 1606 through the DMAC.
The BIU is a Bus Interface Unit 1613, which is used for interaction of the AXI Bus with the DMAC and an Instruction Fetch Buffer (IFB) 1609.
The Bus Interface Unit 1613(Bus Interface Unit, BIU for short) is configured to obtain an instruction from the external memory by the instruction fetch memory 1609, and also to obtain the original data of the input matrix a or the weight matrix B from the external memory by the storage Unit access controller 1605.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1606, or to transfer weight data to the weight memory 1602, or to transfer input data to the input memory 1601.
The vector calculation unit 1607 includes a plurality of arithmetic processing units, and further processes the output of the arithmetic circuit 1603 if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a prediction label plane and the like.
In some implementations, the vector calculation unit 1607 can store the processed output vector to the unified memory 1606. For example, the vector calculation unit 1607 may calculate a linear function; alternatively, a non-linear function is applied to the output of the arithmetic circuit 1603, such as to linearly interpolate the predicted tag planes extracted by the convolutional layers, and then, such as to accumulate a vector of values to generate activation values. In some implementations, the vector calculation unit 1607 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 1603, e.g., for use in subsequent layers in a neural network.
An instruction fetch buffer (instruction fetch buffer)1609 connected to the controller 1604 for storing instructions used by the controller 1604;
the unified memory 1606, the input memory 1601, the weight memory 1602, and the instruction fetch memory 1609 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
The processor mentioned in any of the above may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.
It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (19)

1. A method of model training, the method comprising:
obtaining a model to be trained, wherein the model to be trained comprises a plurality of neurons, the neurons are associated with parameter information and position coding information, and the neurons correspond to the position coding information one by one;
updating the model to be trained through the parameter information, the position coding information and the training data to obtain an updated model;
and sending the updated model.
2. The method of claim 1, wherein the position-encoded information of the plurality of neurons is determined based on positions of the plurality of neurons in the model to be trained.
3. The method according to claim 1 or 2, wherein the updating the model to be trained through the parameter information, the position coding information and the training data to obtain an updated model comprises:
processing the training data through the parameter information and the position coding information to obtain a processing result;
and updating the parameter information based on the processing result to obtain an updated model.
4. The method according to claim 3, wherein the model to be trained includes N layers, the plurality of neurons are all neurons from layer 2 to layer N-1, and the processing the training data by the parameter information and the position encoding information to obtain the processing result includes:
performing first calculation on parameter information of a jth neuron of an ith layer and final outputs of all neurons of an i-1 th layer to obtain an initial output of the jth neuron of the ith layer, wherein i is 2, i.e., N-1, j is 1, i.e., M, N is not less than 3, and M is not less than 1;
and performing second calculation on the position coding information of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer, wherein the final output of the jth neuron of the ith layer is used for generating a processing result.
5. The method according to claim 4, wherein in the model to be trained, a layer 1 is an input layer, a layer N is an output layer, final outputs of all neurons in the layer 1 are the training data, parameter information of a layer N and a layer j of the neurons is used for performing first calculation on the final outputs of all neurons in the layer N-1 to obtain a final output of a layer N and a layer j of the neurons, and the final output of all the neurons in the layer N is a processing result.
6. The method of claim 4 or 5, wherein performing a second calculation on the position coding information of the jth neuron at the ith layer and the initial output of the jth neuron at the ith layer to obtain a final output of the jth neuron at the ith layer comprises:
performing four arithmetic operations on position coding information of the jth neuron of the ith layer and initial output of the jth neuron of the ith layer to obtain final output of the jth neuron of the ith layer; or the like, or, alternatively,
performing trigonometric function operation on the position coding information of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or the like, or a combination thereof,
performing exponential operation on the position coding information of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer; or the like, or a combination thereof,
and carrying out logarithmic operation on the position coding information of the jth neuron of the ith layer and the initial output of the jth neuron of the ith layer to obtain the final output of the jth neuron of the ith layer.
7. The method according to any one of claims 3 to 6, wherein updating the parameter information based on the processing result, and obtaining an updated model comprises:
obtaining a target loss based on the processing result and a real processing result of the training data, the target loss indicating a difference between the processing result and the real processing result;
and updating the parameter information and the position coding information based on the target loss to obtain an updated model, wherein the updating frequency of the position coding information is less than that of the parameter information.
8. The method of claim 7, wherein sending the updated model comprises:
acquiring a parameter information updating quantity and a position coding information updating quantity based on the updated model and the model to be trained;
and sending the parameter information updating amount and the position coding information updating amount.
9. The method according to any one of claims 3 to 6, wherein updating the parameter information based on the processing result, and obtaining an updated model comprises:
obtaining a target loss based on the processing result and a real processing result of the training data, the target loss indicating a difference between the processing result and the real processing result;
and updating the parameter information based on the target loss to obtain an updated model.
10. The method of claim 9, wherein sending the updated model comprises:
acquiring parameter information updating quantity based on the updated model and the model to be trained;
and sending the parameter information updating amount.
11. A method of model training, the method comprising:
sending a model to be trained, wherein the model to be trained comprises a plurality of neurons, the neurons are associated with parameter information and position coding information, and the neurons correspond to the position coding information one by one;
and acquiring an updated model, and aggregating the updated model to obtain a trained model, wherein the updated model is obtained by updating the model to be trained based on the parameter information, the position coding information and the training data.
12. The method of claim 11, wherein the position-encoded information of the plurality of neurons is determined based on positions of the plurality of neurons in the model to be trained.
13. The method according to claim 11 or 12, wherein the obtaining the updated model and aggregating the updated model to obtain the trained model comprises:
acquiring a parameter information updating amount and a position coding information updating amount, wherein the parameter information updating amount and the position coding information updating amount are acquired based on the updated model and the model to be trained;
and updating the model to be trained based on the parameter information updating amount and the position coding information updating amount until model training conditions are met, so as to obtain a trained model.
14. The method according to claim 11 or 12, wherein the obtaining the updated model and aggregating the updated model to obtain the trained model comprises:
acquiring a parameter information updating amount, wherein the parameter information updating amount is acquired based on the updated model and the model to be trained;
and updating the model to be trained based on the parameter information updating amount until model training conditions are met, and obtaining the trained model.
15. A model training apparatus, the apparatus comprising:
the device comprises an acquisition module, a training module and a processing module, wherein the acquisition module is used for acquiring a model to be trained, the model to be trained comprises a plurality of neurons, the neurons are associated with parameter information and position coding information, and the neurons correspond to the position coding information one to one;
the processing module is used for updating the model to be trained through the parameter information, the position coding information and the training data to obtain an updated model;
and the sending module is used for sending the updated model.
16. A model training apparatus, the apparatus comprising:
the device comprises a sending module, a calculating module and a calculating module, wherein the sending module is used for sending a model to be trained, the model to be trained comprises a plurality of neurons, the neurons are associated with parameter information and position coding information, and the neurons correspond to the position coding information one to one;
the acquisition module is used for acquiring an updated model, and the updated model is obtained by updating the model to be trained based on the parameter information, the position coding information and the training data;
and the aggregation module is used for aggregating the updated model to obtain the trained model.
17. A model training apparatus, comprising a memory and a processor;
the memory stores code, the processor is configured to execute the code, and when executed, the model training apparatus performs the method of any of claims 1 to 14.
18. A computer storage medium, characterized in that it stores a computer program which, when executed by a computer, causes the computer to carry out the method of any one of claims 1 to 14.
19. A computer program product having stored thereon instructions which, when executed by a computer, cause the computer to carry out the method of any one of claims 1 to 14.
CN202210304574.2A 2022-03-26 2022-03-26 Model training method and related equipment thereof Pending CN114841361A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210304574.2A CN114841361A (en) 2022-03-26 2022-03-26 Model training method and related equipment thereof
PCT/CN2023/082679 WO2023185541A1 (en) 2022-03-26 2023-03-21 Model training method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210304574.2A CN114841361A (en) 2022-03-26 2022-03-26 Model training method and related equipment thereof

Publications (1)

Publication Number Publication Date
CN114841361A true CN114841361A (en) 2022-08-02

Family

ID=82563158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210304574.2A Pending CN114841361A (en) 2022-03-26 2022-03-26 Model training method and related equipment thereof

Country Status (2)

Country Link
CN (1) CN114841361A (en)
WO (1) WO2023185541A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185541A1 (en) * 2022-03-26 2023-10-05 华为技术有限公司 Model training method and related device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288097B (en) * 2020-10-29 2024-04-02 平安科技(深圳)有限公司 Federal learning data processing method, federal learning data processing device, computer equipment and storage medium
CN112396191B (en) * 2020-12-29 2021-05-11 支付宝(杭州)信息技术有限公司 Method, system and device for updating model parameters based on federal learning
CN113159283B (en) * 2021-03-31 2023-03-31 华为技术有限公司 Model training method based on federal transfer learning and computing node
CN113723619B (en) * 2021-08-31 2024-06-21 南京大学 Federal learning training method based on training stage perception strategy
CN113989561B (en) * 2021-10-29 2024-04-16 河海大学 Parameter aggregation updating method, device and system based on asynchronous federal learning
CN114841361A (en) * 2022-03-26 2022-08-02 华为技术有限公司 Model training method and related equipment thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023185541A1 (en) * 2022-03-26 2023-10-05 华为技术有限公司 Model training method and related device

Also Published As

Publication number Publication date
WO2023185541A1 (en) 2023-10-05

Similar Documents

Publication Publication Date Title
WO2022022274A1 (en) Model training method and apparatus
CN112651511A (en) Model training method, data processing method and device
CN113065633B (en) Model training method and associated equipment
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
WO2022111387A1 (en) Data processing method and related apparatus
CN113627422A (en) Image classification method and related equipment thereof
CN113627163A (en) Attention model, feature extraction method and related device
WO2023020185A1 (en) Image classification method and related device
CN113536970A (en) Training method of video classification model and related device
CN114169393A (en) Image classification method and related equipment thereof
WO2023185541A1 (en) Model training method and related device
CN113627421A (en) Image processing method, model training method and related equipment
CN116739154A (en) Fault prediction method and related equipment thereof
CN117056589A (en) Article recommendation method and related equipment thereof
CN116611861A (en) Consumption prediction method and related equipment thereof
WO2023045949A1 (en) Model training method and related device
CN115795025A (en) Abstract generation method and related equipment thereof
CN116309226A (en) Image processing method and related equipment thereof
CN116343004A (en) Image processing method and related equipment thereof
CN115623242A (en) Video processing method and related equipment thereof
CN117746047A (en) Image processing method and related equipment thereof
CN116259311A (en) Voice processing method and related equipment thereof
CN115907041A (en) Model training method and device
CN115565104A (en) Action prediction method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination